[eigen] Implementation of a TensorMultiMap

Douglas McCloskey

2018-10-17 12:24:42 UTC

Hi,

I would like to implement a "TensorMultiMap" that takes as input an array of pointers from multiple Tensors so that one can perform operations on the combined array of Tensors instead of having to allocate new memory and perform a series of `concatenate` operations prior.

The problem I am running into is that I have two arrays (of dynamic length, but equal in size) of Tensors (e.g., std::vector<Tensor> tensor1 and std::vector<Tensor> tensor2) where I would like to efficiently multiply each Tensor by one another and sum the result in a single output Tensor. I can accomplish this easily enough using a for loop, but because I am not able to use auto, an evaluation must be made at each iteration of the loop. Using Cuda, this results in a new launch of a kernal, which drastically impacts performance.

I have experimented with using a recursive function, but unfortunately, this does not work with Cuda 11 (the code will compile, but the stream will never sync).

Is a "TensorMultiMap" possible? If so, how best could it be implemented?

Best,

Douglas McCloskey, PhD
Group Leader, AutoFlow
Laison, Information Services/Computational Biology

DTU Biosustain

Technical University of Denmark
Novo Nordisk Foundation Center for Biosustainability
Kemitorvet
Building 220, Room 218
2800 Kgs.Lyngby
***@biosustain.dtu.dk<mailto:***@biosustain.dtu.dk>
www.dtu.dk/english<http://www.dtu.dk/english>
[Loading Image...

]