Vincent Hui
2018-07-04 08:56:17 UTC
Hi Deven,
Thank you for your contribution. Did you benchmark Eigen with and without
AMD GPU? Can we give us instructions how to use Eigen with AMD GPU? I have
AMD GPU, I can try Eigen with AMD GPU. Futhermore, did you benchmark
TensorFlow with and without AMD GPU after you added AMD GPU support to
Eigen?
Thank a lot,
Vincent
Thank you for your contribution. Did you benchmark Eigen with and without
AMD GPU? Can we give us instructions how to use Eigen with AMD GPU? I have
AMD GPU, I can try Eigen with AMD GPU. Futhermore, did you benchmark
TensorFlow with and without AMD GPU after you added AMD GPU support to
Eigen?
Thank a lot,
Vincent
PR submitted - https://bitbucket.org/eigen/eigen/pull-requests/402/
adding-support-for-using-eigen-in-hip/diff
Jason : Thank you for your reponse. Hoping you will find the initial
level of HIP support to your liking.
Vincent : yes AMD GPU support is similar to what exists for CUDA / OpenCL
Thanks
deven
adding-support-for-using-eigen-in-hip/diff
Jason : Thank you for your reponse. Hoping you will find the initial
level of HIP support to your liking.
Vincent : yes AMD GPU support is similar to what exists for CUDA / OpenCL
Thanks
deven
Hi Deven,
Is AMD GPUs support similar to Eigen OpenCL hardware support to Eigen?
Thanks,
Vincent
Is AMD GPUs support similar to Eigen OpenCL hardware support to Eigen?
Thanks,
Vincent
Just had to drop in and say cool! It's great to see HIP support
spread through the ecosystem.
I've tried to use Eigen a few times in CUDA and I realized a few
-Solvers that could execute on the GPU didn't, because of dynamic
allocations happening somewhere and I couldn't figure out how to make
that not happen. For things like a batched-qr solve of small
matrices. They may not have actually had happen but the problem is
they'd be referenced on the device side compile, somewhere deep. I
think at the time I was looking at either the SVD or QR solvers.
-It wasn't as flexible as I first hoped, unfortunately there's a lot
of strategies you can use to evaluate matrix operations in warp,
block, or device level parallel and this is outside of what eigen
offers. If it was trying to be a device side library in the capacity
of flexibility that makes sense there, it should for maximum
performance. The cutlass library takes this to the extreme for matrix
https://devblogs.nvidia.com/cutlass-linear-algebra-cuda/
https://github.com/NVIDIA/cutlass . To clarify for flexibility, I
don't just mean exploiting the hierarchy via tiling but choosing
between simpler multiplication techniques given smaller dimensions,
layout, amount of shared memory desired (or registers sacrificed) and
choosing how to extract the parallelism into such evaluations.
Which means that each thread id has to do all its work individually,
this can be somewhat reasonable, depends on the problem's/kernel's
needs.
As for buidling it with cuda support, it autodetects the NVCC compiler
through the macro common definitions that compiler defines (__NVCC__
and the like). You have to explicitly disable it if you're compiling
with NVCC (I've had errors and turn it off occasionally when I'm using
eigen in nvcc on the host side).
I don't know anything about the unit tests, sorry. I also haven't
been watching for any recent changes so my experiences may also be a
little out of date.
I am not a core dev but what I have seen and used in the past for the
project is to submit PR's to https://bitbucket.org/eigen/eigen/ - I of
course leave plenty of room for any stakeholders to clarify any of the
other questions you asked.
-Jason
spread through the ecosystem.
I've tried to use Eigen a few times in CUDA and I realized a few
-Solvers that could execute on the GPU didn't, because of dynamic
allocations happening somewhere and I couldn't figure out how to make
that not happen. For things like a batched-qr solve of small
matrices. They may not have actually had happen but the problem is
they'd be referenced on the device side compile, somewhere deep. I
think at the time I was looking at either the SVD or QR solvers.
-It wasn't as flexible as I first hoped, unfortunately there's a lot
of strategies you can use to evaluate matrix operations in warp,
block, or device level parallel and this is outside of what eigen
offers. If it was trying to be a device side library in the capacity
of flexibility that makes sense there, it should for maximum
performance. The cutlass library takes this to the extreme for matrix
https://devblogs.nvidia.com/cutlass-linear-algebra-cuda/
https://github.com/NVIDIA/cutlass . To clarify for flexibility, I
don't just mean exploiting the hierarchy via tiling but choosing
between simpler multiplication techniques given smaller dimensions,
layout, amount of shared memory desired (or registers sacrificed) and
choosing how to extract the parallelism into such evaluations.
Which means that each thread id has to do all its work individually,
this can be somewhat reasonable, depends on the problem's/kernel's
needs.
As for buidling it with cuda support, it autodetects the NVCC compiler
through the macro common definitions that compiler defines (__NVCC__
and the like). You have to explicitly disable it if you're compiling
with NVCC (I've had errors and turn it off occasionally when I'm using
eigen in nvcc on the host side).
I don't know anything about the unit tests, sorry. I also haven't
been watching for any recent changes so my experiences may also be a
little out of date.
I am not a core dev but what I have seen and used in the past for the
project is to submit PR's to https://bitbucket.org/eigen/eigen/ - I of
course leave plenty of room for any stakeholders to clarify any of the
other questions you asked.
-Jason
Hi All,
I am a software developement engineer in AMD and we are currently
working onI am a software developement engineer in AMD and we are currently
enabling support AMD GPUs in Eigen.
We envision that support for the AMD GPUs can be implemented in fashion
similar to what has already been done for NVidia with CUDA. I have
someWe envision that support for the AMD GPUs can be implemented in fashion
similar to what has already been done for NVidia with CUDA. I have
1. What is the purpose of the "EIGEN_USE_GPU" macro in the codebase? I
see alot of code that is guarded by the EIGEN_CUDACC (guards code that uses
CUDAextensions) and EIGEN_CUDA_ARCH (guards code that is expected to
execute onthe device) macros, which I think I understand. What I am not clear
about isthe need/use for the EIGEN_USE_GPU macro.
2. How do I configure cmake to
- build Eigen with GPU / CUDA support?
- enable all the unit tests that target the GPU/CUDA?
I want to make sure that our implementation is consistent with what is
already in place for CUDA, and hence the need to understand the CUDA
implementation.
Any information regarding this will be very helpful.
3. What is the correct protocol to use for upstreaming our code (once
done)2. How do I configure cmake to
- build Eigen with GPU / CUDA support?
- enable all the unit tests that target the GPU/CUDA?
I want to make sure that our implementation is consistent with what is
already in place for CUDA, and hence the need to understand the CUDA
implementation.
Any information regarding this will be very helpful.
3. What is the correct protocol to use for upstreaming our code (once
to the Eigen codebase? Will a simple pull request suffice, or do we
need todo something more? Is there some acceptance criteria/checklist we need
tocomplete, before we can can issue the PR?
Please let me know if this is not the correct forum to address these
questions (and point me to the right one :) ) I expect to have a
quite aPlease let me know if this is not the correct forum to address these
questions (and point me to the right one :) ) I expect to have a
few more questions in the coming days, as we
Thanks
deven
Thanks
deven