Discussion:
[eigen] Adding support for AMD GPUs in Eigen
Vincent Hui
2018-07-04 08:56:17 UTC
Permalink
Hi Deven,

Thank you for your contribution. Did you benchmark Eigen with and without
AMD GPU? Can we give us instructions how to use Eigen with AMD GPU? I have
AMD GPU, I can try Eigen with AMD GPU. Futhermore, did you benchmark
TensorFlow with and without AMD GPU after you added AMD GPU support to
Eigen?

Thank a lot,
Vincent
PR submitted - https://bitbucket.org/eigen/eigen/pull-requests/402/
adding-support-for-using-eigen-in-hip/diff
Jason : Thank you for your reponse. Hoping you will find the initial
level of HIP support to your liking.
Vincent : yes AMD GPU support is similar to what exists for CUDA / OpenCL
Thanks
deven
Hi Deven,
Is AMD GPUs support similar to Eigen OpenCL hardware support to Eigen?
Thanks,
Vincent
Just had to drop in and say cool! It's great to see HIP support
spread through the ecosystem.
I've tried to use Eigen a few times in CUDA and I realized a few
-Solvers that could execute on the GPU didn't, because of dynamic
allocations happening somewhere and I couldn't figure out how to make
that not happen. For things like a batched-qr solve of small
matrices. They may not have actually had happen but the problem is
they'd be referenced on the device side compile, somewhere deep. I
think at the time I was looking at either the SVD or QR solvers.
-It wasn't as flexible as I first hoped, unfortunately there's a lot
of strategies you can use to evaluate matrix operations in warp,
block, or device level parallel and this is outside of what eigen
offers. If it was trying to be a device side library in the capacity
of flexibility that makes sense there, it should for maximum
performance. The cutlass library takes this to the extreme for matrix
https://devblogs.nvidia.com/cutlass-linear-algebra-cuda/
https://github.com/NVIDIA/cutlass . To clarify for flexibility, I
don't just mean exploiting the hierarchy via tiling but choosing
between simpler multiplication techniques given smaller dimensions,
layout, amount of shared memory desired (or registers sacrificed) and
choosing how to extract the parallelism into such evaluations.
Which means that each thread id has to do all its work individually,
this can be somewhat reasonable, depends on the problem's/kernel's
needs.
As for buidling it with cuda support, it autodetects the NVCC compiler
through the macro common definitions that compiler defines (__NVCC__
and the like). You have to explicitly disable it if you're compiling
with NVCC (I've had errors and turn it off occasionally when I'm using
eigen in nvcc on the host side).
I don't know anything about the unit tests, sorry. I also haven't
been watching for any recent changes so my experiences may also be a
little out of date.
I am not a core dev but what I have seen and used in the past for the
project is to submit PR's to https://bitbucket.org/eigen/eigen/ - I of
course leave plenty of room for any stakeholders to clarify any of the
other questions you asked.
-Jason
Hi All,
I am a software developement engineer in AMD and we are currently
working on
enabling support AMD GPUs in Eigen.
We envision that support for the AMD GPUs can be implemented in fashion
similar to what has already been done for NVidia with CUDA. I have
some
1. What is the purpose of the "EIGEN_USE_GPU" macro in the codebase? I
see a
lot of code that is guarded by the EIGEN_CUDACC (guards code that uses
CUDA
extensions) and EIGEN_CUDA_ARCH (guards code that is expected to
execute on
the device) macros, which I think I understand. What I am not clear
about is
the need/use for the EIGEN_USE_GPU macro.
2. How do I configure cmake to
- build Eigen with GPU / CUDA support?
- enable all the unit tests that target the GPU/CUDA?
I want to make sure that our implementation is consistent with what is
already in place for CUDA, and hence the need to understand the CUDA
implementation.
Any information regarding this will be very helpful.
3. What is the correct protocol to use for upstreaming our code (once
done)
to the Eigen codebase? Will a simple pull request suffice, or do we
need to
do something more? Is there some acceptance criteria/checklist we need
to
complete, before we can can issue the PR?
Please let me know if this is not the correct forum to address these
questions (and point me to the right one :) ) I expect to have a
quite a
few more questions in the coming days, as we
Thanks
deven
Deven Desai
2018-07-06 17:31:07 UTC
Permalink
Hi Vincent,

We have not done any benchmarking of Eigen with AMD GPUs yet. I am
currently focusing on getting the functionality in place and implementing
all the updates requested in the PR feedback so that we can get the PR
merged. I expect to be able to do some benchmarking once that is done.

Running with AMD GPU should only require passing "-DEIGEN_USE_HIP" to the
compiler (for code that pulls in Eigen header files). Everything else
should be similar to what you would do for Nvidia GPUs.

Getting Tensorflow to work with AMD GPUs requires a lot of other changes in
addition to this change in Eigen. There is a separate project ongoing that
is tasked with getting Tensorflow to work on AMD GPUs. Let me know if you
need more information.

Thanks

deven
Post by Vincent Hui
Hi Deven,
Thank you for your contribution. Did you benchmark Eigen with and without
AMD GPU? Can we give us instructions how to use Eigen with AMD GPU? I have
AMD GPU, I can try Eigen with AMD GPU. Futhermore, did you benchmark
TensorFlow with and without AMD GPU after you added AMD GPU support to
Eigen?
Thank a lot,
Vincent
PR submitted -
https://bitbucket.org/eigen/eigen/pull-requests/402/adding-support-for-using-eigen-in-hip/diff
Jason : Thank you for your reponse. Hoping you will find the initial
level of HIP support to your liking.
Vincent : yes AMD GPU support is similar to what exists for CUDA / OpenCL
Thanks
deven
Post by Vincent Hui
Hi Deven,
Is AMD GPUs support similar to Eigen OpenCL hardware support to Eigen?
Thanks,
Vincent
Just had to drop in and say cool! It's great to see HIP support
spread through the ecosystem.
I've tried to use Eigen a few times in CUDA and I realized a few
-Solvers that could execute on the GPU didn't, because of dynamic
allocations happening somewhere and I couldn't figure out how to make
that not happen. For things like a batched-qr solve of small
matrices. They may not have actually had happen but the problem is
they'd be referenced on the device side compile, somewhere deep. I
think at the time I was looking at either the SVD or QR solvers.
-It wasn't as flexible as I first hoped, unfortunately there's a lot
of strategies you can use to evaluate matrix operations in warp,
block, or device level parallel and this is outside of what eigen
offers. If it was trying to be a device side library in the capacity
of flexibility that makes sense there, it should for maximum
performance. The cutlass library takes this to the extreme for matrix
https://devblogs.nvidia.com/cutlass-linear-algebra-cuda/
https://github.com/NVIDIA/cutlass . To clarify for flexibility, I
don't just mean exploiting the hierarchy via tiling but choosing
between simpler multiplication techniques given smaller dimensions,
layout, amount of shared memory desired (or registers sacrificed) and
choosing how to extract the parallelism into such evaluations.
Which means that each thread id has to do all its work individually,
this can be somewhat reasonable, depends on the problem's/kernel's
needs.
As for buidling it with cuda support, it autodetects the NVCC compiler
through the macro common definitions that compiler defines (__NVCC__
and the like). You have to explicitly disable it if you're compiling
with NVCC (I've had errors and turn it off occasionally when I'm using
eigen in nvcc on the host side).
I don't know anything about the unit tests, sorry. I also haven't
been watching for any recent changes so my experiences may also be a
little out of date.
I am not a core dev but what I have seen and used in the past for the
project is to submit PR's to https://bitbucket.org/eigen/eigen/ - I of
course leave plenty of room for any stakeholders to clarify any of the
other questions you asked.
-Jason
Hi All,
I am a software developement engineer in AMD and we are currently
working on
enabling support AMD GPUs in Eigen.
We envision that support for the AMD GPUs can be implemented in
fashion
similar to what has already been done for NVidia with CUDA. I have
some
1. What is the purpose of the "EIGEN_USE_GPU" macro in the codebase?
I see a
lot of code that is guarded by the EIGEN_CUDACC (guards code that
uses CUDA
extensions) and EIGEN_CUDA_ARCH (guards code that is expected to
execute on
the device) macros, which I think I understand. What I am not clear
about is
the need/use for the EIGEN_USE_GPU macro.
2. How do I configure cmake to
- build Eigen with GPU / CUDA support?
- enable all the unit tests that target the GPU/CUDA?
I want to make sure that our implementation is consistent with what is
already in place for CUDA, and hence the need to understand the CUDA
implementation.
Any information regarding this will be very helpful.
3. What is the correct protocol to use for upstreaming our code (once
done)
to the Eigen codebase? Will a simple pull request suffice, or do we
need to
do something more? Is there some acceptance criteria/checklist we
need to
complete, before we can can issue the PR?
Please let me know if this is not the correct forum to address these
questions (and point me to the right one :) ) I expect to have a
quite a
few more questions in the coming days, as we
Thanks
deven
Vincent Hui
2018-08-08 06:21:33 UTC
Permalink
Hi Deven,

How can I try using AMD GPU in Eigen? Is your code merged on default branch?
I just need to clone Eigen and build Eigen on default branch by passing
"-DEIGEN_USE_HIP" to the compiler. Am I right?

Thanks,
Vincent
Post by Deven Desai
Hi Vincent,
We have not done any benchmarking of Eigen with AMD GPUs yet. I am
currently focusing on getting the functionality in place and implementing
all the updates requested in the PR feedback so that we can get the PR
merged. I expect to be able to do some benchmarking once that is done.
Running with AMD GPU should only require passing "-DEIGEN_USE_HIP" to the
compiler (for code that pulls in Eigen header files). Everything else
should be similar to what you would do for Nvidia GPUs.
Getting Tensorflow to work with AMD GPUs requires a lot of other changes
in addition to this change in Eigen. There is a separate project ongoing
that is tasked with getting Tensorflow to work on AMD GPUs. Let me know if
you need more information.
Thanks
deven
Post by Vincent Hui
Hi Deven,
Thank you for your contribution. Did you benchmark Eigen with and without
AMD GPU? Can we give us instructions how to use Eigen with AMD GPU? I have
AMD GPU, I can try Eigen with AMD GPU. Futhermore, did you benchmark
TensorFlow with and without AMD GPU after you added AMD GPU support to
Eigen?
Thank a lot,
Vincent
PR submitted - https://bitbucket.org/eigen/eigen/pull-requests/402/
adding-support-for-using-eigen-in-hip/diff
Jason : Thank you for your reponse. Hoping you will find the initial
level of HIP support to your liking.
Vincent : yes AMD GPU support is similar to what exists for CUDA / OpenCL
Thanks
deven
Post by Vincent Hui
Hi Deven,
Is AMD GPUs support similar to Eigen OpenCL hardware support to Eigen?
Thanks,
Vincent
Just had to drop in and say cool! It's great to see HIP support
spread through the ecosystem.
I've tried to use Eigen a few times in CUDA and I realized a few
-Solvers that could execute on the GPU didn't, because of dynamic
allocations happening somewhere and I couldn't figure out how to make
that not happen. For things like a batched-qr solve of small
matrices. They may not have actually had happen but the problem is
they'd be referenced on the device side compile, somewhere deep. I
think at the time I was looking at either the SVD or QR solvers.
-It wasn't as flexible as I first hoped, unfortunately there's a lot
of strategies you can use to evaluate matrix operations in warp,
block, or device level parallel and this is outside of what eigen
offers. If it was trying to be a device side library in the capacity
of flexibility that makes sense there, it should for maximum
performance. The cutlass library takes this to the extreme for matrix
https://devblogs.nvidia.com/cutlass-linear-algebra-cuda/
https://github.com/NVIDIA/cutlass . To clarify for flexibility, I
don't just mean exploiting the hierarchy via tiling but choosing
between simpler multiplication techniques given smaller dimensions,
layout, amount of shared memory desired (or registers sacrificed) and
choosing how to extract the parallelism into such evaluations.
Which means that each thread id has to do all its work individually,
this can be somewhat reasonable, depends on the problem's/kernel's
needs.
As for buidling it with cuda support, it autodetects the NVCC compiler
through the macro common definitions that compiler defines (__NVCC__
and the like). You have to explicitly disable it if you're compiling
with NVCC (I've had errors and turn it off occasionally when I'm using
eigen in nvcc on the host side).
I don't know anything about the unit tests, sorry. I also haven't
been watching for any recent changes so my experiences may also be a
little out of date.
I am not a core dev but what I have seen and used in the past for the
project is to submit PR's to https://bitbucket.org/eigen/eigen/ - I of
course leave plenty of room for any stakeholders to clarify any of the
other questions you asked.
-Jason
On Wed, May 16, 2018 at 4:52 PM, Deven Desai <
Hi All,
I am a software developement engineer in AMD and we are currently
working on
enabling support AMD GPUs in Eigen.
We envision that support for the AMD GPUs can be implemented in
fashion
similar to what has already been done for NVidia with CUDA. I have
some
1. What is the purpose of the "EIGEN_USE_GPU" macro in the codebase?
I see a
lot of code that is guarded by the EIGEN_CUDACC (guards code that
uses CUDA
extensions) and EIGEN_CUDA_ARCH (guards code that is expected to
execute on
the device) macros, which I think I understand. What I am not clear
about is
the need/use for the EIGEN_USE_GPU macro.
2. How do I configure cmake to
- build Eigen with GPU / CUDA support?
- enable all the unit tests that target the GPU/CUDA?
I want to make sure that our implementation is consistent with what
is
already in place for CUDA, and hence the need to understand the CUDA
implementation.
Any information regarding this will be very helpful.
3. What is the correct protocol to use for upstreaming our code
(once done)
to the Eigen codebase? Will a simple pull request suffice, or do we
need to
do something more? Is there some acceptance criteria/checklist we
need to
complete, before we can can issue the PR?
Please let me know if this is not the correct forum to address these
questions (and point me to the right one :) ) I expect to have a
quite a
few more questions in the coming days, as we
Thanks
deven
Loading...