Discussion:
[eigen] Performance regression with Matrix4f multiplication?
Ryo Miyajima
2017-11-26 04:03:08 UTC
Permalink
Hi.

I was testing Matrix4{d,f} multiplication performance across different
Eigen versions and found that since 3.3.0, the Matrix4f multiplication
speed slowed down significantly when compiled with `-march=native` flag in
gcc.
The performance deteriorated on Core i5 and Core i7 but not on a Xeon.
Is this expected behavior (because for example, Eigen optimizes for larger
matrices than 4x4), or am I doing something wrong like not providing the
right compilation flag?

The benchmarks are in the following repo and can be reproduced by docker
images:
https://github.com/sergeant-wizard/eigen_matrix4_benchmark
The assembly code is also provided in the repository.

I searched through bugtracker and this may or may not be related to this
issue: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1342

Thanks in advance for your help.

Ryo Miyajima
Gael Guennebaud
2017-11-26 20:35:02 UTC
Permalink
In some cases your loop gets over-optimized by the compiler leading to
inconsistent results depending on compiler version and flags. See attached
file for a more correct version. Also, better use 3.3.4 than 3.3.0.

gael
Post by Ryo Miyajima
Hi.
I was testing Matrix4{d,f} multiplication performance across different
Eigen versions and found that since 3.3.0, the Matrix4f multiplication
speed slowed down significantly when compiled with `-march=native` flag in
gcc.
The performance deteriorated on Core i5 and Core i7 but not on a Xeon.
Is this expected behavior (because for example, Eigen optimizes for larger
matrices than 4x4), or am I doing something wrong like not providing the
right compilation flag?
The benchmarks are in the following repo and can be reproduced by docker
https://github.com/sergeant-wizard/eigen_matrix4_benchmark
The assembly code is also provided in the repository.
I searched through bugtracker and this may or may not be related to this
issue: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1342
Thanks in advance for your help.
Ryo Miyajima
Ryo Miyajima
2017-11-26 21:26:42 UTC
Permalink
Thank you very much Gael, that indeed seems to have been the problem.
Post by Gael Guennebaud
In some cases your loop gets over-optimized by the compiler leading to
inconsistent results depending on compiler version and flags. See attached
file for a more correct version. Also, better use 3.3.4 than 3.3.0.
gael
Post by Ryo Miyajima
Hi.
I was testing Matrix4{d,f} multiplication performance across different
Eigen versions and found that since 3.3.0, the Matrix4f multiplication
speed slowed down significantly when compiled with `-march=native` flag in
gcc.
The performance deteriorated on Core i5 and Core i7 but not on a Xeon.
Is this expected behavior (because for example, Eigen optimizes for
larger matrices than 4x4), or am I doing something wrong like not providing
the right compilation flag?
The benchmarks are in the following repo and can be reproduced by docker
https://github.com/sergeant-wizard/eigen_matrix4_benchmark
The assembly code is also provided in the repository.
I searched through bugtracker and this may or may not be related to this
issue: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1342
Thanks in advance for your help.
Ryo Miyajima
Loading...