Discussion:
Eigen 3 is extremely slow
Hari Sundar
2011-03-13 23:48:11 UTC
Permalink
Dear All,

I am new to Eigen and was using blitz++ for matrix and vector classes until
now. Since it is no longer being developed and I liked what I read about
eigen, I decided to switch. To start I took one of my standard pieces of
code which does an optimization, the cost function mainly involving
projection of points. An average optimization of the code using blitz++
takes around 100-200 ms. I converted the code to use Eigen (without making
use of any special functions which Eigen offers). I am only using
Matrix/Vector storage, matrix products and addition and data access. The
Eigen version for the same dataset which takes 120 ms with blitz takes
900secs using Eigen. The results are the same, so my code is correct.
I initially thought there was some step in my code which was especially
slow, but on profiling, it looks like it is uniformly slow. For example a
simple 4x4 * 4*1 multiplication takes around 0.5msec.

Any suggestions on what might be going wrong, or should I stick with blitz++
?

best,
Hari
Christoph Hertzberg
2011-03-14 00:55:25 UTC
Permalink
Post by Hari Sundar
Any suggestions on what might be going wrong, or should I stick with blitz++
?
Do you compile with optimization? Without optimization Eigen can get
really slow due to its heavy template machinery.

I tried to find a link to an FAQ entry explaining that but could not
find one, maybe this should be stated somewhere more prominent.

Regards,
Christoph
--
----------------------------------------------
Dipl.-Inf. Christoph Hertzberg
Cartesium 0.051
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen

Tel: (+49) 421-218-64252
----------------------------------------------
Robert Lupton the Good
2011-03-14 00:58:49 UTC
Permalink
Post by Hari Sundar
Any suggestions on what might be going wrong, or should I stick with blitz++
?
Do you compile with optimization? Without optimization Eigen can get really slow due to its heavy template machinery.
Older versions of g++ (e.g. 4.1.2) don't handle the heavy templating well either, even optimised. 4.2 is OK

R
Hari Sundar
2011-03-14 13:15:53 UTC
Permalink
I am using Visual Studio 2008 and have all optimizations turned on. I'll try
to compile this with gcc and see if that improves performance.

I still find it very strange that it performs so poorly.

On Sun, Mar 13, 2011 at 8:58 PM, Robert Lupton the Good <
Post by Hari Sundar
Post by Christoph Hertzberg
Post by Hari Sundar
Any suggestions on what might be going wrong, or should I stick with
blitz++
Post by Christoph Hertzberg
Post by Hari Sundar
?
Do you compile with optimization? Without optimization Eigen can get
really slow due to its heavy template machinery.
Older versions of g++ (e.g. 4.1.2) don't handle the heavy templating well
either, even optimised. 4.2 is OK
R
--
+1 (215) 501 7752
https://www.rad.upenn.edu/sbia/hsundar/
Gael Guennebaud
2011-03-14 13:10:06 UTC
Permalink
For example a simple 4x4 * 4*1 multiplication takes around 0.5msec.
hm, that's indeed extremely slow... which compiler? flags?

for instance here a 4x4 * 4x1 product takes less than 4e-6msec, and
even in debug mode (-g2) with a very old gcc 3.4 it takes less than
0.0008 msec...

gael
Benoit Jacob
2011-03-14 13:18:34 UTC
Permalink
Post by Gael Guennebaud
For example a simple 4x4 * 4*1 multiplication takes around 0.5msec.
hm, that's indeed extremely slow... which compiler? flags?
for instance here a 4x4 * 4x1 product takes less than 4e-6msec, and
even in debug mode (-g2) with a very old gcc 3.4 it takes less than
0.0008 msec...
If his matrix has NaN values, that could explain another 500x factor.

Benoit
Post by Gael Guennebaud
gael
Hauke Heibel
2011-03-14 13:41:17 UTC
Permalink
Post by Benoit Jacob
If his matrix has NaN values, that could explain another 500x factor.
But that should have affected blitz too, so I don't believe in it.

I think we need a code sample in order to reproduce the problem.

- Hauke
Hari Sundar
2011-03-14 13:41:28 UTC
Permalink
I do not have any NaN values and I have checked it for my sample dataset.

I did some additional tests and it appears that it is not the matrix
multiplication itself that is slow, but it appears to be some really bad
case of cache usage.

Once projected, I use the points to sample from an Array (512x512). If I do
not use Eigen for this array (instead using a std. C array), then my
performance is much better.

I hope that is able to point you in the right direction.

I was using an ArrayXXf (also tried MatrixXf) to store the array.

best,
Hari
Post by Benoit Jacob
Post by Gael Guennebaud
For example a simple 4x4 * 4*1 multiplication takes around 0.5msec.
hm, that's indeed extremely slow... which compiler? flags?
for instance here a 4x4 * 4x1 product takes less than 4e-6msec, and
even in debug mode (-g2) with a very old gcc 3.4 it takes less than
0.0008 msec...
If his matrix has NaN values, that could explain another 500x factor.
Benoit
Post by Gael Guennebaud
gael
--
+1 (215) 501 7752
https://www.rad.upenn.edu/sbia/hsundar/
Benoit Jacob
2011-03-14 13:57:14 UTC
Permalink
Post by Hari Sundar
I do not have any NaN values and I have checked it for my sample dataset.
I did some additional tests and it appears that it is not the matrix
multiplication itself that is slow, but it appears to be some really bad
case of cache usage.
Once projected, I use the points to sample from an Array (512x512). If I do
not use Eigen for this array (instead using a std. C array), then my
performance is much better.
I hope that is able to point you in the right direction.
As Hauke said, the only useful thing would be a compilable test case.

Benoit
Post by Hari Sundar
I was using an ArrayXXf (also tried MatrixXf) to store the array.
best,
Hari
Post by Benoit Jacob
Post by Gael Guennebaud
For example a simple 4x4 * 4*1 multiplication takes around 0.5msec.
hm, that's indeed extremely slow... which compiler? flags?
for instance here a 4x4 * 4x1 product takes less than 4e-6msec, and
even in debug mode (-g2) with a very old gcc 3.4 it takes less than
0.0008 msec...
If his matrix has NaN values, that could explain another 500x factor.
Benoit
Post by Gael Guennebaud
gael
--
+1 (215) 501 7752
https://www.rad.upenn.edu/sbia/hsundar/
Hari Sundar
2011-03-14 14:00:34 UTC
Permalink
I am working on creating a simple example that I can share. I'll send you
both the eigen and blitz versions. (if I am able to re-create the slowdown
with the simple example)

thanks,
Hari
Post by Benoit Jacob
Post by Hari Sundar
I do not have any NaN values and I have checked it for my sample dataset.
I did some additional tests and it appears that it is not the matrix
multiplication itself that is slow, but it appears to be some really bad
case of cache usage.
Once projected, I use the points to sample from an Array (512x512). If I
do
Post by Hari Sundar
not use Eigen for this array (instead using a std. C array), then my
performance is much better.
I hope that is able to point you in the right direction.
As Hauke said, the only useful thing would be a compilable test case.
Benoit
Post by Hari Sundar
I was using an ArrayXXf (also tried MatrixXf) to store the array.
best,
Hari
Post by Benoit Jacob
Post by Gael Guennebaud
For example a simple 4x4 * 4*1 multiplication takes around 0.5msec.
hm, that's indeed extremely slow... which compiler? flags?
for instance here a 4x4 * 4x1 product takes less than 4e-6msec, and
even in debug mode (-g2) with a very old gcc 3.4 it takes less than
0.0008 msec...
If his matrix has NaN values, that could explain another 500x factor.
Benoit
Post by Gael Guennebaud
gael
--
+1 (215) 501 7752
https://www.rad.upenn.edu/sbia/hsundar/
--
+1 (215) 501 7752
https://www.rad.upenn.edu/sbia/hsundar/
Gael Guennebaud
2011-03-14 15:01:35 UTC
Permalink
Perhaps you access to the Eigen's ArrayXXf in a row major fashion?
Remember that by default Eigen's arrays and matrices are column major.
To check that you can simply try with:

Array<float,Dynamic,Dynamic,Eigen::RowMajor>

and leave the rest of the code unchanged.

I cannot believe that's operator(i,j) itself which is miscompiled.

gael
Post by Hari Sundar
I am working on creating a simple example that I can share. I'll send you
both the eigen and blitz versions. (if I am able to re-create the slowdown
with the simple example)
thanks,
Hari
Post by Benoit Jacob
Post by Hari Sundar
I do not have any NaN values and I have checked it for my sample dataset.
I did some additional tests and it appears that it is not the matrix
multiplication itself that is slow, but it appears to be some really bad
case of cache usage.
Once projected, I use the points to sample from an Array (512x512). If I do
not use Eigen for this array (instead using a std. C array), then my
performance is much better.
I hope that is able to point you in the right direction.
As Hauke said, the only useful thing would be a compilable test case.
Benoit
Post by Hari Sundar
I was using an ArrayXXf (also tried MatrixXf) to store the array.
best,
Hari
Post by Benoit Jacob
Post by Gael Guennebaud
For example a simple 4x4 * 4*1 multiplication takes around 0.5msec.
hm, that's indeed extremely slow... which compiler? flags?
for instance here a 4x4 * 4x1 product takes less than 4e-6msec, and
even in debug mode (-g2) with a very old gcc 3.4 it takes less than
0.0008 msec...
If his matrix has NaN values, that could explain another 500x factor.
Benoit
Post by Gael Guennebaud
gael
--
+1 (215) 501 7752
https://www.rad.upenn.edu/sbia/hsundar/
--
+1 (215) 501 7752
https://www.rad.upenn.edu/sbia/hsundar/
Loading...