[eigen] problem with MKL bindings for dense Eigen

Discussion:

Peter

2018-04-06 19:14:39 UTC

Dear All,

the discussion on MKL and sparse Eigen routines reminded me,
that I used to have problems with the MKL bindings, so just tried it again.

I'm currently investigating a system, where a phase transition
including scaling laws is governed by the precision of the underlying
arithmetic. From a numerical point of view it is a nice testbed for the
precision of the libraries. In case you are curious, there's a preprint
on the arxiv server: < https://arxiv.org/abs/1803.08480 > .

Taking the code for double precision and compiling with -DEIGEN_USE_BLAS
I get similar results, as without MKL. Albeit I see small deviations already
in the stable regime. IN this regime, the MKL version appears to be more accurate
compared to plain Eigen. However, in the precision sensitive region, MKL is less accurate.

My real problem is, that compiling with -DEIGEN_USE_BLAS -DEIGEN_USE_LAPACKE_STRICT
gives me a few iteration steps that are still correct and then complete nonsense.

The only LAPACK routine I'm aware of that could be called from the bindings should
be a diagonalization (yeah, the code is written by me, but some parts are a few years old):

Eigen::SelfAdjointEigenSolver<TpMatrix> eigensolverH;
....
eigensolverH.compute( *itH->second ); ///< within a loop over matrices

Before I spend time in hunting the problem, I'd like to ask whether there's
more to do than the above defines and the correct linking options.
Note, I'm using a 2018 MKL and compiled with g++ 7.2 on Linux 4.15.13

CFLAGS = -std=c++11 -O3 -march=native -DNDEBUG

linking with MKL is done via
MKLLIBS = -L $(HOME)/intel/MKL2018/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl
or
MKLLIBS = -L $(HOME)/intel/MKL2018/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl

both versions show the same nonsense.

Thanks in advance,
best regards,
Peter

Gael Guennebaud

2018-04-06 20:49:32 UTC

Permalink

Just a guess, but maybe it's just that MKL gives you different values for
the lowest eigenvalues (e.g., 1e-16 vs 1e-30 vs exact 0 while the largest
is 1) but your code is lacking some thresholds to properly deal with them
(e.g., to ignore them). So tracking the largest and the smallest
eigenvalues might a be a good start.

gael

Post by Peter
Dear All,
the discussion on MKL and sparse Eigen routines reminded me,
that I used to have problems with the MKL bindings, so just tried it again.
I'm currently investigating a system, where a phase transition
including scaling laws is governed by the precision of the underlying
arithmetic. From a numerical point of view it is a nice testbed for the
precision of the libraries. In case you are curious, there's a preprint
on the arxiv server: < https://arxiv.org/abs/1803.08480 > .
Taking the code for double precision and compiling with -DEIGEN_USE_BLAS
I get similar results, as without MKL. Albeit I see small deviations already
in the stable regime. IN this regime, the MKL version appears to be more accurate
compared to plain Eigen. However, in the precision sensitive region, MKL is less accurate.
My real problem is, that compiling with -DEIGEN_USE_BLAS
-DEIGEN_USE_LAPACKE_STRICT
gives me a few iteration steps that are still correct and then complete nonsense.
The only LAPACK routine I'm aware of that could be called from the bindings should
Eigen::SelfAdjointEigenSolver<TpMatrix> eigensolverH;
....
eigensolverH.compute( *itH->second ); ///< within a loop over matrices
Before I spend time in hunting the problem, I'd like to ask whether there's
more to do than the above defines and the correct linking options.
Note, I'm using a 2018 MKL and compiled with g++ 7.2 on Linux 4.15.13
CFLAGS = -std=c++11 -O3 -march=native -DNDEBUG
linking with MKL is done via
MKLLIBS = -L $(HOME)/intel/MKL2018/mkl/lib/intel64 -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl
or
MKLLIBS = -L $(HOME)/intel/MKL2018/mkl/lib/intel64 -lmkl_intel_lp64
-lmkl_sequential -lmkl_core -lpthread -lm -ldl
both versions show the same nonsense.
Thanks in advance,
best regards,
Peter

Gael Guennebaud

2018-04-06 20:51:03 UTC

Permalink

oh, one more thing, I'm not sure that the eigenvalues are ordered the same
way when calling MKL, just in case you relied on that particular ordering.

gael

Post by Gael Guennebaud
Just a guess, but maybe it's just that MKL gives you different values for
the lowest eigenvalues (e.g., 1e-16 vs 1e-30 vs exact 0 while the largest
is 1) but your code is lacking some thresholds to properly deal with them
(e.g., to ignore them). So tracking the largest and the smallest
eigenvalues might a be a good start.
gael

Post by Peter
Dear All,
the discussion on MKL and sparse Eigen routines reminded me,
that I used to have problems with the MKL bindings, so just tried it again.
I'm currently investigating a system, where a phase transition
including scaling laws is governed by the precision of the underlying
arithmetic. From a numerical point of view it is a nice testbed for the
precision of the libraries. In case you are curious, there's a preprint
on the arxiv server: < https://arxiv.org/abs/1803.08480 > .
Taking the code for double precision and compiling with -DEIGEN_USE_BLAS
I get similar results, as without MKL. Albeit I see small deviations already
in the stable regime. IN this regime, the MKL version appears to be more accurate
compared to plain Eigen. However, in the precision sensitive region, MKL
is less accurate.
My real problem is, that compiling with -DEIGEN_USE_BLAS
-DEIGEN_USE_LAPACKE_STRICT
gives me a few iteration steps that are still correct and then complete nonsense.
The only LAPACK routine I'm aware of that could be called from the bindings should
be a diagonalization (yeah, the code is written by me, but some parts are
Eigen::SelfAdjointEigenSolver<TpMatrix> eigensolverH;
....
eigensolverH.compute( *itH->second ); ///< within a loop over matrices
Before I spend time in hunting the problem, I'd like to ask whether there's
more to do than the above defines and the correct linking options.
Note, I'm using a 2018 MKL and compiled with g++ 7.2 on Linux 4.15.13
CFLAGS = -std=c++11 -O3 -march=native -DNDEBUG
linking with MKL is done via
MKLLIBS = -L $(HOME)/intel/MKL2018/mkl/lib/intel64 -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl
or
MKLLIBS = -L $(HOME)/intel/MKL2018/mkl/lib/intel64 -lmkl_intel_lp64
-lmkl_sequential -lmkl_core -lpthread -lm -ldl
both versions show the same nonsense.
Thanks in advance,
best regards,
Peter

Peter

2018-04-06 21:09:37 UTC

Permalink

Dear Gael,

oh, one more thing, I'm not sure that the eigenvalues are ordered the same way when calling MKL, just in case you relied on that particular ordering.

I thought MKL will also sort the eigen values, but that's a good point. I'll check it.

thanks,
Peter

Peter

2018-04-06 21:15:57 UTC

Permalink

Dear Gael,

Just a guess, but maybe it's just that MKL gives you different values for the lowest eigenvalues (e.g., 1e-16 vs 1e-30 vs exact 0 while the largest is 1) but your code is lacking some thresholds to properly deal with them (e.g.,
to ignore them). So tracking the largest and the smallest eigenvalues might a be a good start.

It's all about tracking the lowest eigen values and that's done properly, at least
I intended to do it properly.
That the MKL BLAS version is giving slightly different results is fine.
E.g. if FMA are handled differently it will already influence the result.

My concern is about the mess I get from the MKL diagonalization.
I use MKL a lot in my main code (that one is still without Eigen) and there
everything is fine.

Best regards,
Peter

Gael Guennebaud

2018-04-06 23:06:19 UTC

Permalink

It's not clear to me what you call "mess". Do you get completely different
eigenvalues for the same input matrix? Or do your iterations slightly
diverge before exploding?

gael.

Post by Peter
Dear Gael,

Post by Gael Guennebaud
Just a guess, but maybe it's just that MKL gives you different values

for the lowest eigenvalues (e.g., 1e-16 vs 1e-30 vs exact 0 while the
largest is 1) but your code is lacking some thresholds to properly deal
with them (e.g.,

Post by Gael Guennebaud
to ignore them). So tracking the largest and the smallest eigenvalues

might a be a good start.
It's all about tracking the lowest eigen values and that's done properly, at least
I intended to do it properly.
That the MKL BLAS version is giving slightly different results is fine.
E.g. if FMA are handled differently it will already influence the result.
My concern is about the mess I get from the MKL diagonalization.
I use MKL a lot in my main code (that one is still without Eigen) and there
everything is fine.
Best regards,
Peter

Peter

2018-04-09 07:52:39 UTC

Permalink

Dear Gael,

to be sure I checked the eigen values I get after calling

eigensolverH.compute( *itH->second )

and all samples I looked add where sorted.
Using nm the only LAPACK routine that is called is dsysev

000000000000752e t _GLOBAL__sub_I__ZN10TpOperator11DiagonalizeERN5Eigen6MatrixIdLin1ELi1ELi0ELin1ELi1EEERS_
U __gxx_personality_v0
U LAPACKE_dsyev

It's not clear to me what you call "mess". Do you get completely different eigenvalues for the same input matrix? Or do your iterations slightly diverge before exploding?

I get a few correct than I get complete nonsense. I think I found the problem, see below.

Looking at the description it says:

< https://software.intel.com/en-us/node/521120 >

On exit, if jobz = 'V', then if info = 0, array a contains the orthonormal eigenvectors of the matrix A.

If jobz = 'N', then on exit the lower triangle

(if uplo = 'L') or the upper triangle (if uplo = 'U') of A, including the diagonal, is overwritten.

w: If info = 0, contains the eigenvalues of the matrix A in ascending order.

So, indeed the eigen values should be sorted.
I tried to diagonalize a copy of the matrix, to check whether something gets overwritten,

TpMatrix HHH = *itH->second;
eigensolverH.compute( HHH );

but that doesn't change something.

However changing

U( itH->first.first , itH->first.second, itH->second ->rows(), itH->second ->cols() ) = eigensolverH.eigenvectors();

to

U( itH->first.first , itH->first.second, itH->second ->rows(), itH->second ->cols() ) = eigensolverH.eigenvectors().adjoint();

the results look fine. In return I suspect that the MKL bindings provide a transposed transformation matrix.
The matrices are defines as

typedef Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor> TpMatrix;

Switching to ColMajor, the MKL and non-MKL version both need the first, non-adjoint, version.

Best regards,
Peter

Gael Guennebaud

2018-04-09 11:31:34 UTC

Permalink

I see.

Problem fixed: https://bitbucket.org/eigen/eigen/commits/3a08a093adac (in
3.3 too).

gael

Post by Peter
Dear Gael,
to be sure I checked the eigen values I get after calling
eigensolverH.compute( *itH->second )
and all samples I looked add where sorted.
Using nm the only LAPACK routine that is called is dsysev
000000000000752e t _GLOBAL__sub_I__ZN10TpOperator11DiagonalizeERN
5Eigen6MatrixIdLin1ELi1ELi0ELin1ELi1EEERS_
U __gxx_personality_v0
U LAPACKE_dsyev

Post by Gael Guennebaud
It's not clear to me what you call "mess". Do you get completely

different eigenvalues for the same input matrix? Or do your iterations
slightly diverge before exploding?
I get a few correct than I get complete nonsense. I think I found the problem, see below.
< https://software.intel.com/en-us/node/521120 >
On exit, if jobz = 'V', then if info = 0, array a contains the
orthonormal eigenvectors of the matrix A.
If jobz = 'N', then on exit the lower triangle
(if uplo = 'L') or the upper triangle (if uplo = 'U') of A, including
the diagonal, is overwritten.
w: If info = 0, contains the eigenvalues of the matrix A in ascending order.
So, indeed the eigen values should be sorted.
I tried to diagonalize a copy of the matrix, to check whether something gets overwritten,
TpMatrix HHH = *itH->second;
eigensolverH.compute( HHH );
but that doesn't change something.
However changing
U( itH->first.first , itH->first.second, itH->second ->rows(), itH->second
->cols() ) = eigensolverH.eigenvectors();
to
U( itH->first.first , itH->first.second, itH->second ->rows(), itH->second
->cols() ) = eigensolverH.eigenvectors().adjoint();
the results look fine. In return I suspect that the MKL bindings provide a
transposed transformation matrix.
The matrices are defines as
typedef Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor> TpMatrix;
Switching to ColMajor, the MKL and non-MKL version both need the first,
non-adjoint, version.
Best regards,
Peter

Peter

2018-04-09 13:09:27 UTC

Permalink

Dear Gael,

Post by Gael Guennebaud
I see.
Problem fixed: https://bitbucket.org/eigen/eigen/commits/3a08a093adac (in 3.3 too).

indeed, thanks.

BTW, the MKL version is now consistently more accurate than Eigen without MKL.
Could be a hint on the usage of FMA, but I'm just guessing.

Best regards,
Peter

Gael Guennebaud

2018-04-09 14:13:51 UTC

Permalink

Post by Peter
Dear Gael,

Post by Gael Guennebaud
I see.
Problem fixed: https://bitbucket.org/eigen/eigen/commits/3a08a093adac

(in 3.3 too).
indeed, thanks.
BTW, the MKL version is now consistently more accurate than Eigen without MKL.
Could be a hint on the usage of FMA, but I'm just guessing.

It's more likely more accuracy in the lowest eigenvalues as we assume that
eigenvalues lower than machine_epsilon * largest_eigenvalues are zero.

gael

Post by Peter
Best regards,
Peter

Peter

2018-04-09 14:39:13 UTC

Permalink

Dear Gael,

It's more likely more accuracy in the lowest eigenvalues as we assume that eigenvalues lower than machine_epsilon * largest_eigenvalues are zero.

actually my lowest eigen values are not zero, but of order one.
They get only shifted by the lowest eigen value such that the spectrum starts artificially at zero.
And since I already see this using -DEIGEN_USE_BLAS only, it's at least not just the diagonalization.

Of course Eigen is tremendously superior to MKL as it runs flawlessly with gmp.
So let me thank you again for this great library.

Best regards,
Peter

Gael Guennebaud

2018-04-10 07:09:21 UTC

Permalink

Post by Peter
Dear Gael,

Post by Gael Guennebaud
It's more likely more accuracy in the lowest eigenvalues as we assume

that eigenvalues lower than machine_epsilon * largest_eigenvalues are zero.
actually my lowest eigen values are not zero, but of order one.
They get only shifted by the lowest eigen value such that the spectrum
starts artificially at zero.
And since I already see this using -DEIGEN_USE_BLAS only, it's at least
not just the diagonalization.

interesting, so that's not related to the diagonalization algorithm. If you
suspect FMA, then compile with -mfma to let Eigen use it for its BLAS-like
routines.

gael

Post by Peter
Of course Eigen is tremendously superior to MKL as it runs flawlessly with gmp.
So let me thank you again for this great library.
Best regards,
Peter

Peter

2018-04-10 08:11:56 UTC

Permalink

Dear Gael,

interesting, so that's not related to the diagonalization algorithm. If you suspect FMA, then compile with -mfma to let Eigen use it for its BLAS-like routines.

Interesting, it does make a difference.
I thought that it is sufficient to use -march=native .

The Eigen run without MKL is definitely more accurate when compiling '-mfma' compared to the run without.

Comparing the -mfma version with the MKL run, MKL is slightly more accurate in the
first half of the run, but in the long run the Eigen version without MKL but with
-mfma is more accurate, which is actually the more important regime for me,
as I can improve on the first by increasing a parameter (leading to more CPU time),
but I can't avoid the second.

I'll look into tiḿings later.

Best regards,
Peter