Discussion:
[eigen] Intel (R) MKL IE SpBLAS support in Eigen
Zhukova, Maria
2018-04-03 21:39:16 UTC
Permalink
Hello Eigen community,

My name is Maria Zhukova and I'm a software development engineer at Intel (r) MKL Sparse team.

My team is interested in contributing into Eigen, so I've investigated our possibilities and so far this is what I have:
Eigen support different operations for sparse matrices stored in CSR and CSC format which can be implemented on a basis of IE SpBLAS kernels (please, refer to https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next operations:
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).
I've already started with implementation of sparse_time_dense_impl_mkl kernel which is based on mkl_sparse_?_mv (included in patch).

This is how it will look like for user:
#include <Eigen/SpBLASSupport> <-- NEW: IE SpBLAS include module

void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;

A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
A.createSparseHandle(); /* NEW: is used to create handle required for all IE SpBLAS routines */

// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation = SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation = SPARSE_OPERATION_CONJUGATE_TRANSPOSE */

A.destroySparseHandle(); /* NEW: is used to delete created handle */
}

I've attached a draft patch including all necessary changes and would like to hear your feedback.
Please, let me know if you have any questions and comments.

Best regards,
Maria
Edward Lam
2018-04-03 23:27:50 UTC
Permalink
Hi Maria,

I'm only an Eigen user but I'm extremely happy at this effort to have more MKL
wrapping in Eigen! As our application depends on Intel TBB, the only real way to
use Eigen with multithreaded support is via MKL.
A.createSparseHandle(); /* *NEW*: is used to create handle required for all IE
SpBLAS routines */
...
A.destroySparseHandle(); /* *NEW*: is used to delete created handle */
I think that this is very un-C++ like. I'm not sure why it was decided to be
done this way. Why wouldn't the sparse handle be allocated within A itself and
then deallocated during A's constructor? Note also that in C++, we need to worry
about what happens when the matrix is copied so this state management needs to
be within the matrix itself or else the semantics of these create/destroy
handles are very troublesome.

Best Regards,
-Edward
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at Intel ® MKL
Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated our
Eigen support different operations for sparse matrices stored in CSR and CSC
format which can be implemented on a basis of IE SpBLAS kernels (please, refer
to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
for the general idea of interfaces)
....
                SparseMatrix + SparseMatrix (mkl_sparse_?_add)
                SparseMatrix * DenseVector  (mkl_sparse_?_mv)
                SparseMatrix * DenseMatrix   (mkl_sparse_?_mm)
                SparseMatrix * SparseMatrix  (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).
I’ve already started with implementation of sparse_time_dense_impl_mkl kernel
which is based on mkl_sparse_?_mv (included in patch).
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module
void main () {
  SparseMatrix<double, RowMajor> A;
 Matrix<double, Dynamic, 1> x, y;
  A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required for all IE
SpBLAS routines */
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_CONJUGATE_TRANSPOSE */
*A.destroySparseHandle();* /* *NEW*: is used to delete created handle */
}
I’ve attached a draft patch including all necessary changes and would like to
hear your feedback.
Please, let me know if you have any questions and comments.
Best regards,
Maria
Zhukova, Maria
2018-04-04 00:31:17 UTC
Permalink
Hello Edward and thanks much for your feedback!

This is a good point.
Actually, the correct place for creating IE SpBLAS handle is one of the key moments which I want to discuss.
The thing is -- all Intel (R) MKL IE SpBLAS functionality requires matrix to be stored in compressed row/column storage format. Necessary storage format can be obtained by calling SparseMatrix::makeCompressed() function.
So, I wonder if the better decision will be to create sparse handle inside the makeCompressed() and destroy in SparseMatrix destructor. What do you think about this option?

Best regards,
Maria

-----Original Message-----
From: Edward Lam [mailto:***@sidefx.com]
Sent: Tuesday, April 3, 2018 4:28 PM
To: ***@lists.tuxfamily.org
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen

Hi Maria,

I'm only an Eigen user but I'm extremely happy at this effort to have more MKL wrapping in Eigen! As our application depends on Intel TBB, the only real way to use Eigen with multithreaded support is via MKL.
A.createSparseHandle(); /* *NEW*: is used to create handle required for all IE > SpBLAS routines */ > ...
A.destroySparseHandle(); /* *NEW*: is used to delete created handle */
I think that this is very un-C++ like. I'm not sure why it was decided to be done this way. Why wouldn't the sparse handle be allocated within A itself and then deallocated during A's constructor? Note also that in C++, we need to worry about what happens when the matrix is copied so this state management needs to be within the matrix itself or else the semantics of these create/destroy handles are very troublesome.

Best Regards,
-Edward
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at
Intel ® MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated
Eigen support different operations for sparse matrices stored in CSR
and CSC format which can be implemented on a basis of IE SpBLAS
kernels (please, refer to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-e
xecutor-sparse-blas-routines
for the general idea of interfaces)
....
                SparseMatrix + SparseMatrix (mkl_sparse_?_add)
                SparseMatrix * DenseVector  (mkl_sparse_?_mv)
                SparseMatrix * DenseMatrix   (mkl_sparse_?_mm)
                SparseMatrix * SparseMatrix  (mkl_sparse_spmm), and
Triangular solve (mkl_sparse_?_trsv).
I’ve already started with implementation of sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module
void main () {
  SparseMatrix<double, RowMajor> A;
 Matrix<double, Dynamic, 1> x, y;
  A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required
for all IE SpBLAS routines */
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */ y = beta*y + alpha*A.transpose()*x;
/* call to mkl_sparse_?_mv with operation = SPARSE_OPERATION_TRANSPOSE
*/ y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_CONJUGATE_TRANSPOSE */
*A.destroySparseHandle();* /* *NEW*: is used to delete created handle
*/ }
I’ve attached a draft patch including all necessary changes and would
like to hear your feedback.
Please, let me know if you have any questions and comments.
Bes
Edward Lam
2018-04-04 00:58:07 UTC
Permalink
Hi Maria,

I think your new proposal sounds much better. I haven't looked at your
patch but please make sure the resource management is done properly when
the matrix is copied as well (ie. it's copy constructor and/or operator=()
method). Unfortunately, I'm a relatively newcomer to Eigen and so don't
understand all the intricacies of its internals.

Thanks in advance for your hard work!

-Edward
Post by Zhukova, Maria
Hello Edward and thanks much for your feedback!
This is a good point.
Actually, the correct place for creating IE SpBLAS handle is one of the
key moments which I want to discuss.
The thing is -- all Intel (R) MKL IE SpBLAS functionality requires matrix
to be stored in compressed row/column storage format. Necessary storage
format can be obtained by calling SparseMatrix::makeCompressed() function.
So, I wonder if the better decision will be to create sparse handle inside
the makeCompressed() and destroy in SparseMatrix destructor. What do you
think about this option?
Best regards,
Maria
-----Original Message-----
Sent: Tuesday, April 3, 2018 4:28 PM
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
Hi Maria,
I'm only an Eigen user but I'm extremely happy at this effort to have more
MKL wrapping in Eigen! As our application depends on Intel TBB, the only
real way to use Eigen with multithreaded support is via MKL.
A.createSparseHandle(); /* *NEW*: is used to create handle required for
all IE > SpBLAS routines */ > ...
A.destroySparseHandle(); /* *NEW*: is used to delete created handle */
I think that this is very un-C++ like. I'm not sure why it was decided to
be done this way. Why wouldn't the sparse handle be allocated within A
itself and then deallocated during A's constructor? Note also that in C++,
we need to worry about what happens when the matrix is copied so this state
management needs to be within the matrix itself or else the semantics of
these create/destroy handles are very troublesome.
Best Regards,
-Edward
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at
Intel ® MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated
Eigen support different operations for sparse matrices stored in CSR
and CSC format which can be implemented on a basis of IE SpBLAS
kernels (please, refer to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-e
xecutor-sparse-blas-routines
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
....
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)
SparseMatrix * SparseMatrix (mkl_sparse_spmm), and
Triangular solve (mkl_sparse_?_trsv).
I’ve already started with implementation of sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required
for all IE SpBLAS routines */
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */ y = beta*y + alpha*A.transpose()*x;
/* call to mkl_sparse_?_mv with operation = SPARSE_OPERATION_TRANSPOSE
*/ y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_CONJUGATE_TRANSPOSE */
*A.destroySparseHandle();* /* *NEW*: is used to delete created handle
*/ }
I’ve attached a draft patch including all necessary changes and would
like to hear your feedback.
Please, let me know if you have any questions and comments.
Best regards,
Maria
Christoph Hertzberg
2018-04-04 09:43:36 UTC
Permalink
Post by Edward Lam
Hi Maria,
I think your new proposal sounds much better. I haven't looked at your
patch but please make sure the resource management is done properly when
the matrix is copied as well (ie. it's copy constructor and/or operator=()
method). [...].
In other words, obey the rule of 5/3/0:
http://en.cppreference.com/w/cpp/language/rule_of_three

I.e., ideally have a helper struct which obeys the rule of 5 and don't
use custom copy/assignment implementations in the main class -- I'm
aware that Eigen does not strictly follow that everywhere, e.g.,
SparseMatrix only obeys the rule of 3 at the moment.


Christoph
Post by Edward Lam
Thanks in advance for your hard work!
-Edward
Post by Zhukova, Maria
Hello Edward and thanks much for your feedback!
This is a good point.
Actually, the correct place for creating IE SpBLAS handle is one of the
key moments which I want to discuss.
The thing is -- all Intel (R) MKL IE SpBLAS functionality requires matrix
to be stored in compressed row/column storage format. Necessary storage
format can be obtained by calling SparseMatrix::makeCompressed() function.
So, I wonder if the better decision will be to create sparse handle inside
the makeCompressed() and destroy in SparseMatrix destructor. What do you
think about this option?
Best regards,
Maria
-----Original Message-----
Sent: Tuesday, April 3, 2018 4:28 PM
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
Hi Maria,
I'm only an Eigen user but I'm extremely happy at this effort to have more
MKL wrapping in Eigen! As our application depends on Intel TBB, the only
real way to use Eigen with multithreaded support is via MKL.
A.createSparseHandle(); /* *NEW*: is used to create handle required for
all IE > SpBLAS routines */ > ...
A.destroySparseHandle(); /* *NEW*: is used to delete created handle */
I think that this is very un-C++ like. I'm not sure why it was decided to
be done this way. Why wouldn't the sparse handle be allocated within A
itself and then deallocated during A's constructor? Note also that in C++,
we need to worry about what happens when the matrix is copied so this state
management needs to be within the matrix itself or else the semantics of
these create/destroy handles are very troublesome.
Best Regards,
-Edward
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at
Intel ® MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated
Eigen support different operations for sparse matrices stored in CSR
and CSC format which can be implemented on a basis of IE SpBLAS
kernels (please, refer to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-e
xecutor-sparse-blas-routines
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
....
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)
SparseMatrix * SparseMatrix (mkl_sparse_spmm), and
Triangular solve (mkl_sparse_?_trsv).
I’ve already started with implementation of sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required
for all IE SpBLAS routines */
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */ y = beta*y + alpha*A.transpose()*x;
/* call to mkl_sparse_?_mv with operation = SPARSE_OPERATION_TRANSPOSE
*/ y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_CONJUGATE_TRANSPOSE */
*A.destroySparseHandle();* /* *NEW*: is used to delete created handle
*/ }
I’ve attached a draft patch including all necessary changes and would
like to hear your feedback.
Please, let me know if you have any questions and comments.
Best regards,
Maria
--
Dr.-Ing. Christoph Hertzberg

Besuchsadresse der Nebengeschäftsstelle:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
28359 Bremen, Germany

Postadresse der Hauptgeschäftsstelle Standort Bremen:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
28359 Bremen, Germany

Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
E-Mail: ***@dfki.de

Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Zhukova, Maria
2018-04-05 00:11:13 UTC
Permalink
Hi Cristoph,

Thanks for your comments!
Regarding sparse handle: the idea of IE SpBLAS handle is that it should be created just once and contain internal data related to the matrix structure, different optimizations, etc., which is used in subsequent calls, so it will be very inefficient to do create and destroy on-the-fly.
That’s why I suggested adding this handle as an additional member and include creating routine in a place such as makeCompressed() function (because it is anyway required in order to make matrix format compatible with IE SpBLAS functionality).
I will take care of all the remarks in a next patch version once I gather more feedbacks.

Best regards,
Maria

-----Original Message-----
From: Christoph Hertzberg [mailto:***@informatik.uni-bremen.de]
Sent: Wednesday, April 4, 2018 2:44 AM
To: ***@lists.tuxfamily.org
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
Post by Edward Lam
Hi Maria,
I think your new proposal sounds much better. I haven't looked at your
patch but please make sure the resource management is done properly
when the matrix is copied as well (ie. it's copy constructor and/or
operator=() method). [...].
In other words, obey the rule of 5/3/0:
http://en.cppreference.com/w/cpp/language/rule_of_three

I.e., ideally have a helper struct which obeys the rule of 5 and don't use custom copy/assignment implementations in the main class -- I'm aware that Eigen does not strictly follow that everywhere, e.g., SparseMatrix only obeys the rule of 3 at the moment.


Christoph
Post by Edward Lam
Thanks in advance for your hard work!
-Edward
Post by Zhukova, Maria
Hello Edward and thanks much for your feedback!
This is a good point.
Actually, the correct place for creating IE SpBLAS handle is one of
the key moments which I want to discuss.
The thing is -- all Intel (R) MKL IE SpBLAS functionality requires
matrix to be stored in compressed row/column storage format.
Necessary storage format can be obtained by calling SparseMatrix::makeCompressed() function.
So, I wonder if the better decision will be to create sparse handle
inside the makeCompressed() and destroy in SparseMatrix destructor.
What do you think about this option?
Best regards,
Maria
-----Original Message-----
Sent: Tuesday, April 3, 2018 4:28 PM
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
Hi Maria,
I'm only an Eigen user but I'm extremely happy at this effort to have
more MKL wrapping in Eigen! As our application depends on Intel TBB,
the only real way to use Eigen with multithreaded support is via MKL.
A.createSparseHandle(); /* *NEW*: is used to create handle
required for all IE > SpBLAS routines */ > ...
A.destroySparseHandle(); /* *NEW*: is used to delete created
handle */
I think that this is very un-C++ like. I'm not sure why it was
decided to be done this way. Why wouldn't the sparse handle be
allocated within A itself and then deallocated during A's
constructor? Note also that in C++, we need to worry about what
happens when the matrix is copied so this state management needs to
be within the matrix itself or else the semantics of these create/destroy handles are very troublesome.
Best Regards,
-Edward
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at
Intel ® MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve
Eigen support different operations for sparse matrices stored in CSR
and CSC format which can be implemented on a basis of IE SpBLAS
kernels (please, refer to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector
-e
xecutor-sparse-blas-routines
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
....
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).
I’ve already started with implementation of
sparse_time_dense_impl_mkl kernel which is based on mkl_sparse_?_mv (included in patch).
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required
for all IE SpBLAS routines */
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */ y = beta*y +
alpha*A.transpose()*x;
/* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_TRANSPOSE */ y = beta*y + alpha*A.adjoint()*x; /*
call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_CONJUGATE_TRANSPOSE */
*A.destroySparseHandle();* /* *NEW*: is used to delete created
handle */ }
I’ve attached a draft patch including all necessary changes and
would like to hear your feedback.
Please, let me know if you have any questions and comments.
Best regards,
Maria
--
Dr.-Ing. Christoph Hertzberg

Besuchsadresse der Nebengeschäftsstelle:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
28359 Bremen, Germany

Postadresse der Hauptgeschäftsstelle Standort Bremen:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
28359 Bremen, Germany

Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
E-Mail: ***@dfki.de

Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
--------------------------------------
Christoph Hertzberg
2018-04-04 09:37:13 UTC
Permalink
Hi,

I wonder what the overhead of `mkl_sparse_?_create_cs?` and
`mkl_sparse_destroy` is. Ideally, this should be done on-the-fly using
some inline functions (have a look, e.g., at `viewAsCholmod` in
<Eigen/src/CholmodSupport/CholmodSupport.h> )

In general, avoid adding data members to Core types if possible (this
will break ABI compatibility). And if you call methods which return a
status, at least check the status and assert that it is successful
(making the user check that manually is very C-stylish).

Cheers,
Christoph
Post by Zhukova, Maria
Hello Eigen community,
My name is Maria Zhukova and I'm a software development engineer at Intel (r) MKL Sparse team.
Eigen support different operations for sparse matrices stored in CSR and CSC format which can be implemented on a basis of IE SpBLAS kernels (please, refer to https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines for the general idea of interfaces)
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).
I've already started with implementation of sparse_time_dense_impl_mkl kernel which is based on mkl_sparse_?_mv (included in patch).
#include <Eigen/SpBLASSupport> <-- NEW: IE SpBLAS include module
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
A.createSparseHandle(); /* NEW: is used to create handle required for all IE SpBLAS routines */
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation = SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation = SPARSE_OPERATION_CONJUGATE_TRANSPOSE */
A.destroySparseHandle(); /* NEW: is used to delete created handle */
}
I've attached a draft patch including all necessary changes and would like to hear your feedback.
Please, let me know if you have any questions and comments.
Best regards,
Maria
--
Dr.-Ing. Christoph Hertzberg

Besuchsadresse der Nebengeschäftsstelle:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
28359 Bremen, Germany

Postadresse der Hauptgeschäftsstelle Standort Bremen:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
28359 Bremen, Germany

Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
E-Mail: ***@dfki.de

Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Gael Guennebaud
2018-04-05 12:01:03 UTC
Permalink
Thank you for opening this discussion on the public mailing list.

So let's discuss about the public API, which currently is not very
convenient as already noticed by others. Issues are:

(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not sounds
very generic.
- We need a way to control:
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.


In order to discuss these issues, let's consider the following typical
pattern: (e.g., non-linear optimization, eigenvalues, ...)

SimplicialLDLT<SparseMatrix<double> > llt(A);

while(...) {
...
x = llt.solve(b);
...
}

Here the triangular L factor is going to be used for triangular and
transposed-triangular solves dozens to hundreds of time but only the user
of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix.
Moreover, the user does not own the SparseMatrix that we want to
analyze/optimize for. Other patterns are likely easier to handle, so let's
focus on it for now.

Regarding (i1), I would suggest to introduce a new type, say
MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance.
Then for (i2) and (i3) we could imagine something like:

MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}

All states in MklSparseMatrix would be mutable.

Between a pair of beginAnalysis/endAnalysis each call to a supported
operation would trigger calls to
mkl_sparse_set_*_hint()/mkl_sparse_optimize.
Optionally, we could even add a "dryrun" mode for which no operation would
be performed, only calls to mkl_sparse_set_*_hint() and then
mkl_sparse_optimize would be called in endAnalysis(). This way
mkl_sparse_optimize() would be called only once.

And that's it. Our example would look-like:


SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
...
if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}

or using a "dry-run" mode:

SimplicialLDLT<MklSparseMatrix<double> > llt(A);

llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D
would still be performed, but calls to actual triangular solves would be
by-passed
llt.matrixL().endAnalysis();

while(...) {
...
x = llt.solve(b);
...
}


If someone directly deal with the factor L, then we could follow the same
pattern or copy the SparseMatrix factor L to a MklSparseMatrix:


SimplicialLLT<SparseMatrix<double> > llt(A);

MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
...
y = L.triangularView<Lower>() * x;
...
}


This design in quite general and expendable to any sparse-optimizers, even
built-in ones in the future.

In contrast to the current proposal, only selected operations would be
passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).

What do you think?


gael
Post by Zhukova, Maria
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at Intel
® MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated our
Eigen support different operations for sparse matrices stored in CSR and
CSC format which can be implemented on a basis of IE SpBLAS kernels
(please, refer to https://software.intel.com/en-
us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines for
the general idea of interfaces)
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).
I’ve already started with implementation of sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS
include module
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();* /* *NEW*: is used to
create handle required for all IE SpBLAS routines */
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to
mkl_sparse_?_mv with operation = SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to
mkl_sparse_?_mv with operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to
mkl_sparse_?_mv with operation = SPARSE_OPERATION_CONJUGATE_TRANSPOSE */
* A.destroySparseHandle();* /* *NEW*: is used to
delete created handle */
}
I’ve attached a draft patch including all necessary changes and would like
to hear your feedback.
Please, let me know if you have any questions and comments.
Best regards,
Maria
Edward Lam
2018-04-05 13:31:57 UTC
Permalink
Would it be useful to incorporate lambda's into the interface to avoid begin/end
pairs? So from the user side, we would write code like this instead:

1) Analyze and run

SimplicialLDLT<MklSparseMatrix<double>> llt(A);
int it = 0;
for (int it = 0; ...; ++it) {
if (it == 0)
llt.matrixL().sparseAnalyzeAndRun(100, [&] { llt.solve(b); });
else
llt.solve(b);
// ...
}

2) Analyze only

SimplicialLDLT<MklSparseMatrix<double>> llt(A);
llt.matrixL().sparseAnalyze(100, [&] { llt.solve(b); });
// and solve as usual

For more complicated algorithms, one can always outline the lambda and pass it
into the analysis.

Cheers,
-Edward
Post by Gael Guennebaud
Thank you for opening this discussion on the public mailing list.
So let's discuss about the public API, which currently is not very convenient as
(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not sounds very
generic.
(i2)   - which operations are going to be analyzed/optimized,
(i3)   - and specify the 'expected_calls' parameter.
(e.g., non-linear optimization, eigenvalues, ...)
SimplicialLDLT<SparseMatrix<double> > llt(A);
while(...) {
    ...
x = llt.solve(b);
...
}
Here the triangular L factor is going to be used for triangular and
transposed-triangular solves dozens to hundreds of time but only the user of
SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix. Moreover, the
user does not own the SparseMatrix that we want to analyze/optimize for. Other
patterns are likely easier to handle, so let's focus on it for now.
Regarding (i1), I would suggest to introduce a new type, say MklSparseMatrix<>
that would enhance SparseMatrix<> through inheritance. Then for (i2) and (i3) we
MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}
All states in MklSparseMatrix would be mutable.
Between a pair of beginAnalysis/endAnalysis each call to a supported operation
would trigger calls to mkl_sparse_set_*_hint()/mkl_sparse_optimize.
Optionally, we could even add a "dryrun" mode for which no operation would be
performed, only calls to mkl_sparse_set_*_hint() and then mkl_sparse_optimize
would be called in endAnalysis(). This way mkl_sparse_optimize() would be called
only once.
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
    ...
    if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D would
still be performed, but calls to actual triangular solves would be by-passed
llt.matrixL().endAnalysis();
while(...) {
    ...
    x = llt.solve(b);
...
}
If someone directly deal with the factor L, then we could follow the same
SimplicialLLT<SparseMatrix<double> > llt(A);
MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
    ...
y = L.triangularView<Lower>() * x;
...
}
This design in quite general and expendable to any sparse-optimizers, even
built-in ones in the future.
In contrast to the current proposal, only selected operations would be passed to
MKL (need to use a MklSparseMatrix + begin/end recording phase).
What do you think?
gael
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at Intel ®
MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated our
Eigen support different operations for sparse matrices stored in CSR and CSC
format which can be implemented on a basis of IE SpBLAS kernels (please,
refer to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
<https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines>
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
operations:____
                SparseMatrix + SparseMatrix (mkl_sparse_?_add)
                SparseMatrix * DenseVector  (mkl_sparse_?_mv)____
                SparseMatrix * DenseMatrix   (mkl_sparse_?_mm)____
                SparseMatrix * SparseMatrix  (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).____
I’ve already started with implementation of sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).____
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module ____
void main () {
  SparseMatrix<double, RowMajor> A;
 Matrix<double, Dynamic, 1> x, y;
  A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required for all
IE SpBLAS routines */____
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation
= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____
*A.destroySparseHandle();* /* *NEW*: is used to delete created handle */
}____
__ __
I’ve attached a draft patch including all necessary changes and would like
to hear your feedback.
Please, let me know if you have any questions and comments.____
__ __
Best regards,
Maria____
__ __
__ __
__ __
Christoph Hertzberg
2018-04-05 21:00:38 UTC
Permalink
Post by Edward Lam
Would it be useful to incorporate lambda's into the interface to avoid
begin/end pairs? So from the user side, we would write code like this
1) Analyze and run
    SimplicialLDLT<MklSparseMatrix<double>> llt(A);
    int it = 0;
    for (int it = 0; ...; ++it) {
        if (it == 0)
            llt.matrixL().sparseAnalyzeAndRun(100, [&] { llt.solve(b); });
        else
            llt.solve(b);
        // ...
    }
2) Analyze only
    SimplicialLDLT<MklSparseMatrix<double>> llt(A);
    llt.matrixL().sparseAnalyze(100, [&] { llt.solve(b); });
    // and solve as usual
For more complicated algorithms, one can always outline the lambda and
pass it into the analysis.
That would certainly clean up things a lot. Having to call
A.beginXY();
doStuff();
A.endXY();
is a very C-stylish API, which is error-prone (e.g., one of the calls
might not get called, because it is masked by a wrong if-condition, or
due to an exception, which is caught only outside).
This can usually be avoided with proper C++ constructs.

However, if this is required, I would suggest to add this method not
only to matrices but also to the decomposition (and pass it through to
the internal matrices). Otherwise, this does not scale for
decompositions which use more than one matrix (like SparseLU). And we
could even let the sparseAnalyze() functions return a proxy to the
decomposition which would allow writing something like:

x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});

or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`

The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes'
an operation (after how many iterations do you actually benefit from the
overhead of analyzing the operation?)


Christoph
Post by Edward Lam
Cheers,
-Edward
Post by Gael Guennebaud
Thank you for opening this discussion on the public mailing list.
So let's discuss about the public API, which currently is not very
(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not
sounds very generic.
(i2)   - which operations are going to be analyzed/optimized,
(i3)   - and specify the 'expected_calls' parameter.
In order to discuss these issues, let's consider the following typical
pattern: (e.g., non-linear optimization, eigenvalues, ...)
SimplicialLDLT<SparseMatrix<double> > llt(A);
while(...) {
     ...
x = llt.solve(b);
...
}
Here the triangular L factor is going to be used for triangular and
transposed-triangular solves dozens to hundreds of time but only the
user of SimplicialLDLT knowns that, not SimplicialLDLT, nor
SparseMatrix. Moreover, the user does not own the SparseMatrix that we
want to analyze/optimize for. Other patterns are likely easier to
handle, so let's focus on it for now.
Regarding (i1), I would suggest to introduce a new type, say
MklSparseMatrix<> that would enhance SparseMatrix<> through
MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}
All states in MklSparseMatrix would be mutable.
Between a pair of beginAnalysis/endAnalysis each call to a supported
operation would trigger calls to
mkl_sparse_set_*_hint()/mkl_sparse_optimize.
Optionally, we could even add a "dryrun" mode for which no operation
would be performed, only calls to mkl_sparse_set_*_hint() and then
mkl_sparse_optimize would be called in endAnalysis(). This way
mkl_sparse_optimize() would be called only once.
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
     ...
     if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D
would still be performed, but calls to actual triangular solves would
be by-passed
llt.matrixL().endAnalysis();
while(...) {
     ...
     x = llt.solve(b);
...
}
If someone directly deal with the factor L, then we could follow the
SimplicialLLT<SparseMatrix<double> > llt(A);
MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
     ...
y = L.triangularView<Lower>() * x;
...
}
This design in quite general and expendable to any sparse-optimizers,
even built-in ones in the future.
In contrast to the current proposal, only selected operations would be
passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).
What do you think?
gael
On Tue, Apr 3, 2018 at 11:39 PM, Zhukova, Maria
    Hello Eigen community,
    My name is Maria Zhukova and I’m a software development engineer
at Intel ®
    MKL Sparse team.
    My team is interested in contributing into Eigen, so I’ve
investigated our
    Eigen support different operations for sparse matrices stored in
CSR and CSC
    format which can be implemented on a basis of IE SpBLAS kernels
(please,
    refer to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
<https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines>
    for the general idea of interfaces)
    , basically we want to implement calls to our IE SpBLAS into next
    operations:____
                     SparseMatrix + SparseMatrix (mkl_sparse_?_add)
                     SparseMatrix * DenseVector  (mkl_sparse_?_mv)____
                     SparseMatrix * DenseMatrix   (mkl_sparse_?_mm)____
                     SparseMatrix * SparseMatrix  (mkl_sparse_spmm),
    and Triangular solve (mkl_sparse_?_trsv).____
    I’ve already started with implementation of
sparse_time_dense_impl_mkl
    kernel which is based on mkl_sparse_?_mv (included in patch).____
    *#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include
module ____
    void main () {
       SparseMatrix<double, RowMajor> A;
      Matrix<double, Dynamic, 1> x, y;
       A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
    *A.createSparseHandle();*/* *NEW*: is used to create handle
required for all
    IE SpBLAS routines */____
    // support of IE SpBLAS is here
    y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
    SPARSE_OPERATION_NON_TRANSPOSE */
    y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
    operation = SPARSE_OPERATION_TRANSPOSE */
    y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with
operation
    = SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____
    *A.destroySparseHandle();* /* *NEW*: is used to delete created
handle */
    }____
    __ __
    I’ve attached a draft patch including all necessary changes and
would like
    to hear your feedback.
    Please, let me know if you have any questions and comments.____
    __ __
    Best regards,
    Maria____
    __ __
    __ __
    __ __
--
Dr.-Ing. Christoph Hertzberg

Besuchsadresse der Nebengeschäftsstelle:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
28359 Bremen, Germany

Postadresse der Hauptgeschäftsstelle Standort Bremen:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
28359 Bremen, Germany

Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
E-Mail: ***@dfki.de

Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Gael Guennebaud
2018-04-05 21:54:15 UTC
Permalink
On Thu, Apr 5, 2018 at 11:00 PM, Christoph Hertzberg <
Post by Christoph Hertzberg
Post by Edward Lam
Would it be useful to incorporate lambda's into the interface to avoid
begin/end pairs? So from the user side, we would write code like this
1) Analyze and run
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
int it = 0;
for (int it = 0; ...; ++it) {
if (it == 0)
llt.matrixL().sparseAnalyzeAndRun(100, [&] { llt.solve(b); });
else
llt.solve(b);
// ...
}
2) Analyze only
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
llt.matrixL().sparseAnalyze(100, [&] { llt.solve(b); });
// and solve as usual
For more complicated algorithms, one can always outline the lambda and
pass it into the analysis.
That would certainly clean up things a lot. Having to call
A.beginXY();
doStuff();
A.endXY();
is a very C-stylish API, which is error-prone (e.g., one of the calls
might not get called, because it is masked by a wrong if-condition, or due
to an exception, which is caught only outside).
This can usually be avoided with proper C++ constructs.
The advantages of the begin/end pair are that 1) this keeps the API
minimal, and 2) the user can easily wrap them to accommodate its taste and
usages. To avoid all sorts of redundancies, one could even write a free
function on top the begin/end pair to do:

while(...) {
optimize_if(iter==0, 100, [&]{x=llt.solve(b);}, llt.matrixL());
}

and with variadic templates:

while(...) {
optimize_if(iter==0, 100, [&]{x=LU.solve(b);}, LU.matrixL(), LU.matrixU()
);
}

Actually, this last example is not conceivable at the moment because the
factors in SparseLU are not stored in a standard compressed format, we
would need to add an option to tell SparseLU to turn them to regular
SparseMatrix and to use them for solving.

Anyways, all these variants are essentially syntactic sugar/implementation
details, and at this stage I'm more concerned about possible shortcomings
in the general approach. Is it flexible enough? I'm also not fan of
introducing a MklSparseMatrix inheriting SparseMatrix, but I don't have a
better offer for now that is generic and easily applicable to
decompositions.


gael
Post by Christoph Hertzberg
However, if this is required, I would suggest to add this method not only
to matrices but also to the decomposition (and pass it through to the
internal matrices). Otherwise, this does not scale for decompositions which
use more than one matrix (like SparseLU). And we could even let the
sparseAnalyze() functions return a proxy to the decomposition which would
x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});
or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`
The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes' an
operation (after how many iterations do you actually benefit from the
overhead of analyzing the operation?)
Christoph
Post by Edward Lam
Cheers,
-Edward
Post by Gael Guennebaud
Thank you for opening this discussion on the public mailing list.
So let's discuss about the public API, which currently is not very
(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not
sounds very generic.
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.
In order to discuss these issues, let's consider the following typical
pattern: (e.g., non-linear optimization, eigenvalues, ...)
SimplicialLDLT<SparseMatrix<double> > llt(A);
while(...) {
...
x = llt.solve(b);
...
}
Here the triangular L factor is going to be used for triangular and
transposed-triangular solves dozens to hundreds of time but only the user
of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix.
Moreover, the user does not own the SparseMatrix that we want to
analyze/optimize for. Other patterns are likely easier to handle, so let's
focus on it for now.
Regarding (i1), I would suggest to introduce a new type, say
MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance.
MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}
All states in MklSparseMatrix would be mutable.
Between a pair of beginAnalysis/endAnalysis each call to a supported
operation would trigger calls to mkl_sparse_set_*_hint()/mkl_sp
arse_optimize.
Optionally, we could even add a "dryrun" mode for which no operation
would be performed, only calls to mkl_sparse_set_*_hint() and then
mkl_sparse_optimize would be called in endAnalysis(). This way
mkl_sparse_optimize() would be called only once.
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
...
if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D
would still be performed, but calls to actual triangular solves would be
by-passed
llt.matrixL().endAnalysis();
while(...) {
...
x = llt.solve(b);
...
}
If someone directly deal with the factor L, then we could follow the
SimplicialLLT<SparseMatrix<double> > llt(A);
MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
...
y = L.triangularView<Lower>() * x;
...
}
This design in quite general and expendable to any sparse-optimizers,
even built-in ones in the future.
In contrast to the current proposal, only selected operations would be
passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).
What do you think?
gael
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at
Intel ®
MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated our
Eigen support different operations for sparse matrices stored in CSR and CSC
format which can be implemented on a basis of IE SpBLAS kernels (please,
refer to
https://software.intel.com/en-us/mkl-developer-reference-c-i
nspector-executor-sparse-blas-routines
<https://software.intel.com/en-us/mkl-developer-reference-c-
inspector-executor-sparse-blas-routines>
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
operations:____
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)____
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)____
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).____
I’ve already started with implementation of
sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).____
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module ____
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required for all
IE SpBLAS routines */____
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation
= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____
*A.destroySparseHandle();* /* *NEW*: is used to delete created handle */
}____
__ __
I’ve attached a draft patch including all necessary changes and would like
to hear your feedback.
Please, let me know if you have any questions and comments.____
__ __
Best regards,
Maria____
__ __
__ __
__ __
--
Dr.-Ing. Christoph Hertzberg
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
28359 Bremen, Germany
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
28359 Bremen, Germany
Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Zhukova, Maria
2018-04-05 22:49:55 UTC
Permalink
Hello Gael,
Thanks for your feedback! It’s greatly appreciated.

Regarding your list of issues:
(i1) Our initial suggest was to add all code related to IE SpBLAS together with #ifdef EIGEN_USE_MKL_SPBLAS /*Code is here*/ #endif, so I believe this shouldn’t brake ABI.
(i2) Calling mkl_sparse_set_?_hint() and mkl_sparse_optimize() on-the-fly with every operation won’t be a problem and the overhead is expected to be minimal, because internally we’re checking for previous optimizations and what data can be reused. So adding it into the code just before calling any IE SpBLAS computational routine means that we will take care of required optimizations automatically and it won’t be user’s work.
(i3) Surely in addition we can implement any type of MKLOptimize(/*args TBD*/) routine which will be used to set user-specified hints, desired number of calls, required operations, etc.

The reason why we do not want to implement our own type is that it will be much easier for customers to benefit from Intel architecture: the only thing they should take care about is to include SpBLASSupport module and link MKL, all the optimization work will be hidden (together with IE SpBLAS calls). So if there was already an application which used sparse algebra then it will benefit automatically from MKL usage.

Best regards,
Maria

From: Gael Guennebaud [mailto:***@gmail.com]
Sent: Thursday, April 5, 2018 2:54 PM
To: eigen <***@lists.tuxfamily.org>
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen



On Thu, Apr 5, 2018 at 11:00 PM, Christoph Hertzberg <***@informatik.uni-bremen.de<mailto:***@informatik.uni-bremen.de>> wrote:
On 2018-04-05 15:31, Edward Lam wrote:
Would it be useful to incorporate lambda's into the interface to avoid begin/end pairs? So from the user side, we would write code like this instead:

1) Analyze and run

SimplicialLDLT<MklSparseMatrix<double>> llt(A);
int it = 0;
for (int it = 0; ...; ++it) {
if (it == 0)
llt.matrixL().sparseAnalyzeAndRun(100, [&] { llt.solve(b); });
else
llt.solve(b);
// ...
}

2) Analyze only

SimplicialLDLT<MklSparseMatrix<double>> llt(A);
llt.matrixL().sparseAnalyze(100, [&] { llt.solve(b); });
// and solve as usual

For more complicated algorithms, one can always outline the lambda and pass it into the analysis.

That would certainly clean up things a lot. Having to call
A.beginXY();
doStuff();
A.endXY();
is a very C-stylish API, which is error-prone (e.g., one of the calls might not get called, because it is masked by a wrong if-condition, or due to an exception, which is caught only outside).
This can usually be avoided with proper C++ constructs.

The advantages of the begin/end pair are that 1) this keeps the API minimal, and 2) the user can easily wrap them to accommodate its taste and usages. To avoid all sorts of redundancies, one could even write a free function on top the begin/end pair to do:

while(...) {
optimize_if(iter==0, 100, [&]{x=llt.solve(b);}, llt.matrixL());
}

and with variadic templates:

while(...) {
optimize_if(iter==0, 100, [&]{x=LU.solve(b);}, LU.matrixL(), LU.matrixU());
}

Actually, this last example is not conceivable at the moment because the factors in SparseLU are not stored in a standard compressed format, we would need to add an option to tell SparseLU to turn them to regular SparseMatrix and to use them for solving.

Anyways, all these variants are essentially syntactic sugar/implementation details, and at this stage I'm more concerned about possible shortcomings in the general approach. Is it flexible enough? I'm also not fan of introducing a MklSparseMatrix inheriting SparseMatrix, but I don't have a better offer for now that is generic and easily applicable to decompositions.


gael


However, if this is required, I would suggest to add this method not only to matrices but also to the decomposition (and pass it through to the internal matrices). Otherwise, this does not scale for decompositions which use more than one matrix (like SparseLU). And we could even let the sparseAnalyze() functions return a proxy to the decomposition which would allow writing something like:

x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});

or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`

The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes' an operation (after how many iterations do you actually benefit from the overhead of analyzing the operation?)


Christoph


Cheers,
-Edward

On 4/5/2018 8:01 AM, Gael Guennebaud wrote:
Thank you for opening this discussion on the public mailing list.

So let's discuss about the public API, which currently is not very convenient as already noticed by others. Issues are:

(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not sounds very generic.
- We need a way to control:
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.


In order to discuss these issues, let's consider the following typical pattern: (e.g., non-linear optimization, eigenvalues, ...)

SimplicialLDLT<SparseMatrix<double> > llt(A);

while(...) {
...
x = llt.solve(b);
...
}

Here the triangular L factor is going to be used for triangular and transposed-triangular solves dozens to hundreds of time but only the user of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix. Moreover, the user does not own the SparseMatrix that we want to analyze/optimize for. Other patterns are likely easier to handle, so let's focus on it for now.

Regarding (i1), I would suggest to introduce a new type, say MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance. Then for (i2) and (i3) we could imagine something like:

MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}

All states in MklSparseMatrix would be mutable.

Between a pair of beginAnalysis/endAnalysis each call to a supported operation would trigger calls to mkl_sparse_set_*_hint()/mkl_sparse_optimize.
Optionally, we could even add a "dryrun" mode for which no operation would be performed, only calls to mkl_sparse_set_*_hint() and then mkl_sparse_optimize would be called in endAnalysis(). This way mkl_sparse_optimize() would be called only once.

And that's it. Our example would look-like:


SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
...
if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}

or using a "dry-run" mode:

SimplicialLDLT<MklSparseMatrix<double> > llt(A);

llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D would still be performed, but calls to actual triangular solves would be by-passed
llt.matrixL().endAnalysis();

while(...) {
...
x = llt.solve(b);
...
}


If someone directly deal with the factor L, then we could follow the same pattern or copy the SparseMatrix factor L to a MklSparseMatrix:


SimplicialLLT<SparseMatrix<double> > llt(A);

MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
...
y = L.triangularView<Lower>() * x;
...
}


This design in quite general and expendable to any sparse-optimizers, even built-in ones in the future.

In contrast to the current proposal, only selected operations would be passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).

What do you think?


gael


On Tue, Apr 3, 2018 at 11:39 PM, Zhukova, Maria <***@intel.com<mailto:***@intel.com> <mailto:***@intel.com<mailto:***@intel.com>>> wrote:

Hello Eigen community,

My name is Maria Zhukova and I’m a software development engineer at Intel ®
MKL Sparse team.

My team is interested in contributing into Eigen, so I’ve investigated our
possibilities and so far this is what I have:
Eigen support different operations for sparse matrices stored in CSR and CSC
format which can be implemented on a basis of IE SpBLAS kernels (please,
refer to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
<https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines>
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
operations:____

SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)____

SparseMatrix * DenseMatrix (mkl_sparse_?_mm)____

SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).____

I’ve already started with implementation of sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).____

This is how it will look like for user:
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module ____

void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;

A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required for all
IE SpBLAS routines */____

// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation
= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____

*A.destroySparseHandle();* /* *NEW*: is used to delete created handle */
}____

__ __

I’ve attached a draft patch including all necessary changes and would like
to hear your feedback.
Please, let me know if you have any questions and comments.____

__ __

Best regards,
Maria____

__ __

__ __

__ __
--
Dr.-Ing. Christoph Hertzberg

Besuchsadresse der NebengeschÀftsstelle:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
28359 Bremen, Germany

Postadresse der HauptgeschÀftsstelle Standort Bremen:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
28359 Bremen, Germany

Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
E-Mail: ***@dfki.de<mailto:***@dfki.de>

Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Edward Lam
2018-04-06 00:15:35 UTC
Permalink
It's not clear to me what are the expected lifetimes between and the
matrix, the matrix algorithm, and the optimizer state for IE SpBLAS.

I liked the other idea here to seperate or the optimizer as it's own class
but I'm not clear on the data dependency relationships between these.

For the ABI concerns, does it make sense (or possible?) to store the
optimizer state within the solver itself? This would its use less flexible
but would satisfy the goal of transparently using it with just different
compile options. This of course doesn't preclude the possibility of having
a more general optimizer API as well.

Cheers,
-Edward
Post by Zhukova, Maria
Hello Gael,
Thanks for your feedback! It’s greatly appreciated.
(i1) Our initial suggest was to add all code related to IE SpBLAS together
with #ifdef EIGEN_USE_MKL_SPBLAS /*Code is here*/ #endif, so I believe this
shouldn’t brake ABI.
(i2) Calling mkl_sparse_set_?_hint() and mkl_sparse_optimize() on-the-fly
with every operation won’t be a problem and the overhead is expected to be
minimal, because internally we’re checking for previous optimizations and
what data can be reused. So adding it into the code just before calling any
IE SpBLAS computational routine means that we will take care of required
optimizations automatically and it won’t be user’s work.
(i3) Surely in addition we can implement any type of MKLOptimize(/*args
TBD*/) routine which will be used to set user-specified hints, desired
number of calls, required operations, etc.
The reason why we do not want to implement our own type is that it will be
much easier for customers to benefit from Intel architecture: the only
thing they should take care about is to include SpBLASSupport module and
link MKL, all the optimization work will be hidden (together with IE SpBLAS
calls). So if there was already an application which used sparse algebra
then it will benefit automatically from MKL usage.
Best regards,
Maria
*Sent:* Thursday, April 5, 2018 2:54 PM
*Subject:* Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
On Thu, Apr 5, 2018 at 11:00 PM, Christoph Hertzberg <
Would it be useful to incorporate lambda's into the interface to avoid
begin/end pairs? So from the user side, we would write code like this
1) Analyze and run
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
int it = 0;
for (int it = 0; ...; ++it) {
if (it == 0)
llt.matrixL().sparseAnalyzeAndRun(100, [&] { llt.solve(b); });
else
llt.solve(b);
// ...
}
2) Analyze only
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
llt.matrixL().sparseAnalyze(100, [&] { llt.solve(b); });
// and solve as usual
For more complicated algorithms, one can always outline the lambda and
pass it into the analysis.
That would certainly clean up things a lot. Having to call
A.beginXY();
doStuff();
A.endXY();
is a very C-stylish API, which is error-prone (e.g., one of the calls
might not get called, because it is masked by a wrong if-condition, or due
to an exception, which is caught only outside).
This can usually be avoided with proper C++ constructs.
The advantages of the begin/end pair are that 1) this keeps the API
minimal, and 2) the user can easily wrap them to accommodate its taste and
usages. To avoid all sorts of redundancies, one could even write a free
while(...) {
optimize_if(iter==0, 100, [&]{x=llt.solve(b);}, llt.matrixL());
}
while(...) {
optimize_if(iter==0, 100,
[&]{x=LU.solve(b);}, LU.matrixL(), LU.matrixU());
}
Actually, this last example is not conceivable at the moment because the
factors in SparseLU are not stored in a standard compressed format, we
would need to add an option to tell SparseLU to turn them to regular
SparseMatrix and to use them for solving.
Anyways, all these variants are essentially syntactic sugar/implementation
details, and at this stage I'm more concerned about possible shortcomings
in the general approach. Is it flexible enough? I'm also not fan of
introducing a MklSparseMatrix inheriting SparseMatrix, but I don't have a
better offer for now that is generic and easily applicable to
decompositions.
gael
However, if this is required, I would suggest to add this method not only
to matrices but also to the decomposition (and pass it through to the
internal matrices). Otherwise, this does not scale for decompositions which
use more than one matrix (like SparseLU). And we could even let the
sparseAnalyze() functions return a proxy to the decomposition which would
x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});
or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`
The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes' an
operation (after how many iterations do you actually benefit from the
overhead of analyzing the operation?)
Christoph
Cheers,
-Edward
Thank you for opening this discussion on the public mailing list.
So let's discuss about the public API, which currently is not very
(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not sounds very generic.
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.
In order to discuss these issues, let's consider the following typical
pattern: (e.g., non-linear optimization, eigenvalues, ...)
SimplicialLDLT<SparseMatrix<double> > llt(A);
while(...) {
...
x = llt.solve(b);
...
}
Here the triangular L factor is going to be used for triangular and
transposed-triangular solves dozens to hundreds of time but only the user
of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix.
Moreover, the user does not own the SparseMatrix that we want to
analyze/optimize for. Other patterns are likely easier to handle, so let's
focus on it for now.
Regarding (i1), I would suggest to introduce a new type, say
MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance.
MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}
All states in MklSparseMatrix would be mutable.
Between a pair of beginAnalysis/endAnalysis each call to a supported
operation would trigger calls to
mkl_sparse_set_*_hint()/mkl_sparse_optimize.
Optionally, we could even add a "dryrun" mode for which no operation would
be performed, only calls to mkl_sparse_set_*_hint() and then
mkl_sparse_optimize would be called in endAnalysis(). This way
mkl_sparse_optimize() would be called only once.
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
...
if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D
would still be performed, but calls to actual triangular solves would be
by-passed
llt.matrixL().endAnalysis();
while(...) {
...
x = llt.solve(b);
...
}
If someone directly deal with the factor L, then we could follow the same
SimplicialLLT<SparseMatrix<double> > llt(A);
MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
...
y = L.triangularView<Lower>() * x;
...
}
This design in quite general and expendable to any sparse-optimizers, even
built-in ones in the future.
In contrast to the current proposal, only selected operations would be
passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).
What do you think?
gael
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at Intel ®
MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated our
Eigen support different operations for sparse matrices stored in CSR and CSC
format which can be implemented on a basis of IE SpBLAS kernels (please,
refer to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
<
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines>
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
operations:____
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)____
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)____
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).____
I’ve already started with implementation of sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).____
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module ____
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required for all
IE SpBLAS routines */____
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation
= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____
*A.destroySparseHandle();* /* *NEW*: is used to delete created handle */
}____
__ __
I’ve attached a draft patch including all necessary changes and would like
to hear your feedback.
Please, let me know if you have any questions and comments.____
__ __
Best regards,
Maria____
__ __
__ __
__ __
--
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g> Dr.-Ing.
Christoph Hertzberg
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
<https://maps.google.com/?q=Trippstadter+Stra%C3%9Fe+122,+D-67663+Kaiserslautern&entry=gmail&source=g>
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Gael Guennebaud
2018-04-06 11:42:11 UTC
Permalink
Post by Zhukova, Maria
Hello Gael,
Thanks for your feedback! It’s greatly appreciated.
(i1) Our initial suggest was to add all code related to IE SpBLAS together
with #ifdef EIGEN_USE_MKL_SPBLAS /*Code is here*/ #endif, so I believe this
shouldn’t brake ABI.
This makes binaries compiled with and without MKL support incompatible., so
yes it breaks ABI and users have to very careful to enable MKL support in
all translation units, all libraries and dependencies...
Post by Zhukova, Maria
(i2) Calling mkl_sparse_set_?_hint() and mkl_sparse_optimize() on-the-fly
with every operation won’t be a problem and the overhead is expected to be
minimal, because internally we’re checking for previous optimizations and
what data can be reused. So adding it into the code just before calling any
IE SpBLAS computational routine means that we will take care of required
optimizations automatically and it won’t be user’s work.
(i3) Surely in addition we can implement any type of MKLOptimize(/*args
Post by Zhukova, Maria
TBD*/) routine which will be used to set user-specified hints, desired
number of calls, required operations, etc.
The reason why we do not want to implement our own type is that it will be
Post by Zhukova, Maria
much easier for customers to benefit from Intel architecture: the only
thing they should take care about is to include SpBLASSupport module and
link MKL, all the optimization work will be hidden (together with IE SpBLAS
calls). So if there was already an application which used sparse algebra
then it will benefit automatically from MKL usage.
I just tried your patch with:

vec1_dense.noalias() += row_major_sparse * vec2_dense

and I observe a very significant drop of performance (almost x10) compared
to pure Eigen (with -fopenmp) if this operation is performed only once, or
very few times... Changing the 'expected_calls' parameter from 10 to 1 does
not reduce the performance drop due to mkl_sparse_optimize. If I repeat
this operation hundreds of times, then both Eigen and MKL achieve the same
level of performance for this particular operation and the matrices I tried
(1M^2 Laplacian, bi-Laplacian, and tri-Laplacian matrices).

So I still believe that silently and unconditionally falling back to IE
SpBLAS is not the right approach. It could be that calling MKL's SpBLAS is
never a loss (we have to bench), but that's clearly not the case of the
optimize step. So, clearly, the optimize step must be explicitly activated
on user selected expressions only.

gael
Post by Zhukova, Maria
Best regards,
Maria
*Sent:* Thursday, April 5, 2018 2:54 PM
*Subject:* Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
On Thu, Apr 5, 2018 at 11:00 PM, Christoph Hertzberg <
Would it be useful to incorporate lambda's into the interface to avoid
begin/end pairs? So from the user side, we would write code like this
1) Analyze and run
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
int it = 0;
for (int it = 0; ...; ++it) {
if (it == 0)
llt.matrixL().sparseAnalyzeAndRun(100, [&] { llt.solve(b); });
else
llt.solve(b);
// ...
}
2) Analyze only
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
llt.matrixL().sparseAnalyze(100, [&] { llt.solve(b); });
// and solve as usual
For more complicated algorithms, one can always outline the lambda and
pass it into the analysis.
That would certainly clean up things a lot. Having to call
A.beginXY();
doStuff();
A.endXY();
is a very C-stylish API, which is error-prone (e.g., one of the calls
might not get called, because it is masked by a wrong if-condition, or due
to an exception, which is caught only outside).
This can usually be avoided with proper C++ constructs.
The advantages of the begin/end pair are that 1) this keeps the API
minimal, and 2) the user can easily wrap them to accommodate its taste and
usages. To avoid all sorts of redundancies, one could even write a free
while(...) {
optimize_if(iter==0, 100, [&]{x=llt.solve(b);}, llt.matrixL());
}
while(...) {
optimize_if(iter==0, 100, [&]{x=LU.solve(b);}, LU.
matrixL(), LU.matrixU());
}
Actually, this last example is not conceivable at the moment because the
factors in SparseLU are not stored in a standard compressed format, we
would need to add an option to tell SparseLU to turn them to regular
SparseMatrix and to use them for solving.
Anyways, all these variants are essentially syntactic sugar/implementation
details, and at this stage I'm more concerned about possible shortcomings
in the general approach. Is it flexible enough? I'm also not fan of
introducing a MklSparseMatrix inheriting SparseMatrix, but I don't have a
better offer for now that is generic and easily applicable to
decompositions.
gael
However, if this is required, I would suggest to add this method not only
to matrices but also to the decomposition (and pass it through to the
internal matrices). Otherwise, this does not scale for decompositions which
use more than one matrix (like SparseLU). And we could even let the
sparseAnalyze() functions return a proxy to the decomposition which would
x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});
or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`
The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes' an
operation (after how many iterations do you actually benefit from the
overhead of analyzing the operation?)
Christoph
Cheers,
-Edward
Thank you for opening this discussion on the public mailing list.
So let's discuss about the public API, which currently is not very
(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not sounds very generic.
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.
In order to discuss these issues, let's consider the following typical
pattern: (e.g., non-linear optimization, eigenvalues, ...)
SimplicialLDLT<SparseMatrix<double> > llt(A);
while(...) {
...
x = llt.solve(b);
...
}
Here the triangular L factor is going to be used for triangular and
transposed-triangular solves dozens to hundreds of time but only the user
of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix.
Moreover, the user does not own the SparseMatrix that we want to
analyze/optimize for. Other patterns are likely easier to handle, so let's
focus on it for now.
Regarding (i1), I would suggest to introduce a new type, say
MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance.
MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}
All states in MklSparseMatrix would be mutable.
Between a pair of beginAnalysis/endAnalysis each call to a supported
operation would trigger calls to mkl_sparse_set_*_hint()/mkl_
sparse_optimize.
Optionally, we could even add a "dryrun" mode for which no operation would
be performed, only calls to mkl_sparse_set_*_hint() and then
mkl_sparse_optimize would be called in endAnalysis(). This way
mkl_sparse_optimize() would be called only once.
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
...
if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D
would still be performed, but calls to actual triangular solves would be
by-passed
llt.matrixL().endAnalysis();
while(...) {
...
x = llt.solve(b);
...
}
If someone directly deal with the factor L, then we could follow the same
SimplicialLLT<SparseMatrix<double> > llt(A);
MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
...
y = L.triangularView<Lower>() * x;
...
}
This design in quite general and expendable to any sparse-optimizers, even
built-in ones in the future.
In contrast to the current proposal, only selected operations would be
passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).
What do you think?
gael
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at Intel ®
MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated our
Eigen support different operations for sparse matrices stored in CSR and CSC
format which can be implemented on a basis of IE SpBLAS kernels (please,
refer to
https://software.intel.com/en-us/mkl-developer-reference-c-
inspector-executor-sparse-blas-routines
<https://software.intel.com/en-us/mkl-developer-reference-
c-inspector-executor-sparse-blas-routines>
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
operations:____
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)____
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)____
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).____
I’ve already started with implementation of sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).____
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module ____
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required for all
IE SpBLAS routines */____
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation
= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____
*A.destroySparseHandle();* /* *NEW*: is used to delete created handle */
}____
__ __
I’ve attached a draft patch including all necessary changes and would like
to hear your feedback.
Please, let me know if you have any questions and comments.____
__ __
Best regards,
Maria____
__ __
__ __
__ __
--
Dr.-Ing. Christoph Hertzberg
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
<https://maps.google.com/?q=Trippstadter+Stra%C3%9Fe+122,+D-67663+Kaiserslautern&entry=gmail&source=g>
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Gael Guennebaud
2018-04-06 12:06:08 UTC
Permalink
I also tried:

vec1_dense.noalias() += column_major_sparse * vec2_dense

Since this operation is not supported by mkl_sparse_optimize, the overhead
is null here, but for some very sparse matrices (5 nnz/col) mkl_sparse_d_mv
is about x2 slower than Eigen, while for denser matrices both
implementations achieve the same speed.

gael.
Post by Gael Guennebaud
vec1_dense.noalias() += row_major_sparse * vec2_dense
and I observe a very significant drop of performance (almost x10) compared
to pure Eigen (with -fopenmp) if this operation is performed only once, or
very few times... Changing the 'expected_calls' parameter from 10 to 1
does not reduce the performance drop due to mkl_sparse_optimize. If I
repeat this operation hundreds of times, then both Eigen and MKL achieve
the same level of performance for this particular operation and the
matrices I tried (1M^2 Laplacian, bi-Laplacian, and tri-Laplacian matrices).
So I still believe that silently and unconditionally falling back to IE
SpBLAS is not the right approach. It could be that calling MKL's SpBLAS
is never a loss (we have to bench), but that's clearly not the case of the
optimize step. So, clearly, the optimize step must be explicitly
activated on user selected expressions only.
gael
Post by Zhukova, Maria
Best regards,
Maria
*Sent:* Thursday, April 5, 2018 2:54 PM
*Subject:* Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
On Thu, Apr 5, 2018 at 11:00 PM, Christoph Hertzberg <
Would it be useful to incorporate lambda's into the interface to avoid
begin/end pairs? So from the user side, we would write code like this
1) Analyze and run
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
int it = 0;
for (int it = 0; ...; ++it) {
if (it == 0)
llt.matrixL().sparseAnalyzeAndRun(100, [&] { llt.solve(b); });
else
llt.solve(b);
// ...
}
2) Analyze only
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
llt.matrixL().sparseAnalyze(100, [&] { llt.solve(b); });
// and solve as usual
For more complicated algorithms, one can always outline the lambda and
pass it into the analysis.
That would certainly clean up things a lot. Having to call
A.beginXY();
doStuff();
A.endXY();
is a very C-stylish API, which is error-prone (e.g., one of the calls
might not get called, because it is masked by a wrong if-condition, or due
to an exception, which is caught only outside).
This can usually be avoided with proper C++ constructs.
The advantages of the begin/end pair are that 1) this keeps the API
minimal, and 2) the user can easily wrap them to accommodate its taste and
usages. To avoid all sorts of redundancies, one could even write a free
while(...) {
optimize_if(iter==0, 100, [&]{x=llt.solve(b);}, llt.matrixL());
}
while(...) {
optimize_if(iter==0, 100, [&]{x=LU.solve(b);}, LU.m
atrixL(), LU.matrixU());
}
Actually, this last example is not conceivable at the moment because the
factors in SparseLU are not stored in a standard compressed format, we
would need to add an option to tell SparseLU to turn them to regular
SparseMatrix and to use them for solving.
Anyways, all these variants are essentially syntactic
sugar/implementation details, and at this stage I'm more concerned about
possible shortcomings in the general approach. Is it flexible enough? I'm
also not fan of introducing a MklSparseMatrix inheriting SparseMatrix, but
I don't have a better offer for now that is generic and easily applicable
to decompositions.
gael
However, if this is required, I would suggest to add this method not only
to matrices but also to the decomposition (and pass it through to the
internal matrices). Otherwise, this does not scale for decompositions which
use more than one matrix (like SparseLU). And we could even let the
sparseAnalyze() functions return a proxy to the decomposition which would
x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});
or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`
The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes' an
operation (after how many iterations do you actually benefit from the
overhead of analyzing the operation?)
Christoph
Cheers,
-Edward
Thank you for opening this discussion on the public mailing list.
So let's discuss about the public API, which currently is not very
(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not sounds very generic.
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.
In order to discuss these issues, let's consider the following typical
pattern: (e.g., non-linear optimization, eigenvalues, ...)
SimplicialLDLT<SparseMatrix<double> > llt(A);
while(...) {
...
x = llt.solve(b);
...
}
Here the triangular L factor is going to be used for triangular and
transposed-triangular solves dozens to hundreds of time but only the user
of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix.
Moreover, the user does not own the SparseMatrix that we want to
analyze/optimize for. Other patterns are likely easier to handle, so let's
focus on it for now.
Regarding (i1), I would suggest to introduce a new type, say
MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance.
MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}
All states in MklSparseMatrix would be mutable.
Between a pair of beginAnalysis/endAnalysis each call to a supported
operation would trigger calls to mkl_sparse_set_*_hint()/mkl_sp
arse_optimize.
Optionally, we could even add a "dryrun" mode for which no operation
would be performed, only calls to mkl_sparse_set_*_hint() and then
mkl_sparse_optimize would be called in endAnalysis(). This way
mkl_sparse_optimize() would be called only once.
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
...
if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D
would still be performed, but calls to actual triangular solves would be
by-passed
llt.matrixL().endAnalysis();
while(...) {
...
x = llt.solve(b);
...
}
If someone directly deal with the factor L, then we could follow the same
SimplicialLLT<SparseMatrix<double> > llt(A);
MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
...
y = L.triangularView<Lower>() * x;
...
}
This design in quite general and expendable to any sparse-optimizers,
even built-in ones in the future.
In contrast to the current proposal, only selected operations would be
passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).
What do you think?
gael
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at Intel ®
MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated our
Eigen support different operations for sparse matrices stored in CSR and CSC
format which can be implemented on a basis of IE SpBLAS kernels (please,
refer to
https://software.intel.com/en-us/mkl-developer-reference-c-i
nspector-executor-sparse-blas-routines
<https://software.intel.com/en-us/mkl-developer-reference-c-
inspector-executor-sparse-blas-routines>
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
operations:____
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)____
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)____
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).____
I’ve already started with implementation of sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).____
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module ____
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required for all
IE SpBLAS routines */____
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation
= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____
*A.destroySparseHandle();* /* *NEW*: is used to delete created handle */
}____
__ __
I’ve attached a draft patch including all necessary changes and would like
to hear your feedback.
Please, let me know if you have any questions and comments.____
__ __
Best regards,
Maria____
__ __
__ __
__ __
--
Dr.-Ing. Christoph Hertzberg
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
<https://maps.google.com/?q=Trippstadter+Stra%C3%9Fe+122,+D-67663+Kaiserslautern&entry=gmail&source=g>
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Kalinkin, Alexander A
2018-04-06 18:49:45 UTC
Permalink
Hi All,
First of all appreciate your interest in integration of sparse kernels in Eigen.
Please see my comments below

I also tried:

vec1_dense.noalias() += column_major_sparse * vec2_dense

Since this operation is not supported by mkl_sparse_optimize, the overhead is null here, but for some very sparse matrices (5 nnz/col) mkl_sparse_d_mv is about x2 slower than Eigen, while for denser matrices both implementations achieve the same speed.

[akalinki] That’s sound strange. Can I ask you which MKL version do you use for your testing and what architecture is the target? We implemented matvec support in IE SparseBlas couple of years ago and significantly improve its performance during this year. Can I ask you also about the size of matrix, so I will be able to check it on my side?

I just tried your patch with:

vec1_dense.noalias() += row_major_sparse * vec2_dense

and I observe a very significant drop of performance (almost x10) compared to pure Eigen (with -fopenmp) if this operation is performed only once, or very few times... Changing the 'expected_calls' parameter from 10 to 1 does not reduce the performance drop due to mkl_sparse_optimize. If I repeat this operation hundreds of times, then both Eigen and MKL achieve the same level of performance for this particular operation and the matrices I tried (1M^2 Laplacian, bi-Laplacian, and tri-Laplacian matrices).

So I still believe that silently and unconditionally falling back to IE SpBLAS is not the right approach. It could be that calling MKL's SpBLAS is never a loss (we have to bench), but that's clearly not the case of the optimize step. So, clearly, the optimize step must be explicitly activated on user selected expressions only.

[akalinki] This can be an issue for an older version of MKL, because initially we spent additional time on doing hints/optimize when calling this every time. Starting from MKL2018u3 we’re planning to remove this overhead – if you want we can share eng. build to check performance on same binaries.
Also about poor performance of SparseBlas matvec functionality in comparison with Eigen built-in one: all these cases we interpret as issue that need to be investigated and should be fixed on our side. So I don’t think that we need to add any dispatch on Eigen level between built-in kernels and ours – all this stuff should be hidden in MKL libraries.

And back to the main question of this topic – we have customers who use Eigen on Intel processors and are interested in optimizing their code. Implementation of new SparseMatrix type will forced them to rewrite the code, that sometimes can be impossible due to different reasons. That’s why we are asking to find the way how we can just extend current SparseMatrix type to support MKL functionality.

Thanks,
Alexander Kalinkin,
PhD, Team leader, Architect or of Sparse components in Intel®MKL


From: Gael Guennebaud [mailto:***@gmail.com]
Sent: Friday, April 6, 2018 5:06 AM
To: eigen <***@lists.tuxfamily.org>
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen


I also tried:

vec1_dense.noalias() += column_major_sparse * vec2_dense

Since this operation is not supported by mkl_sparse_optimize, the overhead is null here, but for some very sparse matrices (5 nnz/col) mkl_sparse_d_mv is about x2 slower than Eigen, while for denser matrices both implementations achieve the same speed.

gael.


On Fri, Apr 6, 2018 at 1:42 PM, Gael Guennebaud <***@gmail.com<mailto:***@gmail.com>> wrote:


I just tried your patch with:

vec1_dense.noalias() += row_major_sparse * vec2_dense

and I observe a very significant drop of performance (almost x10) compared to pure Eigen (with -fopenmp) if this operation is performed only once, or very few times... Changing the 'expected_calls' parameter from 10 to 1 does not reduce the performance drop due to mkl_sparse_optimize. If I repeat this operation hundreds of times, then both Eigen and MKL achieve the same level of performance for this particular operation and the matrices I tried (1M^2 Laplacian, bi-Laplacian, and tri-Laplacian matrices).

So I still believe that silently and unconditionally falling back to IE SpBLAS is not the right approach. It could be that calling MKL's SpBLAS is never a loss (we have to bench), but that's clearly not the case of the optimize step. So, clearly, the optimize step must be explicitly activated on user selected expressions only.

gael


Best regards,
Maria

From: Gael Guennebaud [mailto:***@gmail.com<mailto:***@gmail.com>]
Sent: Thursday, April 5, 2018 2:54 PM
To: eigen <***@lists.tuxfamily.org<mailto:***@lists.tuxfamily.org>>
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen



On Thu, Apr 5, 2018 at 11:00 PM, Christoph Hertzberg <***@informatik.uni-bremen.de<mailto:***@informatik.uni-bremen.de>> wrote:
On 2018-04-05 15:31, Edward Lam wrote:
Would it be useful to incorporate lambda's into the interface to avoid begin/end pairs? So from the user side, we would write code like this instead:

1) Analyze and run

SimplicialLDLT<MklSparseMatrix<double>> llt(A);
int it = 0;
for (int it = 0; ...; ++it) {
if (it == 0)
llt.matrixL().sparseAnalyzeAndRun(100, [&] { llt.solve(b); });
else
llt.solve(b);
// ...
}

2) Analyze only

SimplicialLDLT<MklSparseMatrix<double>> llt(A);
llt.matrixL().sparseAnalyze(100, [&] { llt.solve(b); });
// and solve as usual

For more complicated algorithms, one can always outline the lambda and pass it into the analysis.

That would certainly clean up things a lot. Having to call
A.beginXY();
doStuff();
A.endXY();
is a very C-stylish API, which is error-prone (e.g., one of the calls might not get called, because it is masked by a wrong if-condition, or due to an exception, which is caught only outside).
This can usually be avoided with proper C++ constructs.

The advantages of the begin/end pair are that 1) this keeps the API minimal, and 2) the user can easily wrap them to accommodate its taste and usages. To avoid all sorts of redundancies, one could even write a free function on top the begin/end pair to do:

while(...) {
optimize_if(iter==0, 100, [&]{x=llt.solve(b);}, llt.matrixL());
}

and with variadic templates:

while(...) {
optimize_if(iter==0, 100, [&]{x=LU.solve(b);}, LU.matrixL(), LU.matrixU());
}

Actually, this last example is not conceivable at the moment because the factors in SparseLU are not stored in a standard compressed format, we would need to add an option to tell SparseLU to turn them to regular SparseMatrix and to use them for solving.

Anyways, all these variants are essentially syntactic sugar/implementation details, and at this stage I'm more concerned about possible shortcomings in the general approach. Is it flexible enough? I'm also not fan of introducing a MklSparseMatrix inheriting SparseMatrix, but I don't have a better offer for now that is generic and easily applicable to decompositions.


gael


However, if this is required, I would suggest to add this method not only to matrices but also to the decomposition (and pass it through to the internal matrices). Otherwise, this does not scale for decompositions which use more than one matrix (like SparseLU). And we could even let the sparseAnalyze() functions return a proxy to the decomposition which would allow writing something like:

x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});

or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`

The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes' an operation (after how many iterations do you actually benefit from the overhead of analyzing the operation?)


Christoph


Cheers,
-Edward

On 4/5/2018 8:01 AM, Gael Guennebaud wrote:
Thank you for opening this discussion on the public mailing list.

So let's discuss about the public API, which currently is not very convenient as already noticed by others. Issues are:

(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not sounds very generic.
- We need a way to control:
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.


In order to discuss these issues, let's consider the following typical pattern: (e.g., non-linear optimization, eigenvalues, ...)

SimplicialLDLT<SparseMatrix<double> > llt(A);

while(...) {
...
x = llt.solve(b);
...
}

Here the triangular L factor is going to be used for triangular and transposed-triangular solves dozens to hundreds of time but only the user of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix. Moreover, the user does not own the SparseMatrix that we want to analyze/optimize for. Other patterns are likely easier to handle, so let's focus on it for now.

Regarding (i1), I would suggest to introduce a new type, say MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance. Then for (i2) and (i3) we could imagine something like:

MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}

All states in MklSparseMatrix would be mutable.

Between a pair of beginAnalysis/endAnalysis each call to a supported operation would trigger calls to mkl_sparse_set_*_hint()/mkl_sparse_optimize.
Optionally, we could even add a "dryrun" mode for which no operation would be performed, only calls to mkl_sparse_set_*_hint() and then mkl_sparse_optimize would be called in endAnalysis(). This way mkl_sparse_optimize() would be called only once.

And that's it. Our example would look-like:


SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
...
if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}

or using a "dry-run" mode:

SimplicialLDLT<MklSparseMatrix<double> > llt(A);

llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D would still be performed, but calls to actual triangular solves would be by-passed
llt.matrixL().endAnalysis();

while(...) {
...
x = llt.solve(b);
...
}


If someone directly deal with the factor L, then we could follow the same pattern or copy the SparseMatrix factor L to a MklSparseMatrix:


SimplicialLLT<SparseMatrix<double> > llt(A);

MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
...
y = L.triangularView<Lower>() * x;
...
}


This design in quite general and expendable to any sparse-optimizers, even built-in ones in the future.

In contrast to the current proposal, only selected operations would be passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).

What do you think?


gael


On Tue, Apr 3, 2018 at 11:39 PM, Zhukova, Maria <***@intel.com<mailto:***@intel.com> <mailto:***@intel.com<mailto:***@intel.com>>> wrote:

Hello Eigen community,

My name is Maria Zhukova and I’m a software development engineer at Intel ®
MKL Sparse team.

My team is interested in contributing into Eigen, so I’ve investigated our
possibilities and so far this is what I have:
Eigen support different operations for sparse matrices stored in CSR and CSC
format which can be implemented on a basis of IE SpBLAS kernels (please,
refer to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
<https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines>
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
operations:____

SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)____

SparseMatrix * DenseMatrix (mkl_sparse_?_mm)____

SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).____

I’ve already started with implementation of sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).____

This is how it will look like for user:
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module ____

void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;

A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required for all
IE SpBLAS routines */____

// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation
= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____

*A.destroySparseHandle();* /* *NEW*: is used to delete created handle */
}____

__ __

I’ve attached a draft patch including all necessary changes and would like
to hear your feedback.
Please, let me know if you have any questions and comments.____

__ __

Best regards,
Maria____

__ __

__ __

__ __
--
Dr.-Ing. Christoph Hertzberg

Besuchsadresse der NebengeschÀftsstelle:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>

Postadresse der HauptgeschÀftsstelle Standort Bremen:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>

Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
E-Mail: ***@dfki.de<mailto:***@dfki.de>

Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern<https://maps.google.com/?q=Trippstadter+Stra%C3%9Fe+122,+D-67663+Kaiserslautern&entry=gmail&source=g>
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Christoph Hertzberg
2018-04-06 19:16:31 UTC
Permalink
Hi,

your mail-client seems to mess-up quotations (it is deleting all '>' at
the line beginnings) That makes it a bit hard to read.

There is the possibility to modify SparseMatrix locally by defining the
macro EIGEN_SPARSEMATRIX_PLUGIN
The mechanism is documented here for MatrixBase:
http://eigen.tuxfamily.org/dox-devel/TopicCustomizing_Plugins.html

But, as we both said, we think it is generally not a good idea to change
the ABI of core Eigen types.

Cheers,
Christoph
Post by Kalinkin, Alexander A
Hi All,
First of all appreciate your interest in integration of sparse kernels in Eigen.
Please see my comments below
vec1_dense.noalias() += column_major_sparse * vec2_dense
Since this operation is not supported by mkl_sparse_optimize, the overhead is null here, but for some very sparse matrices (5 nnz/col) mkl_sparse_d_mv is about x2 slower than Eigen, while for denser matrices both implementations achieve the same speed.
[akalinki] That’s sound strange. Can I ask you which MKL version do you use for your testing and what architecture is the target? We implemented matvec support in IE SparseBlas couple of years ago and significantly improve its performance during this year. Can I ask you also about the size of matrix, so I will be able to check it on my side?
vec1_dense.noalias() += row_major_sparse * vec2_dense
and I observe a very significant drop of performance (almost x10) compared to pure Eigen (with -fopenmp) if this operation is performed only once, or very few times... Changing the 'expected_calls' parameter from 10 to 1 does not reduce the performance drop due to mkl_sparse_optimize. If I repeat this operation hundreds of times, then both Eigen and MKL achieve the same level of performance for this particular operation and the matrices I tried (1M^2 Laplacian, bi-Laplacian, and tri-Laplacian matrices).
So I still believe that silently and unconditionally falling back to IE SpBLAS is not the right approach. It could be that calling MKL's SpBLAS is never a loss (we have to bench), but that's clearly not the case of the optimize step. So, clearly, the optimize step must be explicitly activated on user selected expressions only.
[akalinki] This can be an issue for an older version of MKL, because initially we spent additional time on doing hints/optimize when calling this every time. Starting from MKL2018u3 we’re planning to remove this overhead – if you want we can share eng. build to check performance on same binaries.
Also about poor performance of SparseBlas matvec functionality in comparison with Eigen built-in one: all these cases we interpret as issue that need to be investigated and should be fixed on our side. So I don’t think that we need to add any dispatch on Eigen level between built-in kernels and ours – all this stuff should be hidden in MKL libraries.
And back to the main question of this topic – we have customers who use Eigen on Intel processors and are interested in optimizing their code. Implementation of new SparseMatrix type will forced them to rewrite the code, that sometimes can be impossible due to different reasons. That’s why we are asking to find the way how we can just extend current SparseMatrix type to support MKL functionality.
Thanks,
Alexander Kalinkin,
PhD, Team leader, Architect or of Sparse components in Intel®MKL
Sent: Friday, April 6, 2018 5:06 AM
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
vec1_dense.noalias() += column_major_sparse * vec2_dense
Since this operation is not supported by mkl_sparse_optimize, the overhead is null here, but for some very sparse matrices (5 nnz/col) mkl_sparse_d_mv is about x2 slower than Eigen, while for denser matrices both implementations achieve the same speed.
gael.
vec1_dense.noalias() += row_major_sparse * vec2_dense
and I observe a very significant drop of performance (almost x10) compared to pure Eigen (with -fopenmp) if this operation is performed only once, or very few times... Changing the 'expected_calls' parameter from 10 to 1 does not reduce the performance drop due to mkl_sparse_optimize. If I repeat this operation hundreds of times, then both Eigen and MKL achieve the same level of performance for this particular operation and the matrices I tried (1M^2 Laplacian, bi-Laplacian, and tri-Laplacian matrices).
So I still believe that silently and unconditionally falling back to IE SpBLAS is not the right approach. It could be that calling MKL's SpBLAS is never a loss (we have to bench), but that's clearly not the case of the optimize step. So, clearly, the optimize step must be explicitly activated on user selected expressions only.
gael
Best regards,
Maria
Sent: Thursday, April 5, 2018 2:54 PM
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
1) Analyze and run
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
int it = 0;
for (int it = 0; ...; ++it) {
if (it == 0)
llt.matrixL().sparseAnalyzeAndRun(100, [&] { llt.solve(b); });
else
llt.solve(b);
// ...
}
2) Analyze only
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
llt.matrixL().sparseAnalyze(100, [&] { llt.solve(b); });
// and solve as usual
For more complicated algorithms, one can always outline the lambda and pass it into the analysis.
That would certainly clean up things a lot. Having to call
A.beginXY();
doStuff();
A.endXY();
is a very C-stylish API, which is error-prone (e.g., one of the calls might not get called, because it is masked by a wrong if-condition, or due to an exception, which is caught only outside).
This can usually be avoided with proper C++ constructs.
while(...) {
optimize_if(iter==0, 100, [&]{x=llt.solve(b);}, llt.matrixL());
}
while(...) {
optimize_if(iter==0, 100, [&]{x=LU.solve(b);}, LU.matrixL(), LU.matrixU());
}
Actually, this last example is not conceivable at the moment because the factors in SparseLU are not stored in a standard compressed format, we would need to add an option to tell SparseLU to turn them to regular SparseMatrix and to use them for solving.
Anyways, all these variants are essentially syntactic sugar/implementation details, and at this stage I'm more concerned about possible shortcomings in the general approach. Is it flexible enough? I'm also not fan of introducing a MklSparseMatrix inheriting SparseMatrix, but I don't have a better offer for now that is generic and easily applicable to decompositions.
gael
x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});
or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`
The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes' an operation (after how many iterations do you actually benefit from the overhead of analyzing the operation?)
Christoph
Cheers,
-Edward
Thank you for opening this discussion on the public mailing list.
(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not sounds very generic.
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.
In order to discuss these issues, let's consider the following typical pattern: (e.g., non-linear optimization, eigenvalues, ...)
SimplicialLDLT<SparseMatrix<double> > llt(A);
while(...) {
...
x = llt.solve(b);
...
}
Here the triangular L factor is going to be used for triangular and transposed-triangular solves dozens to hundreds of time but only the user of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix. Moreover, the user does not own the SparseMatrix that we want to analyze/optimize for. Other patterns are likely easier to handle, so let's focus on it for now.
MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}
All states in MklSparseMatrix would be mutable.
Between a pair of beginAnalysis/endAnalysis each call to a supported operation would trigger calls to mkl_sparse_set_*_hint()/mkl_sparse_optimize.
Optionally, we could even add a "dryrun" mode for which no operation would be performed, only calls to mkl_sparse_set_*_hint() and then mkl_sparse_optimize would be called in endAnalysis(). This way mkl_sparse_optimize() would be called only once.
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
...
if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D would still be performed, but calls to actual triangular solves would be by-passed
llt.matrixL().endAnalysis();
while(...) {
...
x = llt.solve(b);
...
}
SimplicialLLT<SparseMatrix<double> > llt(A);
MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
...
y = L.triangularView<Lower>() * x;
...
}
This design in quite general and expendable to any sparse-optimizers, even built-in ones in the future.
In contrast to the current proposal, only selected operations would be passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).
What do you think?
gael
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at Intel ®
MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated our
Eigen support different operations for sparse matrices stored in CSR and CSC
format which can be implemented on a basis of IE SpBLAS kernels (please,
refer to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
<https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines>
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
operations:____
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)____
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)____
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).____
I’ve already started with implementation of sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).____
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module ____
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required for all
IE SpBLAS routines */____
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation
= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____
*A.destroySparseHandle();* /* *NEW*: is used to delete created handle */
}____
__ __
I’ve attached a draft patch including all necessary changes and would like
to hear your feedback.
Please, let me know if you have any questions and comments.____
__ __
Best regards,
Maria____
__ __
__ __
__ __
--
Dr.-Ing. Christoph Hertzberg
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern<https://maps.google.com/?q=Trippstadter+Stra%C3%9Fe+122,+D-67663+Kaiserslautern&entry=gmail&source=g>
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
--
Dr.-Ing. Christoph Hertzberg

Besuchsadresse der Nebengeschäftsstelle:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
28359 Bremen, Germany

Postadresse der Hauptgeschäftsstelle Standort Bremen:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
28359 Bremen, Germany

Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
E-Mail: ***@dfki.de

Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Edward Lam
2018-04-06 22:04:33 UTC
Permalink
Is it within reason to change the ABI of the solvers?

-Edward

On Fri, Apr 6, 2018, 4:33 PM Christoph Hertzberg, <
Post by Christoph Hertzberg
Hi,
your mail-client seems to mess-up quotations (it is deleting all '>' at
the line beginnings) That makes it a bit hard to read.
There is the possibility to modify SparseMatrix locally by defining the
macro EIGEN_SPARSEMATRIX_PLUGIN
http://eigen.tuxfamily.org/dox-devel/TopicCustomizing_Plugins.html
But, as we both said, we think it is generally not a good idea to change
the ABI of core Eigen types.
Cheers,
Christoph
Post by Kalinkin, Alexander A
Hi All,
First of all appreciate your interest in integration of sparse kernels
in Eigen.
Post by Kalinkin, Alexander A
Please see my comments below
vec1_dense.noalias() += column_major_sparse * vec2_dense
Since this operation is not supported by mkl_sparse_optimize, the
overhead is null here, but for some very sparse matrices (5 nnz/col)
mkl_sparse_d_mv is about x2 slower than Eigen, while for denser matrices
both implementations achieve the same speed.
Post by Kalinkin, Alexander A
[akalinki] That’s sound strange. Can I ask you which MKL version do you
use for your testing and what architecture is the target? We implemented
matvec support in IE SparseBlas couple of years ago and significantly
improve its performance during this year. Can I ask you also about the size
of matrix, so I will be able to check it on my side?
Post by Kalinkin, Alexander A
vec1_dense.noalias() += row_major_sparse * vec2_dense
and I observe a very significant drop of performance (almost x10)
compared to pure Eigen (with -fopenmp) if this operation is performed only
once, or very few times... Changing the 'expected_calls' parameter from 10
to 1 does not reduce the performance drop due to mkl_sparse_optimize. If I
repeat this operation hundreds of times, then both Eigen and MKL achieve
the same level of performance for this particular operation and the
matrices I tried (1M^2 Laplacian, bi-Laplacian, and tri-Laplacian matrices).
Post by Kalinkin, Alexander A
So I still believe that silently and unconditionally falling back to IE
SpBLAS is not the right approach. It could be that calling MKL's SpBLAS is
never a loss (we have to bench), but that's clearly not the case of the
optimize step. So, clearly, the optimize step must be explicitly activated
on user selected expressions only.
Post by Kalinkin, Alexander A
[akalinki] This can be an issue for an older version of MKL, because
initially we spent additional time on doing hints/optimize when calling
this every time. Starting from MKL2018u3 we’re planning to remove this
overhead – if you want we can share eng. build to check performance on same
binaries.
Post by Kalinkin, Alexander A
Also about poor performance of SparseBlas matvec functionality in
comparison with Eigen built-in one: all these cases we interpret as issue
that need to be investigated and should be fixed on our side. So I don’t
think that we need to add any dispatch on Eigen level between built-in
kernels and ours – all this stuff should be hidden in MKL libraries.
Post by Kalinkin, Alexander A
And back to the main question of this topic – we have customers who use
Eigen on Intel processors and are interested in optimizing their code.
Implementation of new SparseMatrix type will forced them to rewrite the
code, that sometimes can be impossible due to different reasons. That’s why
we are asking to find the way how we can just extend current SparseMatrix
type to support MKL functionality.
Post by Kalinkin, Alexander A
Thanks,
Alexander Kalinkin,
PhD, Team leader, Architect or of Sparse components in Intel®MKL
Sent: Friday, April 6, 2018 5:06 AM
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
vec1_dense.noalias() += column_major_sparse * vec2_dense
Since this operation is not supported by mkl_sparse_optimize, the
overhead is null here, but for some very sparse matrices (5 nnz/col)
mkl_sparse_d_mv is about x2 slower than Eigen, while for denser matrices
both implementations achieve the same speed.
Post by Kalinkin, Alexander A
gael.
On Fri, Apr 6, 2018 at 1:42 PM, Gael Guennebaud <
vec1_dense.noalias() += row_major_sparse * vec2_dense
and I observe a very significant drop of performance (almost x10)
compared to pure Eigen (with -fopenmp) if this operation is performed only
once, or very few times... Changing the 'expected_calls' parameter from 10
to 1 does not reduce the performance drop due to mkl_sparse_optimize. If I
repeat this operation hundreds of times, then both Eigen and MKL achieve
the same level of performance for this particular operation and the
matrices I tried (1M^2 Laplacian, bi-Laplacian, and tri-Laplacian matrices).
Post by Kalinkin, Alexander A
So I still believe that silently and unconditionally falling back to IE
SpBLAS is not the right approach. It could be that calling MKL's SpBLAS is
never a loss (we have to bench), but that's clearly not the case of the
optimize step. So, clearly, the optimize step must be explicitly activated
on user selected expressions only.
Post by Kalinkin, Alexander A
gael
Best regards,
Maria
Sent: Thursday, April 5, 2018 2:54 PM
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
On Thu, Apr 5, 2018 at 11:00 PM, Christoph Hertzberg <
Would it be useful to incorporate lambda's into the interface to avoid
begin/end pairs? So from the user side, we would write code like this
Post by Kalinkin, Alexander A
1) Analyze and run
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
int it = 0;
for (int it = 0; ...; ++it) {
if (it == 0)
llt.matrixL().sparseAnalyzeAndRun(100, [&] { llt.solve(b);
});
Post by Kalinkin, Alexander A
else
llt.solve(b);
// ...
}
2) Analyze only
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
llt.matrixL().sparseAnalyze(100, [&] { llt.solve(b); });
// and solve as usual
For more complicated algorithms, one can always outline the lambda and
pass it into the analysis.
Post by Kalinkin, Alexander A
That would certainly clean up things a lot. Having to call
A.beginXY();
doStuff();
A.endXY();
is a very C-stylish API, which is error-prone (e.g., one of the calls
might not get called, because it is masked by a wrong if-condition, or due
to an exception, which is caught only outside).
Post by Kalinkin, Alexander A
This can usually be avoided with proper C++ constructs.
The advantages of the begin/end pair are that 1) this keeps the API
minimal, and 2) the user can easily wrap them to accommodate its taste and
usages. To avoid all sorts of redundancies, one could even write a free
Post by Kalinkin, Alexander A
while(...) {
optimize_if(iter==0, 100, [&]{x=llt.solve(b);}, llt.matrixL());
}
while(...) {
optimize_if(iter==0, 100, [&]{x=LU.solve(b);}, LU.matrixL(),
LU.matrixU());
Post by Kalinkin, Alexander A
}
Actually, this last example is not conceivable at the moment because the
factors in SparseLU are not stored in a standard compressed format, we
would need to add an option to tell SparseLU to turn them to regular
SparseMatrix and to use them for solving.
Post by Kalinkin, Alexander A
Anyways, all these variants are essentially syntactic
sugar/implementation details, and at this stage I'm more concerned about
possible shortcomings in the general approach. Is it flexible enough? I'm
also not fan of introducing a MklSparseMatrix inheriting SparseMatrix, but
I don't have a better offer for now that is generic and easily applicable
to decompositions.
Post by Kalinkin, Alexander A
gael
However, if this is required, I would suggest to add this method not
only to matrices but also to the decomposition (and pass it through to the
internal matrices). Otherwise, this does not scale for decompositions which
use more than one matrix (like SparseLU). And we could even let the
sparseAnalyze() functions return a proxy to the decomposition which would
Post by Kalinkin, Alexander A
x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});
or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`
The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes'
an operation (after how many iterations do you actually benefit from the
overhead of analyzing the operation?)
Post by Kalinkin, Alexander A
Christoph
Cheers,
-Edward
Thank you for opening this discussion on the public mailing list.
So let's discuss about the public API, which currently is not very
(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not
sounds very generic.
Post by Kalinkin, Alexander A
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.
In order to discuss these issues, let's consider the following typical
pattern: (e.g., non-linear optimization, eigenvalues, ...)
Post by Kalinkin, Alexander A
SimplicialLDLT<SparseMatrix<double> > llt(A);
while(...) {
...
x = llt.solve(b);
...
}
Here the triangular L factor is going to be used for triangular and
transposed-triangular solves dozens to hundreds of time but only the user
of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix.
Moreover, the user does not own the SparseMatrix that we want to
analyze/optimize for. Other patterns are likely easier to handle, so let's
focus on it for now.
Post by Kalinkin, Alexander A
Regarding (i1), I would suggest to introduce a new type, say
MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance.
Post by Kalinkin, Alexander A
MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}
All states in MklSparseMatrix would be mutable.
Between a pair of beginAnalysis/endAnalysis each call to a supported
operation would trigger calls to
mkl_sparse_set_*_hint()/mkl_sparse_optimize.
Post by Kalinkin, Alexander A
Optionally, we could even add a "dryrun" mode for which no operation
would be performed, only calls to mkl_sparse_set_*_hint() and then
mkl_sparse_optimize would be called in endAnalysis(). This way
mkl_sparse_optimize() would be called only once.
Post by Kalinkin, Alexander A
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
...
if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D
would still be performed, but calls to actual triangular solves would be
by-passed
Post by Kalinkin, Alexander A
llt.matrixL().endAnalysis();
while(...) {
...
x = llt.solve(b);
...
}
If someone directly deal with the factor L, then we could follow the
SimplicialLLT<SparseMatrix<double> > llt(A);
MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
...
y = L.triangularView<Lower>() * x;
...
}
This design in quite general and expendable to any sparse-optimizers,
even built-in ones in the future.
Post by Kalinkin, Alexander A
In contrast to the current proposal, only selected operations would be
passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).
Post by Kalinkin, Alexander A
What do you think?
gael
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at
Intel ®
Post by Kalinkin, Alexander A
MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve
investigated our
Post by Kalinkin, Alexander A
Eigen support different operations for sparse matrices stored in
CSR and CSC
Post by Kalinkin, Alexander A
format which can be implemented on a basis of IE SpBLAS kernels
(please,
Post by Kalinkin, Alexander A
refer to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
Post by Kalinkin, Alexander A
<
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
Post by Kalinkin, Alexander A
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
operations:____
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)____
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)____
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).____
I’ve already started with implementation of
sparse_time_dense_impl_mkl
Post by Kalinkin, Alexander A
kernel which is based on mkl_sparse_?_mv (included in patch).____
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include
module ____
Post by Kalinkin, Alexander A
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle
required for all
Post by Kalinkin, Alexander A
IE SpBLAS routines */____
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with
operation
Post by Kalinkin, Alexander A
= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____
*A.destroySparseHandle();* /* *NEW*: is used to delete created
handle */
Post by Kalinkin, Alexander A
}____
__ __
I’ve attached a draft patch including all necessary changes and
would like
Post by Kalinkin, Alexander A
to hear your feedback.
Please, let me know if you have any questions and comments.____
__ __
Best regards,
Maria____
__ __
__ __
__ __
--
Dr.-Ing. Christoph Hertzberg
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5<
https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g
Post by Kalinkin, Alexander A
28359 Bremen, Germany<
https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g
Post by Kalinkin, Alexander A
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1<
https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g
Post by Kalinkin, Alexander A
28359 Bremen, Germany<
https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g
Post by Kalinkin, Alexander A
Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern<
https://maps.google.com/?q=Trippstadter+Stra%C3%9Fe+122,+D-67663+Kaiserslautern&entry=gmail&source=g
Post by Kalinkin, Alexander A
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
--
Dr.-Ing. Christoph Hertzberg
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
28359 Bremen, Germany
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
28359 Bremen, Germany
Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Gael Guennebaud
2018-04-07 06:08:26 UTC
Permalink
Post by Edward Lam
Is it within reason to change the ABI of the solvers?
yes, changing the ABI of solvers is acceptable because they are very
unlikely to be passed across libraries/applications.

gael
Post by Edward Lam
-Edward
On Fri, Apr 6, 2018, 4:33 PM Christoph Hertzberg, <
Post by Christoph Hertzberg
Hi,
your mail-client seems to mess-up quotations (it is deleting all '>' at
the line beginnings) That makes it a bit hard to read.
There is the possibility to modify SparseMatrix locally by defining the
macro EIGEN_SPARSEMATRIX_PLUGIN
http://eigen.tuxfamily.org/dox-devel/TopicCustomizing_Plugins.html
But, as we both said, we think it is generally not a good idea to change
the ABI of core Eigen types.
Cheers,
Christoph
Post by Kalinkin, Alexander A
Hi All,
First of all appreciate your interest in integration of sparse kernels
in Eigen.
Post by Kalinkin, Alexander A
Please see my comments below
vec1_dense.noalias() += column_major_sparse * vec2_dense
Since this operation is not supported by mkl_sparse_optimize, the
overhead is null here, but for some very sparse matrices (5 nnz/col)
mkl_sparse_d_mv is about x2 slower than Eigen, while for denser matrices
both implementations achieve the same speed.
Post by Kalinkin, Alexander A
[akalinki] That’s sound strange. Can I ask you which MKL version do you
use for your testing and what architecture is the target? We implemented
matvec support in IE SparseBlas couple of years ago and significantly
improve its performance during this year. Can I ask you also about the size
of matrix, so I will be able to check it on my side?
Post by Kalinkin, Alexander A
vec1_dense.noalias() += row_major_sparse * vec2_dense
and I observe a very significant drop of performance (almost x10)
compared to pure Eigen (with -fopenmp) if this operation is performed only
once, or very few times... Changing the 'expected_calls' parameter from 10
to 1 does not reduce the performance drop due to mkl_sparse_optimize. If I
repeat this operation hundreds of times, then both Eigen and MKL achieve
the same level of performance for this particular operation and the
matrices I tried (1M^2 Laplacian, bi-Laplacian, and tri-Laplacian matrices).
Post by Kalinkin, Alexander A
So I still believe that silently and unconditionally falling back to IE
SpBLAS is not the right approach. It could be that calling MKL's SpBLAS is
never a loss (we have to bench), but that's clearly not the case of the
optimize step. So, clearly, the optimize step must be explicitly activated
on user selected expressions only.
Post by Kalinkin, Alexander A
[akalinki] This can be an issue for an older version of MKL, because
initially we spent additional time on doing hints/optimize when calling
this every time. Starting from MKL2018u3 we’re planning to remove this
overhead – if you want we can share eng. build to check performance on same
binaries.
Post by Kalinkin, Alexander A
Also about poor performance of SparseBlas matvec functionality in
comparison with Eigen built-in one: all these cases we interpret as issue
that need to be investigated and should be fixed on our side. So I don’t
think that we need to add any dispatch on Eigen level between built-in
kernels and ours – all this stuff should be hidden in MKL libraries.
Post by Kalinkin, Alexander A
And back to the main question of this topic – we have customers who use
Eigen on Intel processors and are interested in optimizing their code.
Implementation of new SparseMatrix type will forced them to rewrite the
code, that sometimes can be impossible due to different reasons. That’s why
we are asking to find the way how we can just extend current SparseMatrix
type to support MKL functionality.
Post by Kalinkin, Alexander A
Thanks,
Alexander Kalinkin,
PhD, Team leader, Architect or of Sparse components in Intel®MKL
Sent: Friday, April 6, 2018 5:06 AM
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
vec1_dense.noalias() += column_major_sparse * vec2_dense
Since this operation is not supported by mkl_sparse_optimize, the
overhead is null here, but for some very sparse matrices (5 nnz/col)
mkl_sparse_d_mv is about x2 slower than Eigen, while for denser matrices
both implementations achieve the same speed.
Post by Kalinkin, Alexander A
gael.
On Fri, Apr 6, 2018 at 1:42 PM, Gael Guennebaud <
vec1_dense.noalias() += row_major_sparse * vec2_dense
and I observe a very significant drop of performance (almost x10)
compared to pure Eigen (with -fopenmp) if this operation is performed only
once, or very few times... Changing the 'expected_calls' parameter from 10
to 1 does not reduce the performance drop due to mkl_sparse_optimize. If I
repeat this operation hundreds of times, then both Eigen and MKL achieve
the same level of performance for this particular operation and the
matrices I tried (1M^2 Laplacian, bi-Laplacian, and tri-Laplacian matrices).
Post by Kalinkin, Alexander A
So I still believe that silently and unconditionally falling back to IE
SpBLAS is not the right approach. It could be that calling MKL's SpBLAS is
never a loss (we have to bench), but that's clearly not the case of the
optimize step. So, clearly, the optimize step must be explicitly activated
on user selected expressions only.
Post by Kalinkin, Alexander A
gael
Best regards,
Maria
Sent: Thursday, April 5, 2018 2:54 PM
Subject: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
On Thu, Apr 5, 2018 at 11:00 PM, Christoph Hertzberg <
Would it be useful to incorporate lambda's into the interface to avoid
begin/end pairs? So from the user side, we would write code like this
Post by Kalinkin, Alexander A
1) Analyze and run
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
int it = 0;
for (int it = 0; ...; ++it) {
if (it == 0)
llt.matrixL().sparseAnalyzeAndRun(100, [&] {
llt.solve(b); });
Post by Kalinkin, Alexander A
else
llt.solve(b);
// ...
}
2) Analyze only
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
llt.matrixL().sparseAnalyze(100, [&] { llt.solve(b); });
// and solve as usual
For more complicated algorithms, one can always outline the lambda and
pass it into the analysis.
Post by Kalinkin, Alexander A
That would certainly clean up things a lot. Having to call
A.beginXY();
doStuff();
A.endXY();
is a very C-stylish API, which is error-prone (e.g., one of the calls
might not get called, because it is masked by a wrong if-condition, or due
to an exception, which is caught only outside).
Post by Kalinkin, Alexander A
This can usually be avoided with proper C++ constructs.
The advantages of the begin/end pair are that 1) this keeps the API
minimal, and 2) the user can easily wrap them to accommodate its taste and
usages. To avoid all sorts of redundancies, one could even write a free
Post by Kalinkin, Alexander A
while(...) {
optimize_if(iter==0, 100, [&]{x=llt.solve(b);}, llt.matrixL());
}
while(...) {
optimize_if(iter==0, 100, [&]{x=LU.solve(b);}, LU.matrixL(),
LU.matrixU());
Post by Kalinkin, Alexander A
}
Actually, this last example is not conceivable at the moment because
the factors in SparseLU are not stored in a standard compressed format, we
would need to add an option to tell SparseLU to turn them to regular
SparseMatrix and to use them for solving.
Post by Kalinkin, Alexander A
Anyways, all these variants are essentially syntactic
sugar/implementation details, and at this stage I'm more concerned about
possible shortcomings in the general approach. Is it flexible enough? I'm
also not fan of introducing a MklSparseMatrix inheriting SparseMatrix, but
I don't have a better offer for now that is generic and easily applicable
to decompositions.
Post by Kalinkin, Alexander A
gael
However, if this is required, I would suggest to add this method not
only to matrices but also to the decomposition (and pass it through to the
internal matrices). Otherwise, this does not scale for decompositions which
use more than one matrix (like SparseLU). And we could even let the
sparseAnalyze() functions return a proxy to the decomposition which would
Post by Kalinkin, Alexander A
x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});
or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`
The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes'
an operation (after how many iterations do you actually benefit from the
overhead of analyzing the operation?)
Post by Kalinkin, Alexander A
Christoph
Cheers,
-Edward
Thank you for opening this discussion on the public mailing list.
So let's discuss about the public API, which currently is not very
(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not
sounds very generic.
Post by Kalinkin, Alexander A
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.
In order to discuss these issues, let's consider the following typical
pattern: (e.g., non-linear optimization, eigenvalues, ...)
Post by Kalinkin, Alexander A
SimplicialLDLT<SparseMatrix<double> > llt(A);
while(...) {
...
x = llt.solve(b);
...
}
Here the triangular L factor is going to be used for triangular and
transposed-triangular solves dozens to hundreds of time but only the user
of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix.
Moreover, the user does not own the SparseMatrix that we want to
analyze/optimize for. Other patterns are likely easier to handle, so let's
focus on it for now.
Post by Kalinkin, Alexander A
Regarding (i1), I would suggest to introduce a new type, say
MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance.
Post by Kalinkin, Alexander A
MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}
All states in MklSparseMatrix would be mutable.
Between a pair of beginAnalysis/endAnalysis each call to a supported
operation would trigger calls to mkl_sparse_set_*_hint()/mkl_
sparse_optimize.
Post by Kalinkin, Alexander A
Optionally, we could even add a "dryrun" mode for which no operation
would be performed, only calls to mkl_sparse_set_*_hint() and then
mkl_sparse_optimize would be called in endAnalysis(). This way
mkl_sparse_optimize() would be called only once.
Post by Kalinkin, Alexander A
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
...
if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D
would still be performed, but calls to actual triangular solves would be
by-passed
Post by Kalinkin, Alexander A
llt.matrixL().endAnalysis();
while(...) {
...
x = llt.solve(b);
...
}
If someone directly deal with the factor L, then we could follow the
SimplicialLLT<SparseMatrix<double> > llt(A);
MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
...
y = L.triangularView<Lower>() * x;
...
}
This design in quite general and expendable to any sparse-optimizers,
even built-in ones in the future.
Post by Kalinkin, Alexander A
In contrast to the current proposal, only selected operations would be
passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).
Post by Kalinkin, Alexander A
What do you think?
gael
On Tue, Apr 3, 2018 at 11:39 PM, Zhukova, Maria <
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer
at Intel ®
Post by Kalinkin, Alexander A
MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve
investigated our
Post by Kalinkin, Alexander A
Eigen support different operations for sparse matrices stored in
CSR and CSC
Post by Kalinkin, Alexander A
format which can be implemented on a basis of IE SpBLAS kernels
(please,
Post by Kalinkin, Alexander A
refer to
https://software.intel.com/en-us/mkl-developer-reference-c-
inspector-executor-sparse-blas-routines
Post by Kalinkin, Alexander A
<https://software.intel.com/en-us/mkl-developer-reference-
c-inspector-executor-sparse-blas-routines>
Post by Kalinkin, Alexander A
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
operations:____
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)____
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)____
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).____
I’ve already started with implementation of
sparse_time_dense_impl_mkl
Post by Kalinkin, Alexander A
kernel which is based on mkl_sparse_?_mv (included in patch).____
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include
module ____
Post by Kalinkin, Alexander A
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle
required for all
Post by Kalinkin, Alexander A
IE SpBLAS routines */____
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with
operation
Post by Kalinkin, Alexander A
= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____
*A.destroySparseHandle();* /* *NEW*: is used to delete created
handle */
Post by Kalinkin, Alexander A
}____
__ __
I’ve attached a draft patch including all necessary changes and
would like
Post by Kalinkin, Alexander A
to hear your feedback.
Please, let me know if you have any questions and comments.____
__ __
Best regards,
Maria____
__ __
__ __
__ __
--
Dr.-Ing. Christoph Hertzberg
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5&entry=gmail&source=g>
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%
0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
Post by Kalinkin, Alexander A
28359 Bremen, Germany<https://maps.google.
com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+
Germany&entry=gmail&source=g>
Post by Kalinkin, Alexander A
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1&entry=gmail&source=g>
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%
0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
Post by Kalinkin, Alexander A
28359 Bremen, Germany<https://maps.google.
com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+
Germany&entry=gmail&source=g>
Post by Kalinkin, Alexander A
Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
dfki.de>
Post by Kalinkin, Alexander A
Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------
------------
Post by Kalinkin, Alexander A
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
<https://maps.google.com/?q=Trippstadter+Stra%C3%9Fe+122,+D-67663+Kaiserslautern&entry=gmail&source=g>
<https://maps.google.com/?q=Trippstadter+Stra%C3%9Fe+122,+D-67663+
Kaiserslautern&entry=gmail&source=g>
Post by Kalinkin, Alexander A
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------
------------
--
Dr.-Ing. Christoph Hertzberg
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A%C2%A0+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A%C2%A0+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
<https://maps.google.com/?q=Trippstadter+Stra%C3%9Fe+122,+D-67663+Kaiserslautern&entry=gmail&source=g>
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Gael Guennebaud
2018-04-06 23:02:37 UTC
Permalink
Hi,

to answer your questions, let me clarify that I'm using the 2017.5.220
version of MKL bcause the 2018 version is not compatible with my OS (OSX
10.11). The CPU is a quadcore-Haswell (I7-4960HQ).

I tried various symmetric matrices of about 1M x 1M with 5M to 15M
nonzeros, but also some from the suitesparse collection such as 'bmw3_2'.
To be fair, some are also significantly faster (like x2) when using MKL's
IE SparseBlas, for instance 'pkustk14', but again only if doing a dozen of
products, otherwise calling mkl_sparse_optimize is counter productive.


On Fri, Apr 6, 2018 at 8:49 PM, Kalinkin, Alexander A <
Post by Kalinkin, Alexander A
[akalinki] This can be an issue for an older version of MKL, because
initially we spent additional time on doing hints/optimize when calling
this every time. Starting from MKL2018u3 we’re planning to remove this
overhead – if you want we can share eng. build to check performance on same
binaries.
Also about poor performance of SparseBlas matvec functionality in
comparison with Eigen built-in one: all these cases we interpret as issue
that need to be investigated and should be fixed on our side. So I don’t
think that we need to add any dispatch on Eigen level between built-in
kernels and ours – all this stuff should be hidden in MKL libraries.
I have no doubt that SparseBlas can always beat or be as fast as Eigen
without calling mkl_sparse_optimize, but I have strong doubts that
calling mkl_sparse_optimize
with expected_calls=10 even if the matrix will be used only once for the
given operation can come with neglectible overhead.
Post by Kalinkin, Alexander A
And back to the main question of this topic – we have customers who use
Eigen on Intel processors and are interested in optimizing their code.
Implementation of new SparseMatrix type will forced them to rewrite the
code, that sometimes can be impossible due to different reasons. That’s why
we are asking to find the way how we can just extend current SparseMatrix
type to support MKL functionality.
I agree that ideally it would be nice to have the same level of
transparency as we have for dense BLAS, but since here you have to
attach/store additional information for the matrices and (IMO) have to
control for which operations/matrices mkl_sparse_optimize has to be called,
this seems hardly doable.

If you want to stick with your approach of calling mkl_sparse_optimize
unconditionally and store the required information within SparseMatrix,
then you can still ship a header-only "module" that does so without
touching to Eigen. Your current proof-of-concept is actually very close to
that. You need to:

- inject data-members and some accessors to SparseCompressedBase through
the currently available plugin mechanism,
- access them through free-functions with overloads for Transpose<> (better
than injecting code to Transpose<>),
- sparse handles can then be created on demand (the first time you need
one),
- and they can be automatically deleted at the destruction time of the
matrix (to this end you have to wrap the handle within a tiny class that
will become a member of SparseCompressedBase and call mkl_sparse_destroy in
the destructor of this wrapper).

The main problem with this approach is that if someone modifies a
SparseMatrix after having done some operations on it, then the
sparse-handle becomes invalid, and you would need to ask the user to update
its code to explicitly invalidate the sparse-handle at adequate places.
This issue is also present in your proof-of-concept.

gael
Post by Kalinkin, Alexander A
Thanks,
Alexander Kalinkin,
PhD, Team leader, Architect or of Sparse components in Intel®MKL
*Sent:* Friday, April 6, 2018 5:06 AM
*Subject:* Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
vec1_dense.noalias() += column_major_sparse * vec2_dense
Since this operation is not supported by mkl_sparse_optimize, the
overhead is null here, but for some very sparse matrices (5 nnz/col)
mkl_sparse_d_mv is about x2 slower than Eigen, while for denser matrices
both implementations achieve the same speed.
gael.
vec1_dense.noalias() += row_major_sparse * vec2_dense
and I observe a very significant drop of performance (almost x10) compared
to pure Eigen (with -fopenmp) if this operation is performed only once, or
very few times... Changing the 'expected_calls' parameter from 10 to 1
does not reduce the performance drop due to mkl_sparse_optimize. If I
repeat this operation hundreds of times, then both Eigen and MKL achieve
the same level of performance for this particular operation and the
matrices I tried (1M^2 Laplacian, bi-Laplacian, and tri-Laplacian matrices).
So I still believe that silently and unconditionally falling back to IE
SpBLAS is not the right approach. It could be that calling MKL's SpBLAS is
never a loss (we have to bench), but that's clearly not the case of the
optimize step. So, clearly, the optimize step must be explicitly activated
on user selected expressions only.
gael
Best regards,
Maria
*Sent:* Thursday, April 5, 2018 2:54 PM
*Subject:* Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen
On Thu, Apr 5, 2018 at 11:00 PM, Christoph Hertzberg <
Would it be useful to incorporate lambda's into the interface to avoid
begin/end pairs? So from the user side, we would write code like this
1) Analyze and run
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
int it = 0;
for (int it = 0; ...; ++it) {
if (it == 0)
llt.matrixL().sparseAnalyzeAndRun(100, [&] { llt.solve(b); });
else
llt.solve(b);
// ...
}
2) Analyze only
SimplicialLDLT<MklSparseMatrix<double>> llt(A);
llt.matrixL().sparseAnalyze(100, [&] { llt.solve(b); });
// and solve as usual
For more complicated algorithms, one can always outline the lambda and
pass it into the analysis.
That would certainly clean up things a lot. Having to call
A.beginXY();
doStuff();
A.endXY();
is a very C-stylish API, which is error-prone (e.g., one of the calls
might not get called, because it is masked by a wrong if-condition, or due
to an exception, which is caught only outside).
This can usually be avoided with proper C++ constructs.
The advantages of the begin/end pair are that 1) this keeps the API
minimal, and 2) the user can easily wrap them to accommodate its taste and
usages. To avoid all sorts of redundancies, one could even write a free
while(...) {
optimize_if(iter==0, 100, [&]{x=llt.solve(b);}, llt.matrixL());
}
while(...) {
optimize_if(iter==0, 100, [&]{x=LU.solve(b);}, LU.
matrixL(), LU.matrixU());
}
Actually, this last example is not conceivable at the moment because the
factors in SparseLU are not stored in a standard compressed format, we
would need to add an option to tell SparseLU to turn them to regular
SparseMatrix and to use them for solving.
Anyways, all these variants are essentially syntactic sugar/implementation
details, and at this stage I'm more concerned about possible shortcomings
in the general approach. Is it flexible enough? I'm also not fan of
introducing a MklSparseMatrix inheriting SparseMatrix, but I don't have a
better offer for now that is generic and easily applicable to
decompositions.
gael
However, if this is required, I would suggest to add this method not only
to matrices but also to the decomposition (and pass it through to the
internal matrices). Otherwise, this does not scale for decompositions which
use more than one matrix (like SparseLU). And we could even let the
sparseAnalyze() functions return a proxy to the decomposition which would
x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});
or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`
The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes' an
operation (after how many iterations do you actually benefit from the
overhead of analyzing the operation?)
Christoph
Cheers,
-Edward
Thank you for opening this discussion on the public mailing list.
So let's discuss about the public API, which currently is not very
(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not sounds very generic.
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.
In order to discuss these issues, let's consider the following typical
pattern: (e.g., non-linear optimization, eigenvalues, ...)
SimplicialLDLT<SparseMatrix<double> > llt(A);
while(...) {
...
x = llt.solve(b);
...
}
Here the triangular L factor is going to be used for triangular and
transposed-triangular solves dozens to hundreds of time but only the user
of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix.
Moreover, the user does not own the SparseMatrix that we want to
analyze/optimize for. Other patterns are likely easier to handle, so let's
focus on it for now.
Regarding (i1), I would suggest to introduce a new type, say
MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance.
MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}
All states in MklSparseMatrix would be mutable.
Between a pair of beginAnalysis/endAnalysis each call to a supported
operation would trigger calls to mkl_sparse_set_*_hint()/mkl_
sparse_optimize.
Optionally, we could even add a "dryrun" mode for which no operation would
be performed, only calls to mkl_sparse_set_*_hint() and then
mkl_sparse_optimize would be called in endAnalysis(). This way
mkl_sparse_optimize() would be called only once.
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
...
if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D
would still be performed, but calls to actual triangular solves would be
by-passed
llt.matrixL().endAnalysis();
while(...) {
...
x = llt.solve(b);
...
}
If someone directly deal with the factor L, then we could follow the same
SimplicialLLT<SparseMatrix<double> > llt(A);
MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
...
y = L.triangularView<Lower>() * x;
...
}
This design in quite general and expendable to any sparse-optimizers, even
built-in ones in the future.
In contrast to the current proposal, only selected operations would be
passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).
What do you think?
gael
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at Intel ®
MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated our
Eigen support different operations for sparse matrices stored in CSR and CSC
format which can be implemented on a basis of IE SpBLAS kernels (please,
refer to
https://software.intel.com/en-us/mkl-developer-reference-c-
inspector-executor-sparse-blas-routines
<https://software.intel.com/en-us/mkl-developer-reference-
c-inspector-executor-sparse-blas-routines>
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
operations:____
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)____
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)____
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).____
I’ve already started with implementation of sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).____
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module ____
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required for all
IE SpBLAS routines */____
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation
= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____
*A.destroySparseHandle();* /* *NEW*: is used to delete created handle */
}____
__ __
I’ve attached a draft patch including all necessary changes and would like
to hear your feedback.
Please, let me know if you have any questions and comments.____
__ __
Best regards,
Maria____
__ __
__ __
__ __
--
Dr.-Ing. Christoph Hertzberg
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A+28359+Bremen,+Germany&entry=gmail&source=g>
Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
<https://maps.google.com/?q=Trippstadter+Stra%C3%9Fe+122,+D-67663+Kaiserslautern&entry=gmail&source=g>
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Zhukova, Maria
2018-04-11 17:41:26 UTC
Permalink
This post might be inappropriate. Click to display it.
Zhukova, Maria
2018-05-25 00:33:54 UTC
Permalink
This post might be inappropriate. Click to display it.
Edward Lam
2018-04-06 01:06:46 UTC
Permalink
On Thu, Apr 5, 2018, 7:00 PM Gael Guennebaud wrote,
Post by Gael Guennebaud
The advantages of the begin/end pair are that 1) this keeps the API
minimal, and 2) the user can easily wrap them to accommodate its taste and
usages. To avoid all sorts of redundancies, one could even write a free
With regards to syntax, it's a strong reason why I'm using a C++ library in
the first place. Otherwise, I might as well be using MKL directly. My point
is that the API should be safe by default. I don't mind having additional
means for something more flexible, but it should be out of the way for a
user to use it.

I think that a having a safe and easy API yet accommodating more general
needs should be possible. It will likely require more implementation effort
of course. As mentioned in my other post, there might be good ideas there
with having a separate optimizer (optimizer_hint?) class but that raises
questions between the lifetimes of this class, the algorithm, and the
matrix.

On a related topic, I have similar questions about this with the solver and
matrix class too. What happens if the matrix is destroyed before the solver
in Eigen? I've always been careful that doesn't happen when I write code
because I'm not sure of the consequences.

-Edward


...
Post by Gael Guennebaud
Actually, this last example is not conceivable at the moment because the
facto
Anyways, all these variants are essentially syntactic sugar/implementation
details, and at this stage I'm more concerned about possible shortcomings
in the general approach. Is it flexible enough? I'm also not fan of
introducing a MklSparseMatrix inheriting SparseMatrix, but I don't have a
better offer for now that is generic and easily applicable to
decompositions.
gael
Post by Christoph Hertzberg
However, if this is required, I would suggest to add this method not only
to matrices but also to the decomposition (and pass it through to the
internal matrices). Otherwise, this does not scale for decompositions which
use more than one matrix (like SparseLU). And we could even let the
sparseAnalyze() functions return a proxy to the decomposition which would
x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});
or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`
The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes' an
operation (after how many iterations do you actually benefit from the
overhead of analyzing the operation?)
Christoph
Post by Christoph Hertzberg
Cheers,
-Edward
Post by Gael Guennebaud
Thank you for opening this discussion on the public mailing list.
So let's discuss about the public API, which currently is not very
(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not
sounds very generic.
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.
In order to discuss these issues, let's consider the following typical
pattern: (e.g., non-linear optimization, eigenvalues, ...)
SimplicialLDLT<SparseMatrix<double> > llt(A);
while(...) {
...
x = llt.solve(b);
...
}
Here the triangular L factor is going to be used for triangular and
transposed-triangular solves dozens to hundreds of time but only the user
of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix.
Moreover, the user does not own the SparseMatrix that we want to
analyze/optimize for. Other patterns are likely easier to handle, so let's
focus on it for now.
Regarding (i1), I would suggest to introduce a new type, say
MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance.
MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}
All states in MklSparseMatrix would be mutable.
Between a pair of beginAnalysis/endAnalysis each call to a supported
operation would trigger calls to
mkl_sparse_set_*_hint()/mkl_sparse_optimize.
Optionally, we could even add a "dryrun" mode for which no operation
would be performed, only calls to mkl_sparse_set_*_hint() and then
mkl_sparse_optimize would be called in endAnalysis(). This way
mkl_sparse_optimize() would be called only once.
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
...
if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D
would still be performed, but calls to actual triangular solves would be
by-passed
llt.matrixL().endAnalysis();
while(...) {
...
x = llt.solve(b);
...
}
If someone directly deal with the factor L, then we could follow the
SimplicialLLT<SparseMatrix<double> > llt(A);
MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
...
y = L.triangularView<Lower>() * x;
...
}
This design in quite general and expendable to any sparse-optimizers,
even built-in ones in the future.
In contrast to the current proposal, only selected operations would be
passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).
What do you think?
gael
On Tue, Apr 3, 2018 at 11:39 PM, Zhukova, Maria <
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer at
Intel ®
MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated our
Eigen support different operations for sparse matrices stored in CSR and CSC
format which can be implemented on a basis of IE SpBLAS kernels (please,
refer to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
<
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines>
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
operations:____
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)____
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)____
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).____
I’ve already started with implementation of
sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).____
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module ____
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required for all
IE SpBLAS routines */____
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation
= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____
*A.destroySparseHandle();* /* *NEW*: is used to delete created handle */
}____
__ __
I’ve attached a draft patch including all necessary changes and would like
to hear your feedback.
Please, let me know if you have any questions and comments.____
__ __
Best regards,
Maria____
__ __
__ __
__ __
--
Dr.-Ing. Christoph Hertzberg
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A%C2%A028359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A%C2%A028359+Bremen,+Germany&entry=gmail&source=g>
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A%C2%A028359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A%C2%A028359+Bremen,+Germany&entry=gmail&source=g>
Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
<https://maps.google.com/?q=Trippstadter+Stra%C3%9Fe+122,+D-67663+Kaiserslautern&entry=gmail&source=g>
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Gael Guennebaud
2018-04-06 11:53:49 UTC
Permalink
Post by Edward Lam
On a related topic, I have similar questions about this with the solver
and matrix class too. What happens if the matrix is destroyed before the
solver in Eigen? I've always been careful that doesn't happen when I write
code because I'm not sure of the consequences.
Basically, no dependency for direct solvers whereas iterative solvers store
a reference. There are red warnings in the reference doc, e.g.:


https://eigen.tuxfamily.org/dox/classEigen_1_1ConjugateGradient.html#ac10f778fcd137eca1f6057c8ddd3d644

https://eigen.tuxfamily.org/dox/classEigen_1_1IterativeSolverBase.html#a7dfa55c55e82d697bde227696a630914

but I agree that that's not clear from the user manual pages.

gael
Post by Edward Lam
-Edward
...
Post by Gael Guennebaud
Actually, this last example is not conceivable at the moment because the
facto
Anyways, all these variants are essentially syntactic
sugar/implementation details, and at this stage I'm more concerned about
possible shortcomings in the general approach. Is it flexible enough? I'm
also not fan of introducing a MklSparseMatrix inheriting SparseMatrix, but
I don't have a better offer for now that is generic and easily applicable
to decompositions.
gael
Post by Christoph Hertzberg
However, if this is required, I would suggest to add this method not
only to matrices but also to the decomposition (and pass it through to the
internal matrices). Otherwise, this does not scale for decompositions which
use more than one matrix (like SparseLU). And we could even let the
sparseAnalyze() functions return a proxy to the decomposition which would
x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});
or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`
The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes'
an operation (after how many iterations do you actually benefit from the
overhead of analyzing the operation?)
Christoph
Post by Christoph Hertzberg
Cheers,
-Edward
Post by Gael Guennebaud
Thank you for opening this discussion on the public mailing list.
So let's discuss about the public API, which currently is not very
(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not
sounds very generic.
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.
In order to discuss these issues, let's consider the following typical
pattern: (e.g., non-linear optimization, eigenvalues, ...)
SimplicialLDLT<SparseMatrix<double> > llt(A);
while(...) {
...
x = llt.solve(b);
...
}
Here the triangular L factor is going to be used for triangular and
transposed-triangular solves dozens to hundreds of time but only the user
of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix.
Moreover, the user does not own the SparseMatrix that we want to
analyze/optimize for. Other patterns are likely easier to handle, so let's
focus on it for now.
Regarding (i1), I would suggest to introduce a new type, say
MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance.
MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}
All states in MklSparseMatrix would be mutable.
Between a pair of beginAnalysis/endAnalysis each call to a supported
operation would trigger calls to mkl_sparse_set_*_hint()/mkl_
sparse_optimize.
Optionally, we could even add a "dryrun" mode for which no operation
would be performed, only calls to mkl_sparse_set_*_hint() and then
mkl_sparse_optimize would be called in endAnalysis(). This way
mkl_sparse_optimize() would be called only once.
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
...
if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
...
++it;
}
SimplicialLDLT<MklSparseMatrix<double> > llt(A);
llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D
would still be performed, but calls to actual triangular solves would be
by-passed
llt.matrixL().endAnalysis();
while(...) {
...
x = llt.solve(b);
...
}
If someone directly deal with the factor L, then we could follow the
SimplicialLLT<SparseMatrix<double> > llt(A);
MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
...
y = L.triangularView<Lower>() * x;
...
}
This design in quite general and expendable to any sparse-optimizers,
even built-in ones in the future.
In contrast to the current proposal, only selected operations would be
passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).
What do you think?
gael
On Tue, Apr 3, 2018 at 11:39 PM, Zhukova, Maria <
Hello Eigen community,
My name is Maria Zhukova and I’m a software development engineer
at Intel ®
MKL Sparse team.
My team is interested in contributing into Eigen, so I’ve investigated our
Eigen support different operations for sparse matrices stored in CSR and CSC
format which can be implemented on a basis of IE SpBLAS kernels (please,
refer to
https://software.intel.com/en-us/mkl-developer-reference-c-
inspector-executor-sparse-blas-routines
<https://software.intel.com/en-us/mkl-developer-reference-
c-inspector-executor-sparse-blas-routines>
for the general idea of interfaces)
, basically we want to implement calls to our IE SpBLAS into next
operations:____
SparseMatrix + SparseMatrix (mkl_sparse_?_add)
SparseMatrix * DenseVector (mkl_sparse_?_mv)____
SparseMatrix * DenseMatrix (mkl_sparse_?_mm)____
SparseMatrix * SparseMatrix (mkl_sparse_spmm),
and Triangular solve (mkl_sparse_?_trsv).____
I’ve already started with implementation of
sparse_time_dense_impl_mkl
kernel which is based on mkl_sparse_?_mv (included in patch).____
*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module ____
void main () {
SparseMatrix<double, RowMajor> A;
Matrix<double, Dynamic, 1> x, y;
A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
*A.createSparseHandle();*/* *NEW*: is used to create handle required for all
IE SpBLAS routines */____
// support of IE SpBLAS is here
y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
SPARSE_OPERATION_NON_TRANSPOSE */
y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
operation = SPARSE_OPERATION_TRANSPOSE */
y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation
= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____
*A.destroySparseHandle();* /* *NEW*: is used to delete created handle */
}____
__ __
I’ve attached a draft patch including all necessary changes and would like
to hear your feedback.
Please, let me know if you have any questions and comments.____
__ __
Best regards,
Maria____
__ __
__ __
__ __
--
Dr.-Ing. Christoph Hertzberg
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A%C2%A028359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+5+%0D%0A%C2%A028359+Bremen,+Germany&entry=gmail&source=g>
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A%C2%A028359+Bremen,+Germany&entry=gmail&source=g>
28359 Bremen, Germany
<https://maps.google.com/?q=Robert-Hooke-Stra%C3%9Fe+1+%0D%0A%C2%A028359+Bremen,+Germany&entry=gmail&source=g>
Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
<https://maps.google.com/?q=Trippstadter+Stra%C3%9Fe+122,+D-67663+Kaiserslautern&entry=gmail&source=g>
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------
Loading...