Jason Newton
2016-09-02 07:47:23 UTC
Hey Gael et al,
I was trying to understand the underlying implementation of the General
Matrix Matrix and Matrix Vector products - I haven't succeeded there (is
there a write up on it somewhere?) but I thought I'd inquire about making a
codepath which other software can match the results of, provided they play
by the same rules. Maybe this means evaluating products like cblas/blas
reference does, maybe this means the naive approach - maybe allow the user
to switch between those 2 implementations. Maybe the MKL / blas code path
can be leveraged for this and enhanced (to support reference blas and
general inverse as well).
The idea for control is to set a macro up top to select this preference,
like the default storage order or palatalization/vectorization knobs.
The advantage of doing this is when porting code from one context to
another (be it GPUs, or different languages - like python/numpy) we can get
a 100% bit-exact match as long as both domains follow the same algorithms
(and deal with rounding the same way, another topic) which provides a
fairly strong guarantee that the ported code/code in another domain is
correct (provided a large enough input space is used for coverage) - and
when not performing that verification, we can go back to the fast paths.
Further, it is by necessity a reproducible result, across machines (maybe
not architectures, but certainly processor configs) - this also has value
to many people who are willing to take performance penalties to attain - I
think Eigen already achieves this when disabling openmp, and in mixed
generation machines, disabling vectorization potentially, but I thought I'd
throw it out there too.
I can tell you small matrices on a GPU (small enough to fit in local
memory) - the naive approach works very well for parallel computations and
probably are most frequently used matrices sizes.
Python/numpy doesn't care and fully relies on the underlying blas
implementation,, which users can fairly easily inspect what they're using,
and they support cblas by default.
Thought of bringing this up on the ML after reading (listed to show the
property is desirable, not that it's my first time encountering the issue):
http://stackoverflow.com/questions/22116553/why-result-of-matrix-multiplication-with-eigen-is-different-from-standard-vector
https://forum.kde.org/viewtopic.php?f=74&t=119907#p303319
-Jason
I was trying to understand the underlying implementation of the General
Matrix Matrix and Matrix Vector products - I haven't succeeded there (is
there a write up on it somewhere?) but I thought I'd inquire about making a
codepath which other software can match the results of, provided they play
by the same rules. Maybe this means evaluating products like cblas/blas
reference does, maybe this means the naive approach - maybe allow the user
to switch between those 2 implementations. Maybe the MKL / blas code path
can be leveraged for this and enhanced (to support reference blas and
general inverse as well).
The idea for control is to set a macro up top to select this preference,
like the default storage order or palatalization/vectorization knobs.
The advantage of doing this is when porting code from one context to
another (be it GPUs, or different languages - like python/numpy) we can get
a 100% bit-exact match as long as both domains follow the same algorithms
(and deal with rounding the same way, another topic) which provides a
fairly strong guarantee that the ported code/code in another domain is
correct (provided a large enough input space is used for coverage) - and
when not performing that verification, we can go back to the fast paths.
Further, it is by necessity a reproducible result, across machines (maybe
not architectures, but certainly processor configs) - this also has value
to many people who are willing to take performance penalties to attain - I
think Eigen already achieves this when disabling openmp, and in mixed
generation machines, disabling vectorization potentially, but I thought I'd
throw it out there too.
I can tell you small matrices on a GPU (small enough to fit in local
memory) - the naive approach works very well for parallel computations and
probably are most frequently used matrices sizes.
Python/numpy doesn't care and fully relies on the underlying blas
implementation,, which users can fairly easily inspect what they're using,
and they support cblas by default.
Thought of bringing this up on the ML after reading (listed to show the
property is desirable, not that it's my first time encountering the issue):
http://stackoverflow.com/questions/22116553/why-result-of-matrix-multiplication-with-eigen-is-different-from-standard-vector
https://forum.kde.org/viewtopic.php?f=74&t=119907#p303319
-Jason