Discussion:
[eigen] Efficiently creating a matrix of pairwise vector differences.
Smith, Louis
2017-12-20 23:10:19 UTC
Permalink
Hello,


I'm trying to use eigen to compute the distances between m length vectors i and j which are each rows in an NxM matrix (note that M is often much much smaller than N. In my test case N is about 250,000 and M is 6). What I'm currently working with is an expression like:


Matrix Xd data = readDataFromFile(is); //This works when data is written to cout, so elided.


MatrixXd distances = (data.rowwise() - data.transpose().colwise().transpose()).norm();


Which gives me the following error:

error: no member named 'transpose' in 'Eigen::VectorwiseOp<Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> >, 0>'
MatrixXd distances = (data.rowwise() - data.transpose().colwise().transpose()).norm();


When I get rid of the third transpose I'm then subtracting a column vector from a row vector, which also doesn't work. Calling transpose on the row vector gives a similar error.


What I expect distances to be is an NxN symmetric matrix containing the distances (norms of difference vectors) for each pair of row vectors in data.


Sorry for the newbie question, but I'd really appreciate some insight on this since it seems like there should be a more eigeny way to write this than the double-for loop over the data, which also works but is very slow.


Regards,

Louis
Gael Guennebaud
2017-12-22 08:12:02 UTC
Permalink
If I got your question right, you end up with a NxN symmetric matrix, so a
250k x 250k matrix, so about 500GB.... this does not sound right. Perhaps
you want to compute the differences between the column vectors? or you
inverted N and M? Anyway, for that task you can exploit that:

(a-b)^2 = a^2 + b^2 -2*a.dot(b)

with something like:

VectorXd D2 = data.rowwise().squaredNorm();
MatrixXd dist = D2.rowwise().replicate(N) +
D2.transpose().colwise().replicate(N);
dist -= 2.*data.transpose()*data;

that will be considerably faster.

gael


On Thu, Dec 21, 2017 at 12:10 AM, Smith, Louis <
Post by Smith, Louis
Hello,
I'm trying to use eigen to compute the distances between m length vectors
i and j which are each rows in an NxM matrix (note that M is often much
much smaller than N. In my test case N is about 250,000 and M is 6). What
Matrix Xd data = readDataFromFile(is); //This works when data is written
to cout, so elided.
MatrixXd distances = (data.rowwise() - data.transpose().colwise().
transpose()).norm();
Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> >, 0>'
MatrixXd distances = (data.rowwise() - data.transpose().colwise().
transpose()).norm();
When I get rid of the third transpose I'm then subtracting a column vector
from a row vector, which also doesn't work. Calling transpose on the row
vector gives a similar error.
What I expect distances to be is an NxN symmetric matrix containing the
distances (norms of difference vectors) for each pair of row vectors in
data.
Sorry for the newbie question, but I'd really appreciate some insight on
this since it seems like there should be a more eigeny way to write this
than the double-for loop over the data, which also works but is very slow.
Regards,
Louis
Jeff Hammond
2017-12-24 16:58:15 UTC
Permalink
Post by Smith, Louis
Hello,
I'm trying to use eigen to compute the distances between m length vectors
i and j which are each rows in an NxM matrix (note that M is often much
much smaller than N. In my test case N is about 250,000 and M is 6). What
Matrix Xd data = readDataFromFile(is); //This works when data is written
to cout, so elided.
MatrixXd distances = (data.rowwise() -
data.transpose().colwise().transpose()).norm();
error: no member named 'transpose' in
'Eigen::VectorwiseOp<Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1,
-1> >, 0>'
MatrixXd distances = (data.rowwise() -
data.transpose().colwise().transpose()).norm();
When I get rid of the third transpose I'm then subtracting a column vector
from a row vector, which also doesn't work. Calling transpose on the row
vector gives a similar error.
What I expect distances to be is an NxN symmetric matrix containing the
distances (norms of difference vectors) for each pair of row vectors in
data.
Sorry for the newbie question, but I'd really appreciate some insight on
this since it seems like there should be a more eigeny way to write this
than the double-for loop over the data, which also works but is very slow.
If you’re writing a 250000x250000 matrix, I don’t know how fast you think
it should be. If you did a GEMM between 250000x6 and 6x250000, it is likely
limited by write bandwidth, because the inner loop is 1-6 cycles on modern
hardware. You can compare GEMM to triple looks for these dimension to
verify.

I suspect double loops is actually going to win, especially if you convince
the compiler to generate nontemporal stores (assuming x86) to minimize RFO
traffic.

Jeff
--
Jeff Hammond
***@gmail.com
http://jeffhammond.github.io/
Loading...