[eigen] Signed or unsigned indexing

Discussion:

Henrik Mannerström

2017-01-20 13:25:06 UTC

This issue seems to warrant a separate message thread.

I'd like to offer my two cents: As much as I like Eigen I think there is a
strict ordering "c++ language" > "stl" > "Eigen". More than once have I
developed something and only at some later point brought in Eigen. Code
written in std:size_t fashion has then needed refactoring. So, if there
would be a vote, I'd vote for size_t indexing. I think smooth
interoperability with stl is valuable.

Best,
Henrik

Hi all,
while I would not want to argue with Gael nor with the numerous C++
experts advocating signed integers, today's reality is different. Some
libraries, the most prominent being the standard library, use unsigned
integers as indices. Therefore, mixing signed and unsigned types is sadly
unavoidable when using Eigen, which is a major annoyance (at least for me,
working with -pedantic -Werror).
In my opinion, using Eigen with -DEIGEN_DEFAULT_DENSE_INDEX_TYPE=size_t
should just work, regardless of the developers' personal preference for
signed integers.
Best,
Marton

I wonder if a rethink of reshape could allow a move to
unsigned index types, assuming I understand correctly
that Dynamic would be of another type. Itâs always been a
bit clunky getting âsize_t-correctnessâ right for mixed
Eigen/STL code, and compilers complain increasingly
nowadays. Perhaps now might be a time to give it a try?

See also: http://eigen.tuxfamily.org/index.php?title=FAQ#Why_Eig
en.27s_API_is_using_signed_integers_for_sizes.2C_indices.2C_etc..3F
Maybe one day we'll get a "fixed" std-v2 that would be more compatible
with libraries that made the right choice of using signed types.
For sparse matrices, I agree that we might try to allow for unsigned
types as the StorageIndex type. This should be doable while keeping signed
64 bits integers for the API (rows, cols, nonZeros, etc.)
We might also think about solutions to ease the mix of Eigen/STL code...
gael

I see the âdowncountingâ argument at https://listengine.tuxfamily.o
rg/lists.tuxfamily.org/eigen/2009/03/msg00099.html,
but that appears fairly strongly to be a special case where
one would anyway want to benchmark, check sizes etc.
Finally, I think we are in a world where sparse arrays with
entries in the 2-4billion range are reasonably common,
and one could conceivably be pleased to get the extra bit
backâŠ
Thanks again for a great library!
A.
Dr Andrew Fitzgibbon FREng FBCS FIAPR
Partner Scientist
Microsoft HoloLens, Cambridge, UK
http://aka.ms/awf
*Sent:* 13 January 2017 12:26
*Subject:* Re: [eigen] Let's get reshape in shape for 3.4
Also, regarding them RowMajor/ColMajor int/type issue - perhaps stuff
them in a new namespace or class - storage ? Too bad StorageOrder is
already used in so many places. Honestly I'm all for you making them
types and things working uniformly from there. I have used them
myself as integers with the flags bitset, but only for enable_if logic
which would be rendered obsolete if you had a collection of C++11
inspired type traits (instead they get repeated on the web a few
places). Sorry if I'm not being very detailed, it's been a while
since I've needed these, but my point is that it was basically a flaw
to use them as int's in the first place, in user code - and so I
encourage you to change things so it all works fluidly in the new api
without fear of upsetting users. Although perhaps that is a daunting
task...
I think you are mixing Eigen::RowMajor with Eigen::RowMajorBit. I agree
that the bit flags could be managed differently using individual type
traits, but regarding Eigen::RowMajor, it is currently used as a template
Matrix<...., RowMajor|DontAlign>
which is pretty convenient to write compared to having to subclass some
default_matrix_traits class to customize the options. With
RowMajor|DontAlign, RowMajor could still be instance of an
integral_constant-like type with operator | overloaded.... Actually I've
started to think about such an approach for Eigen::Dynamic, so that one can
M = N*2+1
and get M==Dynamic if N==Dynamic. Currently we always have to write: M =
N==Dynamic ? Dynamic : 2*N+1 which is error prone because it's easy to
forget about checking for Dynamic, especially when combining multiple
compile-time identifiers.
gael
-Jason

Hi Gael,
Glad to see all the new api's you're moving in for the new year.
I actually prefer C if C is a superset of B - that is the way it works
in Numpy - oder is overridable in several places, but mainly things
follow the matrix types you are working with (which would be
expressions here).
I haven't thought about the details but is there any reason
A.reshaped(4, n/2) work via constexprs or something on the 4? I
imagine even if it did you're trying to cover for C++98 though, but I
think fix<4> is a fair bit ugly.
As for the placeholder for a solvable dimension - the matlab
convension is the empty matrix and I welcome that notion (warped as a
Null, Nil, Empty, Filled, DontCare, Placeholder, CalcSize (this and
the next are more explicit), or AutoSized
-Jason
On Thu, Jan 12, 2017 at 10:35 AM, Gael Guennebaud

Hi everyone,
just after generic indexing/slicing, another long standing missing

feature

is reshape. So let's make it for 3.4.
This is not the first time we discuss it. There is a old bug report

entry

[1]. and a old pull-request with various discussions [2]. The Tensor

module

also support reshape [3].
However, the feature is still not there because we never converged

about how

to properly handle the ambiguity between col-major / row-major

orders, also

called Fortran versus C style orders (e.g., in numpy doc [4]).
A) Interpret the indices in column major only, regardless of the

storage

order.
- used in MatLab and Armadillo
- pros: simple strategy
- cons: not very friendly for row-major inputs (needs to transpose

twice)

B) Follows the storage order of the given expression
- used by the Tensor module
- pros: easiest implementation
* results depends on storage order (need to be careful in

generic code)

* not all expressions have a natural storage order (e.g., a+a^T,

a*b)

* needs a hard copy if, e.g., the user want to stack columns of a
row-major input

ColMajor,

RowMajor, Auto
- used by numpy [4] with default to RowMajor (aka C-like order)
- pros: give full control to the user
- cons: the API is a bit more complicated
At this stage, option C) seems to be the only reasonable one.

However, we

yet have to specify how to pass this option at compile-time, what Auto
means, and what is the default strategy.
Regarding 'Auto', it is similar to option (B) above. However, as I

already

mentioned, some expressions do not has any natural storage order. We

could

address this issue by limiting the use of 'Auto' to expressions for

which

- Any expressions with the DirectAccessBit flags (it means we are

dealing

with a Matrix, Map, sub-matrix, Ref, etc. but not with a generic

expression)

- Any expression with the LinearAccessBit flag: it means the

expression can

be efficiently processed as a 1D vector.
Any other situation would raise a static_assert.
But what if I really don't care and just want to, e.g., get a linear

view

with no constraints of the stacking order? Then we could add a fourth

option

meaning 'IDontCare', perhaps 'AnyOrder' ?
For the default behavior, I would propose 'ColMajor' which is perhaps

the

most common and predictable choice given that the default storage is

column

major too.
template<typename RowsType=Index,typename ColType=Index,typename

Order=Xxxx>

DenseBase::reshaped(RowsType rows,ColType cols,Order = Order());
template<typename Order= Xxxx >
DenseBase.reshaped(Order = Order());
Note that I used "reshaped" with a "d" on purpose.
The storage order of the resulting expression would match the optional
order.
Then for the name of the options we cannot use "RowMajor"/"ColMajor"

because

they already are defined as "static const int" and we need objects

with

different types here. Moreover, col-major/row-major does not extend

well to

multi-dimension tensors. I also don't really like the reference to

Fortran/C

as in numpy. "Forward"/"Backward" are confusing too. Any ideas?
The rows/cols parameters could also be a mix of compile-time & runtime
A.reshaped(fix<4>,n/2);
And maybe we could even allow a placeholder to automatically compute

one of

the dimension to match the given matrix size. We cannot reuse "Auto"

here

A.reshaped(5,Auto);
Again, any ideas for a good placeholder name? (numpy uses -1 but we

need a

compile-time identifier)
cheers,
gael
[1] http://eigen.tuxfamily.org/bz/show_bug.cgi?id=437
[2] https://bitbucket.org/eigen/eigen/pull-requests/41
[3]
https://bitbucket.org/eigen/eigen/src/default/unsupported/Ei

gen/CXX11/src/Tensor/README.md?fileviewer=file-view-default#
markdown-header-operation-reshapeconst-dimensions-new_dims

[4]
https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/

numpy.reshape.html

Brad Bell

2017-01-20 13:52:12 UTC