When Turning A List Of Lists Of Tuples To An Array, How Can I Stop Tuples From Creating A 3rd Dimension?
Solution 1:
To np.array
, your list of lists of tuples isn't any different from a list of lists of lists. It's iterables all the way down. np.array
tries to create as high a dimensional array as possible. In this case that is 3d.
There are ways of side stepping that and making a 2d array that contains objects, where those objects are things like tuples. But as noted in the comments, why would you want that?
In a recent SO question, I came up with this way of turning a n-d array into an object array of (n-m)-d shape:
In [267]: res = np.empty((3,2),object)
In [268]: arr = np.array(alist)
In [269]: for ij in np.ndindex(res.shape):
...: res[ij] = arr[ij]
...:
In [270]: res
Out[270]:
array([[array([1, 2]), array([2, 3])],
[array([4, 5]), array([5, 6])],
[array([7, 8]), array([8, 9])]], dtype=object)
But that's a 2d array of arrays, not of tuples.
In [271]: for ij in np.ndindex(res.shape):
...: res[ij] = tuple(arr[ij].tolist())
...:
...:
In [272]: res
Out[272]:
array([[(1, 2), (2, 3)],
[(4, 5), (5, 6)],
[(7, 8), (8, 9)]], dtype=object)
That's better (or is it?)
Or I could index the nested list directly:
In [274]: for i,j in np.ndindex(res.shape):
...: res[i,j] = alist[i][j]
...:
In [275]: res
Out[275]:
array([[(1, 2), (2, 3)],
[(4, 5), (5, 6)],
[(7, 8), (8, 9)]], dtype=object)
I'm using ndindex
to generate the all the indices of a (3,2) array.
The structured array mentioned in the comments works because for a compound dtype, tuples are distinct from lists.
In [277]: np.array(alist, 'i,i')
Out[277]:
array([[(1, 2), (2, 3)],
[(4, 5), (5, 6)],
[(7, 8), (8, 9)]], dtype=[('f0', '<i4'), ('f1', '<i4')])
Technically, though, that isn't an array of tuples. It just represents the elements (or records) of the array as tuples.
In the object dtype array, the elements of the array are pointers to the tuples in the list (at least in the Out[275]
case). In the structured array case the numbers are stored in the same as with a 3d array, as bytes in the array data buffer.
Solution 2:
Here are two more methods to complement @hpaulj's answer. One of them, the frompyfunc
methods seems to scale a bit better than the other methods, although hpaulj's preallocation method is also not bad if we get rid of the loop. See timings below:
import numpy as np
import itertools
bi_grams = [[(1, 2), (2, 3)], [(4, 5), (5, 6)], [(7, 8), (8, 9)]]
def f_pp_1(bi_grams):
return np.frompyfunc(itertools.chain.from_iterable(bi_grams).__next__, 0, 1)(np.empty((len(bi_grams), len(bi_grams[0])), dtype=object))
def f_pp_2(bi_grams):
res = np.empty((len(bi_grams), len(bi_grams[0])), dtype=object)
res[...] = bi_grams
return res
def f_hpaulj(bi_grams):
res = np.empty((len(bi_grams), len(bi_grams[0])), dtype=object)
for i, j in np.ndindex(res.shape):
res[i, j] = bi_grams[i][j]
return res
print(np.all(f_pp_1(bi_grams) == f_pp_2(bi_grams)))
print(np.all(f_pp_1(bi_grams) == f_hpaulj(bi_grams)))
from timeit import timeit
kwds = dict(globals=globals(), number=1000)
print(timeit('f_pp_1(bi_grams)', **kwds))
print(timeit('f_pp_2(bi_grams)', **kwds))
print(timeit('f_hpaulj(bi_grams)', **kwds))
big = 10000 * bi_grams
print(timeit('f_pp_1(big)', **kwds))
print(timeit('f_pp_2(big)', **kwds))
print(timeit('f_hpaulj(big)', **kwds))
Sample output:
True <- same result for
True <- different methods
0.004281356999854324 <- frompyfunc small input
0.002839841999957571 <- prealloc ellipsis small input
0.02361366100012674 <- prealloc loop small input
2.153144505 <- frompyfunc large input
5.152567720999741 <- prealloc ellipsis large input
33.13142323599959 <- prealloc looop large input
Post a Comment for "When Turning A List Of Lists Of Tuples To An Array, How Can I Stop Tuples From Creating A 3rd Dimension?"