What Are The Downsides Of Always Using Numpy Arrays Instead Of Python Lists?
Solution 1:
With your small example, the list comprehension is faster than the array method, even when taking the array creation out of the timing loop:
In [204]: list_of_lists = [["a","b","c"], ["d","e","f"], ["g","h","i"]]
...: flattened_list = [i for j in list_of_lists for i in j]
In [205]: timeit [i for j in list_of_lists for i in j]
757 ns ± 17.3 ns per loop (mean ± std. dev. of7 runs, 1000000 loops each)
In [206]: np.ravel(list_of_lists)
Out[206]: array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'], dtype='<U1')
In [207]: timeit np.ravel(list_of_lists)
8.05 µs ± 12.4 ns per loop (mean ± std. dev. of7 runs, 100000 loops each)
In [208]: %%timeit x = np.array(list_of_lists)
...: np.ravel(x)
2.33 µs ± 22.8 ns per loop (mean ± std. dev. of7 runs, 100000 loops each)
With a much larger example, I expect [208] to get better.
If the sublists differ in size, the array is not 2d, and flatten does nothing:
In [209]: list_of_lists = [["a","b","c",23], ["d",None,"f"], ["g","h","i"]]
...: flattened_list = [i for j in list_of_lists for i in j]
In [210]: flattened_list
Out[210]: ['a', 'b', 'c', 23, 'd', None, 'f', 'g', 'h', 'i']
In [211]: np.array(list_of_lists)
Out[211]:
array([list(['a', 'b', 'c', 23]), list(['d', None, 'f']),
list(['g', 'h', 'i'])], dtype=object)
Growing lists is more efficient:
In [217]: alist = []
In [218]: forrowin list_of_lists:
...: alist.append(row)
...:
In [219]: alist
Out[219]: [['a', 'b', 23], ['d', None, 'f'], ['g', 'h', 'i']]
In [220]: np.array(alist)
Out[220]:
array([['a', 'b', 23],
['d', None, 'f'],
['g', 'h', 'i']], dtype=object)
We strongly discourage iterative concatenation. Collect the sublists or arrays in a list first.
Solution 2:
Yes there are. The rule of thumb would be to remember numpy.array
is better for data of the same datatype (all integers, all double precision fp, all booleans, strings of the same length etc) instead of a mix bag of things. In the latter case you might just as well using generic list, considering this:
In [93]: a = [b'5', 5, '55', 'ab', 'cde', 'ef', 4, 6]
In [94]: b = np.array(a)
In [95]: %timeit 5 in a
65.6 ns ± 0.79 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [96]: %timeit 6 in a # worst case219 ns ± 5.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [97]: %timeit 5 in b
10.9 µs ± 217 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
look at this several magnitudes of performance difference, where numpy.array
is slower! Certainly this depends on the dimension of the list, and in this particular case depends on the index of 5 or 6 (worst case of O(n) complexity), but you get the idea.
Solution 3:
Numpy arrays and functions are better for the most part. Here is an article if you want to look into it more: https://webcourses.ucf.edu/courses/1249560/pages/python-lists-vs-numpy-arrays-what-is-the-difference
Post a Comment for "What Are The Downsides Of Always Using Numpy Arrays Instead Of Python Lists?"