Skip to content Skip to sidebar Skip to footer

What Are The Downsides Of Always Using Numpy Arrays Instead Of Python Lists?

I'm writing a program in which I want to flatten an array, so I used the following code: list_of_lists = [['a','b','c'], ['d','e','f'], ['g','h','i']] flattened_list = [i for j in

Solution 1:

With your small example, the list comprehension is faster than the array method, even when taking the array creation out of the timing loop:

In [204]: list_of_lists = [["a","b","c"], ["d","e","f"], ["g","h","i"]] 
     ...: flattened_list = [i for j in list_of_lists for i in j]    

In [205]: timeit [i for j in list_of_lists for i in j]                                                       
757 ns ± 17.3 ns per loop (mean ± std. dev. of7 runs, 1000000 loops each)

In [206]: np.ravel(list_of_lists)                                                                            
Out[206]: array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'], dtype='<U1')

In [207]: timeit np.ravel(list_of_lists)                                                                     
8.05 µs ± 12.4 ns per loop (mean ± std. dev. of7 runs, 100000 loops each)

In [208]: %%timeit x = np.array(list_of_lists) 
     ...: np.ravel(x)                                                                                                     
2.33 µs ± 22.8 ns per loop (mean ± std. dev. of7 runs, 100000 loops each)

With a much larger example, I expect [208] to get better.

If the sublists differ in size, the array is not 2d, and flatten does nothing:

In [209]: list_of_lists = [["a","b","c",23], ["d",None,"f"], ["g","h","i"]] 
     ...: flattened_list = [i for j in list_of_lists for i in j]                                             
In [210]: flattened_list                                                                                     
Out[210]: ['a', 'b', 'c', 23, 'd', None, 'f', 'g', 'h', 'i']
In [211]: np.array(list_of_lists)                                                                            
Out[211]: 
array([list(['a', 'b', 'c', 23]), list(['d', None, 'f']),
       list(['g', 'h', 'i'])], dtype=object)

Growing lists is more efficient:

In [217]: alist = []                                                                                         
In [218]: forrowin list_of_lists: 
     ...:     alist.append(row) 
     ...:                                                                                                    
In [219]: alist                                                                                              
Out[219]: [['a', 'b', 23], ['d', None, 'f'], ['g', 'h', 'i']]
In [220]: np.array(alist)                                                                                    
Out[220]: 
array([['a', 'b', 23],
       ['d', None, 'f'],
       ['g', 'h', 'i']], dtype=object)

We strongly discourage iterative concatenation. Collect the sublists or arrays in a list first.

Solution 2:

Yes there are. The rule of thumb would be to remember numpy.array is better for data of the same datatype (all integers, all double precision fp, all booleans, strings of the same length etc) instead of a mix bag of things. In the latter case you might just as well using generic list, considering this:

In [93]: a = [b'5', 5, '55', 'ab', 'cde', 'ef', 4, 6]

In [94]: b = np.array(a)

In [95]: %timeit 5 in a
65.6 ns ± 0.79 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [96]: %timeit 6 in a  # worst case219 ns ± 5.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [97]: %timeit 5 in b
10.9 µs ± 217 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

look at this several magnitudes of performance difference, where numpy.array is slower! Certainly this depends on the dimension of the list, and in this particular case depends on the index of 5 or 6 (worst case of O(n) complexity), but you get the idea.

Solution 3:

Numpy arrays and functions are better for the most part. Here is an article if you want to look into it more: https://webcourses.ucf.edu/courses/1249560/pages/python-lists-vs-numpy-arrays-what-is-the-difference

Post a Comment for "What Are The Downsides Of Always Using Numpy Arrays Instead Of Python Lists?"