Numpy: In A Sorted List, Find The First And The Last Index For Each Unique Value
Having a sorted list, how can anyone find (using numpy) the first and the last index for each unique value? Example: Initial sorted list: >>> import numpy as np >>&
Solution 1:
Here's one approach leveraging the sorted nature of input data, making use of the very efficient NumPy array-slicing
and other NumPy functions -
def start_stop_arr(initial_list):
a = np.asarray(initial_list)
mask = np.concatenate(([True], a[1:] != a[:-1], [True]))
idx = np.flatnonzero(mask)
l = np.diff(idx)
start = np.repeat(idx[:-1], l)
stop = np.repeat(idx[1:]-1, l)
return start, stop
Further performance boost is possible with concatenated repetitions -
defstart_stop_arr_concat_repeat(initial_list):
a = np.asarray(initial_list)
mask = np.concatenate(([True], a[1:] != a[:-1], [True]))
idx = np.flatnonzero(mask)
l = np.diff(idx)
idx2 = np.concatenate((idx[:-1,None], (idx[1:,None]-1)),axis=1)
ss = np.repeat(idx2, l, axis=0)
return ss[:,0], ss[:,1]
Sample run -
In [38]: initial_list
Out[38]: array([0, 0, 0, 1, 1, 2, 3, 3, 3])
In [39]: start_stop_arr(initial_list)
Out[39]: (array([0, 0, 0, 3, 3, 5, 6, 6, 6]), array([2, 2, 2, 4, 4, 5, 8, 8, 8]))
Runtime test -
Other approach(es) -
# @Mohammed Elmahgiubi's solndefreversed_app(initial_list): # input expected is a list
reversed_initial_list = list(reversed(initial_list))
first = [initial_list.index(i) for i in initial_list]
last = list(reversed([(len(initial_list) -
(reversed_initial_list.index(i) + 1))
for i in reversed_initial_list]))
return first, last
defunique_app(a): # @B. M.'s soln
_,ind1,inv1,cou1 = np.unique(a, return_index=True, return_inverse=True,
return_counts=True)
return ind1[inv1],(ind1+cou1-1)[inv1]
Timings -
Case #1 : Smaller dataset
In [295]: initial_list = np.random.randint(0,1000,(10000))
...: initial_list.sort()
In [296]: input_list = initial_list.tolist()
In [297]: %timeit reversed_app(input_list)
1 loop, best of 3: 789 ms per loop
In [298]: %timeit unique_app(initial_list)
1000 loops, best of 3: 353 µs per loop
In [299]: %timeit start_stop_arr(initial_list)
10000 loops, best of 3: 96.3 µs per loop
Case #2 : Bigger dataset
In [438]: initial_list = np.random.randint(0,100000,(1000000))
...: initial_list.sort()
In [439]: %timeit unique_app(initial_list) # @B. M.'s soln10 loops, best of 3: 53 ms per loop
In [440]: %timeit start_stop_arr(initial_list)
100 loops, best of 3: 9.64 ms per loop
In [441]: %timeit start_stop_arr_concat_repeat(initial_list)
100 loops, best of 3: 6.76 ms per loop
Solution 2:
This is my approach:
initial_list = [0, 0, 0, 1, 1, 2, 3, 3, 3]
reversed_initial_list = list(reversed(initial_list))
first = [initial_list.index(i) for i in initial_list]
last = list(reversed([(len(initial_list) - (reversed_initial_list.index(i) + 1)) for i in reversed_initial_list]))
print("initial_list = {}\nfirst = {}\nlast = {}".format(initial_list, first, last))
results in :
initial_list = [0, 0, 0, 1, 1, 2, 3, 3, 3]
first = [0, 0, 0, 3, 3, 5, 6, 6, 6]
last = [2, 2, 2, 4, 4, 5, 8, 8, 8]
Solution 3:
first = [np.min(np.where(initial_list==i)) for i in initial_list]
last = [np.max(np.where(initial_list==i)) for i in initial_list]
reference: Is there a Numpy function to return the first index of something in an array?
Solution 4:
numpy.unique
computes all the useful values for that :
a=np.array([0, 0, 0, 1, 1, 2, 3, 3, 3])
_,ind1,inv1,cou1 = np.unique(a, return_index=True, return_inverse=True, return_counts=True)
print(ind1[inv1],(ind1+cou1-1)[inv1])
#[0 0 0 3 3 5 6 6 6] [2 2 2 4 4 5 8 8 8]
Post a Comment for "Numpy: In A Sorted List, Find The First And The Last Index For Each Unique Value"