Use Information Of Two Arrays To Create A Third One

February 08, 2023 Post a Comment

I have two numpy-arrays and want to create a third one with the information in these twos. Here is a simple example: have = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]) use = np.array([[

Solution 1:

If there are only small such data structures and performance is not an issue then you can do this so simple:

np.array([ [a[0]]*b[0]+list(a[b[0]:]) for a,b in zip(have,use)])

Solution 2:

Simply iterate through the have and replace the values based on the use.

Use:

for i in range(use.shape[0]):
    have[i, :use[i, 0]] = np.repeat(have[i, 0], use[i, 0])

Using only numpy operations:

First create a boolean mask of same size as have. mask(i, j) is True if j < use[i, j] otherwise it's False. So mask is True for indices which are to be replaced by first column value. Now use np.where to replace.

n, m = have.shape
mask = np.repeat(np.arange(m)[None, :], n, axis = 0) < use
have = np.where(mask, have[:, 0:1], have)

Output:

>>> have
array([[1, 1, 3, 4],
       [5, 5, 5, 8]])

Solution 3:

If performance matters, you can use np.apply_along_axis().

import numpy as np

have = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
use = np.array([[2], [3]])


def rep1st(arr):
    rep = arr[0]
    res = np.repeat(arr[1], rep)
    res = np.concatenate([res, arr[rep+1:]])
    return res


solution = np.apply_along_axis(rep1st, 1, np.concatenate([use, have], axis=1))

update:

As @hpaulj said, actually the method using apply_along_axis above is not as efficient as I expected. I misunderstood it. Reference: numpy np.apply_along_axis function speed up?.

However, I made some test on current methods:

import numpy as np
from timeit import timeit


def rep1st(arr):
    rep = arr[0]
    res = np.repeat(arr[1], rep)
    res = np.concatenate([res, arr[rep + 1:]])
    return res


def test(row, col, run):
    have = np.random.randint(0, 100, size=(row, col))
    use = np.random.randint(0, col, size=(row, 1))
    d = locals()
    d.update(globals())
    # method by me
    t1 = timeit("np.apply_along_axis(rep1st, 1, np.concatenate([use, have], axis=1))", number=run, globals=d)
    # method by @quantummind
    t2 = timeit("np.array([[a[0]] * b[0] + list(a[b[0]:]) for a, b in zip(have, use)])", number=run, globals=d)
    # method by @Amit Vikram Singh
    t3 = timeit(
        "np.where(np.repeat(np.arange(have.shape[1])[None, :], have.shape[0], axis=0) < use, have[:, 0:1], have)",
        number=run, globals=d
    )
    print(f"{t1:8.6f}, {t2:8.6f}, {t3:8.6f}")


test(1000, 10, 10)
test(100, 100, 10)
test(10, 1000, 10)

test(1000000, 10, 1)
test(100000, 100, 1)
test(10000, 1000, 1)
test(1000, 10000, 1)
test(100, 100000, 1)
test(10, 1000000, 1)

results:

0.062488, 0.028484, 0.000408
0.010787, 0.013811, 0.000270
0.001057, 0.009146, 0.000216

6.146863, 3.210017, 0.044232
0.585289, 1.186013, 0.034110
0.091086, 0.961570, 0.026294
0.039448, 0.917052, 0.022553
0.028719, 0.919377, 0.022751
0.035121, 1.027036, 0.025216

It shows that the second method proposed by @Amit Vikram Singh always works well even when the arrays are huge.

Python Dummy

Use Information Of Two Arrays To Create A Third One

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Use Information Of Two Arrays To Create A Third One"