Skip to content Skip to sidebar Skip to footer

Python - Efficient Function With Scipy Sparse Matrices

for a project, I need an efficient function in python that solves to following task: Given a very large List X of long sparse Vectors (=> big sparse Matrix) and another Matrix Y

Solution 1:

I revised your function and ran it in

import numpy as np
from scipy import sparse

def get_distances(X, Y):
   ret=[]
   for row in X:            
       sample = row.A
       test = Y.getrow(0).A   
       dist = np.minimum(sample[0,:], test[0,:]).sum()
       ret.append(dist)    
   return ret

X = [[0,0,2],   
     [1,0,0],
     [3,1,0]]

Y = [[1,0,2]]

XM = sparse.csr_matrix(X)
YM = sparse.csr_matrix(Y)

print( get_distances(XM,YM))

print (np.minimum(XM.A, YM.A).sum(axis=1))

producing

1255:~/mypy$ python3 stack37056258.py 
[2, 1, 1]
[211]

np.minimum takes element wise minimum of two arrays (may be 2d), so I don't need to iterate on columns. I also don't need to use indexing.

minimum is also implemented for sparse matrices, but I get a segmenation error when I try to apply it to your X (with 3 rows) and Y (with 1). If they are the same size this works:

Ys = sparse.vstack((YM,YM,YM))
print(Ys.shape)
print (XM.minimum(Ys).sum(axis=1))

Converting the single row matrix to an array also gets around the error - because it ends up using the dense version, np.minimum(XM.todense(), YM.A).

print (XM.minimum(YM.A).sum(axis=1))

When I try other element by element operations on these 2 matrices I get ValueError: inconsistent shapes, e.g. XM+YM, or XM<YM. Looks like sparse does not implement broadcasting as numpy arrays does.

=======================

Comparison of ways of replicating a 1 row sparse matrix many times

In [271]: A=sparse.csr_matrix([0,1,0,0,1])

In [272]: timeit sparse.vstack([A]*3000).A
10 loops, best of 3: 32.3 ms per loop

In [273]: timeit sparse.kron(A,np.ones((3000,1),int)).A
1000 loops, best of 3: 1.27 ms per loop

For many times, kron is better than vstack.

=======================

There's an overlap in issues with Scipy sparse matrix alternative for getrow()

Solution 2:

Try below code for sparse matrix:

from scipy.sparse import csr_matrix, vstack
X = csr_matrix([[0,0,2],[1,0,0],[3,1,0]])
Y = csr_matrix([[1,0,2]])
def matrix_dist(x,y):
    y=vstack([y]*x.shape[1])
    return (((x+y)-(x-y).multiply((x-y).sign())).sum(1)/2).A.ravel()

Post a Comment for "Python - Efficient Function With Scipy Sparse Matrices"