Skip to content Skip to sidebar Skip to footer

Multiply Slice Of Scipy Sparse Matrix Without Changing Sparsity

In scipy, when I multiply a slice of a sparse matrix with an array containing only zeros, the result is a matrix that is less or equally sparse than before, even though it should b

Solution 1:

In working with sparse matrices, changing the sparsity pattern is generally a very expensive operation, and so scipy does not do this silently.

If you want to remove explicitly stored zeros from a sparse matrix, you should use the eliminate_zeros() method; for example:

>>> M = csr(np.random.random((1000,1000))>0.9, dtype=float)
>>> M
<1000x1000 sparse matrix of type'<class 'numpy.float64'>'with99740 stored elements in Compressed Sparse Row format>

>>> M[:, 0] *= 0>>> M
<1000x1000 sparse matrix of type'<class 'numpy.float64'>'with99740 stored elements in Compressed Sparse Row format>

>>> M.eliminate_zeros()
>>> M
<1000x1000 sparse matrix of type'<class 'numpy.float64'>'with99657 stored elements in Compressed Sparse Row format>

Scipy could call the eliminate_zeros routine automatically after doing this kind of operation, but the developers chose to give the user more flexibility and control when doing something as expensive as changing the sparsity structure.

Solution 2:

To recreate your code (using int type for a more compact display):

In [16]: M = sparse.csr_matrix(np.random.random((8,8))>.7).astype(int)
In [17]: M
Out[17]: 
<8x8 sparse matrix of type '<class 'numpy.int32'>'
    with 17 stored elements in Compressed Sparse Row format>
In [18]: M.A
Out[18]: 
array([[0, 0, 1, 0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 1],
       [0, 0, 1, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 1, 0],
       [1, 0, 1, 1, 1, 1, 0, 1]])
In [19]: M.tolil().data       # show nonzero values by row
Out[19]: 
array([list([1, 1]), list([1, 1]), list([1]), list([1, 1]), list([]),
       list([1, 1]), list([1, 1]), list([1, 1, 1, 1, 1, 1])], dtype=object)

Setting a row (or column) explicitly. Note the efficiency warning:

In [20]: M[0,:] = 0
/usr/local/lib/python3.5/dist-packages/scipy/sparse/compressed.py:774: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  SparseEfficiencyWarning)
In [21]: M.tolil().data
Out[21]: 
array([list([0, 0, 0, 0, 0, 0, 0, 0]), list([1, 1]), list([1]),
       list([1, 1]), list([]), list([1, 1]), list([1, 1]),
       list([1, 1, 1, 1, 1, 1])], dtype=object)

So yes it has set all values in the row to the specified value. And it doesn't attempt to distinguish between setting 0s as opposed to 1s. You can see the code used in M.__setitem__ and M._set_many (this is where the efficiency warning is generated).

As @jakevpd shows, you need to explicitly tell it to eliminate excess 0s. It does not attempt to do that during the assignment.

In [22]: M.eliminate_zeros()
In [23]: M.tolil().data
Out[23]: 
array([list([]), list([1, 1]), list([1]), list([1, 1]), list([]),
       list([1, 1]), list([1, 1]), list([1, 1, 1, 1, 1, 1])], dtype=object)

In general setting values of a matrix explicitly is discouraged, especially with csr. It's not even allowed with coo. lil is the recommended format if you need to do that.

In [24]: Ml = M.tolil()
In [25]: Ml[1,:] = 0
In [26]: Ml.data
Out[26]: 
array([list([]), list([]), list([1]), list([1, 1]), list([]), list([1, 1]),
       list([1, 1]), list([1, 1, 1, 1, 1, 1])], dtype=object)

lil does take care to eliminate 0s.

Multiplication of a row by an array of 0s does not change the sparsity. Nor does it act in-place. It produces a new matrix:

In [29]: M[1,:].multiply(np.zeros((1,8)))
Out[29]: 
<1x8 sparse matrix of type'<class 'numpy.float64'>'
    with 2 stored elements in COOrdinate format>
In [30]: _.A
Out[30]: array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])
In [31]: M[1,:].A
Out[31]: array([[1, 0, 0, 0, 0, 0, 1, 0]], dtype=int32)

Multiplication with a sparse matrix does eliminate 0s (again, not in-place):

In [32]: M[1,:].multiply(sparse.csr_matrix(np.zeros((1,8))))
Out[32]: 
<1x8 sparse matrix of type '<class 'numpy.float64'>'
    with 0 stored elements in Compressed Sparse Row format>

(Notice also the different in format between Out[29] and Out[32].)

As a general rule, multiplication, both element-wise and matrix, is the most efficient operation with csr matrices, especially if the other is also sparse. In fact row/column sums are performed with matrix multiplication, as is advanced indexing.

Post a Comment for "Multiply Slice Of Scipy Sparse Matrix Without Changing Sparsity"