Skip to content Skip to sidebar Skip to footer

Preallocating Ndarrays

How can I preallocate arrays of arrays so I can do appending a bit more efficiently. In Matlab there is a function called cell(required_length) which preallocates 'cells' which can

Solution 1:

This isn't just a quesiton of preallocating an array, such as np.empty((100,), dtype=int). It's as much a question about how to collect a large number of lists into one structure, whether it be a list or numpy array. The comparison with MATLAB cells is enough, in my opinion, to warrant further discussion.


I think you should be using Python lists. They can contain lists or other objects (incuding arrays) of varying sizes. You can easily append more items (or use extend to add multiple objects). Python has had them for ever; MATLAB added cells to approximate that flexibility.

np.arrays with dtype=object are similar - arrays of pointers to objects such as lists. For the most part they are just lists with an array wrapper. You can initial an array to some large size, and insert/set items.

A = np.empty((10,),dtype=object)

produces an array with 10 elements, each None.

 A[0] = [1,2,3]
 A[1] = [2,3]
 ...

You can also concatenate elements to an existing array, but the result is new one. There is a np.append function, but it is just a cover for concatenate; it should not be confused with the list append.

If it must be an array, you can easily construct it from the list at the end. That's what your np.array([[1,2],[1],[2,3,4]]) does.

How to add to numpy array entries of different size in a for loop (similar to Matlab's cell arrays)?


On the issue of speed, let's try simple time tests

defwitharray(n):
    result=np.empty((n,),dtype=object)
    for i inrange(n):
        result[i]=list(range(i))
    return result

defwithlist(n):
    result=[]                         
    for i inrange(n):
        result.append(list(range(i)))
    return result

which produce

In [111]: withlist(4)
Out[111]: [[], [0], [0, 1], [0, 1, 2]]

In [112]: witharray(4)
Out[112]: array([[], [0], [0, 1], [0, 1, 2]], dtype=object)

In [113]: np.array(withlist(4))
Out[113]: array([[], [0], [0, 1], [0, 1, 2]], dtype=object)

timetests

In [108]: timeit withlist(400)
1000 loops, best of 3: 1.87 ms per loop

In [109]: timeit witharray(400)
100 loops, best of 3: 2.13 ms per loop

In [110]: timeit np.array(withlist(400))
100 loops, best of 3: 8.95 ms per loop

Simply constructing a list of lists is fastest. But if the result must be an object type array, then assigning values to an empty array is faster.

Post a Comment for "Preallocating Ndarrays"