Pandas Data Frame Behavior
Solution 1:
No, apply does not work inplace*.
Here's another for you: the inplace flag doesn't actually mean whatever function is actually happening inplace (!). To give an example:
In [11]: s = pd.Series([1, 2, np.nan, 4])
In [12]: s._data._values
Out[12]: array([ 1., 2., nan, 4.])
In [13]: vals = s._data._values
In [14]: s.fillna(s.mean(), inplace=True)
In [15]: vals is s._data._values # valuesare the same
Out[15]: TrueIn [16]: vals
Out[16]: array([ 1. , 2. , 2.33333333, 4. ])
In [21]: s = pd.Series([1, 2, np.nan, 4]) # start again
In [22]: vals = s._data._values
In [23]: s.fillna('mean', inplace=True)
In [24]: vals is s._data._values # valuesare*not* the same
Out[24]: FalseIn [25]: s._data._values
Out[25]: array([1.0, 2.0, 'mean', 4.0], dtype=object)
Note: often if the type is the same then so is the values array but pandas does not guarantee this.
In general apply is slow (since you are basically iterating through each row in python), and the "game" is to rewrite that function in terms of pandas/numpy native functions and indexing. If you want to delve into more details about the internals, check out the BlockManager in core/internals.py, this is the object which holds the underlying numpy arrays. But to be honest I think your most useful tool is %timeit
and looking at the source code for specific functions (??
in ipython).
In this specific example I would consider using fillna in an explicit for loop of the columns you want:
In [31]: df = pd.DataFrame([[1, 2, np.nan], [4, np.nan, 6]], columns=['A', 'B', 'C'])
In [32]: for col in ["A", "B"]:
....: df[col].fillna(df[col].mean(), inplace=True)
....:
In [33]: df
Out[33]:
A B C
012 NaN
1426
(Perhaps it makes sense for fillna to have columns argument for this usecase?)
All of this isn't to say pandas is memory inefficient... but efficient (and memory efficient) code sometimes has to be thought about.
*apply is not usually going to make sense inplace (and IMO this behaviour would rarely be desired).
Post a Comment for "Pandas Data Frame Behavior"