Fill Multiple Missing Values With Series Based On Index Values
consider the pd.DataFrame df df = pd.DataFrame([ [np.nan, 1, np.nan], [2, np.nan, np.nan], [np.nan, np.nan, 3 ], ], list('abc'), list('xyz
Solution 1:
pandas handles this on a column basis with no issues. Suppose we had a different s
s = pd.Series([10, 20, 30], ['x', 'y', 'z'])
then we could
df.fillna(s)
x y z
a10.01.030.0b2.020.030.0
c 10.020.03.0
But that's not what you want. Using your s
s = pd.Series([10, 20, 30], ['a', 'b', 'c'])
then df.fillna(s)
does nothing. But we know that it works for columns, so:
df.T.fillna(s).T
x y z
a10.01.010.0b2.020.020.0
c 30.030.03.0
Solution 2:
Another way:
def fillnull(col):
col[col.isnull()] = s[col.isnull()]
return col
df.apply(fillnull)
Note that it's less efficient than @Brian's way (9ms per loop versus 1.5ms per loop on my computer)
Solution 3:
Here's a NumPy approach -
mask = np.isnan(df.values)
df.values[mask] = s[s.index.searchsorted(df.index)].repeat(mask.sum(1))
Sample run -
In [143]: df
Out[143]:
x y z
a NaN1.0NaN
b 2.0NaNNaN
d 4.0NaN7.0cNaNNaN3.0
In [144]: s
Out[144]:
a 10
b 20c30
d 40
e 50
dtype: int64
In [145]: mask = np.isnan(df.values)
...: df.values[mask]= s[s.index.searchsorted(df.index)].repeat(mask.sum(1))
...:
In [146]: df
Out[146]:
x y z
a 10.01.010.0
b 2.020.020.0
d 4.040.07.0c30.030.03.0
Please note that if the index values of s
are not sorted, we need to use extra argument sorter
with searchsorted
.
Post a Comment for "Fill Multiple Missing Values With Series Based On Index Values"