Skip to content Skip to sidebar Skip to footer

Fill Multiple Missing Values With Series Based On Index Values

consider the pd.DataFrame df df = pd.DataFrame([ [np.nan, 1, np.nan], [2, np.nan, np.nan], [np.nan, np.nan, 3 ], ], list('abc'), list('xyz

Solution 1:

pandas handles this on a column basis with no issues. Suppose we had a different s

s = pd.Series([10, 20, 30], ['x', 'y', 'z'])

then we could

df.fillna(s)

      x     y     z
a10.01.030.0b2.020.030.0
c  10.020.03.0

But that's not what you want. Using your s

s = pd.Series([10, 20, 30], ['a', 'b', 'c'])

then df.fillna(s) does nothing. But we know that it works for columns, so:

df.T.fillna(s).T

      x     y     z
a10.01.010.0b2.020.020.0
c  30.030.03.0

Solution 2:

Another way:

def fillnull(col):
    col[col.isnull()] = s[col.isnull()]
    return col

df.apply(fillnull)

Note that it's less efficient than @Brian's way (9ms per loop versus 1.5ms per loop on my computer)

Solution 3:

Here's a NumPy approach -

mask = np.isnan(df.values)
df.values[mask] = s[s.index.searchsorted(df.index)].repeat(mask.sum(1))

Sample run -

In [143]: df
Out[143]: 
     x    y    z
a  NaN1.0NaN
b  2.0NaNNaN
d  4.0NaN7.0cNaNNaN3.0

In [144]: s
Out[144]: 
a    10
b    20c30
d    40
e    50
dtype: int64

In [145]: mask = np.isnan(df.values)
     ...: df.values[mask]= s[s.index.searchsorted(df.index)].repeat(mask.sum(1))
     ...: 

In [146]: df
Out[146]: 
      x     y     z
a  10.01.010.0
b   2.020.020.0
d   4.040.07.0c30.030.03.0

Please note that if the index values of s are not sorted, we need to use extra argument sorter with searchsorted.

Post a Comment for "Fill Multiple Missing Values With Series Based On Index Values"