Skip to content Skip to sidebar Skip to footer

Dataframe List Comprehension "zip(...)": Loop Through Chosen Df Columns Efficiently With Just A List Of Column Name Strings

This is just a nitpicking syntactic question... I have a dataframe, and I want to use list comprehension to evaluate a function using lots of columns. I know I can do this df['resu

Solution 1:

this should work, but honestly, OP figured it himself as well, so +1 OP :)

df['result_col'] = [some_func(*var) forvar in zip(*[df[col] for col in ['col_1', 'col_2',... ,'col_n']])]

Solution 2:

As mentioned in the comments above, you should use apply instead:

df['reult_col'] = df.apply(lambda x: some_func(*tuple(x.values)), axis=1)

Solution 3:

df.apply() is almost as slow as df.iterrows(), both are not recommended, see How to iterate over rows in a DataFrame in Pandas --> search for "An Obvious Example" of @cs95a and see the comparison graph. As the fastest ways (vectorization, Cython routines) are not easy to implement, the 3rd best and thus usually best solution is list comprehension:

# print 3rd coldefsome_func(row):
    print(row[2])


df['result_col'] = [some_func(*row) for row inzip(df[['col_1', 'col_2',... ,'col_n']].to_numpy())]

or

# print 3rd coldefsome_func(row):
    print(row[2])

df['result_col'] = [some_func(row[0]) for row inzip(df[['col_1', 'col_2',... ,'col_n']].to_numpy())]

or

# print 3rd coldefsome_func(x):
    print(x)

df['result_col'] = [some_func(row[0][2]) for row inzip(df[['col_1', 'col_2',... ,'col_n']].to_numpy())]

See also:

EDIT:

Please use df.iloc and df.loc instead of df[[...]], see Selecting multiple columns in a pandas dataframe

Post a Comment for "Dataframe List Comprehension "zip(...)": Loop Through Chosen Df Columns Efficiently With Just A List Of Column Name Strings"