Skip to content Skip to sidebar Skip to footer

Transform Pandas Dataframe, Add Row Values As Column Headers

I have a pandas dataframe like this: COMMIT_ID | FILE_NAME | COMMITTER | CHANGE TYPE ------------------------------------------------------------- 1 | package.json | A

Solution 1:

I think you need set_index + unstack:

df = df.set_index(['COMMIT_ID','COMMITTER','FILE_NAME'])['CHANGE TYPE']
       .unstack()
      .reset_index()
print (df)
FILE_NAME  COMMIT_ID COMMITTER class.java main.js package.json
01         A       NoneNone       MODIFY
12         B     DELETE     ADD         None

Solutions with pivot_table - need aggregate function like sum (concatenate strings without separator) or '_'.join (concatenate strings with separator) if duplicates:

print (df)
   COMMIT_ID     FILE_NAME COMMITTER CHANGE TYPE
01  package.json         A      MODIFY
12       main.js         B         ADD
22class.java         B      DELETE
32class.java         B         ADD


df = df.pivot_table(index=['COMMIT_ID','COMMITTER'], 
                    columns='FILE_NAME', 
                    values='CHANGE TYPE', 
                    aggfunc='sum').reset_index()
print (df)
FILE_NAME  COMMIT_ID COMMITTER class.java main.js package.json
01         A       NoneNone       MODIFY
12         B  DELETEADD     ADD         None

Or:

df = df.pivot_table(index=['COMMIT_ID','COMMITTER'], 
                    columns='FILE_NAME', 
                    values='CHANGE TYPE', 
                    aggfunc='_'.join).reset_index()
print (df)
FILE_NAME  COMMIT_ID COMMITTER  class.java main.js package.json
01         A        NoneNone       MODIFY
12         B  DELETE_ADD     ADDNone

Aggregate with first works also, but you can lost duplicates values:

df = df.pivot_table(index=['COMMIT_ID','COMMITTER'], 
                    columns='FILE_NAME', 
                    values='CHANGE TYPE', 
                    aggfunc='first').reset_index()
print (df)
FILE_NAME  COMMIT_ID COMMITTER class.java main.js package.json
01         A       NoneNone       MODIFY
12         B     DELETEADDNone

Last for rename columns names add rename_axis:

df = df.rename_axis(None, axis=1)
print (df)
   COMMIT_ID COMMITTER class.java main.js package.json
01         A       NoneNone       MODIFY
12         B  DELETEADD     ADD         None

Post a Comment for "Transform Pandas Dataframe, Add Row Values As Column Headers"