How To Use Multiple Separators In A Pandas Series And Split Into Multiple Rows
I have a dataframe like this. df = pd.DataFrame({ 'Name' : ['ABC LLC Ram corp', 'IJK Inc'], 'id' : [101, 102] }) Name id 0 ABC LLC Ram corp 101 1
Solution 1:
You can use str.findall
to find all the occurrence of matching regex pattern in column Name
, then assign these matching occurrences to the column Name
and explode
the dataframe on Name
:
pat = fr"(?i)(.*?(?:{'|'.join(separators)}))"
df.assign(Name=df['Name'].str.findall(pat)).explode('Name')
Regex details:
(?i)
: Case insensitive flag(
: Start of capturing group.*?
: Matches any character except line terminators between zero and unlimited times, as few times as possible (lazy match
).(?:
: start of a non capturing group{'|'.join(separators)}
:f-string
expression which evaluates toinc|corp|llc
)
: End of non-capturing group)
: End of capturing group
Name id
0 ABC LLC 101
0 Ram corp 101
1 IJK Inc 102
Solution 2:
A bit verbose approach , by replacing the spaces after the words with comma and then split:
d = dict(zip([f'{i} 'for i in separators],[f'{i},'for i in separators]))
#{'inc ': 'inc,', 'corp ': 'corp,', 'llc ': 'llc,'}
out = (df.assign(Name=df['Name'].str.lower()
.replace(d,regex=True).str.title().str.split(",")).explode("Name"))
print(out)
Name id
0 Abc Llc 101
0 Ram Corp 101
1 Ijk Inc 102
Post a Comment for "How To Use Multiple Separators In A Pandas Series And Split Into Multiple Rows"