How To Use Multiple Separators In A Pandas Series And Split Into Multiple Rows

January 18, 2024 Post a Comment

I have a dataframe like this. df = pd.DataFrame({ 'Name' : ['ABC LLC Ram corp', 'IJK Inc'], 'id' : [101, 102] }) Name id 0 ABC LLC Ram corp 101 1

Solution 1:

You can use str.findall to find all the occurrence of matching regex pattern in column Name, then assign these matching occurrences to the column Name and explode the dataframe on Name:

pat = fr"(?i)(.*?(?:{'|'.join(separators)}))"
df.assign(Name=df['Name'].str.findall(pat)).explode('Name')

Regex details:

(?i) : Case insensitive flag
( : Start of capturing group
.*? : Matches any character except line terminators between zero and unlimited times, as few times as possible (lazy match).
(?: : start of a non capturing group
{'|'.join(separators)}: f-string expression which evaluates to inc|corp|llc
) : End of non-capturing group
) : End of capturing group

        Name   id
0    ABC LLC  101
0   Ram corp  101
1    IJK Inc  102

Solution 2:

A bit verbose approach , by replacing the spaces after the words with comma and then split:

Baca Juga

d = dict(zip([f'{i} 'for i in separators],[f'{i},'for i in separators]))
#{'inc ': 'inc,', 'corp ': 'corp,', 'llc ': 'llc,'}

out = (df.assign(Name=df['Name'].str.lower()
       .replace(d,regex=True).str.title().str.split(",")).explode("Name"))

print(out)

       Name   id
0   Abc Llc  101
0  Ram Corp  101
1   Ijk Inc  102

Python Library

How To Use Multiple Separators In A Pandas Series And Split Into Multiple Rows

Solution 1:

Solution 2:

Post a Comment for "How To Use Multiple Separators In A Pandas Series And Split Into Multiple Rows"