Extract Sub-string Between 2 Special Characters From One Column Of Pandas Dataframe

December 15, 2023 Post a Comment

I have a Python Pandas DataFrame like this: Name Jim, Mr. Jones Sara, Miss. Baker Leila, Mrs. Jacob Ramu, Master. Kuttan I would like to extract only name title from Name colum

Solution 1:

In [157]: df['Title'] = df.Name.str.extract(r',\s*([^\.]*)\s*\.', expand=False)

In [158]: df
Out[158]:
                   Name   Title
0        Jim, Mr. Jones      Mr
1     Sara, Miss. Baker    Miss
2     Leila, Mrs. Jacob     Mrs
3  Ramu, Master. Kuttan  Master

In [163]: df['Title'] = df.Name.str.split(r'\s*,\s*|\s*\.\s*').str[1]

In [164]: df
Out[164]:
                   Name   Title
0        Jim, Mr. Jones      Mr
1     Sara, Miss. Baker    Miss
2     Leila, Mrs. Jacob     Mrs
3  Ramu, Master. Kuttan  Master

Solution 2:

Have a look at str.extract.

The regexp you are looking for is (?<=, )\w+(?=.). In words: take the substring that is preceded by , (but do not include), consists of at least one word character, and ends with a . (but do not include). In future, use an online regexp tester such as regex101; regexps become rather trivial that way.

Baca Juga

This is assuming each entry in the Name column is formatted the same way.

Python Library

Extract Sub-string Between 2 Special Characters From One Column Of Pandas Dataframe

Solution 1:

Solution 2:

Post a Comment for "Extract Sub-string Between 2 Special Characters From One Column Of Pandas Dataframe"