Extract Sub-string Between 2 Special Characters From One Column Of Pandas Dataframe
I have a Python Pandas DataFrame like this: Name Jim, Mr. Jones Sara, Miss. Baker Leila, Mrs. Jacob Ramu, Master. Kuttan I would like to extract only name title from Name colum
Solution 1:
In [157]: df['Title'] = df.Name.str.extract(r',\s*([^\.]*)\s*\.', expand=False)
In [158]: df
Out[158]:
Name Title
0 Jim, Mr. Jones Mr
1 Sara, Miss. Baker Miss
2 Leila, Mrs. Jacob Mrs
3 Ramu, Master. Kuttan Master
or
In [163]: df['Title'] = df.Name.str.split(r'\s*,\s*|\s*\.\s*').str[1]
In [164]: df
Out[164]:
Name Title
0 Jim, Mr. Jones Mr
1 Sara, Miss. Baker Miss
2 Leila, Mrs. Jacob Mrs
3 Ramu, Master. Kuttan Master
Solution 2:
Have a look at str.extract.
The regexp you are looking for is (?<=, )\w+(?=.)
. In words: take the substring that is preceded by ,
(but do not include), consists of at least one word character, and ends with a .
(but do not include). In future, use an online regexp tester such as regex101; regexps become rather trivial that way.
This is assuming each entry in the Name
column is formatted the same way.
Post a Comment for "Extract Sub-string Between 2 Special Characters From One Column Of Pandas Dataframe"