Skip to content Skip to sidebar Skip to footer

Splitting On Group Of Capital Letters In Python

I'm trying to tokenize a number of strings using a capital letter as a delimited. I have landed on the following code: token = ([a for a in re.split(r'([A-Z][a-z]*)', 'ABCowDog') i

Solution 1:

re.split isn't always easy to use and seems sometimes limited in many situations. You can try a different approach with re.findall:

>>> s = 'ABCowDog'>>> re.findall(r'[A-Z](?:[A-Z]*(?![a-z])|[a-z]*)', s)
['AB', 'Cow', 'Dog']

Solution 2:

You can use the following to split with regex module:

(?=[A-Z][a-z])

See DEMO

Code:

regex.split(r'(?=[A-Z][a-z])', "ABCowDog",flags=regex.VERSION1)

Solution 3:

([A-Z][a-z]+)

You should split by this.

Post a Comment for "Splitting On Group Of Capital Letters In Python"