Splitting On Group Of Capital Letters In Python
I'm trying to tokenize a number of strings using a capital letter as a delimited. I have landed on the following code: token = ([a for a in re.split(r'([A-Z][a-z]*)', 'ABCowDog') i
Solution 1:
re.split
isn't always easy to use and seems sometimes limited in many situations. You can try a different approach with re.findall
:
>>> s = 'ABCowDog'>>> re.findall(r'[A-Z](?:[A-Z]*(?![a-z])|[a-z]*)', s)
['AB', 'Cow', 'Dog']
Solution 2:
You can use the following to split with regex module:
(?=[A-Z][a-z])
See DEMO
Code:
regex.split(r'(?=[A-Z][a-z])', "ABCowDog",flags=regex.VERSION1)
Solution 3:
([A-Z][a-z]+)
You should split by this.
Post a Comment for "Splitting On Group Of Capital Letters In Python"