Regular Expression: Matching And Grouping A Variable Number Of Space Separated Words
Solution 1:
re.match
returns result at the start of the string. Use re.search
instead.
.*?
returns the shortest match between two words/expressions (. means anything, * means 0 or more occurrences and ? means shortest match).
import re
my_str = "foo hello world baz 33"
my_pattern = r'foo\s(.*?)\sbaz'
p = re.search(my_pattern,my_str,re.I)
result = p.group(1).split()
print result
['hello', 'world']
EDIT:
In case foo or baz are missing, and you need to return the entire string, use an if-else
:
if p is not None:
result = p.group(1).split()
else:
result = my_str
Why the ?
in the pattern:
Suppose there are multiple occurrences of the word baz
:
my_str = "foo hello world baz 33 there is another baz"
using pattern = 'foo\s(.*)\sbaz'
will match(longest and greedy) :
'hello world baz 33 there is another'
whereas , using pattern = 'foo\s(.*?)\sbaz'
will return the shortest match:
'hello world'
Solution 2:
[This is not a solution, but I try to explain why is not possible]
What you're after is something like this:
foo\s(\w+\s)+baz\s(\d+)
The cool part would be (\w+\s)+
that would repeat the capturing group.
The problem is that most regex flavors, are storing only the last match in that capturing group; old captures are overwritten.
I recommend to loop over the string with a simpler regex.
Hope it helps
Solution 3:
use index
to find the foo
and baz
. then split
the sub string
def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end].split()
except ValueError:
return ""
s = "foo hello world baz 33"
start = "foo"
end = "baz"
print find_between(s,start,end)
Post a Comment for "Regular Expression: Matching And Grouping A Variable Number Of Space Separated Words"