Do Python Regexes Support Something Like Perl's \g?

May 30, 2024 Post a Comment

I have a Perl regular expression (shown here, though understanding the whole thing isn't hopefully necessary to answering this question) that contains the \G metacharacter. I'd lik

Solution 1:

Try these:

import re
re.sub()
re.findall()
re.finditer()

for example:

# Finds all words of length 3 or 4
s = "the quick brown fox jumped over the lazy dogs."print re.findall(r'\b\w{3,4}\b', s)

# prints ['the','fox','over','the','lazy','dogs']

Solution 2:

Python does not have the /g modifier for their regexen, and so do not have the \G regex token. A pity, really.

Solution 3:

You can use re.match to match anchored patterns. re.match will only match at the beginning (position 0) of the text, or where you specify.

def match_sequence(pattern,text,pos=0):
  pat = re.compile(pattern)
  match = pat.match(text,pos)
  whilematch:
    yieldmatchifmatch.end() == pos:
      break # infinite loop otherwise
    pos = match.end()
    match = pat.match(text,pos)

This will only match pattern from the given position, and any matches that follow 0 characters after.

>>>for match in match_sequence(r'[^\W\d]+|\d+',"he11o world!"):...print match.group()...
he
11
o

Solution 4:

I know I'm little late, but here's an alternative to the \G approach:

import re

defreplace(match):
    if match.group(0)[0] == '/': return match.group(0)
    else: return'<' + match.group(0) + '>'

source = '''http://a.com http://b.com
//http://etc.'''

pattern = re.compile(r'(?m)^//.*$|http://\S+')
result = re.sub(pattern, replace, source)
print(result)

output (via Ideone):

<http://a.com> <http://b.com>
//http://etc.

The idea is to use a regex that matches both kinds of string: a URL or a commented line. Then you use a callback (delegate, closure, embedded code, etc.) to find out which one you matched and return the appropriate replacement string.

As a matter of fact, this is my preferred approach even in flavors that do support \G. Even in Java, where I have to write a bunch of boilerplate code to implement the callback.

(I'm not a Python guy, so forgive me if the code is terribly un-pythonic.)

Solution 5:

Don't try to put everything into one expression as it become very hard to read, translate (as you see for yourself) and maintain.

import re
lines = [re.sub(r'http://[^\s]+', r'<\g<0>>', line) for line in text_block.splitlines() ifnot line.startedwith('//')]
print'\n'.join(lines)

Python is not usually best when you literally translate from Perl, it has it's own programming patterns.

Python Library