Python Regex, Find And Replace Second Tab Character

March 31, 2024 Post a Comment

I am trying to find and replace the second tab character in a string using regex. booby = 'Joe Bloggs\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\r\n' This works fine: re.sub(

Solution 1:

You may be overthinking it a little.

>>>text = 'Joe Bloggs\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\r\n'>>>re.sub(r'(\t[^\t]*)\t', r'\1###', text, count=1)
'Joe Bloggs\tNULL###NULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\r\n'

Simply match the first instance of a tab followed by any number of non-tabs followed by a tab, and replace it with everything but the final tab and whatever you want to replace it with.

Solution 2:

>>> re.sub(r'^((?:(?!\t).)*\t(?:(?!\t).)*)\t',r'\1###', booby)
'Joe Bloggs\tNULL###NULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\r\n'

You are almost there, add \1 before ###

I provide another way to solve it because of the comments:

>>> booby.replace("\t", "###",2).replace("###", "\t",1)
'Joe Bloggs\tNULL###NULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\r\n'

Solution 3:

With regex

This is the shortest regex I could find :

import re
booby = 'Joe Bloggs\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\tNULL\r\n'print re.sub(r'(\t.*?)\t', r'\1###', booby, 1)

It uses non-greedy . to make sure it doesn't glob too many tabs. It outputs :

Joe Bloggs  NULL###NULLNULLNULLNULLNULLNULLNULL

With split and join

The regex might get ugly if you need it for other indices. You could use split and join for the general case :

n = 2
sep = '\t'
cells = booby.split(sep)
print sep.join(cells[:n]) + "###" + sep.join(cells[n:])

It outputs :

Joe Bloggs  NULL###NULLNULLNULLNULLNULLNULLNULL

Python Library