Skip to content Skip to sidebar Skip to footer

Python Unique Lines

Hi I have a text file in the following format: Sam John Peter Sam Peter John I want to extract the unique records using REGULAR EXPRESSION from the file such as: Sam John Peter

Solution 1:

Use set:

In [1]: name="""
   ...: Sam
   ...: John
   ...: Peter
   ...: Sam 
   ...: Peter
   ...: John"""

In [2]: print name

Sam
John
Peter
Sam 
Peter
John

In [3]: a=name.split()

In [4]: a
Out[4]: ['Sam', 'John', 'Peter', 'Sam', 'Peter', 'John']

In [5]: set(a)
Out[5]: {'John', 'Peter', 'Sam'}

Solution 2:

Don't listen to them!

Of course this can be done in Regex. Never mind that they have the correct, O(1) solution that's readable and concise, or that any Regex solution will be at least quadratic-time and about as readable as a drunkard's scrawling.

What matters is that it's Regex, and Regex must be good. Here you go:

re.findall(r"""(?ms)^([^\n]*)$(?!.*^\1$)""", target_string)
#>>> ['Sam', 'Peter', 'John']

Solution 3:

seems like you want to create a list by splitting the input by new line and then removing duplicates using set(). you can then convert that to a list using list(). looks something like below. The strip() is used to remove the newline characters.

names = list(set([x.strip() for x in open('names.txt').readlines()]))

Post a Comment for "Python Unique Lines"