Find And Update Duplicates In A List Of Lists
Solution 1:
from collections import defaultdict
lists = [['apple', 'window', 'pear', 2, 1.55, 'banana'],
['apple', 'orange', 'kiwi', 3, 1.80, 'banana'],
['apple', 'envelope', 'star_fruit', 2, 1.55, 'banana'],
['apple', 'orange', 'pear', 2, 0.80, 'coffee_cup'],
['apple', 'orange', 'pear', 2, 3.80, 'coffee_cup']]
dic = defaultdict(int)
fts = []
for lst in lists:
first_third = lst[0], lst[2]
dic[first_third] += 1if dic[first_third] == 2: fts.append(first_third)
lst.append(dic[first_third])
for lst in lists:
if (lst[0], lst[2]) not in fts:
lst[-1] -= 1print(lists)
Edit: Thanks utdemir. first_third = lst[0], lst[2]
is correct, not first_third = lst[0] + lst[2]
Edit2: Changed variable names for clarity.
Edit3: Changed to reflect what the original poster really wanted, and his updated list. Not pretty any more, desired changes just tacked on.
Solution 2:
Your best bet is to sort the list first using itemgetter()
to select the fields to be matched as key
. This will cause all matching key fields to appear together so they can easily be compared and tagged. For example, the sort for matching the first and third fields would be:
lst.sort(key=itemgetter(0, 2))
The comparison of each item with its predecessor is straight forward.
Okay, here is the complete solution (uses itemgetter and groupby):
fromoperator import itemgetter
from itertools import groupby
def tagdups(input_seq, tag, key_indexes):
keygetter = itemgetter(*key_indexes)
sorted_list = sorted(input_seq, key=keygetter)
for key, groupingroupby(sorted_list, keygetter):
group_list = list(group)
iflen(group_list) <= 1:
continuefor item in group_list:
item.append(tag)
return sorted_list
And here is a sample test run to show usage:
>>> samp = [[1,2,3,4,5], [1,3,5,7,7],[1,4,3,5,8],[4,3,2,7,5],[1,6,3,7,4]]
>>> tagdups(samp, 'dup', (0,2))
[[1, 2, 3, 4, 5, 'dup'], [1, 4, 3, 5, 8, 'dup'], [1, 6, 3, 7, 4, 'dup'], [1, 3, 5, 7, 7], [4, 3, 2, 7, 5]]
Solution 3:
Here is my solution(commented code):
import itertools
l = [
['apple', 'window', 'pear', 2, 1.55, 'banana'],
['apple', 'orange', 'kiwi', 3, 1.80, 'banana'],
['apple', 'envelope', 'star_fruit', 2, 1.55, 'banana'],
['apple', 'orange', 'pear', 2, 0.80, 'coffee_cup'],
['apple', 'orange', 'pear', 2, 3.80, 'coffee_cup']
]
#Here you can select the important fields
key = lambda i: (i[0],i[2])
l.sort(key=key)
grp = itertools.groupby(l, key=key)
#Look at itertools documentation
grouped = (list(j) for i,j in grp)
for i in grouped:
iflen(i) == 1:
i[0].append(0)
else: #You want duplicates to start from 1for pos, item inenumerate(i, 1):
item.append(pos)
#Just a little loop for flattening the list
result = []
for i in grouped:
for j in i:
result.append(j)
print(result)
Output:
[['apple', 'orange', 'kiwi', 3, 1.8, 'banana', 0],
['apple', 'window', 'pear', 2, 1.55, 'banana', 1],
['apple', 'orange', 'pear', 2, 0.8, 'coffee_cup', 2],
['apple', 'orange', 'pear', 2, 3.8, 'coffee_cup', 3],
['apple', 'envelope', 'star_fruit', 2, 1.55, 'banana', 0]]
Post a Comment for "Find And Update Duplicates In A List Of Lists"