Skip to content Skip to sidebar Skip to footer

Replacing Python List Elements With Key

I have a list of non-unique strings: list = ['a', 'b', 'c', 'a', 'a', 'd', 'b'] I would like to replace each element with an integer key which uniquely identifies each string: lis

Solution 1:

This will guarantee uniqueness and that the id's are contiguous starting from 0:

id_s = {c: i for i, c in enumerate(set(list))}
li = [id_s[c] for c in list]

On a different note, you should not use 'list' as variable name because it will shadow the built-in type list.

Solution 2:

Here's a single pass solution with defaultdict:

from collections import defaultdict
seen = defaultdict()
seen.default_factory = lambda: len(seen)  # you could instead bind to seen.__len__

In [11]: [seen[c] for c inlist]
Out[11]: [0, 1, 2, 0, 0, 3, 1]

It's kind of a trick but worth mentioning!


An alternative, suggested by @user2357112 in a related question/answer, is to increment with itertools.count. This allows you to do this just in the constructor:

from itertools import count
seen = defaultdict(count().__next__)  # .next in python 2

This may be preferable as the default_factory method won't look up seen in global scope.

Solution 3:

>>>lst = ["a", "b", "c", "a", "a", "d", "b"]>>>nums = [ord(x) for x in lst]>>>print(nums)
[97, 98, 99, 97, 97, 100, 98]

Solution 4:

If you are not picky, then use the hash function: it returns an integer. For strings that are the same, it returns the same hash:

li = ["a", "b", "c", "a", "a", "d", "b"]
li = map(hash, li)                # Turn list of strings into list of intsli = [hash(item) for item in li]  # Same as above

Solution 5:

A functional approach:

l = ["a", "b", "c", "a", "a", "d", "b", "abc", "def", "abc"]
from itertools import count
from operator import itemgetter

mapped = itemgetter(*l)(dict(zip(l, count())))

You could also use a simple generator function:

from itertools import count

defuniq_ident(l):
    cn,d  = count(), {}
    for ele in l:
        if ele notin d:
            c = next(cn)
            d[ele] = c
            yield c
        else:
            yield d[ele]


In [35]: l = ["a", "b", "c", "a", "a", "d", "b"]

In [36]: list(uniq_ident(l))
Out[36]: [0, 1, 2, 0, 0, 3, 1]

Post a Comment for "Replacing Python List Elements With Key"