Skip to content Skip to sidebar Skip to footer

Identifying Implicit String Literal Concatenation

According to guido (and to some other Python programmers), implicit string literal concatenation is considered harmful. Thus, I am trying to identifying logical lines containing su

Solution 1:

Interesting question, I just had to play with it and because there is no answer I'm posting my solution to the problem:

#!/usr/bin/pythonimport tokenize
import token
import sys

withopen(sys.argv[1], 'rU') as f:
    toks = list(tokenize.generate_tokens(f.readline))
    for i in xrange(len(toks) - 1):
        tok = toks[i]
        # print tok
        tok2 = toks[i + 1]
        if tok[0] == token.STRING and tok[0] == tok2[0]:
            print"implicit concatenation in line " \
                "{} between {} and {}".format(tok[2][0], tok[1], tok2[1])

You can feed the program with itself and the result should be

implicit concatenation in line 14 between "implicit concatenation in line "and"{} between {} and {}"

Solution 2:

I've decided to use the advice from user2357112, and extend it a bit to derive the following solution, which I describe here as an extension to the pep8 module:

defpython_illegal_concetenation(logical_line):
    """
    A language design mistake from the early days of Python.
    https://mail.python.org/pipermail/python-ideas/2013-May/020527.html

    Okay: val = "a" + "b"
    W610: val = "a" "b"
    """
    w = "W610 implicit string literal concatenation considered harmful"
    sio = StringIO.StringIO(logical_line)
    tgen = tokenize.generate_tokens(sio.readline)
    state = Nonefor token_type, _, (_, pos), _, _ in tgen:
      if token_type == tokenize.STRING:
        if state == tokenize.STRING:
          yield pos, w
        else:
          state = tokenize.STRING
      else:
        state = None

Solution 3:

One idea to deal with this better, is to put a space (or two) AFTER the close quote when you have a list:

aList = [
   'one'  ,
   'two'  ,
   'three''four'  ,
]

Now it's more obvious that 'three' is missing its trailing comma

PROPOSAL: I suggest python have a pragma that indicates string literal concatenation is forbidden in a region:

@nostringliteralconcat
a = "this""and""that"   # Would cause a compiler failure
@stringliteralconcat
a = "this""and""that"   # Successfully Compiles

Allowing the concatenation would be the default (to maintain compatibility)

There is also this thread:

https://groups.google.com/forum/#!topic/python-ideas/jP1YtlyJqxs%5B1-25%5D

Post a Comment for "Identifying Implicit String Literal Concatenation"