Skip to content Skip to sidebar Skip to footer

How To Print Different Character Of Repeated Words In Python?

My data in a text file PDBs.txt looks like this: 150L_A 150L_B 150L_C 150L_D 16GS_A 16GS_B 17GS_A 17GS_B The end result needed is: 'First chain of 150L is A and second is B

Solution 1:

You could achieve this by first reading the file and extracting the PDB and chain labels to a dictionary mapping the PDB ID to a list of chain labels, here called results. Then, you can write the "chains.txt" file line by line by iterating through these results and constructing the output lines you indicated:

from collections import defaultdict                                             

results = defaultdict(list)                                                     
withopen("PDBs.txt") as fh:                                                    
    for line in fh:                                                             
        line = line.strip()                                                     
        if line:                                                                
            pdb, chain = line.split("_")                                        
            results[pdb].append(chain)                                          

# Note that you would need to extend this if more than 4 chains are possible                                                
prefix = {2: "second", 3: "third", 4: "fourth"}                                 

withopen("chains.txt", "w") as fh:                                             
    for pdb, chains in results.items():                                         
        fh.write(f"First chain of {pdb} is {chains[0]}")                        
        for ii, chain inenumerate(chains[1:], start=1):                        
            fh.write(f" and {prefix[ii + 1]} is {chain}")                       
        fh.write("\n")

Content of "chains.txt":

First chain of 150L is A and second is B and third is C and fourth is D         
First chain of 16GS is A and second is B                                        
First chain of 17GS is A and second is B                                        
First chain of 18GS is A and second is B                                        
First chain of 19GS is A and second is B

Solution 2:

You can reach that simply with split operations and a loop. First split your data by empty chars to get the separated chunks as a list. Then each chunk consists of a key and a value, separated by an underscore. You can iterate over all chunks and split each of them into the key and the value. Then simply create a python dictionary with an array of all values per key.

data = "150L_A 150L_B 150L_C 150L_D 16GS_A 16GS_B 17GS_A 17GS_B 18GS_A 18GS_B 19GS_A 19GS_B"

chunks = data.split()
result = {}

for chunk in chunks:
  (key, value) = chunk.split('_')
  if not key in result:
    result[key] = []
  result[key].append(value)

print(result)
# {'150L': ['A', 'B', 'C', 'D'], '16GS': ['A', 'B'], '17GS': ['A', 'B'], '18GS': ['A', 'B'], '19GS': ['A', 'B']}

Post a Comment for "How To Print Different Character Of Repeated Words In Python?"