Skip to content Skip to sidebar Skip to footer

Split Lines/sentence With Over 10 Words Where The First Comma Appears

I have the following code that splits the line every 10 words. #!/bin/bash while read line do counter=1; for word in $line do echo -n $word' '; if (($coun

Solution 1:

A better approach is to use awk and test for 15 or more words and if so, just substitute a ",\n" for a ", ", e.g.

awk 'NF >= 15 {sub (", ", ",\n")}1' file

Example Use/Output

With your input in file, you would have:

$ awk 'NF >= 15 {sub (", ", ",\n")}1' file
phrase from a test line,
which I want to split, and I don't know how.

(if you have a large number of lines, awk will be orders-of-magnitude faster than a shell loop)


Solution 2:

I am not sure if you want to split over 10 words or 15 words.

Simply replace the 10 with 15 in case you are dealing with 15 words.

awk -v OFS=, 'NF > 10{ sub(/, */, ",\n", $0); print }' input.txt

or more clearly:

#! /bin/bash

awk -v OFS=, 'NF > 10{

    # enter this block iff words > 10

    # replace first occurence of , and additional space,
    # if any, with newline
    sub(/, */, ",\n", $0)
    print

}' input.txt

Solution 3:

Here is a simple solution which check number of word in a string. if number of words in a string are more than 10 then, it will split:

output = []
s = 'phrase from a test line, which I want to split, and I dont know how'
while len (s.split()) > 10:
    first_sent,s = s.split(',',1)
    output.append(first_sent)
output.append(s)

Solution 4:

This is a simple version of the question for loop in bash simply prints n times the command instead of reiterating

The simple version can be handled with

# For each line with 10 words append a newline after the first comma
sed -r '/((\w)+ ){10}/s/,/,\n/' input.txt

Post a Comment for "Split Lines/sentence With Over 10 Words Where The First Comma Appears"