Split Lines/sentence With Over 10 Words Where The First Comma Appears
I have the following code that splits the line every 10 words. #!/bin/bash while read line do counter=1; for word in $line do echo -n $word' '; if (($coun
Solution 1:
A better approach is to use awk
and test for 15 or more words and if so, just substitute a ",\n"
for a ", "
, e.g.
awk 'NF >= 15 {sub (", ", ",\n")}1' file
Example Use/Output
With your input in file
, you would have:
$ awk 'NF >= 15 {sub (", ", ",\n")}1' file
phrase from a test line,
which I want to split, and I don't know how.
(if you have a large number of lines, awk
will be orders-of-magnitude faster than a shell loop)
Solution 2:
I am not sure if you want to split over 10
words or 15
words.
Simply replace the 10
with 15
in case you are dealing with 15
words.
awk -v OFS=, 'NF > 10{ sub(/, */, ",\n", $0); print }' input.txt
or more clearly:
#! /bin/bash
awk -v OFS=, 'NF > 10{
# enter this block iff words > 10
# replace first occurence of , and additional space,
# if any, with newline
sub(/, */, ",\n", $0)
print
}' input.txt
Solution 3:
Here is a simple solution which check number of word in a string. if number of words in a string are more than 10 then, it will split:
output = []
s = 'phrase from a test line, which I want to split, and I dont know how'
while len (s.split()) > 10:
first_sent,s = s.split(',',1)
output.append(first_sent)
output.append(s)
Solution 4:
This is a simple version of the question for loop in bash simply prints n times the command instead of reiterating
The simple version can be handled with
# For each line with 10 words append a newline after the first comma
sed -r '/((\w)+ ){10}/s/,/,\n/' input.txt
Post a Comment for "Split Lines/sentence With Over 10 Words Where The First Comma Appears"