I have a big corpus which is segmented at the sentence level. That means each line contains one sentence. Some of these lines end with full stop (period) some don't. I am looking for an efficient way to add full stops to the end of the lines which don't end with one. For instance a shell script that benefits from sed or awk to do this task.
linux shell - adding full stop (period) to end of lines which do not end with full stop, in a corpus
Asked
Active
Viewed 3,199 times
1 Answers
4
Sed is probably the simplest approach for this:
$ cat file
sentence one
sentence two.
sentence three
$ sed 's/[^.]$/&./' file
sentence one.
sentence two.
sentence three.
On lines that don't end with a period [^.]$
replace the last character with the matched last character followed by a period &.
. You should watch out for lines with trailing spaces that might contain the period as the last viable character.
Edit:
With awk
I would do:
$ awk '/[^.]$/{$(NF+1)="."}1' FS= OFS= file
sentence one.
sentence two.
sentence three.

Chris Seymour
- 83,387
- 30
- 160
- 202
-
Can I challenge you to do it with awk? :D – fedorqui Apr 09 '13 at 14:47
-
1@fedorqui not really a challenge haha :P – Chris Seymour Apr 09 '13 at 14:51
-
Alternatively a sed-ish awk: `awk '{sub(/[^.]$/, "&.", $0); print}' file` – mike3996 Apr 09 '13 at 14:53
-
You are a master, I would +1 again : D – fedorqui Apr 09 '13 at 14:58
-
@progo cutting the fat `awk '{sub(/[^.]$/,"&.")}1' file` – Chris Seymour Apr 09 '13 at 14:59