2

I have a big corpus which is segmented at the sentence level. That means each line contains one sentence. Some of these lines end with full stop (period) some don't. I am looking for an efficient way to add full stops to the end of the lines which don't end with one. For instance a shell script that benefits from sed or awk to do this task.

MAZDAK
  • 573
  • 1
  • 4
  • 16

1 Answers1

4

Sed is probably the simplest approach for this:

$ cat file
sentence one
sentence two.
sentence three

$ sed 's/[^.]$/&./' file
sentence one.
sentence two.
sentence three.

On lines that don't end with a period [^.]$ replace the last character with the matched last character followed by a period &.. You should watch out for lines with trailing spaces that might contain the period as the last viable character.

Edit:

With awk I would do:

$ awk '/[^.]$/{$(NF+1)="."}1' FS= OFS= file
sentence one.
sentence two.
sentence three.
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202