0

we process a lot of srt files in linux to generate derivatives , but some of them have ctrl-M characters since they were generated in windows. right now I put two commands to check and take out the hidden characters

tr -d '\015' <${file}.srt >${file}.srt

awk '/^$/{ if (! blank++) print; next } { blank=0; print }'  ${file}.srt | tee ${file}.srt

but I still have srt files that slips through the command and still have ctrl-M character in it. Does anyone have a solution in this case to keep on empty line only between each subtle lines? so if pre-processed srt file looks like

1
00:00:05,569 --> 00:00:07,569
Welcome to this overview of ShareStream, 


2
00:00:07,820 --> 00:00:11,940
which is a new digital streaming service
from Information Technology Services


3
00:00:11,940 --> 00:00:13,740
at the University of Iowa.

after taking out the ctrl-M character or extra space line should be

1
00:00:05,569 --> 00:00:07,569
Welcome to this overview of ShareStream, 

2
00:00:07,820 --> 00:00:11,940
which is a new digital streaming service
from Information Technology Services

3
00:00:11,940 --> 00:00:13,740
at the University of Iowa.

any help is appreciated thanks!

Developer Guy
  • 2,318
  • 6
  • 19
  • 37
Calvin
  • 407
  • 1
  • 5
  • 21

1 Answers1

1

The UNIX command to remove those line-end control-Ms is

dos2unix

The UNIX command to squeeze multiple blank lines between records to one blank line is:

awk -v RS= -v ORS='\n\n' '1'
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • I tend to use `sed -i 's/\r$//' file`, but that probably requires GNU sed. – glenn jackman May 03 '18 at 20:18
  • @glennjackman it'd require OSX or GNU sed for the `-i`, but of course you don' t **need** that, and I'm not sure what POSIX says about `\r` but I'd suspect it has to be something like `'s/'$'\r''$//'` (but then you're bash-only which may not be any more portable) or use a literal control-M. You could use `awk '{sub(/\r$/,"")}1'` of course and that'd work anywhere, – Ed Morton May 03 '18 at 20:20