This code is taking all the .stm file from a directory and after executing this code saving in a file.
cat db/some_path/stm/*.stm | sort -k1,1 -k2,2 -k4,4n | \
sed -e 's:<F0_M>:<o,f0,male>:' \
-e 's:<F0_F>:<o,f0,female>:' \
-e 's:([0-9])::g' \
-e 's:<sil>::g' \
-e 's:([^ ]*)$::' | \
awk '{ $2 = "A"; print $0; }'
} | local/join_suffix.py > data/dev.orig/stm
Sample output
AimeeMullins_2009P A inter_segment_gap 0 17.82 <o,,unknown> ignore_time_segment_in_scoring
AimeeMullins_2009P A AimeeMullins 17.82 28.81 <o,f0,female> i'd like to share with you a discovery that i made a few months ago while writing an article for italian wired i always keep my thesaurus handy whenever i'm writing anything but
AimeeMullins_2009P A AimeeMullins 28.81 40.266 <o,f0,female> i'd already finished editing the piece and i realized that i had never once in my life looked up the word disabled to see what i'd find let me read you the entry
AimeeMullins_2009P A inter_segment_gap 40.266 41.418 <o,,unknown> ignore_time_segment_in_scoring
I don't understand using sed -e how to format it.
what I understand
awk '{ $2 = "A"; print $0; }'
this line meaning for each row take the 2nd word and check if it's equal to A then print 1st word, but what are those -e 's:<sil>::g'
meaning?