0

I'm looking for a sed command to clean up some kml files I have. The files are all on a single line and look like this

<some text><kml><Document><name> Name </name><Placemark><name> Hotel 01 </name></Placemark><Placemark><name> Hotel 02 </name></Placemark><Placemark><name> Hotel 03 </name></Placemark></Document></kml>

Ideally I want the only the parts starting with (and including) the first <Placemark> element to the last (and including) </Placemark> element and these sections from all the kml files output to a single file.

I'd be happy with a command to either delete all text before the first <Placemark> and delete all text after last </Placemark>, or a command to extract the content after the first <Placemark> and before the last </Placemark>.

A command that I've managed to botch together so far is:

find . -name 'kmlFiles00*' -exec sed -r 's/^.{879}/ /' {} \; | sed -e 's/<\/Document><\/kml>//g' > placemarks_`date +%d-%m-%Y`.list

which has worked in getting rid of the first 879 characters and then removing all the instances of </Document></kml> before outputting it all into final file, but this is pretty messy so I'm looking for a cleaner command. I have also tried

sed -e 's/^.*<Placemark> //' -e 's/<\/Placemark>.*$//' 

Which I know is getting closer but still fails

Kevin
  • 53,822
  • 15
  • 101
  • 132

2 Answers2

2
awk NF=NF FPAT='<Placemark>.*</Placemark>'
  • define a field as being <Placemark>.*</Placemark>
  • force rebuild of the line, printing all fields
Zombo
  • 1
  • 62
  • 391
  • 407
0

This might work for you (GNU sed):

sed -r 's/<Placemark>/\n&/;s/.*\n(.*<\/Placemark>).*/\1/' file
potong
  • 55,640
  • 6
  • 51
  • 83