0

I'd like to take my current file and remove the excess xml data, leaving just a few values.

<project name="C0016">
    <marker value="Test 1" completed="0"/>
    <marker value="Test 2" completed="0"/>
    <marker value="Test 3" completed="0"/>
    <marker value="Test 4" completed="0"/>
<project name="C0017">
    <marker value="Test 5" completed="0"/>
    <marker value="Test 6" completed="0"/>
    <marker value="Test 7" completed="0"/>
    <marker value="Test 8" completed="0"/>

This is the data I need to clean, to output the 'project name' attribute and the 'marker value' attribute, with commas in between (or new lines- I'm hoping to input this as a csv into excel)

Output I'd like:

Project Name: C0016
Test 1
Test 2
Test 3
Test 4
Project Name: C0017
Test 5
Test 6
Test 7
Test 8

or

Project Name: C0016,Test 1,Test 2,Test 3,Test 4,Project Name: C0017,Test 5,Test 6,Test 7,Test 8
Chris CIS
  • 1
  • 2
  • So what is your current approach with xmlstartlet? – tkruse Jul 29 '19 at 00:20
  • 1
    your input does not look well-formed, the `` tag is not closed. Is that intentional? – tkruse Jul 29 '19 at 00:21
  • Look at [ask] and provide a [mcve]. What have you tried, where does your solution fail, ...? SO is not a free coding service, more a help to get you done with what you have started. Do your research first. That being said, a combination of `while`, `grep` and `cut` would work. – Nic3500 Jul 29 '19 at 00:24
  • Sorry I've been trying to figure this out for days and have tried dozens of different approaches. I ultimately am trying to get the above output from some final cut pro xml files, and have had a ton of problems getting xmlstarlet to do what I need it to do. This approach I was able to use grep to filter down to the lines that I actually need, but don't know much about shell scripts and wasn't sure where to start from here. I'll try to update my question but I already posted one with more detail and no responses so I wrote this one to try to simplify what I'm hoping to do :/ – Chris CIS Jul 29 '19 at 00:46
  • If you're willing to accept answers that use XMLStarlet or other tools that backend into libxslt / libxml2, you need to provide well-formed input so those tools will actually work (and thus, so folks providing answers can test them). If your input is genuinely in the format given, it isn't XML at all. – Charles Duffy Jul 29 '19 at 02:54

1 Answers1

1

Since your input isn't actually well formed XML: with GNU sed (may work with others, may need tweaked, idk):

$ sed -E 's/^<([[:alpha:]]+ )([[:alpha:]]+)="([^"]+).*/\u\1\u\2: \3/; s/.*value="([^"]+).*/\1/' file
Project Name: C0016
Test 1
Test 2
Test 3
Test 4
Project Name: C0017
Test 5
Test 6
Test 7
Test 8
Ed Morton
  • 188,023
  • 17
  • 78
  • 185