0

I'm trying to read a tcx file (from a bike trainer) using the XML package. This question Import TCX into R using XML package got me started. In that 2011 example, the data looked like this:

<?xml version="1.0" encoding="UTF-8"?>
<TrainingCenterDatabase xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2">
<Activities>
    <Activity Sport="Running">
        <Id>2011-10-30T16:05:48Z</Id>
        <Lap StartTime="2011-10-30T16:05:48Z">
            <TotalTimeSeconds>3855.99</TotalTimeSeconds>
            <DistanceMeters>12498.8115</DistanceMeters>
            <MaximumSpeed>4.45662498</MaximumSpeed>
            <Calories>1011</Calories>
            <Intensity>Active</Intensity>
            <TriggerMethod>Manual</TriggerMethod>
            <Track>
                <Trackpoint>
                    <Time>2011-10-30T16:05:48Z</Time>
                    <Position>
                        <LatitudeDegrees>52.33613318</LatitudeDegrees>
                        <LongitudeDegrees>-1.58814317</LongitudeDegrees>
                    </Position>
                    <AltitudeMeters>77.5234375</AltitudeMeters>
                    <DistanceMeters>0.00000000</DistanceMeters>
                </Trackpoint>
                <Trackpoint>
                    <Time>2011-10-30T16:05:49Z</Time>
                    <Position>
                        <LatitudeDegrees>52.33614810</LatitudeDegrees>
                        <LongitudeDegrees>-1.58814283</LongitudeDegrees>
                    </Position>
                    <AltitudeMeters>77.5234375</AltitudeMeters>
                    <DistanceMeters>1.77584004</DistanceMeters>
                </Trackpoint>
...

The code to read this is simple:

library(XML)
doc = xmlParse("testfile.tcx")
basic = xmlToDataFrame(nodes <- getNodeSet(doc, "//ns:Trackpoint", "ns"))

I have a more complex tcx file:

<?xml version="1.0" encoding="utf-8"?>
<TrainingCenterDatabase xsi:schemaLocation="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 http://www.garmin.com/xmlschemas/TrainingCenterDatabasev2.xsd" xmlns:ns5="http://www.garmin.com/xmlschemas/ActivityGoals/v1" xmlns:ns3="http://www.garmin.com/xmlschemas/ActivityExtension/v2" xmlns:ns2="http://www.garmin.com/xmlschemas/UserProfile/v2" xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns4="http://www.garmin.com/xmlschemas/ProfileExtension/v1" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <Activities xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2">
    <Activity Sport="Biking">
      <Id>2017-01-21T17:33:40.86Z</Id>
      <Lap StartTime="2017-01-21T17:33:40Z">
        <TotalTimeSeconds>720</TotalTimeSeconds>
        <DistanceMeters>3565.9106917557524</DistanceMeters>
        <MaximumSpeed>5.39404950335725</MaximumSpeed>
        <Calories>68</Calories>
        <AverageHeartRateBpm>
          <Value>123</Value>
        </AverageHeartRateBpm>
        <MaximumHeartRateBpm>
          <Value>128</Value>
        </MaximumHeartRateBpm>
        <Intensity>Active</Intensity>
        <Cadence>75</Cadence>
        <TriggerMethod>Time</TriggerMethod>
        <Track>
          <Trackpoint>
            <Time>2017-01-21T17:33:40Z</Time>
            <DistanceMeters>4.68</DistanceMeters>
            <Cadence>79</Cadence>
            <Extensions>
              <ns3:TPX>
                <ns3:Watts>87</ns3:Watts>
                <ns3:Speed>4.68072232948508</ns3:Speed>
              </ns3:TPX>
            </Extensions>
          </Trackpoint>
          <Trackpoint>
            <Time>2017-01-21T17:33:41Z</Time>
            <DistanceMeters>9.41</DistanceMeters>
            <Cadence>81</Cadence>
            <Extensions>
              <ns3:TPX>
                <ns3:Watts>88</ns3:Watts>
                <ns3:Speed>4.726922499738</ns3:Speed>
              </ns3:TPX>
            </Extensions>
          </Trackpoint>
        </Track>
        <Extensions>
          <ns3:LX>
            <AvgSpeed xmlns:ns3="http://www.garmin.com/xmlschemas/ActivityExtension/v2" xmlns="">4.45096542560575</AvgSpeed>
          </ns3:LX>
          <ns3:LX>
            <ns3:MaxBikeCadence>111</ns3:MaxBikeCadence>
          </ns3:LX>
          <ns3:LX>
            <ns3:MaxWatts>161</ns3:MaxWatts>
          </ns3:LX>
          <ns3:LX>
            <ns3:AvgWatts>81</ns3:AvgWatts>
          </ns3:LX>
        </Extensions>
      </Lap>
    </Activity>
  </Activities>
</TrainingCenterDatabase>

(Many lines omitted from the middle)

I am interested in each Trackpoint. That is, I don't care about the Extensions that come after the last Trackpoint, but I do want to record the Extensions (Watts and Speed) that are recorded within each Trackpoint.

Using the code suggested above generates the following:

> head(basic,2)
                  Time DistanceMeters Cadence         Extensions HeartRateBpm
1 2017-01-21T17:33:40Z           4.68      79 874.68072232948508         <NA>
2 2017-01-21T17:33:41Z           9.41      81   884.726922499738         <NA>

Meaning that the extensions are extracted, but are concatenated and their names are lost.

IOW, I want a data frame that looks like this:

> head(basic,2)
                  Time DistanceMeters Cadence         Watts    Speed HeartRateBpm
1 2017-01-21T17:33:40Z           4.68      79       87     4.68072232948508         <NA>
2 2017-01-21T17:33:41Z           9.41      81       88     4.726922499738         <NA>

I've tried various muckings about with the XPath syntax and the namespace, but the closest I have got is

> basic2 = xmlToDataFrame(nodes <- getNodeSet(doc, "//ns:Trackpoint//ns:Extensions", "ns"))
> head(basic2,2)
                 TPX
1 874.68072232948508
2   884.726922499738

Which of course is no improvement.

Can anyone help?

Community
  • 1
  • 1
Ken
  • 41
  • 1
  • 8
  • The result appear sto be a data frame, so wouldn't something like `head(basic,2)[,1:3]` give you just columns you need? – Andrew Lavers Jan 23 '17 at 01:58
  • No, the point is that the data aren't extracted from the XML correctly-- so that the data frame is not constructed right. I think your code would give me time, distance and cadence, but would still not get me the watts and speed. – Ken Jan 23 '17 at 02:10
  • 1
    XML naturally corresponds more to a list than a data.frame, so it may be more useful to turn it into a list first so you can either rearrange it or pull out the parts you need to construct your data.frame. – alistaire Jan 23 '17 at 02:24
  • @Ken, Check/edit - "but I don want to record the Extensions (Watts and Speed)" – Andrew Lavers Jan 23 '17 at 02:48
  • Thanks! Just needed to delete that n! – Ken Jan 23 '17 at 04:00
  • And, added a cheesy demo of what I'm hoping to get out of this. – Ken Jan 23 '17 at 04:04
  • @alistaire, is there a way to do that? I had the idea that this package was the "right" way to read XML in R. – Ken Jan 23 '17 at 04:07
  • `XML` and `xml2` are both popular, so there is no "right" way. Both can turn XML into a list, via `XML::xmlToList` and `xml2::as_list` respectively. Some kung-fu is likely required regardless of what path you take; turning it into a list is just a trailhead. – alistaire Jan 23 '17 at 04:19
  • Look at http://stackoverflow.com/questions/40787171/transforming-data-from-xml-into-r-dataframe/40790195#40790195 and http://stackoverflow.com/questions/41221840/xml-transformation-in-r-final-bit/41224044#41224044 – G. Grothendieck Jan 23 '17 at 14:15
  • Thanks, @alistaire – Ken Jan 23 '17 at 15:21
  • Thanks @G.Grothendieck. – Ken Jan 23 '17 at 15:22

2 Answers2

3

Since TPX has child nodes, xmlToDataFrame will mash them together. Try converting xmlToList to a data.frame

nodes <- getNodeSet(doc, "//ns:Trackpoint", "ns")
rows <-  lapply(nodes, function(x) data.frame(xmlToList(x) )) 
do.call("rbind", rows)
                  Time DistanceMeters Cadence Extensions.TPX.Watts Extensions.TPX.Speed
1 2017-01-21T17:33:40Z           4.68      79                   87     4.68072232948508
2 2017-01-21T17:33:41Z           9.41      81                   88       4.726922499738
Chris S.
  • 2,185
  • 1
  • 14
  • 14
  • That's awesome, @Chris! Thanks. I found a less elegant solution late last night using the suggestions linked above that I'll post when I get home. – Ken Jan 25 '17 at 22:07
0

Here's an ugly way to do it using xmltoDataFrame twice, based on Transforming data from xml into R dataframe, pointed out by (and contributed by) https://stackoverflow.com/users/516548/g-grothendieck

doc = xmlParse("file.tcx")

# get the stuff in the Trackpoint main node
basic = xmlToDataFrame(nodes <- getNodeSet(doc, "//ns:Trackpoint", "ns"))

# get the watts and speed
extra = do.call("rbind", xpathApply(doc, "//ns3:TPX", function(N) {
   data.frame(
         Watts = xmlValue(N[["Watts"]]),
         Speed = xmlValue(N[["Speed"]])
         )
    }))

# paste them together
ride = cbind(basic[,-4], extra)

> ride
                  Time DistanceMeters Cadence Watts            Speed
1 2017-01-21T17:33:40Z           4.68      79    87 4.68072232948508
2 2017-01-21T17:33:41Z           9.41      81    88   4.726922499738

Thanks again, all.

Community
  • 1
  • 1
Ken
  • 41
  • 1
  • 8