0

I'm trying to analyse some GPS data in a .tcx file containing multiple laps. I want to do something very similar to this - basically extract each trackpoint into a dataframe for further analysis. But I need to keep the information defining each lap.

I'm a complete xml novice - my failed attempts are below, along with an extract of my data. Note that the position data is missing as my GPS failed when I was creating the test data. Just pretend that each trackpoint contains lat and long as well.

library(XML)
library(plyr)
doc <- xmlInternalTreeParse("test.tcx")
doc
  <Lap>
    <Track>
      <Trackpoint>
        <Time>2017-05-03T08:22:56.000Z</Time>
        <SensorState>Present</SensorState>
      </Trackpoint>
      <Trackpoint>
        <Time>2017-05-03T08:22:57.000Z</Time>
        <SensorState>Present</SensorState>
      </Trackpoint>
      <Trackpoint>
        <Time>2017-05-03T08:22:58.000Z</Time>
        <SensorState>Present</SensorState>
      </Trackpoint>
      <Trackpoint>
        <Time>2017-05-03T08:22:59.000Z</Time>
        <SensorState>Present</SensorState>
      </Trackpoint>
      <Trackpoint>
        <Time>2017-05-03T08:23:00.000Z</Time>
        <SensorState>Present</SensorState>
      </Trackpoint>
      <Trackpoint>
        <Time>2017-05-03T08:23:01.000Z</Time>
        <SensorState>Present</SensorState>
      </Trackpoint>
    </Track>
  </Lap>
  <Lap>
    <Track>
       <Trackpoint>
        <Time>2017-05-03T08:23:02.000Z</Time>
        <SensorState>Present</SensorState>
      </Trackpoint>
      <Trackpoint>
        <Time>2017-05-03T08:23:03.000Z</Time>
        <SensorState>Present</SensorState>
      </Trackpoint>
      <Trackpoint>
        <Time>2017-05-03T08:23:04.000Z</Time>
        <SensorState>Present</SensorState>
      </Trackpoint>
      <Trackpoint>
        <Time>2017-05-03T08:23:05.000Z</Time>
        <SensorState>Present</SensorState>
      </Trackpoint>
      <Trackpoint>
        <Time>2017-05-03T08:23:06.000Z</Time>
        <SensorState>Present</SensorState>
      </Trackpoint>
      <Trackpoint>
        <Time>2017-05-03T08:23:07.000Z</Time>
        <SensorState>Present</SensorState>
      </Trackpoint>
    </Track>
  </Lap>

> nodes <- getNodeSet(doc, "//ns:Trackpoint", "ns")
> ldply(nodes, as.data.frame(xmlToList))
                  value.Time value.SensorState
1   2017-05-03T08:22:56.000Z           Present
2   2017-05-03T08:22:57.000Z           Present
3   2017-05-03T08:22:58.000Z           Present
4   2017-05-03T08:22:59.000Z           Present
5   2017-05-03T08:23:00.000Z           Present
6   2017-05-03T08:23:01.000Z           Present
7   2017-05-03T08:23:02.000Z           Present
...

Following the steps in that answer gets me 90% of the way there, as you can see, but I lose the lap information. I've tried splitting the data by lap/track (I've yet to work out when there's a that isn't immediately preceded by , but that doesn't really matter), but then struggled to make it any further.

nodes2 <- getNodeSet(doc, "//ns:Track", "ns") successfully breaks the xml up by laps into something that looks similar to a list, but is of the class XMLNodeSet, and I then can't use ldply or getNodeSet on nodes2. I've played around with xmlApply and xmlToList but no luck.

I've also tried a bit of a botch using a loop, but have even had problems there. Seemingly getNodeSet(nodes2[[i]],...) performs the operation on all of the trackpoints contained in nodes2 rather than just those in nodes2[[i]].

test <- nodes2[[1]] 
#successfully pulls out just the 6 trackpoints in lap 1
ldply(getNodeSet(test,"//ns:Trackpoint", "ns"), as.data.frame(xmlToList))
#creates a dataframe containing all 18 trackpoints in `nodes`.

So I'm completely confused by that.

The other alternative is not to split the data up by laps but to have one big dataframe that with a factor variable for lap. The only way I can think of to do that would be such a botch that it makes gag a bit.

Any suggestions or nudges in the right direction greatly appreciated.

Thanks in advance,

James


Update: turns out I made a hash of simplifying my input data and deleted some information that was needed. Chris S's solution works for the extract of data that I included originally, but there are some higher levels to the XML, <TrainingCenterDatabase>, <Activities> and <Activity>. Like I said, I'm a complete beginner. Here is the very start of another XML doc in the same format as the last.

<TrainingCenterDatabase>
    <Activities>
        <Activity Sport="Other">
            <Id>2017-05-11T08:27:04.000Z</Id>
            <Lap StartTime="2017-05-11T08:27:05.000Z">
                <TotalTimeSeconds>106.0</TotalTimeSeconds>
                <DistanceMeters>157.1999969482422</DistanceMeters>
                <MaximumSpeed>1.6944444179534912</MaximumSpeed>
                <Calories>20</Calories>
                <Intensity>Active</Intensity>
                <TriggerMethod>Manual</TriggerMethod>
                <Track>
                    <Trackpoint>
                        <Time>2017-05-11T08:27:05.000Z</Time>
                        <Position>
                            <LatitudeDegrees>51.50305517</LatitudeDegrees>
                            <LongitudeDegrees>-0.09115383</LongitudeDegrees>
                        </Position>
                        <DistanceMeters>1.6944444179534912</DistanceMeters>
                        <SensorState>Present</SensorState>
                    </Trackpoint>
                    <Trackpoint>
                        <Time>2017-05-11T08:27:06.000Z</Time>
                        <Position>
                            <LatitudeDegrees>51.50305517</LatitudeDegrees>
                            <LongitudeDegrees>-0.09115383</LongitudeDegrees>
                        </Position>
                        <DistanceMeters>3.3888888359069824</DistanceMeters>
                        <SensorState>Present</SensorState>
                    </Trackpoint>

On the plus side, I've got the output to include an attribute under Lap for StartTime, which can be what goes into the final dataframe. I think all that needs adjusting is

xpathSApply(doc, "//Trackpoint/..", xmlSize)
Community
  • 1
  • 1
James
  • 67
  • 6
  • What is your desired output? *I lose the Lap information* ... what info as the Lap nodes carry no attribute or text by itself? – Parfait May 10 '17 at 14:47
  • @Parfait By lap information, I just mean "which lap was I on at this point? 1, 2, 3 etc?" – James May 10 '17 at 18:02

1 Answers1

1

This should get your trackpoint data...

x <- xmlToDataFrame(doc["//Trackpoint"])

If you need to add values or attributes from a parent node to that table, then get the size of the parent node (6 and 6) and repeat the attribute or value (since you have neither, I repeated numbers).

n <- xpathSApply(doc, "//Lap/Track", xmlSize) #OR
n <- xpathSApply(doc, "//Trackpoint/..", xmlSize)
# if Lap had an attribute 
x$Lap <- rep( xpathSApply(doc, "//Lap", xmlGetAttr, "number"), n)
x$Lap <- rep( 1:length(n), n)
x
                       Time SensorState Lap
1  2017-05-03T08:22:56.000Z     Present   1
2  2017-05-03T08:22:57.000Z     Present   1
3  2017-05-03T08:22:58.000Z     Present   1
4  2017-05-03T08:22:59.000Z     Present   1
5  2017-05-03T08:23:00.000Z     Present   1
6  2017-05-03T08:23:01.000Z     Present   1
7  2017-05-03T08:23:02.000Z     Present   2
8  2017-05-03T08:23:03.000Z     Present   2
...
Chris S.
  • 2,185
  • 1
  • 14
  • 14
  • Thanks, hadn't thought of pulling out the size of the parent node. Looks like it should sort the problem, expect an upvote once I've tried it out. – James May 10 '17 at 18:06
  • So your solution works on the extract of data I gave, but it turns out I'd made a foul up when taking the subset. I've added an update to the question, would hugely appreciate it if you could take another look and offer anymore advice. Thanks, James – James May 11 '17 at 13:26
  • I would post another question. `xmlToDataFrame` only works with simple XML structures, the workaround I mentioned works in very limited cases, but now you have a very complex file and there are a number of possible solutions. – Chris S. May 11 '17 at 19:47
  • Yep, I think a new question's the way to go. I managed to find a workaround by extracting the times at which each lap was triggered and working out which lap each trackpoint must have been on, but it feels a bit like cheating given the whole point was teach myself XML. – James May 12 '17 at 21:11