I'm trying to analyse some GPS data in a .tcx file containing multiple laps. I want to do something very similar to this - basically extract each trackpoint into a dataframe for further analysis. But I need to keep the information defining each lap.
I'm a complete xml novice - my failed attempts are below, along with an extract of my data. Note that the position data is missing as my GPS failed when I was creating the test data. Just pretend that each trackpoint contains lat and long as well.
library(XML)
library(plyr)
doc <- xmlInternalTreeParse("test.tcx")
doc
<Lap>
<Track>
<Trackpoint>
<Time>2017-05-03T08:22:56.000Z</Time>
<SensorState>Present</SensorState>
</Trackpoint>
<Trackpoint>
<Time>2017-05-03T08:22:57.000Z</Time>
<SensorState>Present</SensorState>
</Trackpoint>
<Trackpoint>
<Time>2017-05-03T08:22:58.000Z</Time>
<SensorState>Present</SensorState>
</Trackpoint>
<Trackpoint>
<Time>2017-05-03T08:22:59.000Z</Time>
<SensorState>Present</SensorState>
</Trackpoint>
<Trackpoint>
<Time>2017-05-03T08:23:00.000Z</Time>
<SensorState>Present</SensorState>
</Trackpoint>
<Trackpoint>
<Time>2017-05-03T08:23:01.000Z</Time>
<SensorState>Present</SensorState>
</Trackpoint>
</Track>
</Lap>
<Lap>
<Track>
<Trackpoint>
<Time>2017-05-03T08:23:02.000Z</Time>
<SensorState>Present</SensorState>
</Trackpoint>
<Trackpoint>
<Time>2017-05-03T08:23:03.000Z</Time>
<SensorState>Present</SensorState>
</Trackpoint>
<Trackpoint>
<Time>2017-05-03T08:23:04.000Z</Time>
<SensorState>Present</SensorState>
</Trackpoint>
<Trackpoint>
<Time>2017-05-03T08:23:05.000Z</Time>
<SensorState>Present</SensorState>
</Trackpoint>
<Trackpoint>
<Time>2017-05-03T08:23:06.000Z</Time>
<SensorState>Present</SensorState>
</Trackpoint>
<Trackpoint>
<Time>2017-05-03T08:23:07.000Z</Time>
<SensorState>Present</SensorState>
</Trackpoint>
</Track>
</Lap>
> nodes <- getNodeSet(doc, "//ns:Trackpoint", "ns")
> ldply(nodes, as.data.frame(xmlToList))
value.Time value.SensorState
1 2017-05-03T08:22:56.000Z Present
2 2017-05-03T08:22:57.000Z Present
3 2017-05-03T08:22:58.000Z Present
4 2017-05-03T08:22:59.000Z Present
5 2017-05-03T08:23:00.000Z Present
6 2017-05-03T08:23:01.000Z Present
7 2017-05-03T08:23:02.000Z Present
...
Following the steps in that answer gets me 90% of the way there, as you can see, but I lose the lap information. I've tried splitting the data by lap/track (I've yet to work out when there's a that isn't immediately preceded by , but that doesn't really matter), but then struggled to make it any further.
nodes2 <- getNodeSet(doc, "//ns:Track", "ns")
successfully breaks the xml up by laps into something that looks similar to a list, but is of the class XMLNodeSet, and I then can't use ldply
or getNodeSet
on nodes2
. I've played around with xmlApply
and xmlToList
but no luck.
I've also tried a bit of a botch using a loop, but have even had problems there. Seemingly getNodeSet(nodes2[[i]],...)
performs the operation on all of the trackpoints contained in nodes2
rather than just those in nodes2[[i]]
.
test <- nodes2[[1]]
#successfully pulls out just the 6 trackpoints in lap 1
ldply(getNodeSet(test,"//ns:Trackpoint", "ns"), as.data.frame(xmlToList))
#creates a dataframe containing all 18 trackpoints in `nodes`.
So I'm completely confused by that.
The other alternative is not to split the data up by laps but to have one big dataframe that with a factor variable for lap. The only way I can think of to do that would be such a botch that it makes gag a bit.
Any suggestions or nudges in the right direction greatly appreciated.
Thanks in advance,
James
Update: turns out I made a hash of simplifying my input data and deleted some information that was needed. Chris S's solution works for the extract of data that I included originally, but there are some higher levels to the XML, <TrainingCenterDatabase>
, <Activities>
and <Activity>
. Like I said, I'm a complete beginner. Here is the very start of another XML doc in the same format as the last.
<TrainingCenterDatabase>
<Activities>
<Activity Sport="Other">
<Id>2017-05-11T08:27:04.000Z</Id>
<Lap StartTime="2017-05-11T08:27:05.000Z">
<TotalTimeSeconds>106.0</TotalTimeSeconds>
<DistanceMeters>157.1999969482422</DistanceMeters>
<MaximumSpeed>1.6944444179534912</MaximumSpeed>
<Calories>20</Calories>
<Intensity>Active</Intensity>
<TriggerMethod>Manual</TriggerMethod>
<Track>
<Trackpoint>
<Time>2017-05-11T08:27:05.000Z</Time>
<Position>
<LatitudeDegrees>51.50305517</LatitudeDegrees>
<LongitudeDegrees>-0.09115383</LongitudeDegrees>
</Position>
<DistanceMeters>1.6944444179534912</DistanceMeters>
<SensorState>Present</SensorState>
</Trackpoint>
<Trackpoint>
<Time>2017-05-11T08:27:06.000Z</Time>
<Position>
<LatitudeDegrees>51.50305517</LatitudeDegrees>
<LongitudeDegrees>-0.09115383</LongitudeDegrees>
</Position>
<DistanceMeters>3.3888888359069824</DistanceMeters>
<SensorState>Present</SensorState>
</Trackpoint>
On the plus side, I've got the output to include an attribute under Lap for StartTime, which can be what goes into the final dataframe. I think all that needs adjusting is
xpathSApply(doc, "//Trackpoint/..", xmlSize)