There is this 8GB CSV file on my disk. It has one "match" per row.
A "match" consists of some data like id
, date
, and winner
. But it also has 10 players in it with all their data. Those are stored in participants.0.stats.visionScore
, participants.1.stats.visionScore
, ..., participants.0.stats.assists
, ..., participants.9.stats.assists
, ... i think you get the pattern. It's just participants.{number}.stats.{variable_name}
. Every participant has literally hundreds of stats; that's why I have about 4,000 columns total.
I read the data in like this:
> d <- fread("Matches.csv") > head(d) participants.1.stats.totalDamageDealt 1: 118504 2: 20934 3: 76639 4: 123932 5: 160561 6: 237046 participants.8.stats.totalDamageTaken participants.9.stats.totalPlayerScore 1: 18218 0 2: 12378 0 3: 46182 0 4: 19340 0 5: 30808 0 6: 36194 0 ... [there are thousands of lines I omit here] ...
Of course, I now want a representation of the data, where one row corresponds to one participant. I imagine a result like this:
> [magic] > head(d) participant stats.totalDamageDealt 1: 1 118504 2: 2 190143 3: 3 46700 4: 4 60787 5: 5 78108 6: 6 124761 stats.totalDamageTaken stats.totalPlayerScore 1: 18218 0 2: 15794 0 3: 34578 0 4: 78771 0 5: 16749 0 6: 11540 0 ...
But all the methods that are out there, like meld
, cast
, and reshape
will need me to name all the columns by hand. Even with patterns
for meld
I end up having to name all my hundreds of columns per participant. Is there no way out there to make this thing long shape in R?