0

I have a dataframe of return values of cryptocurrencies, each sorted into a group of 5 based on price (dataframe has 1,847,012 entries)

2014-01-02  XRP         0.064041998     1
2014-01-02  Dogecoin    -0.220306670    1
2014-01-02  FLO         0.043457042     1
2014-01-02  CasinoCoin  -0.080614279    2
2014-01-02  Nxt         0.048375563     2
2014-01-02  Quark       0.004495892     2
2014-01-02  Feathercoin 0.218078384     3
2014-01-02  Diamond     0.052490183     3
2014-01-02  Unobtanium  -0.107420249    4
2014-01-02  Peercoin    0.263435789     4
2014-01-02  Primecoin   0.230874782     4
2014-01-02  Bitcoin     0.039387728     5
2014-01-02  Litecoin    0.045263780     5
2014-01-02  Namecoin    0.318292245     5
2014-01-02  Ethereum    NA              NA
  ...        ...           ...         ...
2014-01-03  XRP         -0.03900908     1
2014-01-03  Dogecoin    -0.15273525     1
2014-01-03  FLO         0.02397348      1
2014-01-03  CasinoCoin  0.05748349      2
  ...        ...           ...         ...

The date going up to 2019-12-31 (Some values are not available if the coin didnt exist at that time, like Ethereum in 2014).

What i want to do is create a table that takes the mean of each group, for each day, so

        2014-01-02         2014-01-03          2014-01-04     ...
1       mean(group 1)      mean(group 1)
2       mean(group 2)      mean(group 2)
3       mean(group 3)      mean(group 3)
4       mean(group 4)      mean(group 4)
5       mean(group 5)      mean(group 5)

I tried to look up and found Aggregate / summarize multiple variables per group (e.g. sum, mean) What i tried was:

means = as.data.frame(aggregate(d$value, list(d$Group), mean, na.rm = TRUE)) 

But this only does it for each group, over all dates, and gives me 5 values overall, where i need 5 values each day

Group.1    x
1          -4.920999e-03
2          -3.372798e-03
3          -1.548296e-03
4          -5.959693e-05
5          6.303165e-04

I also tried using data.table

df3 <- setDT(d)[, lapply(.SD, mean), by=.(date, Group), .SDcols=c("date","Group")]

But this is not able to read the dates

Lasse.Jen
  • 5
  • 4

1 Answers1

0

About the data.table approach, I think your SDcols are wrong.

library(data.table)
setDT(d)
d[,date := as.Date(date)]
d[, lapply(.SD, mean, na.rm = TRUE), by= c("date","Group"), .SDcols=c("value")]

If I understood correctly, you want to group by date and group and apply mean on value column (change the name if needed)

This gives you a long formatted data. If you want a wide formatted data, you can use dcast. Something in this flavor:

dcast(data_long, Group ~ date)

Example

library(data.table)
d = data.table(date = as.Date(c("2014-01-02", "2014-01-03")),
               Group = c('XRP','bitcoin'),
                value = c(0.064041998, 1))

data_long <- d[, lapply(.SD, mean, na.rm = TRUE), by= c("date","Group"), .SDcols=c("value")]
data_long
         date   Group    value
1: 2014-01-02     XRP 0.064042
2: 2014-01-03 bitcoin 1.000000

And if you want wide formatted data, you do:

dcast(data_long, Group ~ date)
Group 2014-01-02 2014-01-03
1:     XRP   0.064042         NA
2: bitcoin         NA          1
linog
  • 5,786
  • 3
  • 14
  • 28
  • Yes, you are correct. it does compile now. Now there just seems to be a problem that alot of groups just return a NA value, even though there are several coins with values at that date in that group. – Lasse.Jen Apr 16 '20 at 08:45
  • In that case, you need to add `na.rm = TRUE` to your arguments in the `lapply` . See my update. Does it solve your issue ? – linog Apr 16 '20 at 08:53