I have a data.frame
of hospital data with 11 million rows.
Columns: ID (chr), outcome (1|0), 20x ICD-10 codes (chr).
Rows: 10.6 million
I wish to make the data tidy to allow modelling of diagnostic codes to a binary outcome.
I would normally use pivot_longer
or Base R aggregate
function, but the resulting data.frame
is huge and my machine struggles, due to memory (32gb RAM, windows server running latest R x64).
I am going to split the data.frame
and pivot_longer
for each and manually add columns to allow binding data.frame
's after, or to model each split data.frame
separately.
Is there a method I could use instead to reduce the data size or achieve a similar objective which I am missing?