In R, I have a large list of large dataframes consisting of two columns, value
and count
. The function which I am using in the previous step returns the value of the observation in value
, the corresponding column count
shows how many times this specific value has been observed. The following code produces one dataframe as an example - however all dataframes in the list do have different values resp. value ranges:
d <- as.data.frame(
cbind(
value = runif(n = 1856, min = 921, max = 4187),
count = runif(n = 1856, min = 0, max = 20000)
)
)
Now I would like to aggregate the data to be able to create viewable visualizations. This aggregation should be applied to all dataframes in a list, which do each have different value ranges. I am looking for a function, cutting the data into new values and counts, a little bit like a histogram function. So for example, for all data from a value of 0 to 100, the counts should be summated (and so on, in a defined interval, with a clean interval border starting point like 0).
My first try was to create a simple value vector, where each value is repeated in a number of times that is determined by the count
field. Then, the next step would have been applying the hist()
function without plotting to obtain the aggregated values and counts which can be defined in the hist()
's arguments. However, this produces too large vectors (some Gb for each) that R cannot handle anymore. I appreciate any solutions or hints!