What I am trying to select the rows that have the same gene_id, but have the minimum value of the start coordinates: href_pos$start. Why do I get this error, even though I have a memory limit of ~ 16Gb? Or what am I doing wrong? I have the following code:
head(href_pos, 5)
chr region start end strand nu gene_id
1 chr1 start_codon 67000042 67000044 + . NM_032291
2 chr1 CDS 67000042 67000051 + 0 NM_032291
3 chr1 exon 66999825 67000051 + . NM_032291
4 chr1 CDS 67091530 67091593 + 2 NM_032291
5 chr1 exon 67091530 67091593 + . NM_032291
d1 <- ddply(as.data.frame(href_pos), "gene_id", function(href_pos) href_pos[which.min(href_pos$start), ])
Error: cannot allocate vector of size 283 Kb In addition: Warning messages:
1: In lapply(dfs, function(df) levels(df[[var]])) : Reached total allocation of 16383Mb: see help(memory.size)