-1

I've got a large file that looks like this:

SAMPLE1 10
SAMPLE1 10
SAMPLE1 10
SAMPLE1 2
SAMPLE2 10
SAMPLE2 10
SAMPLE2 2
SAMPLE2 2

the file is huge (several gigabytes) and R is killed when I want to read the file and then useboxplot. So my idea is to use sort | uniq -c on my file and to use a much smaller file that would now look like this ( with a 3rd column containing the number of observations):

SAMPLE1 10 3
SAMPLE1 2 1
SAMPLE2 10 2
SAMPLE2 2 2

Is there a way to use base:boxplot to plot such data ?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Pierre
  • 34,472
  • 31
  • 113
  • 192

2 Answers2

3

Here's a package ENmisc with a function wtd.boxplot. https://www.rdocumentation.org/packages/ENmisc/versions/1.2-7/topics/wtd.boxplot

Alternatively, calculate the weighted quartiles and then draw the boxplot using those values.

Shaun Jackman
  • 956
  • 10
  • 15
1

We can pre-compute 5 numbers per sample (min, low, mid, upper, max) in bash. Then data would be small enough to import to R, then we can boxplot using summary data:

zx8754
  • 52,746
  • 12
  • 114
  • 209