dealing with data table with redundant rows

Question

The title is not precisely stated but I could not come up with other words which summarizes what I exactly going to ask.

I have a table of the following form:

value (0<v<1)        # of events
   0.5677                 100000
   0.5688                   5000
   0.1111                   6000
     ...                     ...
   0.5688                 200000
   0.1111                  35000

Here are some of the things I like to do with this table: drawing the histogram, computing mean value, fitting the distribution, etc. So far, I could only figure out how to do this with vectors like

v=(0.5677,...,0.5688,...,0.1111,...)

but not with tables.

Since the number of possible values are huge by being almost continuous, I guess making a new table would not be that effective, so doing this without modifying the original table and making another table would be desirable very much. But if it has to be done so, it's okay. Thanks in advance.

Appendix: What I want to figure out is how to treat this table as a usual data vector: If I had the following vector representing the exact same data as above:

v= (0.5677, ...,0.5677 , 0.5688, ... 0.5688, 0.1111,....,0.1111,....)
    ------------------   ------------------  ------------------
     (100000 times)      (5000+200000 times)  (6000+35000) times

then we just need to apply the basic functions like plot, mean, or etc to get what I wanted. I hope this makes my question more clear.

what have you tried? `ggplot` can make a histogram with data in this form no problem. what do you mean when you say "the mean value"? have you looked at the various distributions and fitting functions in R? also, can you provide a reproducible sample of your data using `dput(head(yourdata))` or something similar. — Justin, Sep 26 '12 at 20:07
Try this one: "Efficiently compute mean and standard deviation from a frequency table" http://stackoverflow.com/q/10397574/496803 — thelatemail, Sep 26 '12 at 23:51

score 0 · Answer 1 · answered Sep 26 '12 at 21:32

Your data consist of a value and a count for that value so you are looking for functions that will use the count to weight the value. Type ?weighted.mean to get information on a function that will compute the mean for weighted (grouped) data. For density plots, you want to use the weights= argument in the density() function. For the histogram, you just need to use cut() to combine values into a small number of groups and then use aggregate() to sum the counts for all the values in the group. You will find a variety of weighted statistical measures in package Hmisc (wtd.mean, wtd.var, wtd.quantile, etc).

dealing with data table with redundant rows

1 Answers1