2

My question today refers to a data frame I'm working on in R. The header of the data frame looks like the following: String(unique), Integer N[0-23]

Those 24 Integer values represent the frequency of the String associated with each hour of the day. Logically, the int values in each row sum up to the number how often the string appears in the data in general.

Thing is, I don't need the real frequency of the string at a certain hour but the percentage this frequency represents in relation to the sum of the integer values in all rows.

My lecturer hinted that table() might be the right R tool for that but I honestly don't understand how that is supposed to help me.

If all else fails, I'll calculate it in Java - although I'd really appreciate your help to do this in R.

Thanks for reading so far and thanks in advance for your help,

Rickyfox

@@@@@@I am your edit, read me @@@@@@

With the help I got from James I got the following proptable

Thing is, the percentages sum up to 100 for each row, but they should do so for the whole table. Is there a way to do that?

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
deemel
  • 1,006
  • 5
  • 17
  • 34

1 Answers1

6

Use prop.table on a matrix containing the values:

x <- data.frame(id=letters[1:3],val0=1:3,val1=4:6,val2=7:9)
prop.table(as.matrix(x[-1]),margin=1)
           val0      val1      val2
[1,] 0.08333333 0.3333333 0.5833333
[2,] 0.13333333 0.3333333 0.5333333
[3,] 0.16666667 0.3333333 0.5000000

Edit: A fully working example:

tt=read.table("topichitsperhod.csv",sep=",",header=TRUE)  
tt=na.omit(tt[-1])
pt=prop.table(tt[-1],margin=NULL)

First column is being left out because it held the topic strings.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
James
  • 65,548
  • 14
  • 155
  • 193
  • Thank you for your answer James! Would using as.matrix() suffice here or do I have to create the matrix specifically with the entries in column 1 being the rows? I'm not that experienced with matrices in R – deemel Sep 25 '12 at 08:31
  • A `matrix` can only contain a single type (and we want numeric for `prop.table`), so in the example above I used `as.matrix` on everything except the first (character) column. If you prefer you can do it separately and use the first column as `rownames`. – James Sep 25 '12 at 08:39
  • updated my question to adapt to the progress I've made with your help – deemel Sep 25 '12 at 10:08
  • @Rickyfox Look at the `margin` argument. For the proportions to be computed for the whole table, either omit it in your call or use `margin=NULL` – James Sep 25 '12 at 10:13
  • when using margin=NULL I get the following error: "Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables" So do I have to set the sum of all columns except the the string column as margin? Does the margin represent the value that is the 100%? – deemel Sep 25 '12 at 11:24
  • 1
    You have to remove the non-numeric column, which I did with `x[-1]` in the example – James Sep 25 '12 at 11:36
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/17123/discussion-between-rickyfox-and-james) – deemel Sep 25 '12 at 12:30