2

I have a table (data.frame) with numerical data & factors data of which several are character variables (e.g. 'species', 'Fam_name', 'gear') where I want to calculate the subtotals (sum) for the 'weight' and 'number' variables for each 'ss'.

I have tried using the 'aggregate' function, but I have failed to get it to return the character value for the 'gear' variable.

Below is the head of my table

   survey station         ss species weight number bdep      lon      lat                       Sci_name       Fam_name gear
1 2012901       1 2012901001 CARSC04  11.20     20   23 37.61650 19.14900        Scomberoides lysan     CARANGIDAE   TB
2 2012901       1 2012901001 SCMGR02   0.98      2   23 37.61650 19.14900 Grammatorcynus bilineatus     SCOMBRIDAE   TB
3 2012901       2 2012901002 NOCATCH   0.00      0    6 38.48333 18.71667                  NO CATCH       NO CATCH   TB
4 2012901       3 2012901003 LUTLU06   5.65      1    6 38.48333 18.71667            Lutjanus bohar     LUTJANIDAE   TB
5 2012901       3 2012901003 SHACAB1   4.00      1    6 38.48333 18.71667         Triaenodon obesus CARCHARHINIDAE   TB
6 2012901       4 2012901004 NOCATCH   0.00      0    9 38.48333 18.71667                  NO CATCH       NO CATCH   TB

I tried using the following code with the intent of combining the two using bind,

catch1<-aggregate(cbind(weight, number) ~ ss, data = catch, FUN = sum) 

catch2<-aggregate(cbind(survey, station, bdep, lon, lat, gear) ~ ss, data = catch, FUN=median) 

but while the first line does what I want it to - sums for each 'ss', the other results in numerical median for 'gear' whereas I want it to return the 'gear' code for that particular 'ss'.

Reconstruction of the 'gear' factor (thanks to BrodieG):

catch2$gear <- factor(levels(catch$gear)[catch2$gear], levels=levels(catch$gear))

Problem solved :-)

ErikJS
  • 43
  • 1
  • 8
  • I suggest you provide a slightly larger example data set just for gear and ss if those are the only two variables causing problems. Also provide the answer you want based on that larger example data set. – Mark Miller Feb 15 '14 at 23:30
  • 'gear' is just an example in the code above - I'm trying to output one row per 'ss' with all the factor information in place. I could provide a larger data set, but the header I have included is a good example as both for 'ss'=2012901001 and 2012901003 there are two rows of data. – ErikJS Feb 16 '14 at 02:59

2 Answers2

1

Your problem is that gear is a factor, so median is returning the median of the numerical values of the factor. Try:

catch2$gear <- factor(levels(catch$gear)[catch2$gear], levels=levels(catch$gear))

or something like it to reconstruct the factor for catch2.

BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • +1. When I tried to solve I assumed two kinds of `gear` were possible within a single `ss` and nothing I tried worked, at least if there were four rows of data and two rows had one `gear` type and two rows had another. – Mark Miller Feb 16 '14 at 04:43
  • Mark: sorry for the confusion. For each 'ss' there is only one kind of 'gear'. – ErikJS Feb 16 '14 at 05:17
  • @ErikJS No problem. I added an answer for the scenario where multiple kinds of `gear` are possible within a single `ss`. – Mark Miller Feb 16 '14 at 05:28
0

I assumed there could be two kinds of gear for a given ss. In that case the problem boils down to finding the median (or mode) of a character variable. Here is code to find the mode of a character variable (here gear).

catch <- read.table(text = '
         ss  gear
          1    AA
          1    AA
          1    BB
          1    BB
          2    CC
          2    CC
          2    CC
          3    BB
          4    AA
          4    CC
', header = TRUE)

gear.mode <- tapply(catch$gear, catch$ss, function(x) { y = table(x) ; names(y)[y==max(y)] })
gear.mode <- as.data.frame(gear.mode)
gear.mode

  gear.mode
1    AA, BB
2        CC
3        BB
4    AA, CC

You can also do this with aggregate:

aggregate(gear ~ ss, data = catch, FUN = function (x) {
   y = table(x) ; names(y)[y==max(y)] 
})

  ss   gear
1  1 AA, BB
2  2     CC
3  3     BB
4  4 AA, CC
Mark Miller
  • 12,483
  • 23
  • 78
  • 132