0

I have a database with surf clam lengths that I want to create bin lengths for. These clam lengths range from 20 cm all the way to 180 cm. I want to bin these lengths together in 3 cm increments. For example, lengths of 1, 2 or 3 will have a bin length of 3, lengths 4, 5 and 6 will be a bin length of 6, and 7, 8, 9 will all be bin length of 9 and so on.

The bin categories I want are 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102 105 108 111 114 117 120 123 126 129 132 135 138 141 144 147 150 153 156 159 162 165 168 171 174 177 180.

I also need to add the FREQ together with the lengths that are being binned together. For example, if I have lengths of 58 cm (FREQ = 2), 59 cm (FREQ = 1), and 60 cm (FREQ = 5), the end result should be 60 cm with a frequency of 8.

STA     DATE        SPP LENG FREQ
5002    06/12/85    403 82  1
5002    06/12/85    403 90  1
5002    06/12/85    403 94  2
5002    06/12/85    403 98  1
5002    06/12/85    403 99  1
5002    06/12/85    403 102 1
5002    06/12/85    403 105 1
5002    06/12/85    403 106 1
5002    06/12/85    403 107 1
5002    06/12/85    403 111 1
5003    06/12/85    403 75  1
5003    06/12/85    403 76  1
5003    06/12/85    403 92  1
5003    06/12/85    403 93  1
5003    06/12/85    403 95  1
5003    06/12/85    403 151 1
5004    06/12/85    403 130 1
5004    06/12/85    403 140 1
5004    06/12/85    403 143 1
5004    06/12/85    403 144 1
5004    06/12/85    406 145 1
5004    06/12/85    403 146 1
5004    06/12/85    406 147 1
5004    06/12/85    403 153 1

I'm fairly new to R so I'm not sure how to go about doing this. Please help!

2 Answers2

0

I believe this answers your question --

dat$bins<-ceiling(dat$LENG/3)*3
ndat<-aggregate(dat[,c('FREQ')],by=list(dat$STA,dat$DATE,dat$SPP,dat$bins),FUN=sum)
Travis Gaddie
  • 152
  • 10
0

The cut() function turns numerics into binned factors.

cutoff_lengths <- seq(0, 180, by = 3)
df$BIN <- cut(df$LENG, cutoff_lengths, labels = cutoff_lengths[-1])
table(df$BIN)

cutoff_lengths[-1] means the labels are all but the first value of cutoff_lengths. Because each bin is between two of the cut points, there's one less bin than there are cut points. And you want to round up, so the lowest cut point isn't used as a label.

Nathan Werth
  • 5,093
  • 18
  • 25