0

I have a dataframe 'df' that has three columns. Column three contains a series of randomly assigned numbers (1 to X) that often repeat themselves a number of times. Even though the column contains a set of randomly assigned numbers, they are sorted small to large to make this easier. Additionally, there are multiple entries for Site and Date, where each combination of Site and Date has the randomly selected numbers (1 to X). df currently looks like this (shortened for space purposes):

Site Date Minute
BMA 44648 4
BMA 44648 4
BMA 44648 4
BMA 44648 4
BMA 44648 4
BMA 44648 13
BMA 44648 13
BMA 44648 13
BMA 44648 27
BMA 44648 27
BMA 44648 27
BMA 44648 27
BMA 44648 27
BMA 44773 2
BMA 44773 2
BMA 44773 2
BMA 44773 2
BMA 44773 2
BMA 44773 12
BMA 44773 12
BMA 44773 12
BMC 44648 3
BMC 44648 3
BMC 44648 3
BMC 44648 3
BMC 44648 3
BMC 44648 3
BMC 44648 44
BMC 44648 44
BMC 44648 44
BMC 44648 44
BMC 44648 60
BMC 44648 60
BMC 44648 60
BMC 44648 60

I want to convert those random numbers to an ordered set of numbers from 1-X (where X is the number of unique values for Minute, and so that a new ordered factor column is created that looks like this:

Site Date Minute NewMinute
BMA 44648 4 1
BMA 44648 4 1
BMA 44648 4 1
BMA 44648 4 1
BMA 44648 4 1
BMA 44648 13 2
BMA 44648 13 2
BMA 44648 13 2
BMA 44648 27 3
BMA 44648 27 3
BMA 44648 27 3
BMA 44648 27 3
BMA 44648 27 3
BMA 44773 2 1
BMA 44773 2 1
BMA 44773 2 1
BMA 44773 2 1
BMA 44773 2 1
BMA 44773 12 2
BMA 44773 12 2
BMA 44773 12 2
BMC 44648 3 1
BMC 44648 3 1
BMC 44648 3 1
BMC 44648 3 1
BMC 44648 3 1
BMC 44648 3 1
BMC 44648 44 2
BMC 44648 44 2
BMC 44648 44 2
BMC 44648 44 2
BMC 44648 60 3
BMC 44648 60 3
BMC 44648 60 3
BMC 44648 60 3

Thank you for your help!

user438383
  • 5,716
  • 8
  • 28
  • 43
Jacob
  • 329
  • 2
  • 10

1 Answers1

2

base R

ave(dat$Minute, dat[c("Site","Date")], FUN = function(z) match(z, sort(unique(z))))
#  [1] 1 1 1 1 1 2 2 2 3 3 3 3 3 1 1 1 1 1 2 2 2 1 1 1 1 1 1 2 2 2 2 3 3 3 3
dat$NewMinute <- ave(dat$Minute, dat[c("Site","Date")], FUN = function(z) match(z, sort(unique(z))))
identical(dat, dat2) # dat2 is expected output from the OP
# [1] TRUE

dplyr

library(dplyr)
dat %>%
  group_by(Site, Date) %>%
  mutate(NewMinute = match(Minute, sort(unique(Minute)))) %>%
  ungroup()
# # A tibble: 35 x 4
#    Site   Date Minute NewMinute
#    <chr> <int>  <int>     <int>
#  1 BMA   44648      4         1
#  2 BMA   44648      4         1
#  3 BMA   44648      4         1
#  4 BMA   44648      4         1
#  5 BMA   44648      4         1
#  6 BMA   44648     13         2
#  7 BMA   44648     13         2
#  8 BMA   44648     13         2
#  9 BMA   44648     27         3
# 10 BMA   44648     27         3
# # ... with 25 more rows

Data

dat <- structure(list(Site = c("BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC"), Date = c(44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44773L, 44773L, 44773L, 44773L, 44773L, 44773L, 44773L, 44773L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L,  44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L), Minute = c(4L, 4L, 4L, 4L, 4L, 13L, 13L, 13L, 27L, 27L, 27L, 27L, 27L, 2L, 2L, 2L, 2L, 2L, 12L, 12L, 12L, 3L, 3L, 3L, 3L, 3L, 3L, 44L, 44L, 44L, 44L, 60L, 60L, 60L, 60L)), row.names = c(NA, -35L), class = "data.frame")
dat2 <- structure(list(Site = c("BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC"), Date = c(44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44773L, 44773L, 44773L, 44773L, 44773L, 44773L, 44773L, 44773L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L,  44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L), Minute = c(4L, 4L, 4L, 4L, 4L, 13L, 13L, 13L, 27L, 27L, 27L, 27L, 27L, 2L, 2L, 2L, 2L, 2L, 12L, 12L, 12L, 3L, 3L, 3L, 3L, 3L, 3L, 44L, 44L, 44L, 44L, 60L, 60L, 60L, 60L), NewMinute = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L)), class = "data.frame", row.names = c(NA, -35L))
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    Both answers seem to work just fine. I suspected this might be a dplyr issue and was happy to see your answer reflect that. That solution did the trick prefectly. Thank you! – Jacob Dec 27 '21 at 16:08