Convert random numbers to set of ordered numbers

Question

I have a dataframe 'df' that has three columns. Column three contains a series of randomly assigned numbers (1 to X) that often repeat themselves a number of times. Even though the column contains a set of randomly assigned numbers, they are sorted small to large to make this easier. Additionally, there are multiple entries for Site and Date, where each combination of Site and Date has the randomly selected numbers (1 to X). df currently looks like this (shortened for space purposes):

Site	Date	Minute
BMA	44648	4
BMA	44648	4
BMA	44648	4
BMA	44648	4
BMA	44648	4
BMA	44648	13
BMA	44648	13
BMA	44648	13
BMA	44648	27
BMA	44648	27
BMA	44648	27
BMA	44648	27
BMA	44648	27
BMA	44773	2
BMA	44773	2
BMA	44773	2
BMA	44773	2
BMA	44773	2
BMA	44773	12
BMA	44773	12
BMA	44773	12
BMC	44648	3
BMC	44648	3
BMC	44648	3
BMC	44648	3
BMC	44648	3
BMC	44648	3
BMC	44648	44
BMC	44648	44
BMC	44648	44
BMC	44648	44
BMC	44648	60
BMC	44648	60
BMC	44648	60
BMC	44648	60

I want to convert those random numbers to an ordered set of numbers from 1-X (where X is the number of unique values for Minute, and so that a new ordered factor column is created that looks like this:

Site	Date	Minute	NewMinute
BMA	44648	4	1
BMA	44648	4	1
BMA	44648	4	1
BMA	44648	4	1
BMA	44648	4	1
BMA	44648	13	2
BMA	44648	13	2
BMA	44648	13	2
BMA	44648	27	3
BMA	44648	27	3
BMA	44648	27	3
BMA	44648	27	3
BMA	44648	27	3
BMA	44773	2	1
BMA	44773	2	1
BMA	44773	2	1
BMA	44773	2	1
BMA	44773	2	1
BMA	44773	12	2
BMA	44773	12	2
BMA	44773	12	2
BMC	44648	3	1
BMC	44648	3	1
BMC	44648	3	1
BMC	44648	3	1
BMC	44648	3	1
BMC	44648	3	1
BMC	44648	44	2
BMC	44648	44	2
BMC	44648	44	2
BMC	44648	44	2
BMC	44648	60	3
BMC	44648	60	3
BMC	44648	60	3
BMC	44648	60	3

Thank you for your help!

score 2 · Accepted Answer · answered Dec 27 '21 at 16:00

base R

ave(dat$Minute, dat[c("Site","Date")], FUN = function(z) match(z, sort(unique(z))))
#  [1] 1 1 1 1 1 2 2 2 3 3 3 3 3 1 1 1 1 1 2 2 2 1 1 1 1 1 1 2 2 2 2 3 3 3 3
dat$NewMinute <- ave(dat$Minute, dat[c("Site","Date")], FUN = function(z) match(z, sort(unique(z))))
identical(dat, dat2) # dat2 is expected output from the OP
# [1] TRUE

dplyr

library(dplyr)
dat %>%
  group_by(Site, Date) %>%
  mutate(NewMinute = match(Minute, sort(unique(Minute)))) %>%
  ungroup()
# # A tibble: 35 x 4
#    Site   Date Minute NewMinute
#    <chr> <int>  <int>     <int>
#  1 BMA   44648      4         1
#  2 BMA   44648      4         1
#  3 BMA   44648      4         1
#  4 BMA   44648      4         1
#  5 BMA   44648      4         1
#  6 BMA   44648     13         2
#  7 BMA   44648     13         2
#  8 BMA   44648     13         2
#  9 BMA   44648     27         3
# 10 BMA   44648     27         3
# # ... with 25 more rows

Data

dat <- structure(list(Site = c("BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC"), Date = c(44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44773L, 44773L, 44773L, 44773L, 44773L, 44773L, 44773L, 44773L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L,  44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L), Minute = c(4L, 4L, 4L, 4L, 4L, 13L, 13L, 13L, 27L, 27L, 27L, 27L, 27L, 2L, 2L, 2L, 2L, 2L, 12L, 12L, 12L, 3L, 3L, 3L, 3L, 3L, 3L, 44L, 44L, 44L, 44L, 60L, 60L, 60L, 60L)), row.names = c(NA, -35L), class = "data.frame")
dat2 <- structure(list(Site = c("BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMA", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC", "BMC"), Date = c(44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44773L, 44773L, 44773L, 44773L, 44773L, 44773L, 44773L, 44773L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L,  44648L, 44648L, 44648L, 44648L, 44648L, 44648L, 44648L), Minute = c(4L, 4L, 4L, 4L, 4L, 13L, 13L, 13L, 27L, 27L, 27L, 27L, 27L, 2L, 2L, 2L, 2L, 2L, 12L, 12L, 12L, 3L, 3L, 3L, 3L, 3L, 3L, 44L, 44L, 44L, 44L, 60L, 60L, 60L, 60L), NewMinute = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L)), class = "data.frame", row.names = c(NA, -35L))

Both answers seem to work just fine. I suspected this might be a dplyr issue and was happy to see your answer reflect that. That solution did the trick prefectly. Thank you! — Jacob, Dec 27 '21 at 16:08

Convert random numbers to set of ordered numbers

1 Answers1

base R

dplyr