0

Apologies for the unclear title. Although not effective, I couldn't think of a better way to describe this problem.

Here is a sample dataset I am working with

test = data.frame(
  Value = c(1:5, 5:1),
  Index = c(1:5, 1:5),
  GroupNum = c(rep.int(1, 5), rep.int(2, 5))
)

I want to create a new column (called "Value_Standardized") whose values are calculated by grouping the data by GroupNum and then dividing each Value observation by the Value observation of the group when the Index is 1.

Here's what I've come up with so far.

test2 = test %>% 
  group_by(GroupNum) %>% 
  mutate(Value_Standardized = Value / special_function(Value))

The special_function would represent a way to get value when Index == 1.

That is also precisely the problem - I cannot figure out a way to get the denominator to be the value when index == 1 in that group. Unfortunately, the value when the index is 1 is not necessarily the max or the min of the group.

Thanks in advance.

Edit: Emphasis added for clarity.

Tee
  • 149
  • 8
  • *dividing each Value observation by the Value observation* means `Value / Value`? – Andre Wildberg Oct 18 '22 at 14:45
  • @AndreWildberg I think it means *dividing each Value observation by (the Value observation when the Index is 1).* – Paul Stafford Allen Oct 18 '22 at 14:51
  • Andre - Paul's interpretation is correct. Sorry about the ambiguous wording. I've added an emphasis in an effort to clarify what I mean. – Tee Oct 18 '22 at 14:56
  • If I got it right simply use `ifelse`, e.g. `test %>% group_by(GroupNum) %>% mutate(Value_Standardized = ifelse(Index == 1, Value/Value, NA))` – Andre Wildberg Oct 18 '22 at 15:02
  • Thanks, Andre. This isn't exactly what I was looking for as I didn't *only* want to divide rows with an index of 1. If interested, feel free to take a look at the answers. Thanks for your help. – Tee Oct 18 '22 at 19:26

3 Answers3

1

A quick base R solution:

test = data.frame(
  Value = c(1:5, 5:1),
  Index = c(1:5, 1:5),
  GroupNum = c(rep.int(1, 5), rep.int(2, 5)),
  Value_Standardized = NA
)


groups <- levels(factor(test$GroupNum))
for(currentGroup in groups) {
  test$Value_Standardized[test$GroupNum == currentGroup] <- test$Value[test$GroupNum == currentGroup] / test$Value[test$GroupNum == currentGroup & test$Index == 1]
}

This only works under the assumption that each group will have only one observation with a "1" index though. It's easy to run into trouble...

David
  • 371
  • 2
  • 13
  • Thanks, David. For a clean dataset, this works just fine. If there are multiple "1" indexes, using ```sub``` to change the first one before employing technique would probably work. – Tee Oct 18 '22 at 19:25
1

Not sure if this is what you meant, nor if it's the best way to do this but... Instead of using a group_by I used a nested pipe, filtering and then left_joining the table to itself.

test = data.frame(
  Value = c(1:5, 5:1),
  Index = c(1:5, 1:5),
  GroupNum = c(rep.int(1, 5), rep.int(2, 5))
)

test %>% 
  left_join(test %>% 
              filter(Index == 1) %>% 
              select(Value,GroupNum),
            by = "GroupNum",
            suffix = c('','_Index_1')) %>% 
  mutate(Value = Value/Value_Index_1)

output:

   Value Index GroupNum Value_Index_1
1    1.0     1        1             1
2    2.0     2        1             1
3    3.0     3        1             1
4    4.0     4        1             1
5    5.0     5        1             1
6    1.0     1        2             5
7    0.8     2        2             5
8    0.6     3        2             5
9    0.4     4        2             5
10   0.2     5        2             5
Claudio Paladini
  • 1,000
  • 1
  • 10
  • 20
1

There is a super simple tidyverse way of doing this with the method cur_data() it pulls the tibble for the current subset (group) of data and acts on it

test2 <- test %>% 
group_by(GroupNum) %>% 
mutate(output=Value/cur_data()$Value[1])

enter image description here The cur_data() grabs the tibble, then you extract the Values column as you would normally using $Value and because the denominator is always the first row in this group, you just specify that index with [1]

Nice and neat, there are a whole bunch of cur_... methods you can use, check them out here:

sconfluentus
  • 4,693
  • 1
  • 21
  • 40
  • Thanks for the answer! I didn't think would be a package made for scenarios like this. Thank you. Edit: Wow, I just took a look at the documentation. This is very useful, especially being a part of the `dplyr` package already. Thanks again. – Tee Oct 18 '22 at 19:25
  • @Tee This is a part of the package `dplyr` but I would prefer to install the meta-package `tidyverse` when you load it using `library(tidyverse)` it loads dplyr and a whole bunch of other really useful packages that facilitate this kind of aggregation and data manipulation. The ease of doing meaningful data manipulation is miles simpler in the world of the `tidyverse`, just a little harder to chase down the methods because people did it the hard way for so long! – sconfluentus Oct 20 '22 at 14:01
  • Yes, appreciate the follow up! – Tee Oct 20 '22 at 23:55