I am having some trouble wording my issue, so I am using the mtcars dataset as an example.
Imagine I am student of social sciences in the Pixar Cars(TM) universe. For a small school project on statistical methods, I am doing a survey amongst my peers. My target is to collect data on a sample of 30 cars, half of which are automatic, and the other half is manual. After my online survey is closed, and I have cleaned up my data, it looks like the mtcars dataset.
data(mtcars)
str(mtcars)
mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am) <- c("automatic", "manual") # because anthropomorphic cars prefer factors with levels over binary code
If I use table(mtcars$am)
, I find out that there were 19 automatic and 13 manual transmission cars in the dataset. Looks like I didn't make the target to have an equal number of manual and automatic cars :(! Luckily, as a car-sociologist, I can fix this by weighing my dataset. I divide the target # by the collected # to get the weight of each observation. Thus, all automatic cars should get a weight of 0.7894 (19/15) and manual cars a weight of 1.1538 (13/15). Assigning the correct weight to each observation is a fairly straightforward:
mtcars$weight <- ifelse(mtcars$am == "automatic", 0.7894737, 1.153846)
You can imagine that this method becomes a bit cumbersome with larger datasets with more weight-categories. Is there a way to automate the process of assigning the weights to each observation?
As a car and self-taught R-user who mainly cobbles things together as-needed, I don't really know where to start. I've been using the method above, but due to an enlarged number of target-groups it's not really sustainable anymore.
I of course did attempt to find the answer elsewhere on the WWW, but not very successfully unfortunately. The following question seemed promising, but doesn't provide a solution for me:
R: new variable values based on factor levels of another variable