2

My dataframe is of the following form:

Index: Number of pets owned : Age range

  1. 10 : 30s

  2. 2 : 50s

  3. 4 : 60s

  4. 6 : <20s

  5. 9 : 70s

etc. Essentially, the number of age ranges are <20s, 20s, 30s, 40s, 50s, 60s, 70s. What I would like to do is turn this categorical age range variable into a continuous one by assigning 1, 2, 3, 4, 5, 6, 7 to the age ranges. Any idea how I can do this in R? I think the as.numeric function could be useful but I've never used it before.

Phil
  • 7,287
  • 3
  • 36
  • 66
John
  • 197
  • 1
  • 7
  • Search SO for `extract number` from strings and you'll find a lot of relevant questions and answers to do so. Common methods include `stringr::str_extract_all` and/or `sub`/`gsub`/`gregexpr`. You need this in order to numerically order them by number (since lexicographic sorting can/will fail). – r2evans Jun 19 '20 at 17:58
  • Alternatively, if it is already a `factor` and ordered correctly, then you can use `as.integer` to use just the integer indices within the `factor`. We can know for sure if you provide unambiguous sample data, i.e., `dput(head(x))`. – r2evans Jun 19 '20 at 17:59
  • I'd like to use as.numeric() in order to make it continuous. Otherwise, it is pretty much the same as being discrete (if the only values it can take are integers). – John Jun 19 '20 at 18:07
  • That's perfectly legitimate R code: making a "float" from integers is perfectly fine from a mathematical point of view, though you're misleading yourself to believe that your discrete data is continuous (though that's more a topic of accuracy/precision). Is there a reason you aren't already using `as.numeric`? – r2evans Jun 19 '20 at 18:12

1 Answers1

0

You can do that using as.numeric() function. Using your dataframe we have:

data_frame <- data.frame(
pets_owned = c("10", "2", "4","6","9"),
age_rank = c("30", "50", "60","20","70")
)

This is your Dataframe looks like:

> data_frame
  pets_owned age_rank
1         10       30
2          2       50
3          4       60
4          6       20
5          9       70

Checking the class data type of age_rank column we have:

> class(data_frame$age_rank)
[1] "factor"

So using as.numeric():

data_frame[2]=as.numeric(data_frame$age_rank)
# update the value in the position [2] of the dataframe

This is your dataframe with the values 1, 2, 3, 4, 5 in the age rank.

> data_frame
  pets_owned age_rank
1         10        2
2          2        3
3          4        4
4          6        1 # note that the value 1 
5          9        5 # correspond with the age of 20.

Checking the column again:

> class(data_frame$age_rank)
[1] "numeric"
rubengavidia0x
  • 501
  • 1
  • 5
  • 18