0

I have a dataset where column names have prefixes (corresponding to panel waves), e.g.

a_age
a_sex
a_jbstat
b_age
b_sex
b_jbstat

I would like to convert the prefixes into suffixes, so that it becomes:

age_a
sex_a
jbstat_a
age_b
sex_b
jbstat_b

I'd be grateful for suggestions on efficient ways of doing this.

aynber
  • 22,380
  • 8
  • 50
  • 63
dbartram
  • 33
  • 5

2 Answers2

1

One way to do it, is to use a regex

x <- c(
  "a_age",
  "a_sex",
  "a_jbstat",
  "b_age",
  "b_sex",
  "b_jbstat"
)

stringr::str_replace(x, "^([a-z]+)_([a-z]+)$", "\\2_\\1")
#> [1] "age_a"    "sex_a"    "jbstat_a" "age_b"    "sex_b"    "jbstat_b"

Created on 2020-05-25 by the reprex package (v0.3.0)

Edit: Full Example

df <- data.frame(
  a_age = 1,
  a_sex = 1,
  b_age = 2,
  b_sex = 2
)
df
#>   a_age a_sex b_age b_sex
#> 1     1     1     2     2

names(df) <- stringr::str_replace(names(df), "^([a-z]+)_([a-z]+)$", "\\2_\\1")
df
#>   age_a sex_a age_b sex_b
#> 1     1     1     2     2

Created on 2020-05-26 by the reprex package (v0.3.0)

David
  • 9,216
  • 4
  • 45
  • 78
  • This one as well -- it turns my dataframe into a "large character". – dbartram May 26 '20 at 06:52
  • See comment / solution of Chris. This solution only changes the values of the names, but does not assign it. See updates... – David May 26 '20 at 07:25
  • My thanks to you as well -- it helps me to see how to create a small example dataframe to help in posing a question of this sort. – dbartram May 26 '20 at 08:20
  • Always a pleasure to help and get someone started with something... :) Please make sure to also vote/accept answers. That motivates us to keep answering – David May 26 '20 at 08:51
1

You can use sub and backreference:

sub("([a-z])_([a-z]+)", "\\2_\\1", x)
[1] "age_a"    "sex_a"    "jbstat_a" "age_b"    "sex_b"    "jbstat_b"

The backreferences \\1and \\2 recall the exact character strings in the two capturing groups ([a-z]), which is recalled by \\1, and ([a-z]+), which is recalled by \\2. To obtain the desired string change, these 'recollections' are simply reversed in the replacement argument to sub.

EDIT:

If the elements are column names, you can do this:

names(df) <- sub("([a-z])_([a-z]+)", "\\2_\\1", names(df))
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • Thank you -- but the result isn't quite what is intended -- it turns my dataframe into a "large character". – dbartram May 26 '20 at 06:51
  • Well, then you should post your data frame in some way so that we can see how it is structured. I guess, if the elements are, as you say, column names, you will need to do something like: `names(df) <- sub("([a-z])_([a-z]+)", "\\2_\\1", names(df))` – Chris Ruehlemann May 26 '20 at 06:54
  • Thank you so much -- that does it. Never would have figured it out myself; this really helps me understand regex. My apologies for not being clear enough at the beginning. – dbartram May 26 '20 at 08:19