0

I'm using R 4.3.1

I have a data frame with several variables, including years. The years are formatted this way: X1960..YR1960. I would like to rename all the variables following this pattern to a simplified version: Y1960

The dataframe contains variables from X1960..YR1960. to X2022..YR2022.

My current approach:

names(df) <- str_replace(names(df), "X\\d\\d\\d\\d\\.\\.YR*.", "Y")

Result of the current approach: Y960.

I don't understand the following things: Why is the first digit of the year omitted. Is the R a special character? If so, how can i escape it correctly? How do I get rid of the last dot? I tried escaping it too, but that yielded no matches to the regex. How works * exactly as a placeholder?

jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • The unescaped `.` matches the digit after `R` as an unescaped `.` matches any character but line break chars. Did you mean to match and keep what is between `R` and last `.`? Like `str_replace(names(df), "^X\\d{4}\\.{2}YR*(.*)\\.$", "Y\\1")`? – Wiktor Stribiżew Jul 28 '23 at 13:28
  • Yes, that's what I meant, thanks. I think your solution is equivalent to my modification of @Tim 's answer, if I'm not mistaken. – milkyuniverse Jul 31 '23 at 09:50
  • No, it is not the same. My pattern does not care what kind of text comes after `YR`, `\1` only matches the same four digits captured into Group 1. – Wiktor Stribiżew Jul 31 '23 at 19:10

2 Answers2

1

If all your names truly look like this, complicated regex isn't necessarily needed - a non-regex solution would be to simply pluck the last four digits using substr and prepend "Y" using paste0

df <- data.frame(`X1960..YR1960` = NA, 
                  `X1970..YR1970` = NA, 
                  `X2022..YR2022` = NA)
paste0("Y", 
       substr(names(df), nchar(names(df))-3, nchar(names(df))))

# [1] "Y1960" "Y1970" "Y2022"

Note I couldn't tell if there was truly a . at the end of the name (i.e., X1960..YR1960.). - if so, then you would simply tweak:

df <- data.frame(`X1960..YR1960.` = NA, 
                  `X1970..YR1970.` = NA, 
                  `X2022..YR2022.` = NA)
paste0("Y", 
       substr(names(df), nchar(names(df))-4, nchar(names(df))-1))

# [1] "Y1960" "Y1970" "Y2022"
jpsmith
  • 11,023
  • 5
  • 15
  • 36
0

We can use sub() here for a simple base R option:

x <- "X1960..YR1960"
x_out <- sub("X(\\d{4})\\.\\.YR\\1", "\\1", x)
x_out
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thank You! I made some changes, but it works perfectly: `names(X) <- sub("X(\\d{4})\\.\\.YR\\1\\.", "Y\\1", names(X))` I added an escaped dot `\\.` at the end of the regex and an `Y` to the beginning of the replacement to yield the format i desired. – milkyuniverse Jul 28 '23 at 13:34