0

I have a dataframe whose columns names are combinations of numbering and some complicated texts:

  1. A1. Good day

  2. A1a. Have a nice day

......

  1. Z7d. Some other titles

Now I want to keep only the "A1.", "A1a.", "Z7d.", removing both the preceding number and the ending texts. Is there any idea how to do this with tidyselect and regex?

Miles N.
  • 165
  • 7

2 Answers2

0

You can use this regex -

names(df) <- sub('\\d+\\.\\s+([A-Za-z0-9]+).*', '\\1', names(df))
names(df)
#[1] "A1"  "A1a" "Z7d"

The same regex can also be used in rename_with if you want a tidyverse answer.

library(dplyr)
df %>% rename_with(~sub('\\d+\\.\\s+([A-Za-z0-9]+).*', '\\1', .))

#          A1        A1a        Z7d
#1  0.5755992  0.4147519 -0.1474461
#2  0.1347792 -0.6277678  0.3263348
#3  1.6884930  1.3931306  0.8809109
#4 -0.4269351 -1.2922231 -0.3362182
#5 -2.0032113  0.2619571  0.4496466

data

df <- structure(list(`1. A1. Good day` = c(0.575599213383783, 0.134779160673435, 
1.68849296209512, -0.426935114884432, -2.00321125417319), `2. A1a. Have a nice day` = c(0.414751904860513, 
-0.627767775889949, 1.39313055331098, -1.29222310608057, 0.261957078465535
), `99. Z7d. Some other titles` = c(-0.147446140558093, 0.326334824433201, 
0.880910933597998, -0.336218174873965, 0.449646567320979)), 
class = "data.frame", row.names = c(NA, -5L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

We can use str_extract

library(stringr)
names(df) <- str_extract(names(df), "(?<=\\.\\s)[^.]+")
names(df)
[1] "A1"  "A1a" "Z7d"

data

df <- structure(list(`1. A1. Good day` = c(0.575599213383783, 0.134779160673435, 
1.68849296209512, -0.426935114884432, -2.00321125417319), `2. A1a. Have a nice day` = c(0.414751904860513, 
-0.627767775889949, 1.39313055331098, -1.29222310608057, 0.261957078465535
), `99. Z7d. Some other titles` = c(-0.147446140558093, 0.326334824433201, 
0.880910933597998, -0.336218174873965, 0.449646567320979)), 
class = "data.frame", row.names = c(NA, -5L))
akrun
  • 874,273
  • 37
  • 540
  • 662