0

This problem is about writing a regex to edit a column of industry names that I have in a data frame.

To create an example data frame here is some code:

Industries<-c("LEISURE - Restaurants", "FINANCIAL SERVICES - Closed End Fund - Equity", "AEROSPACE/DEFENSE - Aerospace/Defense Products & Services", "METALS & MINING -  Industrial Metals & Minerals")
Industries<-data.frame(Industries)

I have a column that is populated with words strings such as:

LEISURE - Restaurants
FINANCIAL SERVICES - Closed End Fund - Equity
AEROSPACE/DEFENSE - Aerospace/Defense Products & Services
METALS & MINING -  Industrial Metals & Minerals

I want to preserve everything to the left of the first hyphen while discarding everything else. Desired output:

LEISURE
FINANCIAL SERVICES
AEROSPACE/DEFENSE
METALS & MINING

I have tried:

stringi::stri_trim_right(Industries[,1], pattern = "[-]")

[1] "LEISURE -" "FINANCIAL SERVICES - Closed End Fund -" [3] "AEROSPACE/DEFENSE -" "METALS & MINING -"

stringi::stri_trim_right(Industries[,1], pattern = "[A-Z]")

[1] "LEISURE - R" "FINANCIAL SERVICES - Closed End Fund - E"
[3] "AEROSPACE/DEFENSE - Aerospace/Defense Products & S" "METALS & MINING - Industrial Metals & M"

gsub("[^A-Z]","", Industries[,1])

[1] "LEISURER" "FINANCIALSERVICESCEFE" "AEROSPACEDEFENSEADPS" "METALSMININGIMM"

All are close, but not what I need. Suggestions? Relevant post?

Kirsten
  • 31
  • 3

0 Answers0