This problem is about writing a regex to edit a column of industry names that I have in a data frame.
To create an example data frame here is some code:
Industries<-c("LEISURE - Restaurants", "FINANCIAL SERVICES - Closed End Fund - Equity", "AEROSPACE/DEFENSE - Aerospace/Defense Products & Services", "METALS & MINING - Industrial Metals & Minerals")
Industries<-data.frame(Industries)
I have a column that is populated with words strings such as:
LEISURE - Restaurants
FINANCIAL SERVICES - Closed End Fund - Equity
AEROSPACE/DEFENSE - Aerospace/Defense Products & Services
METALS & MINING - Industrial Metals & Minerals
I want to preserve everything to the left of the first hyphen while discarding everything else. Desired output:
LEISURE
FINANCIAL SERVICES
AEROSPACE/DEFENSE
METALS & MINING
I have tried:
stringi::stri_trim_right(Industries[,1], pattern = "[-]")
[1] "LEISURE -" "FINANCIAL SERVICES - Closed End Fund -" [3] "AEROSPACE/DEFENSE -" "METALS & MINING -"
stringi::stri_trim_right(Industries[,1], pattern = "[A-Z]")
[1] "LEISURE - R" "FINANCIAL SERVICES - Closed End Fund - E"
[3] "AEROSPACE/DEFENSE - Aerospace/Defense Products & S" "METALS & MINING - Industrial Metals & M"
gsub("[^A-Z]","", Industries[,1])
[1] "LEISURER" "FINANCIALSERVICESCEFE" "AEROSPACEDEFENSEADPS" "METALSMININGIMM"
All are close, but not what I need. Suggestions? Relevant post?