0

I have hospital ward data that needs to be consistent. The first numeric character is the floor number, the alphabet characters that follow is the ward acronym, and the final two numeric characters are the bed number.

So 2EA 28 would be floor 2, Ward East and Bed 28.

The locations have been entered in with inconsistent spaces such that I have the following:

   toyraw<-data.table(incident_no = c(1:6), location =c("2EA17","2EA 17", "1ED1", "1ED23", "1ED 34","ICU24"))

I would like it to look like the following

   toyideal<-data.table(incident_no = c(1:5), location =c("2EA 17","2EA 17", "1ED 1", "1ED 23", "1ED 34", "ICU 24"))

If there was no numeric at the front I would just sub out the numeric and the characters one at a time but because it is numeric, character numeric it is posing a problem. There are 1462 rows.

Further complications, ground floor wards such as the ICU have no preceding number.

Added as per request - human readable names

human readable names:

additional<-data.table(incident_no = c(1:5), location =c("2EA 17","2EA 17", "1ED 1", "1ED 23", "1ED 34"),
                    human_Readable = c("Ward 2 East Bed 17","Ward 2 East Bed 17", "Ward 1 Emergency Department
                                       Bed 1", "Ward 1 Emergency Department Bed 23", "Ward 1 Emergency Department Bed 24",
                                       "Ward ICU Bed 24"))
monkeyshines
  • 1,058
  • 2
  • 8
  • 25

1 Answers1

2

You can use gsub() for this:

> gsub("(\\d*)(\\D*)\\s*(\\d*)",
       "Floor \\1 Ward \\2 and Bed \\3.",
       gsub(" ", "", "1ED 34"))

[1] "Floor 1 Ward ED and Bed 34."

Here is the regex I used:

(\\d*)(\\D*)\\s*(\\d*)

Regex101

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • This would work if the hospital ward naming protocal was consistent. But is causing some problems as the hospital ward names themselves are not entirely consistent and some wards have odd acronyms. Ideally I need to ensure that there is one white space right of the last numeric. So ICU24 becomes ICU 24. Basically need to be consistent so I can perform counts with dplyr. – monkeyshines Jul 21 '16 at 07:59
  • I updated the regex so that it also works for `ICU24`, i.e. in case there is no preceding number. Regex works well for fixed structures. If there are really no rules for the names, regex might not be the best tool. – Tim Biegeleisen Jul 21 '16 at 08:01
  • 1
    That works well all of the 'outliers' are ground floor. Will ensure that limit regex to clearly structured expressions. – monkeyshines Jul 21 '16 at 08:08
  • The following code just adds a white space before the last numerics. It uses the `$` sign of regex to ensure only the last numerical values are used. `gsub("(\\d*)$"," \\1", toyraw$location)` I get the ofllowing results from your input data: `"2EA 17" "2EA 17" "1ED 1" "1ED 23" "1ED 34" "ICU 24" ` – Choubi Jul 21 '16 at 08:09