-1

I have a dataset like below and the telephone numbers are in different digits and formats.

Would you help me ordering them into a standard format using R?

TelephoneData <- data.frame(
  FIRST = c("STAN", "FIONA", "JOHN", "VERA", "ROBERT", "ANGIE", "PAUL", "GEORGE", "JUDITH", "TREVOR", "KEN", "BRIAN", "GLADYS", "MARY", "MARY", "JOSHUA", 
            "BRIAN", "PHILLIP", "KATE", "BRIAN"),
  PHONE = c("+44 1152 195298", "07366 602865", "01160 979447", "01597 501161", "01232 637283", "01296 230679", "(07183) 151418", "(07995) 376450", 
            "(0208) 0511522", "+44 208 3960687", "(01544) 668176", "(07540) 940315", "0208 4137611", "(01472) 119737", "(0208) 6494623", 
            "(01156) 145807", "07731 566115", "(0207) 7270589", "(0207) 7542812", "(01205) 835056")
  )
Patrik_P
  • 3,066
  • 3
  • 22
  • 39
dido
  • 77
  • 1
  • 7
  • 1
    Can you edit the question to include data we can copy-paste into R? Try using dput(data) and pasting that into the question. Make sure to indent all code lines by 5 spaces so they appear as code as well, this will make your answers more readable – morgan121 Nov 20 '18 at 06:42
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Ronak Shah Nov 20 '18 at 06:43
  • So, what have you tried? – Andreas Nov 20 '18 at 06:44
  • @user10626943 I edited the data but is it in a form that you can paste it into a normal form? – dido Nov 20 '18 at 07:07
  • Not really, I'm just bored so I pasted into notpead, pressed enter before all 'IDxxx', saved it as a csv and read it in :P. Tbh your original data was better than what you have now... what you should do in future is `dput()` the whole dataframe and paste the output here. Also remember to indent as then it gets that nice grey background – morgan121 Nov 20 '18 at 07:09
  • what do you want really? What is your expected output? What do you want to delete and what do you want to maintain? – Onyambu Nov 20 '18 at 07:16
  • I want to re-organize the PHONENUM column. I have telephone numbers in defferent formats. I want to reform them into one format such; 01142 262574, 01119 963864 (first 5 digits, one digit space and 6 digits at last) – dido Nov 20 '18 at 07:25
  • @Onyambu Do you hae any idea or solution for me that I can apply? – dido Nov 20 '18 at 07:53
  • since I cannot copy paste this into R: I will roughly guide you. `sub('(\\d{5})','\\1 ',gsub('\\D','',sub('+44','0',your_data)))` – Onyambu Nov 20 '18 at 07:58

2 Answers2

1

Assuming your dataframe is called data you can clean up the phone numbers like this:

 library(stringi)
 data$PHONENUM <- stri_replace_all_fixed(data$PHONENUM, '+44', '0') #changes +44 to 0
 data$PHONENUM <- gsub("[^0-9.]", "", data$PHONENUM) # removes all white space and ()

Then you can order the phone numbers like this:

 data[order(data$PHONENUM), ]

Does that do what you need?

EDIT: don't need the lapply at all, those functions will do the whole list anyway

morgan121
  • 2,213
  • 1
  • 15
  • 33
  • Well I should fix the phone numbers' format as below; from this format: (01119) 963864 into this: 01119 963864, from this format: (01142) 262574 into this: 01142 262574 – dido Nov 20 '18 at 07:18
  • The first code changed the numbers, but when I enter the second code it doesn't give me any results as a list. – dido Nov 20 '18 at 07:31
  • It should just remove the white space and the brackets. It works for me, what version of R are you running? (use `session_info()` to help get that) – morgan121 Nov 20 '18 at 07:34
  • I use the version of R 3.5.1 – dido Nov 20 '18 at 07:41
  • hmm, me too. Maybe I've just misunderstood what you want as it seems to all work the way I would want it to – morgan121 Nov 20 '18 at 07:42
1

This might be useful as well:

TelephoneData$TelNr <- gsub("\\+44", "0", gsub("[() ]", "", TelephoneData$PHONE))   #replace +44 by 0, remove spaces and brackets
TelephoneData$TelNr <- gsub("([0-9]{5})(.*)", "\\1 \\2", TelephoneData$TelNr) #insert space after every 5 chars
TelephoneData <- TelephoneData[order(TelephoneData$TelNr ),] #sort by the column TelNr

Giving the result

#     FIRST           PHONE        TelNr
#1     STAN +44 1152 195298 01152 195298
#16  JOSHUA  (01156) 145807 01156 145807
#3     JOHN    01160 979447 01160 979447
#20   BRIAN  (01205) 835056 01205 835056
#5   ROBERT    01232 637283 01232 637283
#6    ANGIE    01296 230679 01296 230679
#14    MARY  (01472) 119737 01472 119737
#11     KEN  (01544) 668176 01544 668176
#4     VERA    01597 501161 01597 501161
#18 PHILLIP  (0207) 7270589 02077 270589
#19    KATE  (0207) 7542812 02077 542812
#9   JUDITH  (0208) 0511522 02080 511522
#10  TREVOR +44 208 3960687 02083 960687
#13  GLADYS    0208 4137611 02084 137611
#15    MARY  (0208) 6494623 02086 494623
#7     PAUL  (07183) 151418 07183 151418
#2    FIONA    07366 602865 07366 602865
#12   BRIAN  (07540) 940315 07540 940315
#17   BRIAN    07731 566115 07731 566115
#8   GEORGE  (07995) 376450 07995 376450

Hope this helps!

Patrik_P
  • 3,066
  • 3
  • 22
  • 39
  • 1
    I tried your code by adding TelephoneData$ into PHONEMUM as; sortedTelNr <- sort(gsub("\\+44\\s*", "0", gsub("[()]", "", TelephoneData$PHONENUM))) but it turns out as ; \+44\s* – dido Nov 20 '18 at 07:46
  • and did u get what u wanted? – Patrik_P Nov 20 '18 at 07:47
  • Unfortunately, no. It doesn't give me list of telephone numbers. – dido Nov 20 '18 at 07:50
  • Lets say you got a data.frame `data`. You got more columns there including the `PHONENUM`. Then you can do `data$TelNr <- gsub("\\+44\\s*", "0", gsub("[()]", "", data$PHONENUM))` and then sort it like `data <- data[order(data$TelNr ),]` – Patrik_P Nov 20 '18 at 07:55
  • I entered it as (my data.frame is called as TelephoneData) TelephoneData$TelNr <- gsub("\\+44\\s*", "0", gsub("[()]", "", TelephoneData$PHONENUM)) I received this warning message; In base::gsub(pattern = pattern, replacement = replacement, x = x, : argument 'replacement' has length > 1 and only the first element will be used What should I do? The PHONENUM column's elements in all rows turned into \+44\s . – dido Nov 20 '18 at 08:01
  • make `dput(TelephoneData[1:100,])` and copy paste the output into an edit in your question – Patrik_P Nov 20 '18 at 08:06
  • I edited as you mentioned. I have 157 rows, so I edited as 1:157 – dido Nov 20 '18 at 08:14
  • I took the liberty to make your question more readable and asjusted the answer accordingly. Hope now it will work as you would expect it to – Patrik_P Nov 20 '18 at 08:39
  • First I changed +44 with 0; TelephoneData$PHONENUM <- stri_replace_all_fixed(TelephoneData$PHONENUM, '+44', '0') Then, I have replaced blanks and paranthesis with gsub function; TelephoneData$PHONENUM <- gsub(x = TelephoneData$PHONENUM, pattern = "\\(", replacement = "") TelephoneData$PHONENUM <- gsub(x = TelephoneData$PHONENUM, pattern = "\\)", replacement = "") TelephoneData$PHONENUM <- gsub(x = TelephoneData$PHONENUM, pattern = " ", replacement = "") Now I have all the rows at PHONENUM column with 11 digits. But I should put a blank after 5 digits at every row? – dido Nov 20 '18 at 09:18
  • See the edit, I inlcuded code for inlcuding space after every 5 chars – Patrik_P Nov 20 '18 at 09:28