2

I am cleaning up a dataset with a character variable like this:

df <- c("2015  000808", "2013  000041", "2015  000005", "2015  301585", "2015  311585", "2014  380096", "2013  100041")

So I can achieve this result, where the 000s in front of the second number are removed and each number is pasted together:

"2015808"
"201341"
"20155"
"2015301585"
"2015311585"
"2014380096"
"2013100041"

I am stuck trying to find the best way to remove the 0s that occur before the number in the second part of the string. I have looked at gsub and substring but I am bit confused how to remove a pattern of zeros based on their position as well as on conditions? Something along the lines of "remove one or more zeros only if they precede number 1-9 and are in position 7-11".

lenlen
  • 79
  • 7

2 Answers2

4

We may use read.table to read as two columns (which automatically reads as numeric columns by using the default space separator and numeric class won't allow 0 prefix, thus strips off the 0s) and then paste the data.frame columns by row by using do.call

do.call(paste0, read.table(text = df, header = FALSE))
[1] "2015808"    "201341"     "20155"      "2015301585" "2015311585" "2014380096" "2013100041"

Or with sub - match one or more spaces (\\s+) followed by zero or more (*) 0, and replace with blank ("")

sub("\\s+0*", "", df)
[1] "2015808"    "201341"     "20155"      "2015301585" "2015311585" "2014380096" "2013100041"
akrun
  • 874,273
  • 37
  • 540
  • 662
3

While akrun's approach is the one that should be used. Here is stringr composition:

  1. with word(df, 1) we take the left part of the string
  2. with word(df, -1) we take the right part (here we use 2a. str_remove_all with regex ^0+ to remove leading zeros.
  3. Finally we use str_c to combine both parts:
library(stringr)
str_c(word(df,1), str_remove_all(word(df, -1), '^0+'))
[1] "2015808"    "201341"     "20155"      "2015301585" "2015311585" "2014380096" "2013100041"
TarJae
  • 72,363
  • 6
  • 19
  • 66