I make progress in cleaning data like this:
df1 <- data.frame(ID=(c("18.1010-2.570322","171114-238509","140808-3481906
","18055656193","180625-378224","190903-2793831 / -9311442 / -6810125","190808-625-6692","190 807 - 7941125","1807298087721Roland","19060881t1676")),
True_ID = c("181010-2570322","171114-2385039","190808-4381906","180556-5619343","180625-3782242", "190903-2793831 190903-9311442
190903-6810125", "190808-6256692","190807-7941125","180729-8087721","190608-8112676"))
The true value is like this: 190312-4184811. So there is a pattern first six integers are a date like 19 = 2019 03 = March and 12 = Day. And the other seven numbers are random. I cleaned a lot of non informative patterns, but here I dont know exactly how to deal with this many different.
I tried something like, but I think there can be a better way:
a = str_extract(data_file$IP_P,"(^|[ ])[:digit:]{6}\\-[:digit:]{7}([ ]|$)")
b = str_extract(data_file$IP_P,"(^|[ ])[:digit:]{5}\\-[:digit:]{7}([ ]|$)")
c = str_extract(data_file$IP_P,"(^|[ ])[:digit:]{4}\\-[:digit:]{7}([ ]|$)")
d = str_extract(data_file$IP_P,"(^|[ ])[:digit:]{6}\\-[:digit:]{6}([ ]|$)")
e = str_extract(data_file$IP_P,"(^|[ ])[:digit:]{6}\\-[:digit:]{5}([ ]|$)")
f = str_extract(data_file$IP_P,"(^|[ ])[:digit:]{6}\\-[:digit:]{4}([ ]|$)")
g = str_extract(data_file$IP_P,"(^|[ ])[:digit:]{6}\\-[:digit:]{8}([ ]|$)")
h = str_extract(data_file$IP_P,"(^|[ ])[:digit:]{6}\\-[:digit:]{9}([ ]|$)")
data_file["Extracted_i"] = NA
data1 <- data.frame(a,b,c,d,e,f,g,h)
data1 <- data1 %>% unite("z", a:h, remove = FALSE)
data_file["Extracted_i"] =gsub("[^0-9\\.\\-]", "", data1$z)