R: How to split string into pieces

Question

I'm trying to split tons of strings as below:

x = "�\001�\001�\001�\001�\001\002CN�\001\bShandong�\001\004Zibo�\002$ABCDEFGHIJK�\002\aIMG_HAS�\002�\002�\002�\002�\002�\002�\002�\002\02413165537405763268743�\002\001�\002�\002�\002�\003�\003�\003����\005�\003�\003�\003�\003"

into four pieces

'CN', 'Shandong', 'Zibo', 'ABCDEFGHIJK'

I've tried

stringr::str_split(x, '\\00.')

which output the origin x. Also,

trimws(gsub("�\\00?", "", x, perl = T))

which only removes the unknown character �.

Could someone help me with this? Thanks for doing so.

Ronak Shah · Accepted Answer · 2020-12-28T07:23:49.933

2

You can try with str_extract_all :

stringr::str_extract_all(x, '[A-Za-z_]+')[[1]]
[1] "CN"          "Shandong"    "Zibo"        "ABCDEFGHIJK" "IMG_HAS"

With base R :

regmatches(x, gregexpr('[A-Za-z_]+', x))[[1]]

Here we extract all the words with upper, lower case or an underscore. Everything else is ignored so characters like �\\00? are not there in final output.

edited Dec 28 '20 at 07:23

answered Dec 28 '20 at 06:19

Ronak Shah

377,200
20
156
213

You are amazing! It works. Could you explain a little bit about the code? – Zhenyu Wu Dec 28 '20 at 07:11
I added some explanation of the code. Hope that helps. – Ronak Shah Dec 28 '20 at 07:25

score 0 · Answer 2 · answered Dec 28 '20 at 16:37

0

We can use strsplit from base R

setdiff(strsplit(x, "[^A-Za-z]+")[[1]], "")
#[1] "CN"          "Shandong"    "Zibo"        "ABCDEFGHIJK" "IMG"         "HAS"

answered Dec 28 '20 at 16:37

akrun

874,273
37
540
662

R: How to split string into pieces

2 Answers2