Given a ;
delimited file of structure:
colA; colB; colC
1;A; 10
2;B; 11
3;C"; 12
4;D""; 15
5;"F";20
6;K"""; 21
7;""M";22
8; \""O;23
I would like to ensure that colB
is always imported verbatim as a character string. In particular, I would like to preserve all values including ""M"
and \""O
.
Attempt
I'm currently trying:
require(readr)
tst_dta <- read_delim(
file = "test_file.csv",
escape_double = FALSE,
delim = ";",
col_types = cols(
colA = col_integer(),
colB = col_character(),
colC = col_integer()
)
)
but this returns:
> tst_dta
# A tibble: 8 x 3
colA colB colC
<int> <chr> <int>
1 1 A 10
2 2 B NA
3 3 "C\"" 12
4 4 "D\"\"" 15
5 5 F 20
6 6 "K\"\"\"" 21
7 7 "\"\"M\"" 22
8 8 " \\\"\"O" 23
Desired rsults
The desired results should reflect:
colA colB colC
<int> <chr> <int>
1 A 10
2 B 11
3 C" 12
4 D"" 15
5 "F" 20
6 K""" 21
7 ""M" 22
8 \""O 23
Other points:
- Ideally, I would also like to ensure that non-ASCII characters are ignored in a manner that value
\""[Non-ASCII-Character]O
would appear in the resulting data frame as\""O
string.
Updates
As per comments, more examples:
is:
colA; colB; colC
1; text \" text; 2
should be:
colA;colB;colC
1;text text;2
is:
colA; colB; colC
1; text \;" text; 2
should be:
colA;colB;colC
1;text text;2
is:
colA; colB; colC
1; [non-ASCII] text something \;" text; 2
should be:
colA;colB;colC
1;text something;2