I have a corpus containing two text files that I imported as:
temp = list.files(pattern = ".txt")
mydata = lapply(temp, read.delim, sep ="\t", quote = "")
mydata
the output class was list but I converted it to character as:
class(mydata)
list
mydata <- as.character(mydata)
the texts are of the character class:
class(mydata)
[1] "character"
but it seems they are character strings as the output first shows:
[[1]]ï..We.give.the.observer.as.much.time.as.he.wants.to.make.his.response..we.simply.increase.the.number.of.alternative.stimuli.among.which.he.must.
(the above line is just an example of one of the texts); then it prints the actual texts as they are each sentence on a separate line, e.g., :
ï..this.is.just.a.bunch.of.crab.to.analyse.
1 I need to understand how this R package works.
2 lexical diversity needs to be analysed for two texts for now.
3 In this document I am typing each sentence on a separate line.
I need to have this texts converted as character vector for the next step of the analysis to convert them to ASCII with the help of stringi package in R, e.g., :
stri_enc_toascii(mydata)
--this package only converts character vector to ascii encoding. So the question is:
--How to convert a corpus of character string to vector?
P.S: I have already reviewed all other questions in StackOverflow to avoid a duplicate question. Thanks for your help!
Thanks guys for your help! I simply used the as.vector to convert the character string to character vector:
as.vector(mydata)
is.vector(mydata)
TRUE
But the main problem remains: I wanted a character vector as input for the stringi package and the stri_enc_toascii(mydata) function to convert mydata to ASCII encoding (check here, but the encoding still shows unknown. Is there any straightforward way to convert an "unknown" encoding to "ascii"?