1

I am having some issues reading a txt file in R that I presume is created by end of line issues. I have a dataframe that contains two columns: the first column contains a label in the form __label__1234 and the second column is a string of text (I can't share what the actual text is for privacy reasons, but it would be something like, I WORK AT MACDONALDS). I first use write.table to store this information in a text file as so

write.table(test,"test.txt",sep="\t",quote=FALSE,row.names=FALSE,col.names=FALSE

I then attempt later to read that text file back into R and I get undesireable results.

data<-read.table("test.txt",header=FALSE,sep="\t")

The data comes out looking similar to

           V1                V2
1 __label__001 I WORK AT WENDYS
2 __label__002 I WORK AT BK
3 __label__001 I WORK AT WENDYS\n__label__002\tI WORK AT BK\n__label__003\tI WORK AT FIVE GUYS

When what I desire is

           V1                V2
1 __label__001 I WORK AT WENDYS
2 __label__002 I WORK AT BK
3 __label__001 I WORK AT WENDYS
4 __label__002 I WORK AT BK
5 __label__003 I WORK AT FIVE GUYS

Any idea what I can change either in the read or the write to fix this?

astel
  • 192
  • 7
  • 1
    I don't understand why, so maybe someone with more knowledge can put it as an answer but adding quote="" to my read.table command gave me the desired result – astel Feb 16 '21 at 03:26

1 Answers1

0

Must have data reasons. I simulated your example dataset

V1 = paste0("__label__00",c(1,2,1,2,3))
V2 = paste("I WORK AT",c("WENDYS","BK","WENDYS","BK","FIVE GUYS"))
test = data.frame(V1,V2)

and executed your write/read commands

write.table(test,"test.txt",sep="\t",quote=FALSE,row.names=FALSE,col.names=FALSE)
data = read.table("test.txt",header=FALSE,sep="\t")

which gave me your desired output. I was not able to reproduce your undesired output. So I suggest that you look for differences in your data compared to mine.

  • Thanks for giving it a go. Definitely has to do with the data. The actual data is a lot messier, different characters, spacing etc. but it all looks fine, in that I can't see anything in the data that would cause it. And again, due to the sensitivity of the data I can't share the actual data. – astel Feb 16 '21 at 07:45