I've encountered a simple issue on a dataset i'm working on right now that uses written text a you would see much of social media where people sensibly use commas on their writing process. The whole text is in column 1 on the dataset followed by a date column and so on. The data are in .xls format, separated by commas, and each cell is then placed inside parantheses. It would look like this:
"Come and get around, we have ice cream!", "2021-02-02", "lorem ipsum"
Using comma as the separator yields one extra column than it should have.
I used the normal read table function and couldn't wrap my head if i needed to use a regex or where would i put it.
Any tips are apreciated!
EDIT:
Here's an example of the dataset and the simple code I ran
These are the first two lines of the raw xls:
"Text","Time of posting","Reach","Comments" |
---|
"Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur?","2020-11-15T18:23:32","28360","5689" |
Using the import tool on Rstudio for xls gave me no options for separators, so I used read.table and got the same dataset on .csv, the code was as follows:
header = TRUE,
sep=',',
skip= 5)´´´
It resulted in every single comma generating a new a new column, when what i actually want is just for commas outside the parentheses to generat new columns.