Read it in using readLines
and remove the first (^"
) and last ("$
) double quote and also any double quote followed by another double quote ("(?=")
) creating L
. Then use read.table
to read L
specifying as.is=TRUE
to get "character"
and "numeric"
columns.
L <- gsub('^"|"$|"(?=")', '', readLines("albania_+.csv"), perl = TRUE)
DF <- read.csv(text = L, as.is = TRUE)
giving:
> str(DF)
'data.frame': 544 obs. of 10 variables:
$ Country.or.Area: chr "Albania" "Albania" "Albania" "Albania" ...
$ Year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
$ Area : chr "Urban" "Urban" "Urban" "Urban" ...
$ Sex : chr "Female" "Female" "Female" "Female" ...
$ Age : chr "Total" "0 - 4" "5 - 9" "10 - 14" ...
$ Record.Type : chr "Estimate - de facto" "Estimate - de facto" "Estimate - de facto" "Estimate - de facto" ...
$ Reliability : chr "Final figure, complete" "Final figure, complete" "Final figure, complete" "Final figure, complete" ...
$ Source.Year : int 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
$ Value : num 763925 39796 42761 55894 68627 ...
$ Value.Footnotes: logi NA NA NA NA NA NA ...
Here is a visualization of the regular expression:
^"|"$|"(?=")

Debuggex Demo