I'm looking at getting data from this PDFs.
I'm running into a problem, where location names with multiple words ("Northern Island" for example) are being put into different columns.
The "sep" argument within "read.table" seems to only be able to read a single space as a delimiter. Ideally, I'd like anything with more than one space to act as a delimiter. Is this at all possible?
url <- "C:/Users/files/PSSS Weekly Bulletin - W1 2019 (Dec 31-Jan 06).pdf"
# Convert the PDF to a text string
txt <- pdf_text(url)
# get the working directory
wd <- getwd()
#write the file to the working directory
file_name <- paste0(wd, "/", "temp.txt")
write(txt, file = file_name, sep = "\t")
# Convert to a table. Data is located starting line 25, and lasts 25 lines
# P.S: I've tried this code with and without the "sep" argument. No change.
dtaPCF <- read.table(file_name, skip = 24, nrows = 25, fill = TRUE, header = TRUE)
# Here is the text that I'd like to read.table with. Ideally, I'd want to keep the headers, but it's not a dealbreaker if that doesn't work.
Country/Area No. sites No. reported % reported AFR Diarrhoea ILI PF DLI
American Samoa 0 0 0% 0 0 0 0 0
Cook Islands 13 11 85% 0 3 3 0 0
FSM 4 3 75% 0 21 74 0 3
Fiji 0 0 0% 0 0 0 0 0
French Polynesia 31 16 52% 3 9 11 3 3
Guam 0 0 0% 0 0 0 0 0
Kiribati 7 7 100% 0 172 609 22 0
Marshall Islands 2 2 100% 0 4 0 2 0
N Mariana Is 7 7 100% 4 13 60 17 0
Nauru 0 0 0% 0 0 0 0 0
New Caledonia 0 0 0% 0 0 0 0 0
New Zealand 0 0 0% 0 0 0 0 0
Niue 0 0 0% 0 0 0 0 0
PNG 0 0 0% 0 0 0 0 0
Palau 0 0 0% 0 0 0 0 0
Pitcairn Islands 1 1 100% 0 0 0 0 0
Samoa 13 6 46% 0 262 606 18 4
Solomon Islands 13 4 31% 0 75 59 4 1
Tokelau 2 2 100% 0 2 9 0 0
Tonga 11 11 100% 0 17 73 0 0
Tuvalu 0 0 0% 0 0 0 0 0
Vanuatu 11 7 64% 0 49 171 0 1
Wallis & Futuna 0 0 0% 0 0 0 0 0