0

Cleaning the following web scraped data and getting vectors without proper spacing in consistent places:

" SharePriceNAVPremium/Discount" "Current$21.26$20.901.72%" "52 Wk Avg$24.41$23.245.05%" "52 Wk High$28.00$25.0518.09%"
"52 Wk Low$18.52$19.11-4.92%" ""

I'm trying to get the data to look like this:

"SharePrice NAV Premium/Discount" "Current $21.26 $20.90 1.72%" "52WkAvg $24.41 $23.24 5.05%" "52WkHigh $28.00 $25.05 18.09%"
"52WkLow $18.52 $19.11 -4.92%"

The issue I'm encountering are how to conditionally add a whitespace after the "$" plus 4 numbers (since that appears to be the consistent price convention used here).

Have tried str_pad and str_replace_all without universal success. Any help is appreciated!

Here is my script:

library(rvest)
library(stringr)

CEF_Page <- read_html("https://www.cefconnect.com/fund/JLS")

test9 <- CEF_Page %>%
        html_nodes("#ContentPlaceHolder1_cph_main_cph_main_SummaryGrid") %>% 
        html_text() %>%
        strsplit(split = "\n") %>%
        unlist() %>%
        .[. != " "]


test9 <- str_replace_all(test9,pattern = "\t", replacement = "") 
test9 <- str_replace_all(test9,pattern = "\r", replacement = "")
js80
  • 385
  • 2
  • 11
  • Perhaps, create a data.frame `out <- read.csv(text= sub('-', ',', gsub("([$])", ",\\1", test9[-1])), header = FALSE, strip.white = TRUE, stringsAsFactors = FALSE)` – akrun Jan 10 '19 at 18:09
  • This works: test9 <- sub("\\s+$", "", gsub('(\\$.{5})', '\\1 ', test9)) – js80 Jan 10 '19 at 21:07

0 Answers0