I have the following output from data that I have downloaded from the Wall Street Journal.
> Search(MySymList, " Net Income")
Fiscal year is July-June. All values AUD Millions. 2018 2017 2016 2015 2014 5-year trend
82 Consolidated Net Income 949 814 376 850 769
86 Net Income 934 792 335 817 737
88 Net Income Growth 18.04% 135.99% -58.93% 10.83% -
103 Net Income After Extraordinaries 934 792 335 817 909
107 Net Income Available to Common 934 792 335 817 565
I want to capture Net Income
but as there is no consistency in where Net Income
will be in the data (as in line number), I tried using library qdap
and Search
in particular. It does a wonderful job of finding most information but I am stumped with how to remove the other lines.
I thought that exclude
might be helpful but it just doesn't seem to work.
Search(MySymList, " Net Income", exclude = "Common")
Error in agrep(term, x, ignore.case = TRUE, max.distance = max.distance, :
unused argument (exclude = "Common")
I can get the Net Income
by other means but I would prefer to do it with just one function, that being Search
or anything that the library qdap
might offer.
Any guidance would be most welcome.
EDIT!!
The cut down code is as follows as it is easier to run it than to provide data for it. The symbol is different from the original so the line numbers will have changed.
library(httr)
library(XML)
library(data.table)
library(qdap)
library(Hmisc)
getwsj.quotes <- function(Symbol)
{
MyUrl <- sprintf("https://quotes.wsj.com/AU/XASX/%s/financials/annual/income-statement", Symbol)
Symbol.Data <- GET(MyUrl)
x <- content(Symbol.Data, as = 'text')
wsj.tables <- sub('cr_dataTable cr_sub_capital', '\\1', x)
SymData <- readHTMLTable(wsj.tables)
return(SymData)
}
TickerList <- c("AMC")
SymbolDataList <- lapply(TickerList, FUN = getwsj.quotes)
MySymList <- data.frame()
MySymList <- SymbolDataList[[1]][[2]]
Search(MySymList, " Net Income")
Regards Stephen