0

I have the following output from data that I have downloaded from the Wall Street Journal.

> Search(MySymList, " Net Income")
    Fiscal year is July-June. All values AUD Millions.   2018    2017    2016   2015 2014 5-year trend
82                             Consolidated Net Income    949     814     376    850  769             
86                                          Net Income    934     792     335    817  737             
88                                   Net Income Growth 18.04% 135.99% -58.93% 10.83%    -             
103                   Net Income After Extraordinaries    934     792     335    817  909             
107                     Net Income Available to Common    934     792     335    817  565      

I want to capture Net Income but as there is no consistency in where Net Income will be in the data (as in line number), I tried using library qdap and Search in particular. It does a wonderful job of finding most information but I am stumped with how to remove the other lines.

I thought that exclude might be helpful but it just doesn't seem to work.

Search(MySymList, " Net Income", exclude = "Common")
Error in agrep(term, x, ignore.case = TRUE, max.distance = max.distance,  : 
  unused argument (exclude = "Common")

I can get the Net Income by other means but I would prefer to do it with just one function, that being Search or anything that the library qdap might offer.

Any guidance would be most welcome.

EDIT!!

The cut down code is as follows as it is easier to run it than to provide data for it. The symbol is different from the original so the line numbers will have changed.

library(httr)
library(XML)
library(data.table)
library(qdap)
library(Hmisc)
getwsj.quotes <- function(Symbol) 
{
    MyUrl <- sprintf("https://quotes.wsj.com/AU/XASX/%s/financials/annual/income-statement", Symbol)
        Symbol.Data <- GET(MyUrl)   
        x <- content(Symbol.Data, as = 'text')
        wsj.tables <- sub('cr_dataTable cr_sub_capital', '\\1', x)
        SymData <- readHTMLTable(wsj.tables)
        return(SymData)       
}
TickerList <- c("AMC")
SymbolDataList <- lapply(TickerList, FUN = getwsj.quotes)
MySymList <- data.frame()
MySymList <- SymbolDataList[[1]][[2]]
Search(MySymList, " Net Income")

Regards Stephen

Stephen
  • 1
  • 3
  • Won't be able to do much unless you give use the output of `dput(MySymList[ 125:135, ])`. The text in the close vote says it all: "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself." (Sorry, 2 out of 3 does not satisfy the exacting requirements of SO.) – IRTFM Mar 02 '19 at 08:54
  • My question is about using exclude which is part of the qdap library Search function. The Error I posted says it all and I thought I made that clear. However, if you are trying to offer your guidance by using something else, I wish you would say so as I dropped out of my law degree's clairvoyancy unit.. The dput information is a lot of information and will easily go beyond the characters available - too long by 3700 characters. – Stephen Mar 02 '19 at 09:32
  • Mr 42, I don't mean to be disrespectful. If you would like to look at that dput output, please run the snippet of code and have a look. Somewhere in the library qdap, there is probably some method of doing what I want. – Stephen Mar 02 '19 at 09:57
  • Actually `exclude` is NOT "part of the `qdap::Search` function. The help page describes 2 different functions and the Usage listing for Search has no `exclude` parameter. – IRTFM Mar 02 '19 at 19:33
  • Right you are @42. I somehow got Search and boolean_search melded into one. In a UNIX shell, I could easily get what I wanted with grep but in R, it is for some reason not as easy. I need to read up on them. – Stephen Mar 02 '19 at 22:10
  • As I can get some output with my Search function, I was wondering if I could capture the line number with only `Net Income`. Eg, in the example above, Net Income is on line 86. `MySymList[ 86:86, ]` is what I am after. The question is how to I read a line number? – Stephen Mar 02 '19 at 22:40
  • I would try R's grep: `grep("^Net Income$", MySymList[[1]])` . See `?regex` and `?grep`. The thing toi realize (although it doesn' affect this application is that regex's escape is the same as R's escape so it often needs to be doubled in the pattern. – IRTFM Mar 02 '19 at 23:52
  • Looks as though two methods provide the same answer which is not unusual. Thanks @42. – Stephen Mar 03 '19 at 00:03

1 Answers1

0

I have made a breakthrough but it might not be the most efficient code. Giving a short name to the first column helped a lot. The function which provides an exact match function for searching. Alas, I cannot answer my own question about the library qdap Search function.

library(httr)
library(XML)
library(data.table)
library(qdap)
library(Hmisc)
getwsj.quotes <- function(Symbol) 
{
    MyUrl <- sprintf("https://quotes.wsj.com/AU/XASX/%s/financials/annual/income-statement", Symbol)
        Symbol.Data <- GET(MyUrl)   
        x <- content(Symbol.Data, as = 'text')
        wsj.tables <- sub('cr_dataTable cr_sub_capital', '\\1', x)
        SymData <- readHTMLTable(wsj.tables)
        return(SymData)       
}
TickerList <- c("BHP")
SymbolDataList <- lapply(TickerList, FUN = getwsj.quotes)
MySymList <- data.frame()
MySymList <- SymbolDataList[[1]][[2]]
Search(MySymList, " Net Income") # purely for testing what is available.
names(MySymList) <- c("FinElement", "2018", "2017", "2016", "2015", "2014", "5-year trend")
lineNo <- which(MySymList$FinElement == "Net Income")
MySymList[ lineNo:lineNo, ]

The output is:

> Ratio  2018  2017    2016  2015   2014 5-year trend
91 Net Income 8,585 8,453 (8,774) 4,109 14,775 

Thanks to everyone who considered this problem. Regards Stephen

Stephen
  • 1
  • 3