0

I am using tq_get from the tidyquant package. The SAME incantation yields data with differing TIME ranges. Why is this?

Here is a reprex:-

library(tidyquant)

> tq_get("AACG")
# A tibble: 2,666 x 8
   symbol date        open  high   low close volume adjusted
   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
 1 AACG   2010-01-04  4.65  4.75  4.65  4.7    1100     4.7 
 2 AACG   2010-01-05  4.74  4.74  4.65  4.65    300     4.65
 3 AACG   2010-01-06  4.51  4.51  4.51  4.51    100     4.51
 4 AACG   2010-01-07  4.52  4.52  4.52  4.52    200     4.52
 5 AACG   2010-01-08  4.74  4.75  4.53  4.75   1700     4.75
 6 AACG   2010-01-11  4.75  4.75  4.25  4.3    8800     4.3 
 7 AACG   2010-01-12  4.3   4.3   4.3   4.3       0     4.3 
 8 AACG   2010-01-13  4.52  4.52  4.49  4.49    400     4.49
 9 AACG   2010-01-14  4.32  4.32  4.32  4.32    200     4.32
10 AACG   2010-01-15  4.42  4.5   4.42  4.45   2800     4.45
# … with 2,656 more rows
> tq_get("AACG")
# A tibble: 356 x 8
   symbol date        open  high   low close volume adjusted
   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
 1 AACG   2019-03-11  1.08  1.1   1.08  1.08  32100     1.08
 2 AACG   2019-03-12  1.08  1.09  1.05  1.05  20200     1.05
 3 AACG   2019-03-13  1.06  1.08  1.04  1.07  23100     1.07
 4 AACG   2019-03-14  1.06  1.11  1.06  1.08  29900     1.08
 5 AACG   2019-03-15  1.06  1.08  1.04  1.04  30900     1.04
 6 AACG   2019-03-18  1.07  1.07  1.03  1.07  48800     1.07
 7 AACG   2019-03-19  1.08  1.08  1     1.06 122800     1.06
 8 AACG   2019-03-20  1.04  1.04  1     1.01  41000     1.01
 9 AACG   2019-03-21  1.01  1.03  1.01  1.03  10200     1.03
10 AACG   2019-03-22  1.03  1.03  1     1.03  29100     1.03
# … with 346 more rows
> 

Why does tq_get return data from 2010 / 2019 for the same incantation? Can someone please help me ?

Here is my sessionInfo @ akrun. Is this a version issue ? Should I try upgrading my tidyquant or get the latest R ?

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_IN       LC_NUMERIC=C         LC_TIME=en_IN       
 [4] LC_COLLATE=en_IN     LC_MONETARY=en_IN    LC_MESSAGES=en_IN   
 [7] LC_PAPER=en_IN       LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readxl_1.3.1               forcats_0.4.0             
 [3] stringr_1.4.0              dplyr_1.0.1               
 [5] purrr_0.3.3                readr_1.3.1               
 [7] tidyr_1.1.1                tibble_3.0.3              
 [9] ggplot2_3.2.1              tidyverse_1.3.0           
[11] tidyquant_1.0.1            quantmod_0.4-15           
[13] TTR_0.23-6                 PerformanceAnalytics_2.0.4
[15] xts_0.12-0                 zoo_1.8-7                 
[17] lubridate_1.7.4           

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3         lattice_0.20-38    class_7.3-15       utf8_1.1.4        
 [5] assertthat_0.2.1   ipred_0.9-9        R6_2.4.1           cellranger_1.1.0  
 [9] backports_1.1.5    reprex_0.3.0       httr_1.4.1         pillar_1.4.6      
[13] rlang_0.4.7        lazyeval_0.2.2     curl_4.3           rstudioapi_0.10   
[17] rpart_4.1-15       Matrix_1.2-18      splines_3.6.2      gower_0.2.2       
[21] timetk_2.2.0       munsell_0.5.0      broom_0.5.3        compiler_3.6.2    
[25] modelr_0.1.5       pkgconfig_2.0.3    nnet_7.3-12        tidyselect_1.1.0  
[29] prodlim_2019.11.13 quadprog_1.5-8     fansi_0.4.1        crayon_1.3.4      
[33] dbplyr_1.4.2       withr_2.1.2        MASS_7.3-51.5      recipes_0.1.13    
[37] grid_3.6.2         Quandl_2.10.0      nlme_3.1-143       jsonlite_1.6      
[41] gtable_0.3.0       lifecycle_0.2.0    DBI_1.1.0          magrittr_1.5      
[45] scales_1.1.0       cli_2.0.1          stringi_1.4.6      fs_1.3.1          
[49] timeDate_3043.102  xml2_1.2.2         ellipsis_0.3.0     generics_0.0.2    
[53] vctrs_0.3.2        lava_1.6.7         tools_3.6.2           glue_1.4.1        
[57] hms_0.5.2          survival_3.1-8     colorspace_1.4-1   rvest_0.3.5       
[61] haven_2.2.0       
> 

@Matt, the "from" functionality also seems to be not working on repeated incantations. Please see this (it gives me 2018 data/2019 data ) :-

> tq_get("AACG",from = "2018-01-01")
# A tibble: 653 x 8
   symbol date        open  high   low close volume adjusted
   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
 1 AACG   2018-01-02  4.65  5.5   4.65  5.43  44800     5.43
 2 AACG   2018-01-03  5.33  5.34  5.15  5.34   3000     5.34
 3 AACG   2018-01-04  5.22  5.45  5.22  5.4   11100     5.4 
 4 AACG   2018-01-05  5.39  5.72  5.21  5.4   17400     5.4 
 5 AACG   2018-01-08  5.49  5.5   5.27  5.27   2600     5.27
 6 AACG   2018-01-09  5.3   5.33  5.3   5.32   2500     5.32
 7 AACG   2018-01-10  5.22  5.51  5.22  5.31   1800     5.31
 8 AACG   2018-01-11  5.39  5.49  5.15  5.15   3400     5.15
 9 AACG   2018-01-12  5.25  5.5   5.25  5.49   6300     5.49
10 AACG   2018-01-16  5.55  5.65  5.55  5.62  17000     5.62
# … with 643 more rows
> tq_get("AACG",from = "2018-01-01")
# A tibble: 356 x 8
   symbol date        open  high   low close volume adjusted
   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
 1 AACG   2019-03-11  1.08  1.1   1.08  1.08  32100     1.08
 2 AACG   2019-03-12  1.08  1.09  1.05  1.05  20200     1.05
 3 AACG   2019-03-13  1.06  1.08  1.04  1.07  23100     1.07
 4 AACG   2019-03-14  1.06  1.11  1.06  1.08  29900     1.08
 5 AACG   2019-03-15  1.06  1.08  1.04  1.04  30900     1.04
 6 AACG   2019-03-18  1.07  1.07  1.03  1.07  48800     1.07
 7 AACG   2019-03-19  1.08  1.08  1     1.06 122800     1.06
 8 AACG   2019-03-20  1.04  1.04  1     1.01  41000     1.01
 9 AACG   2019-03-21  1.01  1.03  1.01  1.03  10200     1.03
10 AACG   2019-03-22  1.03  1.03  1     1.03  29100     1.03
# … with 346 more rows
> 
user2338823
  • 501
  • 1
  • 3
  • 16
  • I get only `356 * 8` on multiple calls i.e couldn't reproduce the first one – akrun Aug 06 '20 at 06:59
  • Can't reproduce. I only get the first result. By default `tq_get` picks up 10 years worth of data. In this case from 2010-01-01 until today. – phiver Aug 06 '20 at 07:23
  • So if I understand correctly, akrun is getting the second result and phiver is getting the first one ? – user2338823 Aug 06 '20 at 07:26

1 Answers1

0

Something has changed with this ticker. Yahoo Finance is the source, and it appears that only data from 2019-03-11 until current is available. You can test this by providing a from = "2011-01-01 to specify the beginning date range. Yahoo Finance does not appear to have data before "2019-03-11".

enter image description here

> tq_get("AACG", from = "2011-01-01")
# A tibble: 356 x 8
   symbol date        open  high   low close volume adjusted
   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
 1 AACG   2019-03-11  1.08  1.1   1.08  1.08  32100     1.08
 2 AACG   2019-03-12  1.08  1.09  1.05  1.05  20200     1.05
 3 AACG   2019-03-13  1.06  1.08  1.04  1.07  23100     1.07
 4 AACG   2019-03-14  1.06  1.11  1.06  1.08  29900     1.08
 5 AACG   2019-03-15  1.06  1.08  1.04  1.04  30900     1.04
 6 AACG   2019-03-18  1.07  1.07  1.03  1.07  48800     1.07
 7 AACG   2019-03-19  1.08  1.08  1     1.06 122800     1.06
 8 AACG   2019-03-20  1.04  1.04  1     1.01  41000     1.01
 9 AACG   2019-03-21  1.01  1.03  1.01  1.03  10200     1.03
10 AACG   2019-03-22  1.03  1.03  1     1.03  29100     1.03
# ... with 346 more rows
Matt Dancho
  • 6,840
  • 3
  • 35
  • 26
  • `tq_get("AACG",from = "2019-04-01")` on repeated incantations DOES return the same response. So I guess you want me to check Yahoo finance for the starting date for each ticker before I use the API. Unfortunately I have to scale this to the entire exchange. So I'll come to another query, related to this, how can I find the first date available for each symbol in the exchange programmatically? – user2338823 Aug 06 '20 at 11:52
  • You can use something like data %>% group_by(symbol) %>% summarize(first_date = first(date)) – Matt Dancho Aug 06 '20 at 12:17
  • No we can't. Suppose I do `tq_get("AACG")` and try looking for the first day, it may give me 2010-01-04 OR 2019-03-11. Please see the original query. We need to eyeball the yahoo finance page to see when it starts. Only then I can be certain I don't have a spurious value. Am I mistaken? – user2338823 Aug 06 '20 at 12:21
  • This is a simple data wrangling exercise. Take 5 stocks, 4 of which start on "2011-01-01", one of which starts on "2019-03-11". Pull in the data with c("stock1", "stock2", ...) %>% tq_get(from = "2011-01-01"). Store these values as stock_data_tbl. Then take your stock_data_tbl %>% group_by(symbol) %>% summarize(first_date = first(date)). This gives you your 1st dates for each symbol. Just remove the ones that don't start the same date you are expecting. – Matt Dancho Aug 06 '20 at 12:24
  • The "from" functionality is also broken. Please see bottom of my original query. On one incantation it starts from the expected date, on the second it does not. On the first incantation it will yield 2018-01-02 and on the second incantaion it will yield 2019-03-11.The only way left is, I do multiple incantations for the same symbol and hope that both the different dates show up. That's the only way of detecting the true first date of a symbol. But then we are counting on the bad behavior of the software. – user2338823 Aug 06 '20 at 13:09
  • I don't understand. If the data is available, it returns it. If no data is available, what should happen? Should the function error out? My preference is to return all data that is available through `from = "2011-01-01"` - when Yahoo Finance does not have data, it returns the maximum data that Yahoo has. For your example, this is "2019-03-11" through today. So from my perspective, it's working as expected. – Matt Dancho Aug 06 '20 at 21:21
  • Dear Matt, this ticker has data only from 2019-03-11. At the bottom of my original query,on the first incantation of tq_get with a from="2018-01-01", it is returning some data from 2018-01-01. Is that fake data? What am I supposed to think about this data when it should only start from 2019-03-11 ? On the second (SAME) incantation with a from argument, it returns data from 2019-03-11 (which to me is correct behavior). But if I do ONE incantation with a from, how do I decide that the starting date is correct. – user2338823 Aug 07 '20 at 04:47
  • This problem is on YAHOO'S end. Not tidyquant. I don't know why you received data differently, but the code has not changed. And I get the same code (from "2019-03-11") every time I run it. It takes 2 to tango and the data comes from yahoo. Tidyquant just puts it into a data frame. So your issue is with Yahoo, not tidyquant. If you want more reliability, you might try Quandl, Alphavantage, Tiingo, or one of the other data sources that tidyquant delivers data from. – Matt Dancho Aug 07 '20 at 21:34