1

I've had a ton of help recently from the SO community and I'd first just like to say thank you to everyone!

My latest Google Sheet pursuit is querying sec.gov for the latest filing for a given ticker. I'm not trying to scrape the site, I just want to pull in the latest filing so I can alert myself as to when a company has filed something new with the SEC.

I'm currently doing this for each ticker by way of importhtml and index:

index(
importhtml("https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="&A2&"&owner=include&count=100",
"table",3),2)

... where the ticker is in cell A2. However this has been working inconsistently because I'm continually trying to do this for over 2500 tickers. I've noticed that importhtml runs into issues when there are so many calls at once.

Is there some way to automate this via Google Scripts so that I can call in the latest filing (or latest 5 filings or so) on a nightly basis? I'm plenty familiar with Google Scripts and triggers I just don't know how to get around that importhtml limitation, and how to limit my script to only the latest ~5 filings so as to not overwhelm my spreadsheet. Just need a gentle nudge in the right direction.

Thank you!

  • Can I ask you about the detail of `the latest filing (or latest 5 filings or so)`? – Tanaike Apr 25 '20 at 01:26
  • Hi again Tanaike! First I should say, even if I could only pull the first row/filing, that would already be a huge help because then I could immediately tell that something new was filed ("date" is one of the columns on the SEC page) and then I can go investigate it... and this "alert" aspect is really my primary reason for posting this question! But I was thinking it could be helpful if I could pull the latest 3-5 filngs (i.e., the first 3-5 rows of the table on the corresponding SEC page) because sometimes companies file multiple documents on the same day. – SOtoTheRescue Apr 25 '20 at 02:57
  • Thank you for replying. I have to apologize for my poor English skill. From your replying, I couldn't understand about `the latest filing (or latest 5 filings or so)`. – Tanaike Apr 25 '20 at 03:10
  • No need to apologize! I'll try to clarify. At this URL: https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=AAPL&owner=include&count=100 I'm trying to pull into Google Sheets the very first row of table 3... which would be: "4" "Document" "Statement..." "2020-04-17" And I would like to execute that formula for a series of 2500 companies. – SOtoTheRescue Apr 25 '20 at 03:31
  • Thank you for replying. From your replying, I could understand that you want to retrieve the values of `"4" "Document" "Statement..." "2020-04-17"` the URL. From `I would like to execute that formula for a series of 2500 companies.`, I cannot still understand about `the latest filing (or latest 5 filings or so)`. Can I ask you about the detail of the goal you expect? – Tanaike Apr 25 '20 at 03:32
  • Latest filing = row 1 of the table. At the very least I’d like to pull row 1 into my Google Sheet. It might also be helpful, but not necessary, to pull in the first 5 rows of the table. – SOtoTheRescue Apr 25 '20 at 03:41
  • Thank you for replying. The formulas like `=index(importhtml(URL,"table",3),2)` are put to the cells "A1:A", and you want to retrieve the values from row 1 to row 5 using Google Apps Script. Is my understanding correct? If it's so, is it required to refresh the values of formulas, when the values are retrieved? – Tanaike Apr 25 '20 at 03:43
  • Yes that’s correct! Yes the values of the formulas would need to be refreshed when the values are retrieved. – SOtoTheRescue Apr 25 '20 at 03:52
  • Thank you for replying. I propsed a sample script as an answer. Could you please confirm it? If that was not the direction you want, I apologize. – Tanaike Apr 25 '20 at 04:07
  • Thank you Tanaike! I’m not at my computer right now but I’ll be sure to test your script as soon as I sit down at my desk. Cheers! Have a nice night (or day depending on where you are!) – SOtoTheRescue Apr 25 '20 at 04:11
  • Thank you for replying. From your replying, I could understand that my proposals cannot resolve your issue. So I have to delete my answer. This is due to my poor skill. I deeply apologize for this. – Tanaike Apr 26 '20 at 01:24
  • @Tanaike you have tremendous skill! Thank you for your help - I truly appreciate it! – SOtoTheRescue Apr 26 '20 at 04:02
  • Thank you for replying. I think that I have to study more and more for thinking various situations. – Tanaike Apr 26 '20 at 08:25
  • I don't understand. Did @Tanaike managed to solve your problem or not? It seems like a miscommunication problem. Did Tanaike deleted his answer? – Raserhin Apr 27 '20 at 14:10
  • Yes it seemed to be a miscommunication. The solution didn't solve the problem so Tanaike deleted the answer. I believe my problem is rooted in Sheet's importhtml limitations, so a script that staggers the queries would probably be the solution but I don't know how to approach it. – SOtoTheRescue Apr 27 '20 at 17:46
  • Could I ask what was failing about @Tanaike 's code? Because maybe we both are having the same approach. – Raserhin May 01 '20 at 13:49
  • @Raserhin The flow of 1st sample script is as follows. 1. Replace the formula in the cells "A1:A5" of the source sheet. By this, the formulas are refreshed. 2. Retrieve the values from the refreshed formulas. 3. Put the values to the destination sheet. The flow of 2nd sample script is as follows. 1. 5 formulas are retrieved from the source sheet. 2. The formulas are put to a temporal sheet. 3. The result values are retrieved from the temporal sheet. – Tanaike May 01 '20 at 22:07
  • @Raserhin I appreciate the help here! The problem with refreshing the formula is that the formula isn't successfully querying the data in the first place. Most of the cells remain as "Loading..." and never load the data. See Sheet7 of this spreadsheet: https://docs.google.com/spreadsheets/d/1EoOIQxWyKWOvtlCrmJNI76FAxGhzgXrE4s0F05tw2MY/edit#gid=1598325639 – SOtoTheRescue May 03 '20 at 01:35
  • The problem with Apps Script is that there is now way to easily retrieve the information from the DOM. So doing the request there is kind of hard. You could actually set the formulas from apps script, but int he end you will still have the same problems. Have you taken a look at [this page](https://www.sec.gov/developer) to see if they have a tool to this purpose? So maybe you don' t need to do web scrapping. – Raserhin May 05 '20 at 09:20
  • @Raserhin Yes I posted that SEC resources link on Tanaike's original answer before it was deleted. Specifically I felt the RSS feed may be my best bet via importfeed but alas I ran into similar issues where only some data was called... plus for some bizarre reason the SEC doesn't separate out the 'date' field which I would need. See this image: https://i.imgur.com/NvIziIR.png. From that SEC link, do you see any alternative solution for a noob like me? I don't know how to go about tackling the index files. – SOtoTheRescue May 06 '20 at 01:57
  • @Raserhin Would it be possible to import the RSS feeds in bulk? – SOtoTheRescue May 06 '20 at 01:59
  • I guess, the thing is that you actually want to retrieve a lot of data in Apps Script. And there are some limitations there in the amount of time that an Script can run and the amount of request you can make a day. Have you managed to retrieve a company through RSS? I'm not that very well versed using this kind of feeds. – Raserhin May 11 '20 at 14:32
  • I have retrieved a company through RSS but I run into similar issues when dealing with many companies. It's increasingly seeming like there is no solution for this issue. I appreciate the help from the both of you. – SOtoTheRescue May 11 '20 at 15:01

0 Answers0