2

I am trying to get the 'Earnings Announcements table' from: https://www.zacks.com/stock/research/amzn/earnings-announcements

I am using different beautifulsoup options but none get the table.

table = soup.find('table', attrs={'class': 'earnings_announcements_earnings_table'})

table = soup.find_all('table')

When I inspect the table, the elements of the table are there.

I am pasting a portion of the code I am getting for the table (js, json?).

document.obj_data = {
"earnings_announcements_earnings_table"   : 
         [  [ "10/26/2017", "9/2017", "$0.06", "--", "--", "--", "--" ] ,  [ "7/27/2017", "6/2017", "$1.40", "$0.40", "<div class=\"right neg negative neg_icon showinline down\">-1.00</div>", "<div class=\"right neg negative neg_icon showinline down\">-71.43%</div>", "After Close" ] ,  [ "4/27/2017", "3/2017", "$1.03", "$1.48", "<div class=\"right pos positive pos_icon showinline up\">+0.45</div>", "<div class=\"right pos positive pos_icon showinline up\">+43.69%</div>", "After Close" ] ,  [ "2/2/2017", "12/2016", "$1.40", "$1.54", "<div class=\"right pos positive pos_icon showinline up\">+0.14</div>", "<div class=\"right pos positive pos_icon showinline up\">+10.00%</div>", "After Close" ] ,  [ "10/27/2016", "9/2016", "$0.85", "$0.52", "<div class=\"right neg negative neg_icon showinline down\">-0.33</div>", "<div class=\"right neg negative neg_icon showinline down\">-38.82%</div>", "After Close" ] ,  [ "7/28/2016", "6/2016", "$1.14", "$1.78", "<div class=\"right pos positive pos_icon showinline up\">+0.64</div>", "<div class=\"right pos positive pos_icon showinline up\">+56.14%</div>", "After Close" ] ,  [ "4/28/2016", "3/2016", "$0.61", "$1.07", "<div class=\"right pos positive pos_icon showinline up\">+0.46</div>", "<div class=\"right pos positive pos_icon showinline up\">+75.41%</div>", "After Close" ] ,  [ "1/28/2016", "12/2015", "$1.61", "$1.00", "<div class=\"right neg negative neg_icon showinline down\">-0.61</div>", "<div class=\"right neg negative neg_icon showinline down\">-37.89%</div>", "After Close" ] ,  [ "10/22/2015", "9/2015", "-$0.1", "$0.17", "<div class=\"right pos positive pos_icon showinline up\">+0.27</div>", "<div class=\"right pos positive pos_icon showinline up\">+270.00%</div>", "After Close" ] ,  [ "7/23/2015", "6/2015", "-$0.15", "$0.19", "<div class=\"right pos positive pos_icon showinline up\">+0.34</div>", "<div class=\"right pos positive pos_icon showinline up\">+226.67%</div>", "After Close" ] ,  [ "4/23/2015", "3/2015", "-$0.13", "-$0.12", "<div class=\"right pos positive pos_icon showinline up\">+0.01</div>", "<div class=\"right pos positive pos_icon showinline up\">+7.69%</div>", "After Close" ] ,  [ "1/29/2015", "12/2014", "$0.24", "$0.45", "<div class=\"right pos positive pos_icon showinline up\">+0.21</div>", "<div class=\"right pos positive pos_icon showinline up\">+87.50%</div>", "After Close" ] ,  [ "10/23/2014", "9/2014", "-$0.73", "-$0.95", "<div class=\"right neg negative neg_icon showinline down\">-0.22</div>", "<div class=\"right neg negative neg_icon showinline down\">-30.14%</div>", "After Close" ] ,  [ "7/24/2014", "6/2014", "-$0.13", "-$0.27", "<div class=\"right neg negative neg_icon showinline down\">-0.14</div>", "<div class=\"right neg negative neg_icon showinline down\">-107.69%</div>", "After Close" ] ,  [ "4/24/2014", "3/2014", "$0.22", "$0.23", "<div class=\"right pos positive pos_icon showinline up\">+0.01</div>", "<div class=\"right pos positive pos_icon showinline up\">+4.55%</div>", "After Close" ] ,  [ "1/30/2014", "12/2013", "$0.68", "$0.51", "<div class=\"right neg negative neg_icon showinline down\">-0.17</div>", "<div class=\"right neg negative neg_icon showinline down\">-25.00%</div>", "After Close" ] ,  [ "10/24/2013", "9/2013", "-$0.09", "-$0.09", "<div class=\"right pos_na showinline\">0.00</div>", "<div class=\"right pos_na showinline\">0.00%</div>", "After Close" ] ,  [ "7/25/2013", "6/2013", "$0.04", "-$0.02", "<div class=\"right neg negative neg_icon showinline down\">-0.06</div>", "<div class=\"right neg negative neg_icon showinline down\">-150.00%</div>", "After Close" ] ,  [ "4/25/2013", "3/2013", "$0.10", "$0.18", "<div class=\"right pos positive pos_icon showinline up\">+0.08</div>", "<div class=\"right pos positive pos_icon showinline up\">+80.00%</div>", "After Close" ] ,  [ "1/29/2013", "12/2012", "$0.28", "$0.21", "<div class=\"right neg negative neg_icon showinline down\">-0.07</div>", "<div class=\"right neg negative neg_icon showinline down\">-25.00%</div>", "After Close" ] ,  [ "10/25/2012", "9/2012", "-$0.08", "-$0.23", "<div class=\"right neg negative neg_icon showinline down\">-0.15</div>", "<div class=\"right neg negative neg_icon showinline down\">-187.50%</div>", "After Close" ] ,  [ "7/26/2012", "6/2012", "--", "--", "--", "--", "After Close" ] ,  [ "4/26/2012", "3/2012", "--", "--", "--", "--", "After Close" ] ,  [ "1/31/2012", "12/2011", "--", "--", "--", "--", "After Close" ] ,  [ "10/25/2011", "9/2011", "--", "--", "--", "--", "After Close" ] ,  [ "7/26/2011", "6/2011", "--", "--", "--", "--", "After Close" ] ,  [ "4/26/2011", "3/2011", "--", "--", "--", "--", "--" ] ,  [ "1/27/2011", "12/2010", "--", "--", "--", "--", "After Close" ] ,  [ "10/21/2010", "9/2010", "--", "--", "--", "--", "After Close" ] ,  [ "7/22/2010", "6/2010", "--", "--", "--", "--", "After Close" ] ,  [ "4/22/2010", "3/2010", "--", "--", "--", "--", "After Close" ] ,  [ "1/28/2010", "12/2009", "--", "--", "--", "--", "After Close" ] ,  [ "10/22/2009", "9/2009", "--", "--", "--", "--", "After Close" ] ,  [ "7/23/2009", "6/2009", "--", "--", "--", "--", "After Close" ]  ]

How could I get this table? Thanks!

Diego
  • 637
  • 3
  • 10
  • 24

1 Answers1

0

So the solution is to parse the whole HTML document using Python's string and RegExp functions instead of BeautifulSoup because we are not trying to get the data from HTML tags but instead we want to get them inside a JS code.

So this code basically, get the JS array inside "earnings_announcements_earnings_table" and since the JS Array is the same as Python's list structure, I just parse it using ast. The result is a list were you can loop into and it shows all data from all the pages of the table.

import urllib2
import re
import ast

user_agent = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0'}
req = urllib2.Request('https://www.zacks.com/stock/research/amzn/earnings-announcements', None, user_agent)
source = urllib2.urlopen(req).read()

compiled = re.compile('"earnings_announcements_earnings_table"\s+\:', flags=re.IGNORECASE | re.DOTALL)
match = re.search(compiled, source)
if match:
    source = source[match.end(): len(source)]

compiled = re.compile('"earnings_announcements_webcasts_table"', flags=re.IGNORECASE | re.DOTALL)
match = re.search(compiled, source)
if match:
    source = source[0: match.start()]

result = ast.literal_eval(str(source).strip('\r\n\t, '))
print result

Let me know if you need clarifications.

chad
  • 838
  • 1
  • 5
  • 16