0

I've been trying to scrape the portfolio holdings for the USO fund here: http://www.uscfinvestments.com/holdings/uso

So far I could only get as far as this:

import requests
page = requests.get("http://www.uscfinvestments.com/holdings/uso")
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
soup.find_all('div', id = 'holdingsTableWrapper')

Then I basically get nothing:

    [<div id="holdingsTableWrapper">
 <div id="portfolioTableDiv"></div>
 </div>, <div id="holdingsTableWrapper">
 <div id="holdingsTableDiv"></div>
 </div>]

Anyone know how to work around this?

EDIT:

I'm trying to scrape the contents of these tables: enter image description here

enter image description here

moron
  • 51
  • 1
  • 9

1 Answers1

0

Call the API directly because the site is loaded via JavaScript. By using the XHR request.

import pandas as pd

df = pd.read_json("http://www.uscfinvestments.com/uscfinvestments-template/assets/charts/portfolioHoldings-uso.json").drop(0).drop(columns=['asofdate'])

print(df)

enter image description here

import pandas as pd

df = pd.read_json("http://www.uscfinvestments.com/uscfinvestments-template/assets/charts/portfolioHoldings-uso.json").drop(0).drop(columns=['asofdate'])

print(df)

enter image description here

Note: you can have a look here to understand how to get the XHR call.