How to scrape United States Oil fund holdings?

Question

I've been trying to scrape the portfolio holdings for the USO fund here: http://www.uscfinvestments.com/holdings/uso

So far I could only get as far as this:

import requests
page = requests.get("http://www.uscfinvestments.com/holdings/uso")
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
soup.find_all('div', id = 'holdingsTableWrapper')

Then I basically get nothing:

    [<div id="holdingsTableWrapper">
 <div id="portfolioTableDiv"></div>
 </div>, <div id="holdingsTableWrapper">
 <div id="holdingsTableDiv"></div>
 </div>]

Anyone know how to work around this?

EDIT:

I'm trying to scrape the contents of these tables:

score 0 · Accepted Answer · answered May 08 '20 at 07:51

Call the API directly because the site is loaded via JavaScript. By using the XHR request.

import pandas as pd

df = pd.read_json("http://www.uscfinvestments.com/uscfinvestments-template/assets/charts/portfolioHoldings-uso.json").drop(0).drop(columns=['asofdate'])

print(df)

import pandas as pd

df = pd.read_json("http://www.uscfinvestments.com/uscfinvestments-template/assets/charts/portfolioHoldings-uso.json").drop(0).drop(columns=['asofdate'])

print(df)

Note: you can have a look here to understand how to get the XHR call.

How to scrape United States Oil fund holdings?

1 Answers1