I am trying to extract information such as profits, revenues and others along with their corresponding dates and quarters from an unstructured text about stock market and convert it into a report in the table form but as there is not format of the input text, it is hard to know which entity belong to what date and quarters and which value belong to which entity. Chunking works on few documents but not enough. Is there any unsupervised way to linking entities with their corresponding dates, values and quarters?
Asked
Active
Viewed 55 times
-1

desertnaut
- 57,590
- 26
- 140
- 166

starter
- 9
- 3
-
See if this resource can help you: https://www.deepset.ai/blog/automating-information-extraction-with-question-answering It is about automatic information extraction from texts using a set of questions defined by the user. ā Stefano Fiorucci - anakin87 Aug 01 '22 at 13:32
-
Iām voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology - please see the intro and NOTE in https://stackoverflow.com/tags/machine-learning/info ā desertnaut Aug 01 '22 at 14:13
1 Answers
0
Financial data is highly structured data. Not sure what you are after, but maybe this will help.
import pandas_datareader as web
import pandas as pd
df = web.DataReader('AAPL', data_source='yahoo', start='2011-01-01', end='2021-01-12')
df.head()
import yfinance as yf
aapl = yf.Ticker("AAPL")
aapl
# get stock info
aapl.info
Result:
{'zip': '95014',
'sector': 'Technology',
'fullTimeEmployees': 154000,
'longBusinessSummary': 'Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. It also sells various related services. In addition, the company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; AirPods Max, an over-ear wireless headphone; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, HomePod, and iPod touch. Further, it provides AppleCare support services; cloud services store services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts. Additionally, the company offers various services, such as Apple Arcade, a game subscription service; Apple Music, which offers users a curated listening experience with on-demand radio stations; Apple News+, a subscription news and magazine service; Apple TV+, which offers exclusive original content; Apple Card, a co-branded credit card; and Apple Pay, a cashless payment service, as well as licenses its intellectual property. The company serves consumers, and small and mid-sized businesses; and the education, enterprise, and government markets. It distributes third-party applications for its products through the App Store. The company also sells its products through its retail and online stores, and direct sales force; and third-party cellular network carriers, wholesalers, retailers, and resellers. Apple Inc. was incorporated in 1977 and is headquartered in Cupertino, California.',
'city': 'Cupertino',
'phone': '408 996 1010',
'state': 'CA',
'country': 'United States',
'companyOfficers': [],
'website': 'https://www.apple.com',
'maxAge': 1,
'address1': 'One Apple Park Way',
'industry': 'Consumer Electronics',
'ebitdaMargins': 0.3343,
'profitMargins': 0.25709,
'grossMargins': 0.43313998,
'operatingCashflow': 118224003072,
'revenueGrowth': 0.019,
'operatingMargins': 0.30533,
'ebitda': 129556996096,
'targetLowPrice': 130,
'recommendationKey': 'buy',
'grossProfits': 152836000000,
etc., etc., etc.
# get historical market data
hist = aapl.history(period="max")
# show actions (dividends, splits)
aapl.actions
# show dividends
aapl.dividends
# show splits
aapl.splits
# show financials
aapl.financials
aapl.quarterly_financials
Result:
2022-06-25 2022-03-26 \
Research Development 6797000000.0 6387000000.0
Effect Of Accounting Charges None None
Income Before Tax 23066000000.0 30139000000.0
Minority Interest None None
Net Income 19442000000.0 25010000000.0
Selling General Administrative 6012000000.0 6193000000.0
Gross Profit 35885000000.0 42559000000.0
Ebit 23076000000.0 29979000000.0
Operating Income 23076000000.0 29979000000.0
Other Operating Expenses None None
Interest Expense -719000000.0 -691000000.0
Extraordinary Items None None
Non Recurring None None
Other Items None None
Income Tax Expense 3624000000.0 5129000000.0
Total Revenue 82959000000.0 97278000000.0
Total Operating Expenses 59883000000.0 67299000000.0
Cost Of Revenue 47074000000.0 54719000000.0
Total Other Income Expense Net -10000000.0 160000000.0
Discontinued Operations None None
Net Income From Continuing Ops 19442000000.0 25010000000.0
Net Income Applicable To Common Shares 19442000000.0 25010000000.0
2021-12-25 2021-09-25
Research Development 6306000000.0 5772000000.0
Effect Of Accounting Charges None None
Income Before Tax 41241000000.0 23248000000.0
Minority Interest None None
Net Income 34630000000.0 20551000000.0
Selling General Administrative 6449000000.0 5616000000.0
Gross Profit 54243000000.0 35174000000.0
Ebit 41488000000.0 23786000000.0
Operating Income 41488000000.0 23786000000.0
Other Operating Expenses None None
Interest Expense -694000000.0 -672000000.0
Extraordinary Items None None
Non Recurring None None
Other Items None None
Income Tax Expense 6611000000.0 2697000000.0
Total Revenue 123945000000.0 83360000000.0
Total Operating Expenses 82457000000.0 59574000000.0
Cost Of Revenue 69702000000.0 48186000000.0
Total Other Income Expense Net -247000000.0 -538000000.0
Discontinued Operations None None
Net Income From Continuing Ops 34630000000.0 20551000000.0
Net Income Applicable To Common Shares 34630000000.0 20551000000.0
Documentation Here:
https://medium.com/codestorm/how-to-get-data-from-yahoo-finance-using-python-8d087fe42b10

ASH
- 20,759
- 19
- 87
- 200