-1

I am trying to extract information such as profits, revenues and others along with their corresponding dates and quarters from an unstructured text about stock market and convert it into a report in the table form but as there is not format of the input text, it is hard to know which entity belong to what date and quarters and which value belong to which entity. Chunking works on few documents but not enough. Is there any unsupervised way to linking entities with their corresponding dates, values and quarters?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
starter
  • 9
  • 3
  • See if this resource can help you: https://www.deepset.ai/blog/automating-information-extraction-with-question-answering It is about automatic information extraction from texts using a set of questions defined by the user. – Stefano Fiorucci - anakin87 Aug 01 '22 at 13:32
  • I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology - please see the intro and NOTE in https://stackoverflow.com/tags/machine-learning/info – desertnaut Aug 01 '22 at 14:13

1 Answers1

0

Financial data is highly structured data. Not sure what you are after, but maybe this will help.

import pandas_datareader as web
import pandas as pd
 
df = web.DataReader('AAPL', data_source='yahoo', start='2011-01-01', end='2021-01-12')
df.head()

import yfinance as yf
aapl = yf.Ticker("AAPL")
aapl
 
 
# get stock info
aapl.info

Result:

{'zip': '95014',
 'sector': 'Technology',
 'fullTimeEmployees': 154000,
 'longBusinessSummary': 'Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. It also sells various related services. In addition, the company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; AirPods Max, an over-ear wireless headphone; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, HomePod, and iPod touch. Further, it provides AppleCare support services; cloud services store services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts. Additionally, the company offers various services, such as Apple Arcade, a game subscription service; Apple Music, which offers users a curated listening experience with on-demand radio stations; Apple News+, a subscription news and magazine service; Apple TV+, which offers exclusive original content; Apple Card, a co-branded credit card; and Apple Pay, a cashless payment service, as well as licenses its intellectual property. The company serves consumers, and small and mid-sized businesses; and the education, enterprise, and government markets. It distributes third-party applications for its products through the App Store. The company also sells its products through its retail and online stores, and direct sales force; and third-party cellular network carriers, wholesalers, retailers, and resellers. Apple Inc. was incorporated in 1977 and is headquartered in Cupertino, California.',
 'city': 'Cupertino',
 'phone': '408 996 1010',
 'state': 'CA',
 'country': 'United States',
 'companyOfficers': [],
 'website': 'https://www.apple.com',
 'maxAge': 1,
 'address1': 'One Apple Park Way',
 'industry': 'Consumer Electronics',
 'ebitdaMargins': 0.3343,
 'profitMargins': 0.25709,
 'grossMargins': 0.43313998,
 'operatingCashflow': 118224003072,
 'revenueGrowth': 0.019,
 'operatingMargins': 0.30533,
 'ebitda': 129556996096,
 'targetLowPrice': 130,
 'recommendationKey': 'buy',
 'grossProfits': 152836000000,
etc., etc., etc.

# get historical market data
hist = aapl.history(period="max")
 
# show actions (dividends, splits)
aapl.actions
 
# show dividends
aapl.dividends
 
# show splits
aapl.splits
 
# show financials
aapl.financials
aapl.quarterly_financials

Result:

                                           2022-06-25     2022-03-26  \
Research Development                     6797000000.0   6387000000.0   
Effect Of Accounting Charges                     None           None   
Income Before Tax                       23066000000.0  30139000000.0   
Minority Interest                                None           None   
Net Income                              19442000000.0  25010000000.0   
Selling General Administrative           6012000000.0   6193000000.0   
Gross Profit                            35885000000.0  42559000000.0   
Ebit                                    23076000000.0  29979000000.0   
Operating Income                        23076000000.0  29979000000.0   
Other Operating Expenses                         None           None   
Interest Expense                         -719000000.0   -691000000.0   
Extraordinary Items                              None           None   
Non Recurring                                    None           None   
Other Items                                      None           None   
Income Tax Expense                       3624000000.0   5129000000.0   
Total Revenue                           82959000000.0  97278000000.0   
Total Operating Expenses                59883000000.0  67299000000.0   
Cost Of Revenue                         47074000000.0  54719000000.0   
Total Other Income Expense Net            -10000000.0    160000000.0   
Discontinued Operations                          None           None   
Net Income From Continuing Ops          19442000000.0  25010000000.0   
Net Income Applicable To Common Shares  19442000000.0  25010000000.0   

                                            2021-12-25     2021-09-25  
Research Development                      6306000000.0   5772000000.0  
Effect Of Accounting Charges                      None           None  
Income Before Tax                        41241000000.0  23248000000.0  
Minority Interest                                 None           None  
Net Income                               34630000000.0  20551000000.0  
Selling General Administrative            6449000000.0   5616000000.0  
Gross Profit                             54243000000.0  35174000000.0  
Ebit                                     41488000000.0  23786000000.0  
Operating Income                         41488000000.0  23786000000.0  
Other Operating Expenses                          None           None  
Interest Expense                          -694000000.0   -672000000.0  
Extraordinary Items                               None           None  
Non Recurring                                     None           None  
Other Items                                       None           None  
Income Tax Expense                        6611000000.0   2697000000.0  
Total Revenue                           123945000000.0  83360000000.0  
Total Operating Expenses                 82457000000.0  59574000000.0  
Cost Of Revenue                          69702000000.0  48186000000.0  
Total Other Income Expense Net            -247000000.0   -538000000.0  
Discontinued Operations                           None           None  
Net Income From Continuing Ops           34630000000.0  20551000000.0  
Net Income Applicable To Common Shares   34630000000.0  20551000000.0  

Documentation Here:

https://medium.com/codestorm/how-to-get-data-from-yahoo-finance-using-python-8d087fe42b10

ASH
  • 20,759
  • 19
  • 87
  • 200