4

I'm loading data from a CSV into a data frame and then looping through the rows to do web queries. All my code is shown below.

import pandas as pd
from bs4 import BeautifulSoup
import requests
import pandas as pd


df = pd.read_csv('C:\\Users\\ryans\\OneDrive\\Desktop\\Briefcase\\NY Times Dates\\exchanges.csv')
print(df)

for index, row in df.iterrows():
    passin = 'https://markets.on.nytimes.com/research/markets/holidays/holidays.asp?display=market&exchange='+row["Symbol"]
    dfs = pd.read_html(passin)
    df = dfs[0]
    print(df)

My last step here is to append data frame #2 under data frame #1, and append data frame #3 under data frame #2, and so on and so on. I Googled for a solution for this, and found several techniques to append #1 under #1, and that's it.

I'm not sure how to append data frame #n to data frame #n-1. How can I do that? I'm guessing it's an increment process, but I can't get it working here.

halfer
  • 19,824
  • 17
  • 99
  • 186
ASH
  • 20,759
  • 19
  • 87
  • 200

1 Answers1

4

You can use a generator expression with pd.concat:

url = 'https://markets.on.nytimes.com/research/markets/holidays/holidays.asp?display=market&exchange='

res = pd.concat(pd.read_html(f'{url}{symbol}')[0] for symbol in df['Symbol'])

If you wish to ignore index when concatenating, use the argument ignore_index=True:

gen = (pd.read_html(f'{url}{symbol}')[0] for symbol in df['Symbol'])
res = pd.concat(gen, ignore_index=True)
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Hello. I have followed you, and your comments, for the past 2+ years. I up-voted so many of your comments!! I know you know your stuff!! No question about it!! I followed your recommendation on this post, but I can't make this work. I must have made a mistake. I'm getting this error: KeyError: 'Symbol' Thoughts??? – ASH Dec 21 '18 at 01:40
  • @ryguy72, First check `print(df.columns)` and see if `'Symbol'` exists there, then check `print(df['Symbol'])` to see if it works. Your current solution implies there is a series labelled `'Symbol'`. – jpp Dec 21 '18 at 01:41
  • Well, when I do 'print(df['Symbol'])', I get the same result. Here are the first 3 rows in 'exchanges.csv' Name of Exchange Symbol Australian Stock Exchange Ltd - Australia ASX Euronext Paris - France PAR – ASH Dec 21 '18 at 01:46
  • By same result, I assume you mean `KeyError`. Then check `print(df.columns)`. You might have some unwanted whitespace (see [here](https://stackoverflow.com/questions/41476150/removing-space-from-dataframe-columns-in-pandas) to correct). – jpp Dec 21 '18 at 01:47
  • When I run the code I get this: KeyError: 'Symbol'. I just updated my original post with the addition of your code. – ASH Dec 21 '18 at 14:07
  • 1
    @ryguy72, My solution doesn't involve an explicit `for` loop. Don't use `iterrows`. – jpp Dec 21 '18 at 14:24