0

I would like to copy data into a pandas dataframe using StringIO as opposed to a text file. Some files will be really big and i'd rather not have big text files and then delete. Using StringIO seems like a much nicer solution.

If I do this the dataframe gets created no problem

import pandas as pd
import requests

indIndustryDataURL='https://download.bls.gov/pub/time.series/pc/pc.industry'  #0.04MB

# put data from web into a requests.get() object
indIndustryData=requests.get(indIndustryDataURL, allow_redirects=True)

# Store column names and data rows from requests.get() in separate list objects
industryDataColNames=indIndustryData.text.split('\r\n')[0].split('\t')
industryDataRowData=indIndustryData.text.split('\r\n')[1:-1]


# Print row data into a text file ready for import pandas like csv
with open('industryDataRowData.txt','w') as f:
          f.writelines('%s\n' % row for row in industryDataRowData )

#This works fine
df1=pd.read_csv('industryDataRowData.txt',sep='\t', names=industryDataColNames)

But when I try StringIO() the code below from the same jupyter notebook it won't let me because of an error about my path too long.

import io
io = io.StringIO()
start_time = datetime.now()
io.writelines(industryDataRowData)
io.seek(0)

df2=pd.read_csv(io.getvalue(),sep='\t', names=industryDataColNames)

ValueError: stat: path too long for Windows

Any advice?

costa rica
  • 85
  • 1
  • 12

2 Answers2

1

You pass the buffer, not its contents.

df2=pd.read_csv(io, sep='\t', ...)
  • This seems to work but then when I look inside the dataframe it's empty. Column names are there but df2.head() brings no results and len(df2) return 0. io has data because when i do io.getvalue()[0:40] i can see the string of data. – costa rica Sep 12 '20 at 14:36
0

Try:

df2=pd.read_csv(io.getvalue().strip(),sep='\t', names=industryDataColNames)
gtomer
  • 5,643
  • 1
  • 10
  • 21