-1

I guess this is a combination of two questions - read online text file and then parse the result into lists. I tried the following code, which can read the file into byte file but not able to convert into list

import urllib
CFTC_URL = r"http://www.cftc.gov/dea/newcot/FinFutWk.txt"
CFTC_url = urllib.request.urlopen(CFTC_URL)
output = CFTC_url.read().decode('utf-8')
S.Wang
  • 13
  • 4
  • Possible duplicate of [How do I split a multi-line string into multiple lines?](http://stackoverflow.com/questions/172439/how-do-i-split-a-multi-line-string-into-multiple-lines) – SiHa Sep 26 '16 at 13:49

3 Answers3

2

You can use standart csv module with StringIO wrapper for file content (example with requests library for getting data):

import requests, io, csv

CFTC_URL = r"http://www.cftc.gov/dea/newcot/FinFutWk.txt"
data = io.StringIO(requests.get(CFTC_URL).text)

dialect = csv.Sniffer().sniff(data.read(1024))
data.seek(0)
reader = csv.reader(data, dialect)
for row in reader:
    print(row)
Stanislav Ivanov
  • 1,854
  • 1
  • 16
  • 22
0

Rather than attempting parse every line from the url and put it into specific rows for a csv file, you can just push it all into a text file to clean up the formating, and then read back from it, it may seem like a bit more works but this is generally my approach to comma delimited information from a URL.

import requests
URL = "http://www.cftc.gov/dea/newcot/FinFutWk.txt"
r = requests.get(URL,stream=True)
with open('file.txt','w') as W:
    W.write(r.text)
with open('file.txt', 'r') as f:
    lines = f.readlines()

for line in  lines:
    print(line.split(','))

You can take what is in that forloop, and swap it around to actually saving the lists into a array of lists so you can use rather than print them.

content = []
for line in lines:
    content.append(line.split(','))

Also note that upon splitting, you will still notice that there is content that has quite a large amount of white space after it, you could run through the entire list, for each list in the array, and remove all white space but that would ruin the first element in the list, or just convert the numeric values which have the white space into actual integers as they were read in as strings. That would be your preference. If you have any questions feel free to add a comment below.

EDIT 1: On a side note, if you do not wish to keep the file that was saved with the content, import the os library and then after you read the lines into the lines array, remove the file.

import os
os.remove('file.txt')
Xavid Ramirez
  • 216
  • 2
  • 7
0

Assuming you want to interpret the file as a table you want to first get the rows by using split. Then you can get the columns by splitting each row again.

import urllib.request
CFTC_URL = r"http://www.cftc.gov/dea/newcot/FinFutWk.txt"
CFTC_url = urllib.request.urlopen(CFTC_URL)
output = CFTC_url.read().decode('utf-8')
lines = output.split("\r\n"))) # split on newline
print(lines[0]) # first line "CANADIAN DOLLAR ..."
columns_0 = lines[0].split(",") # split on ,
print(columns[0]) # first column of first line

You can then iterate through the list of lines and for each entry in lines you can iterate through the columns.

Community
  • 1
  • 1
merl
  • 162
  • 1
  • 9