Using URLLIB in python to pull data from online CSV file

Question

So, in Python, I'm trying to pull data from a csv file, using the csv module (to handle data in a csv file of curse), I have this:

import csv
with open('GDMTH_CSV.csv')as csv_file:
  csv_file_read = csv.reader(csv_file)
  for line in csv_file_read:
    print(line)

and then I get raw data, so I modify accordingly to get the specific data I need.

but GDMTH_CSV.csv is actually a file online, so I use the urllib.request module to try the same thing:

import urllib.request
import csv

url='http://www.cre.gob.mx/da/TarifasFinalesdeSuministroBasico.csv'
x = urllib.request.urlopen(url)
csv_read = csv.reader(x)
for line in csv_read:
  print(line)

(Btw the file's name is actually "TarifasfinalesdeSuministroBasico.csv") but that gives me the error "Iterator should return Strings not bytes, did you open the file in text mode?", so I reason: "ohh that's in bytes; I should just decode that", so I change

x = urllib.request.urlopen(url)

to

x = urllib.request.urlopen(str(url))

But then I get every single character on the file in square brackets as a result. I guess I still can't understand data types and lists in Python. How do I get a result similar to the first code?

I suppose that solves the issue, but i don't understend why 'str()' doesn't give the same result as 'codec.iterdecode()', I get the feeling that's something Sooo basic. I did search for similar questions before psting my own, i suppose i didn't search plenty. Thanks anyway. — ZainZeus, Jan 03 '19 at 20:55
the `str` constructor doesn't decode bytes, so you shound use `decode` method instead. `codec.iterdecode` is good for reading large files when memory consumption is concern. — taras, Jan 04 '19 at 07:37

cody · Accepted Answer · 2019-01-03T19:50:48.263

The object returned by urllib.request.urlopen is not suitable to pass to csv.reader(), as that function is expecting an iterator. Additionally, I would recommend using the simpler requests library for higher-level HTTP interactions. The following should successfully get the data:

(Note that I'm decoding the byte sequence as iso-8859-1, as that is the encoding of this particular csv file)

import csv
import requests

url = 'http://www.cre.gob.mx/da/TarifasFinalesdeSuministroBasico.csv'

res = requests.get(url)
content = res.content.decode('iso-8859-1')

for line in csv.reader(content.splitlines()):
    print(line)

Output:

['', 'División', '', '', '', '', 'Baja California', '', '', '', '', '', '', '', '', 'Baja California Sur', '', '', '', '', '', '', '', '', 'Bajio', '', '', '', '', '', '', '', '', 'Centro Occidente', '', '', '', '', '', '', '', '', 'Centro Oriente', '', '', '', '', '', '', '', '', 'Centro Sur', '', '', '', '', '', '', '', '', 'Golfo Centro', '', '', '', '', '', '', '', '', 'Golfo Norte', '', '', '', '', '', '', '', '', 'Jalisco', '', '', '', '', '', '', '', '', 'Valle de México Centro', '', '', '', '', '', '', '', '', 'Valle de Mexico Norte', '', '', '', '', '', '', '', '', 'Valle de México Sur', '', '', '', '', '', '', '', '', 'Noroeste', '', '', '', '', '', '', '', '', 'Norte', '', '', '', '', '', '', '', '', 'Oriente', '', '', '', '', '', '', '', '', 'Peninsular', '', '', '', '', '', '', '', '', 'Sureste', '', '', '', '', '', '', '', '']
['Mes', 'Tarifa', 'Descripción', 'Int. Horario', 'Cargo', 'Unidades', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL', 'Transmisión', 'Distribución', 'CENACE', 'Suministro', 'SCnMEM', 'Generación', 'Capacidad', 'Pérdidas', 'TOTAL']
...

I marked my question as a duplicate before seeing your answer, this is more elegant and simple to do, btw i encoded using 'latin_1' codec, I suppose it's the saqme @cody — ZainZeus, Jan 03 '19 at 21:09

Using URLLIB in python to pull data from online CSV file

1 Answers1