0

I'm quite new to Python and am trying to learn as much as I can by watching videos/reading tutorials.

I was following this video on how to take data from Quandl. I know there is a specific module for python already, but I wanted to learn how to take it from the website if necessary. My issue is that when I try to emulate the code around 9:50 and print the result, python doesn't split the lines in the CSV file. I understand he's using python 2.x, while I'm using 3.4.

Here's the code I use:

import urllib
from urllib.request import urlopen

def grabQuandl(ticker):
    endLink = 'sort_order=desc'#without authtoken

    try:
        salesRev = urllib.request.urlopen('https://www.quandl.com/api/v1/datasets/SEC/'+ticker+'_SALESREVENUENET_Q.csv?&'+endLink).read()

        print (salesRev)   

    except Exception as e:
        print ('failed the main quandl loop for reason of', str(e))

grabQuandl('AAPL')

And this is what gets printed:

b'Date,Value\n2009-06-27,8337000000.0\n2009-12-26,15683000000.0\n2010-03-27,13499000000.0\n2010-06-26,15700000000.0\n2010-09-25,20343000000.0\n2010-12-25,26741000000.0\n2011-03-26,24667000000.0\n2011-06-25,28571000000.0\n2011-09-24,28270000000.0\n2011-12-31,46333000000.0\n2012-03-31,39186000000.0\n2012-06-30,35023000000.0\n2012-09-29,35966000000.0\n2012-12-29,54512000000.0\n2013-03-30,43603000000.0\n2013-06-29,35323000000.0\n2013-09-28,37472000000.0\n2013-12-28,57594000000.0\n2014-03-29,45646000000.0\n2014-06-28,37432000000.0\n2014-09-27,42123000000.0\n2014-12-27,74599000000.0\n2015-03-28,58010000000.0\n'

I get that the \n is some sort of line splitter, but it's not working like in the video. I've googled for possible solutions, such as doing a for loop, using read().split(), but at best they simply remove the \n. I can't get the output into a table like in the video. What am I doing wrong?

Jayesh Goyani
  • 11,008
  • 11
  • 30
  • 50
pyman
  • 565
  • 2
  • 7
  • 15
  • Sorry the import csv part was my own attempt to find a solution by trying the csv reader. I've edited it out. – pyman Aug 06 '15 at 06:28
  • So "\n" is a newline. You can `split("\n")` and get strings of whole rows, and then get away with `split(',')` maybe to get fields. CSV data in the wild is richer than that, though, and can do "bad, things" (which is 1 field, not 2). – Paul Aug 06 '15 at 06:31
  • I'm sorry, I don't fully understand. I get that "\n" is a new line, but what do you mean I can use split(',')? Apologies for my ignorance. – pyman Aug 06 '15 at 06:40

1 Answers1

2

.read() gives you back a byte-string , when you directly print it, you get the result you got.You can notice the b at the starting before the quote, it indicates byte-string.

You should decode the string you get, before printing (or directly while using .read() . An example -

import urllib
from urllib.request import urlopen

def grabQuandl(ticker):
    endLink = 'sort_order=desc'#without authtoken

    try:
        salesRev = urllib.request.urlopen('https://www.quandl.com/api/v1/datasets/SEC/'+ticker+'_SALESREVENUENET_Q.csv?&'+endLink).read().decode('utf-8')

        print (salesRev)   

    except Exception as e:
        print ('failed the main quandl loop for reason of', str(e))

grabQuandl('AAPL')

The above decodes the returned data using utf-8 encoding, you can use whatever encoding you want (whatever encoding the data is).


Example to show the print behavior -

>>> s = b'asd\nbcd\n'
>>> print(s)
b'asd\nbcd\n'
>>> print(s.decode('utf-8'))
asd
bcd

>>> type(s)
<class 'bytes'>
Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
  • Ah! That does it! Thank you! So how did you know you needed to use utf-8 encoding to decode it? – pyman Aug 06 '15 at 06:44
  • The most common encoding used is `utf-8` , so it was an educated guess :) . Anyway from the data you posted above, even ascii can decode it , so it should not be a problem. – Anand S Kumar Aug 06 '15 at 06:49
  • Thank you! I just did. As an aside, if I am going to use this data (such as graphing it) do I still need to decode it, or should I leave it as the original and only decode if I wish to print it? – pyman Aug 06 '15 at 06:56
  • decoding would be good, rarely any methods require byte-string as input , you may be better off keeping the decoded version, and then if you need to send it into some function that needs byte-string, using `.encode()` function to convert it back to byte-string. – Anand S Kumar Aug 06 '15 at 06:58