Not getting txt into Python

Question

I am trying to grab key financial data for specific companies (stock in below code) via this code:

        netIncomeAr = []

        endLink = 'order=asc'   # order=asc&
        try:

            netIncome = urllib.request.urlopen('https://www.quandl.com/api/v3/datasets/RAYMOND/'+stock.upper()+'_NET_INCOME_A.csv?'+endLink).read()

            splitNI = netIncome.split('\n')
            print('Net Income:')
            for eachNI in splitNI[1:-1]:
                print(eachNI)
                netIncomeAr.append(eachNI)


            incomeDate, income = np.loadtxt(netIncomeAr, delimiter=',',unpack=True,
                                            converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

        except Exception as e:
            print('failed in the Quandl grab')
            print(str(e))
            time.sleep(555)

But I get the error message that I designed 'Failed in Quandl grab'. I know that the error must be in the first lines doing the urllib.request from Quandl.

Does anyone see why this code will not work?

OK - Thanks Roland,

I have changed my code to this limited proof-of-concept snippet:

import urllib.request, urllib.error, urllib.parse
import time
import datetime
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.dates as mdates

evenBetter = ['GOOG','AAPL']


def graphData(stock, MA1, MA2):
    #######################################
    #######################################
    '''
        Use this to dynamically pull a stock from Quandl:
    '''
    print('Currently Pulling',stock)

    netIncomeAr = []
#    revAr = []
#    ROCAr = []

    endLink = 'order=asc'

    netIncome = str(urllib.request.urlopen('https://www.quandl.com/api/v3/datasets/RAYMOND/'+stock.upper()+'_NET_INCOME_A.csv?'+endLink).read())[2:-1]
    # convert to string, remove leading "b'" and trailing "'" characters.
    # netIncome = 'head\\ndata\\ndata\\n...'


    splitNI = netIncome.split('\\')[1:-1]
    # data segments still have leading 'n' character.
    # the [1:-1] is more pythonic and releases memory.
    for i in range (len(splitNI)):
        splitNI[i] = splitNI[i][1:]
    # data segments are now converted.

    print('Net Income:')
    for eachNI in splitNI:
        print(eachNI)
        netIncomeAr.append(eachNI)


    incomeDate, income = np.loadtxt(netIncomeAr, delimiter=',',unpack=True,
                                    converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

for stock in evenBetter:
    graphData(stock,25,50)

And am now getting past the urllib.request problem to another one... Below error:

Currently Pulling GOOG
Net Income:
2009-12-31,6520448000.0
2010-12-31,8505000000.0
2011-12-31,9737000000.0
2012-12-31,10737000000.0
2013-12-31,12920000000.0
Traceback (most recent call last):

  File "<ipython-input-3-5ce0b8405254>", line 1, in <module>
    runfile('C:/Users/Morten/Google Drev/SpyderProject/test/Test_sentdex_comp_screener_own_webscraper2.py', wdir='C:/Users/Morten/Google Drev/SpyderProject/test')

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "C:/Users/Morten/Google Drev/SpyderProject/test/Test_sentdex_comp_screener_own_webscraper2.py", line 57, in <module>
    graphData(stock,25,50)

  File "C:/Users/Morten/Google Drev/SpyderProject/test/Test_sentdex_comp_screener_own_webscraper2.py", line 54, in graphData
    converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\numpy\lib\npyio.py", line 860, in loadtxt
    items = [conv(val) for (conv, val) in zip(converters, vals)]

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\numpy\lib\npyio.py", line 860, in <listcomp>
    items = [conv(val) for (conv, val) in zip(converters, vals)]

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\matplotlib\dates.py", line 261, in __call__
    return date2num(datetime.datetime(*time.strptime(s, self.fmt)[:6]))

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\_strptime.py", line 494, in _strptime_time
    tt = _strptime(data_string, format)[0]

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\_strptime.py", line 306, in _strptime
    raise TypeError(msg.format(index, type(arg)))

TypeError: strptime() argument 0 must be str, not <class 'bytes'>

With Davse Bamse's suggestion I see the following traceback (it is a tough one):

Currently Pulling GOOG
Net Income:
Traceback (most recent call last):

  File "<ipython-input-3-c3f1db0f3995>", line 1, in <module>
    runfile('C:/Users/Morten/Google Drev/SpyderProject/test/sentdex_Test_comp_screener_own_webscraper2.py', wdir='C:/Users/Morten/Google Drev/SpyderProject/test')

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "C:/Users/Morten/Google Drev/SpyderProject/test/sentdex_Test_comp_screener_own_webscraper2.py", line 59, in <module>
    graphData(stock)

  File "C:/Users/Morten/Google Drev/SpyderProject/test/sentdex_Test_comp_screener_own_webscraper2.py", line 56, in graphData
    converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\numpy\lib\npyio.py", line 845, in loadtxt
    converters[i] = conv

IndexError: list assignment index out of range

With Davse Bamse's new suggestion with a list like this in the converter:

[incomeDate, income] = np.loadtxt(netIncomeAr, delimiter=',',unpack=True,
                                converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

I see this error:

Currently Pulling GOOG
Net Income:
C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\numpy\lib\npyio.py:823: UserWarning: loadtxt: Empty input file: "[]"
  warnings.warn('loadtxt: Empty input file: "%s"' % fname)
Traceback (most recent call last):

  File "<ipython-input-1-c3f1db0f3995>", line 1, in <module>
    runfile('C:/Users/Morten/Google Drev/SpyderProject/test/sentdex_Test_comp_screener_own_webscraper2.py', wdir='C:/Users/Morten/Google Drev/SpyderProject/test')

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "C:/Users/Morten/Google Drev/SpyderProject/test/sentdex_Test_comp_screener_own_webscraper2.py", line 60, in <module>
    graphData(stock)

  File "C:/Users/Morten/Google Drev/SpyderProject/test/sentdex_Test_comp_screener_own_webscraper2.py", line 56, in graphData
    converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\numpy\lib\npyio.py", line 845, in loadtxt
    converters[i] = conv

IndexError: list assignment index out of range

Thanks for your input from 12 oct. 2015 Davse Bamse,

However I am unsure of where to insert the .join as you say...

Could you please copy this snippet and post your (edited) proposal of it. I need to see the light! This is what I have now after all edits until 12 oct.

import urllib.request, urllib.error, urllib.parse
import numpy as np
import matplotlib.dates as mdates

stocklist = ['GOOG']


def graphData(stock, MA1, MA2):
    #######################################
    #######################################
    '''
        Use this to dynamically pull a stock from Quandl:
    '''
    print('Currently Pulling',stock)

    netIncomeAr = []

    endLink = 'order=asc'   # order=asc&

    netIncome = str(urllib.request.urlopen('https://www.quandl.com/api/v3/datasets/RAYMOND/'+stock.upper()+'_NET_INCOME_A.csv?'+endLink).read())[2:-1]
    # convert to string, remove leading "b'" and trailing "'" characters.
    # netIncome = 'head\\ndata\\ndata\\n...'


    splitNI = netIncome.split('\\')[1:-1]
    # data segments still have leading 'n' character.
    # the [1:-1] is more pythonic and releases memory.
    for i in range (len(splitNI)):
        splitNI[i] = splitNI[i][1:]
    # data segments are now converted.

    print('Net Income:')
    for eachNI in splitNI:
        print(eachNI)
        netIncomeAr.append(eachNI)


    incomeDate, income = np.loadtxt(netIncomeAr, delimiter=',',unpack=True,
                                    converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

for stock in stocklist:
    graphData(stock,25,50)

With todays (13-10-2015) input from Davse Bamse, I get the following error:

Currently Pulling GOOG
Net Income:
2009-12-31,6520448000.0
2010-12-31,8505000000.0
2011-12-31,9737000000.0
2012-12-31,10737000000.0
2013-12-31,12920000000.0
Traceback (most recent call last):

  File "<ipython-input-13-5ce0b8405254>", line 1, in <module>
    runfile('C:/Users/Morten/Google Drev/SpyderProject/test/Test_sentdex_comp_screener_own_webscraper2.py', wdir='C:/Users/Morten/Google Drev/SpyderProject/test')

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "C:/Users/Morten/Google Drev/SpyderProject/test/Test_sentdex_comp_screener_own_webscraper2.py", line 54, in <module>
    graphData(stock,25,50)

  File "C:/Users/Morten/Google Drev/SpyderProject/test/Test_sentdex_comp_screener_own_webscraper2.py", line 51, in graphData
    converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\numpy\lib\npyio.py", line 740, in loadtxt
    fh = iter(open(fname))

OSError: [Errno 22] Invalid argument: '2009-12-31,6520448000.0\n2010-12-31,8505000000.0\n2011-12-31,9737000000.0\n2012-12-31,10737000000.0\n2013-12-31,12920000000.0'

Davse Bamse suggested that I use io.StringIO like so:

incomeDate, income = StringIO(np.loadtxt('\n'.join(netIncomeAr), delimiter=',',unpack=True,
                                converters={ 0: mdates.strpdate2num('%Y-%m-%d')}))

But this gives me the same error as before... Any thoughts???

Changing the converter line to this:

incomeDate, income = np.loadtxt(StringIO('\n'.join(netIncomeAr)), delimiter=',',unpack=True,
                                converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

Gives following Stacktrace:

Currently Pulling GOOG
Net Income:
2009-12-31,6520448000.0
2010-12-31,8505000000.0
2011-12-31,9737000000.0
2012-12-31,10737000000.0
2013-12-31,12920000000.0
Traceback (most recent call last):

  File "<ipython-input-26-5ce0b8405254>", line 1, in <module>
    runfile('C:/Users/Morten/Google Drev/SpyderProject/test/Test_sentdex_comp_screener_own_webscraper2.py', wdir='C:/Users/Morten/Google Drev/SpyderProject/test')

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "C:/Users/Morten/Google Drev/SpyderProject/test/Test_sentdex_comp_screener_own_webscraper2.py", line 60, in <module>
    graphData(stock,25,50)

  File "C:/Users/Morten/Google Drev/SpyderProject/test/Test_sentdex_comp_screener_own_webscraper2.py", line 57, in graphData
    converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\numpy\lib\npyio.py", line 860, in loadtxt
    items = [conv(val) for (conv, val) in zip(converters, vals)]

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\numpy\lib\npyio.py", line 860, in <listcomp>
    items = [conv(val) for (conv, val) in zip(converters, vals)]

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\matplotlib\dates.py", line 261, in __call__
    return date2num(datetime.datetime(*time.strptime(s, self.fmt)[:6]))

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\_strptime.py", line 494, in _strptime_time
    tt = _strptime(data_string, format)[0]

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\_strptime.py", line 306, in _strptime
    raise TypeError(msg.format(index, type(arg)))

TypeError: strptime() argument 0 must be str, not <class 'bytes'>

Instead of Numpy's (I am at np 1.9.2) loadtxt i found another method np.genfromtxt, that apparantly can do this described in this solution numpy.loadtxt does not read file with complex numbers.

So using this converter-line instead

incomeDate, income = np.genfromtxt('\n'.join(netIncomeAr), delimiter=',',unpack=True,
                                converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

Output

Currently Pulling GOOG
Net Income:
2009-12-31,6520448000.0
2010-12-31,8505000000.0
2011-12-31,9737000000.0
2012-12-31,10737000000.0
2013-12-31,12920000000.0
Traceback (most recent call last):

  File "<ipython-input-10-5ce0b8405254>", line 1, in <module>
    runfile('C:/Users/Morten/Google Drev/SpyderProject/test/Test_sentdex_comp_screener_own_webscraper2.py', wdir='C:/Users/Morten/Google Drev/SpyderProject/test')

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "C:/Users/Morten/Google Drev/SpyderProject/test/Test_sentdex_comp_screener_own_webscraper2.py", line 50, in <module>
    graphData(stock,25,50)

  File "C:/Users/Morten/Google Drev/SpyderProject/test/Test_sentdex_comp_screener_own_webscraper2.py", line 47, in graphData
    converters={ 0: mdates.strpdate2num('%Y-%m-%d')})

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\numpy\lib\npyio.py", line 1366, in genfromtxt
    fhd = iter(np.lib._datasource.open(fname, 'rb'))

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\numpy\lib\_datasource.py", line 151, in open
    return ds.open(path, mode)

  File "C:\Program Files\WinPython-64bit-3.3.5.7\python-3.3.5.amd64\lib\site-packages\numpy\lib\_datasource.py", line 501, in open
    raise IOError("%s not found." % path)

OSError: 2009-12-31,6520448000.0
2010-12-31,8505000000.0
2011-12-31,9737000000.0
2012-12-31,10737000000.0
2013-12-31,12920000000.0 not found.

I don't know if this thing is better, or worse...

Remove the try/catch surrounding your code block, run it again, and print the stacktrace. the "except Exception as e:" line is masking the fault. — rask004, Oct 02 '15 at 07:48

score 0 · Answer 1 · answered Oct 02 '15 at 08:14

In Python 3.x the urllib.request.urlopen(...).read() function, if successful, returns a ByteArray - not a String Object.

A solution to convert the ByteArray to a String, is as follows:

...
netIncome = str(urllib.request.urlopen('https://www.quandl.com/api/v3/datasets/RAYMOND/'+stock.upper()+'_NET_INCOME_A.csv?'+endLink).read())[2:-1]
# convert to string, remove leading "b'" and trailing "'" characters.
# netIncome = 'head\\ndata\\ndata\\n...'
...

splitNI = netIncome.split('\\')[1:-1]
# data segments still have leading 'n' character.
# the [1:-1] is more pythonic and releases memory.
for i in range (len(splitNI)):
    splitNI[i] = splitNI[i][1:]
# data segments are now converted.

print('Net Income:')
for eachNI in splitNI:
    print(eachNI)
    netIncomeAr.append(eachNI)

score 0 · Answer 2 · 2015-10-05T12:12:47.547

0

As Roland points out is the problem that it is a bytearray that is returned and not a string.

The code should however look like this:

netIncomeBytes = urllib.request.urlopen('https://www.quandl.com/api/v3/datasets/RAYMOND/'+stock.upper()+'_NET_INCOME_A.csv?'+endLink).read()
netIncome = netIncomeBytes.decode("utf-8")

This will convert the bytearray to a string in utf-8.

edited Oct 05 '15 at 12:12

answered Oct 05 '15 at 10:41

Thanks David, Not an easy one to crack I suppose. Now I get another error, and do not see any data at all. See my edit in above question. – Morten Oct 05 '15 at 15:21
That did solve the first problem indeed. The problem is now in the line where you create the converters. Try to add it as a list instead (surround with [ and ]. It says that it cannot get the i'th converter, so that is why I am guessing that it have to be a list. Can you find the doc for that function you are calling? – Oct 09 '15 at 18:47
It seems to work here without a list [ ]. http://stackoverflow.com/questions/22582691/text-file-mdates-strpdate2num-error Or i'm not sure where you would make a list...? – Morten Oct 10 '15 at 14:32
OK added a list in the converter. See edits in my initial question. Still see an error. ??? – Morten Oct 10 '15 at 15:10
I have read the docs for the loadtext function. The converters is a dict and not a list as I thought. No need to use [ and ]. – Oct 11 '15 at 16:12
netIncomeAr must be a string and not a list. Use ''.join(netIncomeAr) to convert to string. – Oct 11 '15 at 16:15
So you mean that this line should look different: netIncomeAr = [ ] How should it look? I cannot create an empty string. Should I delete the line alltogether??? – Morten Oct 13 '15 at 11:10
Change `incomeDate, income = np.loadtxt(netIncomeAr, delimiter=',',unpack=True, converters={ 0: mdates.strpdate2num('%Y-%m-%d')})` to `incomeDate, income = np.loadtxt('\n'.join(netIncomeAr), delimiter=',',unpack=True, converters={ 0: mdates.strpdate2num('%Y-%m-%d')})` – Oct 13 '15 at 11:40
Thanks David, See in my question the Stacktrace error – Morten Oct 14 '15 at 07:57
The documentation for loadtxt states that the first argument must be a filename. If you want to use a string must you do something like the first example here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html Using the StringIO will make a string that behave like a file, and that will make the loadtxt eat the string – Oct 14 '15 at 08:25
Thanks David, This gives me no change in the error... See question. http://stackoverflow.com/questions/28200366/python-3-4-0-email-package-install-importerror-no-module-named-cstringio – Morten Oct 14 '15 at 10:40
I did not suggest you used stringio like this: `incomeDate, income = StringIO(np.loadtxt('\n'.join(netIncomeAr), delimiter=',',unpack=True, converters={ 0: mdates.strpdate2num('%Y-%m-%d')}))` but like this: `incomeDate, income = np.loadtxt(StringIO('\n'.join(netIncomeAr)), delimiter=',',unpack=True, converters={ 0: mdates.strpdate2num('%Y-%m-%d')})` Please see the first example here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html and note that it is used as the first argument for loadtxt – Oct 14 '15 at 11:42
Raschgu! With this you latest change of the converter line, we are back to the error: TypeError: strptime() argument 0 must be str, not Hmmm I do not understand - but I try to???? – Morten Oct 14 '15 at 12:30
Please the see in the stacktrace the following call `exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)` this indicates that the file is open in binary mode. Try to see if you can get it not do that. It must be in the documentation for the loadtxt method. – Oct 15 '15 at 07:04

Not getting txt into Python

2 Answers2