1

Maybe this has been answered before but I'm having a hard time searching the question. Assume I have the following data in a file:

date, id, int1, int2, int3
02/03/2015, 2, 23, 65, 99
10/06/2016, 4, 84, 12, 35
10/01/2017, 6, 53, 6, 78

I can quickly write a numpy snippet in the form:

import StringIO
import numpy as np

hdr = 'date, id, int1, int2, int3'
date = '''
02/03/2015, 2, 23, 65, 99
10/06/2016, 4, 84, 12, 35
10/01/2017, 6, 53, 6, 78
'''
lines = '%s%s' % (hdr, date)
pseudo_file = StringIO.StringIO(lines)
np_dtypes = 'S10,%s' % ','.join(['i4' for x in hdr.split(',')[1:]])

np1 = np.genfromtxt(pseudo_file, delimiter=',', names=True, dtype=np_dtypes)

print np1
print np1.dtype.names
print np1.shape
print np1['date']
print np1['int3']

This will give me the following output:

[('02/03/2015', 2, 23, 65, 99) ('10/06/2016', 4, 84, 12, 35)
 ('10/01/2017', 6, 53, 6, 78)]
('date', 'id', 'int1', 'int2', 'int3')
(3L,)
['02/03/2015' '10/06/2016' '10/01/2017']
[99 35 78]

One can see numpy was able to parse successfully the array. However, how do I split this in 2 portions:

  1. A 1D array with only the strings (the dates column);
  2. Another 1D array with only the integers.

The split should be done in a way that will keep the names structure of each columns.

Evan Carslake
  • 2,267
  • 15
  • 38
  • 56

1 Answers1

0

haven't you already split the strings with np1['date']? To keep its column name, you could put the 'date' column name in a list (thanks @hpaulj):

dates=np1[['date']]
dates
#array([('02/03/2015',), ('10/06/2016',), ('10/01/2017',)], 
#      dtype=[('date', 'S10')])

And to get the ints:

ints=np1[['int1','int2','int3']]
ints
#array([(23, 65, 99), (84, 12, 35), (53, 6, 78)], 
#      dtype=[('int1', '<i4'), ('int2', '<i4'), ('int3', '<i4')])
tmdavison
  • 64,360
  • 12
  • 187
  • 165
  • 1
    `dates = np1[['date']]` should work for the date column(s), just like the 'int' list did. – hpaulj Aug 25 '15 at 21:08
  • Thanks @tom. I was lacking very basic numpy usage (and still do). I would take ages to think on this solution. Mainly because I was searching for numpy methods. lol. ;-) Cheers. – Fabio Kasper Aug 25 '15 at 22:25
  • @hpaulj that doesn't keep the name of the column, which the question requested – tmdavison Aug 26 '15 at 06:14
  • http://docs.scipy.org/doc/numpy/user/basics.rec.html#accessing-multiple-fields-at-once - its an issue of accessing multiple fields at once. – hpaulj Aug 26 '15 at 06:41
  • 1
    If `'date'` is in a list, like your int case, the full dtype, with name, will be returned. That list indexing works whether there is one item in the list or 3. – hpaulj Aug 26 '15 at 06:45
  • ah, sorry, I missed the extra set of brackets in your comment :) Edited my answer with your suggestion – tmdavison Aug 26 '15 at 06:57