0

I am trying to create MX2 numpy matrix or array from the following file contents:

shell: head WORLD#America.csv
"2013-04-17 12","3","WORLD","#America"
"2013-04-17 13","9","WORLD","#America"
"2013-04-17 14","4","WORLD","#America"
"2013-04-17 15","3","WORLD","#America"
"2013-04-17 16","7","WORLD","#America"
"2013-04-17 17","8","WORLD","#America"
"2013-04-17 18","6","WORLD","#America"
"2013-04-17 19","6","WORLD","#America"
"2013-04-17 20","6","WORLD","#America"
"2013-04-17 21","2","WORLD","#America"

I have come across the genfromtxt() function but have been unsuccessful in extracting my data. With a file called f I tried the following: ts = genfromtxt(f, delimiter=",") and got an array filled all with nan. This was only a first attempt, so I read the documentation about the dtype parameter which specifies the data-type of the array. It appears that to get an MX2 matrix with entries of the form (datetime, int) I would have the following: dtype=[('f1', datetime64), ('f2', uint)]. When I did this, I got the following assigned to variable ts:

(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L)],
dtype=[('f1', ('<M8[us]', {})), ('f2', '<u8')])

Every value I got for the matrix is some constant... Why did it not read from my file? Obviously this is not the output that I should get.

How do I get the desired MX2 matrix or array with the first column being the datetime and the second column being an integer as shown with the head command?

askewchan
  • 45,161
  • 17
  • 118
  • 134
CodeKingPlusPlus
  • 15,383
  • 51
  • 135
  • 216

1 Answers1

0

As pointed out in the comments, one difficulty in reading this file with genfromtxt is the presence of quote characters. Maybe it's best to just (programmatically) remove the quotes, but it's also possible to cheat around the issue: Specify the quote character as the delimiter:

np.genfromtxt(filename, delimiter='"', dtype=str, comments=None)[0]
# array(['', '2013-04-17 12', ',', '3', ',', 'WORLD', ',', '#America', ''], 
#       dtype='|S13')

Now the file is interpreted as having 9 columns, where the second and fourth column contain the data of interest.

Another problem is to specify the dtype for the date-time column. In more recent(?) versions of Numpy you have to specify the time/date unit or genfromtxt throws an error. In this case apparently you need to use M8[h] as the dtype, to specify an hourly unit.

All in all, I was able to load the file with:

ts = np.genfromtxt(filename, 
                   delimiter='"', 
                   dtype='M8[h], uint', 
                   usecols=[1,3])

Alternatively, you could look into using a converter or try the CSV reader from Pandas.