0

I need to get data from csv files and convert it into a numpy array with the first row being the column titles of the csv files. I have been trying different ways, the headers need to have a unicode character length of <U30, when I use pandas to read the file and convert them to list and insert the column names to the 0th index of the array and convert them back to ndarray, the character length is <U32. Any help on reading the files will be really appreciated. Here is the sample data in the csv file created_at,user_ID,review_ID,latitude,longitude,location_ID,friend_count,follower_count,sad,happy,surprise,disgust,joy Thu Jan 30 06:58:27 +0000 2020,98675,1,30.23590912,-97.79513958,22847,#####,############,0.421,0.442,0.452,0.397,0.357 Thu Jan 30 06:03:24 +0000 2020,67730,2,30.26910295,-97.74939537,420315,#####,############,0.469,0.408,0.488,0.377,0.35 Thu Jan 30 06:19:25 +0000 2020,11576,3,30.25573099,-97.76338577,316637,#####,##########,0.542,0.361,0.276,0.27,0.424 Thu Jan 30 06:16:38 +0000 2020,87911,4,30.26341812,-97.75759667,16516,#####,############,0.418,0.499,0.352,0.367,0.291 Thu Jan 30 06:08:09 +0000 2020,148147,5,30.27429186,-97.74052262,553587,#####,###########,0.242,0.632,0.532,0.501,0.199 Thu Jan 30 05:57:19 +0000 2020,13092,6,30.2615994,-97.7585806,15372,######,######################,0.477,0.451,0.497,0.48,0.271 Thu Jan 30 05:54:49 +0000 2020,29392,7,30.26790958,-97.74931242,21714,#####,###########,0.356,0.503,0.517,0.437,0.227 Thu Jan 30 05:58:35 +0000 2020,159882,8,30.26910295,-97.74939537,420315,#####,#####################,0.439,0.481,0.51,0.402,0.285 Thu Jan 30 05:56:41 +0000 2020,30580,9,30.28112041,-97.74521112,153505,#####,#####################,0.218,0.672,0.599,0.595,0.179 Thu Jan 30 05:58:50 +0000 2020,151421,10,30.26910295,-97.74939537,420315,#####,############,0.398,0.355,0.311,0.372,0.265 Thu Jan 30 06:20:11 +0000 2020,149730,11,40.64388454,-73.7828064,23261,#####,######################,0.542,0.344,0.282,0.334,0.389 Thu Jan 30 05:59:59 +0000 2020,12487,12,40.74137425,-73.98810522,16907,#####,###########,0.408,0.478,0.515,0.48,0.253 Thu Jan 30 05:57:33 +0000 2020,145831,13,40.7413882,-73.98945451,12973,#####,####################,0.55,0.311,0.347,0.338,0.342 Thu Jan 30 06:21:18 +0000 2020,29261,14,40.72491033,-73.99462075,341255,#####,############,0.673,0.33,0.309,0.27,0.547 Thu Jan 30 06:02:29 +0000 2020,8633,15,40.72976831,-73.99853533,260957,#####,#####################,0.404,0.576,0.467,0.414,0.338. It is a test and it requires all the other tests pass except for the string character length of the ndarray[0] which is supposed to be the column name, the required is <U30 but mine displays <U32

Andrew
  • 105
  • 2
  • 10
  • What's so important about 'U30' as opposed to 'U32'? Without a sample of the file, or maybe the dataframe, I don't think we can help. The description is pretty vague. – hpaulj Oct 10 '21 at 16:54
  • 1
    Is there a reason you have to have an array of strings? It's pretty rare that an array of strings isn't a terrible idea. Maybe try a dataframe instead. – CJR Oct 10 '21 at 17:47
  • I have edited the question to include sample csv file, I need to include the column headers in the array and the are supposed to be – Andrew Oct 10 '21 at 23:39
  • When I try this data = np.genfromtxt(filename,dtype=None,delimiter=',',names=True) I get an error *** Line #2 (got 7 columns instead of 13)*** – Andrew Oct 10 '21 at 23:48

0 Answers0