0

I am trying to convert my csv file into a numpy array so I can manipulate the numbers and then graph them. I printed my csv file and got:

               ra              dec
0       15:09:11.8     -34:13:44.9
1       09:19:46.8   +33:44:58.452
2     05:15:43.488   +19:21:46.692
3     04:19:12.096    +55:52:43.32

.... there's more code (101 lines x 2 columns), but it is just numbers. I wanted to convert the ra and dec numbers from their current unit to degrees and I thought I could do this by making each column into a numpy array. But when I coded it:

import numpy as np
np_array = np.genfromtxt(r'C:\Users\nstev\Downloads\S190930t.csv',delimiter=".", skip_header=1, usecols=(4))
print(np_array)

I get:

nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan]

I keep changing my delimiter and I have changed it to a colon and got the same thing and a semicolon and plus sign and I got an error saying that it got 2 columns instead of 1. I do not know how to change it so that I do not get this set! Someone help please!

  • 1
    That is not a valid csv, as far as I can tell. It looks like some sort of fixed-width format, but really, that looks like someone pretty-printed a pandas data frame and put that string in a file... is that what happened? – juanpa.arrivillaga Jan 16 '20 at 00:18
  • I am not sure whether or not it is a pandas data frame because my prof emailed the csv file through an excel sheet. Any recommendations on how to go about this problem? – Stevia Ndoe Jan 16 '20 at 00:22
  • 1
    "because my prof emailed the csv file through an excel sheet" - is the file extension `.csv`? If so, don't open it through excel! Even if Windows tries to do so by default. It's not actually "an excel sheet". – user2357112 Jan 16 '20 at 00:24
  • yes! it is a ```.csv``` file. So should I open the file without opening it in excel? Is that the problem? – Stevia Ndoe Jan 16 '20 at 00:35
  • 1
    `np.genfromtxt` uses a default `float` dtype. If an element of a csv is not a valid number, it puts `nan` in that slot of the array. – hpaulj Jan 16 '20 at 00:45
  • First load as two columns of strings, and then look into splitting each string on the colon to get the 3 numbers (deg, min, sec?). One call to `genfromtxt` won't do it. You can't split white space and colon at the same time. – hpaulj Jan 16 '20 at 00:52
  • okay, thank you @hpaulj and others! I will try that. – Stevia Ndoe Jan 16 '20 at 00:58

2 Answers2

1

With a copy-n-paste of your file sample:

In [208]: data = np.genfromtxt('stack59761369.csv',encoding=None,dtype=None,names=True)          
In [209]: data                                                                                   
Out[209]: 
array([('15:09:11.8', '-34:13:44.9'), ('09:19:46.8', '+33:44:58.452'),
       ('05:15:43.488', '+19:21:46.692'),
       ('04:19:12.096', '+55:52:43.32')],
      dtype=[('ra', '<U12'), ('dec', '<U13')])

with this dtype and names I get a structured array, 1d, with 2 fields.

In [210]: data['ra']                                                                             
Out[210]: 
array(['15:09:11.8', '09:19:46.8', '05:15:43.488', '04:19:12.096'],
      dtype='<U12')
In [211]: np.char.split(data['ra'],':')                                                          
Out[211]: 
array([list(['15', '09', '11.8']), list(['09', '19', '46.8']),
       list(['05', '15', '43.488']), list(['04', '19', '12.096'])],
      dtype=object)

this split gives an object dtype array with lists. They can be joined into one 2d array with vstack:

In [212]: np.vstack(np.char.split(data['ra'],':'))                                               
Out[212]: 
array([['15', '09', '11.8'],
       ['09', '19', '46.8'],
       ['05', '15', '43.488'],
       ['04', '19', '12.096']], dtype='<U6')

and converted to floats with:

In [213]: np.vstack(np.char.split(data['ra'],':')).astype(float)                                 
Out[213]: 
array([[15.   ,  9.   , 11.8  ],
       [ 9.   , 19.   , 46.8  ],
       [ 5.   , 15.   , 43.488],
       [ 4.   , 19.   , 12.096]])
In [214]: np.vstack(np.char.split(data['dec'],':')).astype(float)                                
Out[214]: 
array([[-34.   ,  13.   ,  44.9  ],
       [ 33.   ,  44.   ,  58.452],
       [ 19.   ,  21.   ,  46.692],
       [ 55.   ,  52.   ,  43.32 ]])

pandas

In [256]: df =  pd.read_csv('stack59761369.csv',delim_whitespace=True)                           
In [257]: df                                                                                     
Out[257]: 
             ra            dec
0    15:09:11.8    -34:13:44.9
1    09:19:46.8  +33:44:58.452
2  05:15:43.488  +19:21:46.692
3  04:19:12.096   +55:52:43.32
In [258]: df['ra'].str.split(':',expand=True).astype(float)                                      
Out[258]: 
      0     1       2
0  15.0   9.0  11.800
1   9.0  19.0  46.800
2   5.0  15.0  43.488
3   4.0  19.0  12.096
In [259]: df['dec'].str.split(':',expand=True).astype(float)                                     
Out[259]: 
      0     1       2
0 -34.0  13.0  44.900
1  33.0  44.0  58.452
2  19.0  21.0  46.692
3  55.0  52.0  43.320

direct line read

In [279]: lines = []                                                                             
In [280]: with open('stack59761369.csv') as f: 
     ...:     header=f.readline() 
     ...:     for row in f: 
     ...:         alist = row.split() 
     ...:         alist = [[float(i) for i in astr.split(':')] for astr in alist] 
     ...:         lines.append(alist) 
     ...:                                                                                        
In [281]: lines                                                                                  
Out[281]: 
[[[15.0, 9.0, 11.8], [-34.0, 13.0, 44.9]],
 [[9.0, 19.0, 46.8], [33.0, 44.0, 58.452]],
 [[5.0, 15.0, 43.488], [19.0, 21.0, 46.692]],
 [[4.0, 19.0, 12.096], [55.0, 52.0, 43.32]]]
In [282]: np.array(lines)                                                                        
Out[282]: 
array([[[ 15.   ,   9.   ,  11.8  ],
        [-34.   ,  13.   ,  44.9  ]],

       [[  9.   ,  19.   ,  46.8  ],
        [ 33.   ,  44.   ,  58.452]],

       [[  5.   ,  15.   ,  43.488],
        [ 19.   ,  21.   ,  46.692]],

       [[  4.   ,  19.   ,  12.096],
        [ 55.   ,  52.   ,  43.32 ]]])
In [283]: _.shape                                                                                
Out[283]: (4, 2, 3)

First dimension is the number of rows; second the 2 columns, third the 3 values in a column

conversion to degree

In [285]: _282@[1,1/60,1/360]                                                                    
Out[285]: 
array([[ 15.18277778, -33.65861111],
       [  9.44666667,  33.8957    ],
       [  5.3708    ,  19.4797    ],
       [  4.35026667,  55.987     ]])

oops, that -34 deg value is wrong; all terms of an element have to have the same sign.

correction

Identify the elements with a negative degree:

In [296]: mask = np.sign(_282[:,:,0])                                                            
In [297]: mask                                                                                   
Out[297]: 
array([[ 1., -1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

adjust all 3 terms accordingly:

In [298]: x = np.abs(_282)*mask[:,:,None]                                                        
In [299]: x                                                                                      
Out[299]: 
array([[[ 15.   ,   9.   ,  11.8  ],
        [-34.   , -13.   , -44.9  ]],

       [[  9.   ,  19.   ,  46.8  ],
        [ 33.   ,  44.   ,  58.452]],

       [[  5.   ,  15.   ,  43.488],
        [ 19.   ,  21.   ,  46.692]],

       [[  4.   ,  19.   ,  12.096],
        [ 55.   ,  52.   ,  43.32 ]]])
In [300]: x@[1, 1/60, 1/360]                                                                     
Out[300]: 
array([[ 15.18277778, -34.34138889],
       [  9.44666667,  33.8957    ],
       [  5.3708    ,  19.4797    ],
       [  4.35026667,  55.987     ]])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thank you so much! I do have a question though, what does the ```_282``` line do? – Stevia Ndoe Jan 17 '20 at 01:07
  • `_282` is `ipython` short hand for `Out[282]`, the value produced by the `In[282]` command. It's an extension of the `_` history of a regular interactive session. – hpaulj Jan 17 '20 at 01:21
0

The nan is probably NaN (Not a Number). Try setting the data type to None (dtype=None).

Also, try omitting delimiter. By default, any consecutive whitespaces act as delimiter.

Not sure what you're expecting, but maybe this will be a better starting point...

import numpy as np

np_array = np.genfromtxt(r"C:\Users\nstev\Downloads\S190930t.csv", skip_header=1, dtype=None, encoding="utf-8", usecols=(1, 2))
print(np_array)

printed output...

[['15:09:11.8' '-34:13:44.9']
 ['09:19:46.8' '+33:44:58.452']
 ['05:15:43.488' '+19:21:46.692']
 ['04:19:12.096' '+55:52:43.32']]

Disclaimer: I don't use numpy. I based my answer on https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html

Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
  • I plugged in your code and I am getting the error ```Line #2 (got 3 columns instead of 2)``` but for each line of my code. I do have colons, plus signs and one minus sign in my data, so maybe that is the problem? My code is only messing up from line two which is where the rest of the data is positive and line one is the only negative number in the set. – Stevia Ndoe Jan 16 '20 at 00:44
  • I copied your input directly from your question and didn't have any issues. I don't see how the colons or plus/minus signs would cause a column error. There must be some difference between your actual file contents and what's shown in your question. Hopefully someone with numpy experience can chime in and we will both learn something. ;-) – Daniel Haley Jan 16 '20 at 00:50
  • @SteviaNdoe, if you are having problems processing the column strings after loading, you need to show the relevant code. What Daniel has show is the best you'll get directly from `genfromtxt`. – hpaulj Jan 16 '20 at 00:54
  • this is my code: ```import csv from collections import defaultdict columns = defaultdict(list) import pandas as pd io = pd.read_csv(r'C:\Users\nstev\Downloads\S190930t.csv',sep=",",usecols=(4,5)) print(io)``` – Stevia Ndoe Jan 16 '20 at 00:57