I wonder how I can replace specific values when loading data from a given (csv) file with multiple columns, combining both strings and numerical values.
In the example that follows, suppose that you have a number of geographical positions, with known latitudes and longitudes and a specific set of properties (P1-P5) and a class (just to include the string component of the problem). There are some missing values which are properly replaced by genfromtxt (missing value in this case is -999) and there are, additionally, values that are not correct (fake, or other kinds of flags) such as 0.0. How can we replace 0.0 to -999 ?
Data:
Name,lat,long,P1,P2,P3,P4,P5,Class
id1,71.234,10.123,0.0,11,212,222,1920,A
id2,72.234,11.111,,,312,342,1920,A
id3,77.832,12.111,1,0.0,,333,4520,B
id4,77.987,12.345,3,0.0,,231,2020,B
id5,77.111,13.099,5,11,212,222,1920,A
And the code so far:
dfile = "data.csv"
missing_value = -999
import numpy as np
data = np.genfromtxt(dfile, unpack=True, comments='#', names=True,
autostrip='Yes', filling_values=missing_value,
dtype=('S5', 'float', 'float', 'float', 'float', 'float', 'float', 'S1')
, delimiter=',',
)
new_data = np.where(data!=0.0 ,data, -999)
I have used the np.where as in np.where(data!=0.0 ,data, -999)
but I got an error:
TypeError: invalid type promotion
I do not know what I am missing...
ps 1. Perhaps it is solvable with pandas but I am looking for an independent solution
ps 2. I know that a dirty workaround would be to set the incorrect values (of 0.0s) as my missing flag in the initial file, but what is there are multiple values that we would like to exclude ? (or combining data with different flags)