I have a df I call low
with columns containing both float and strings. I need to remove the * while preserving the exact numbers and then cast everything to float.
Li Be Sc
0 12.1737 0.1045 0.03365
1 12.67865 0.1287 *0.0022
2 11.5056 0.10595 0.03115
3 11.79965 0.0945 0.0313
4 12.1499 0.10585 0.03535
5 12.35725 *0.11115 0.06515
6 *10.8227 *0.9693 *0.05815
7 11.63235 0.10275 0.02485
8 12.68735 *0.28995 0.0513
9 11.65235 0.10385 0.0188
data = {
'Li': [12.1737, 12.67865, 11.5056, 11.79965, 12.1499, 12.35725, '*10.8227', 11.63235, 12.68735, 11.65235],
'Be': [0.1045, 0.1287, 0.10595, 0.0945, 0.10585, '*0.11115', '*0.9693', 0.10275, '*0.28995', 0.10385],
'Sc': [0.03365, '*0.0022', 0.03115, 0.0313, 0.03535, 0.06515, '*0.05815', 0.02485, 0.0513, 0.0188]}
low = pd.DataFrame(data)
I was able to remove the * while keeping the numbers intact by producing a mask that only selects values with * in them, casting those values to str and then using str.replace to replace the * with an empty string. The line is
low.iloc[:,2:-1] = low.iloc[:,2:-1].mask(
low.iloc[:,2:-1].apply(pd.to_numeric, errors='coerce').isnull(),
low.iloc[:,2:-1].astype(str).apply(lambda x: x.str.replace("*", "")))
That produces columns which look like this
Li Be Sc
0 12.1737 0.1045 0.03365
1 12.67865 0.1287 0.0022
2 11.5056 0.10595 0.03115
3 11.79965 0.0945 0.0313
4 12.1499 0.10585 0.03535
5 12.35725 0.11115 0.06515
6 10.8227 0.9693 0.05815
7 11.63235 0.10275 0.02485
8 12.68735 0.28995 0.0513
9 11.65235 0.10385 0.0188
and low.info returns
Data columns (total 44 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 3 non-null object
1 Sample Name 3 non-null object
2 Li 3 non-null object
3 Be 3 non-null object
4 Sc 3 non-null object
5 V 3 non-null object
..................
dtypes: object(44)
memory usage: 1.2+ KB
Now I need to convert the columns to float. However if I try
low.iloc[:,2:-1] = low.iloc[:,2:-1].astype(float)
That returns
Data columns (total 44 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 10 non-null object
1 Sample Name 10 non-null object
2 Li 10 non-null object
3 Be 10 non-null object
4 Sc 10 non-null object
5 V 10 non-null object
...................
dtypes: object(44)
memory usage: 3.6+ KB
Similarly
low.iloc[:,2:-1] = low.iloc[:,2:-1].apply(pd.to_numeric, errors='coerce')
low.iloc[:,2:-1] = low.iloc[:,2:-1].apply(lambda x: x.astype(float))
and other variations of above return a dataframe of all type object however
for c in low.columns[2:-1]:
low[c] = low[c].astype(float)
returns a df with columns of float
Data columns (total 44 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 10 non-null object
1 Sample Name 10 non-null object
2 Li 10 non-null float64
3 Be 10 non-null float64
4 Sc 10 non-null float64
5 V 10 non-null float64
........
dtypes: float64(41), object(3)
memory usage: 3.6+ KB
Why does the loop work but versions of astype
and apply
do not?? How can I convert without looping through columns?
]` vs setting *a range of values* using `low.iloc` (or for that matter, `low.loc` or `low[:]`).