So I have an excel file that looks like this
Name R s l2 max_amplitude ref_amplitude
R_0.3_s_0.5_l2_0.1 0.3 0.5 0.1 1.45131445 1.45131445
R_0.3_s_0.5_l2_0.6 0.3 0.5 0.6 3.52145743 3.52145743
...
R_1.1_s_2.0_l2_1.6 1.1 2.0 1.6 5.07415199 5.07415199
R_1.1_s_2.0_l2_2.1 1.1 2.0 2.1 5.78820419 5.78820419
R_1.1_s_2.0_l2_2.6 1.1 2.0 2.6 5.84488964 5.84488964
R_1.1_s_2.0_l2_3.1 1.1 2.0 3.1 6.35387516 6.35387516
Using the pandas module I import the data into data frame
import pandas as pd
df = pd.read_excel("output_var.xlsx", header=0)
Everything seems to be ok:
df
in the command line produces:
R s l2 max_amplitude ref_amplitude
0 0.3 0.5 0.1 1.451314 1.451314
1 0.3 0.5 0.6 3.521457 3.521457
2 0.3 0.5 1.1 4.770226 4.770226
...
207 1.1 2.0 2.1 5.788204 5.788204
208 1.1 2.0 2.6 5.844890 5.844890
209 1.1 2.0 3.1 6.353875 6.353875
[210 rows x 5 columns]
Now I need to do some calculations based on the value of R so I need to do slicing of the array. Column R containes 5 different values: 0.3, 0.5, 0.7, 0.9 and 1.1. Each of these 5 values has 42 rows. (5x42=210) To remove the duplicates from "R" I try
set(df.R)
which returns:
{0.29999999999999999,
0.5,
0.69999999999999996,
0.89999999999999991,
0.90000000000000002,
1.1000000000000001}
Beside from representing the 0.3 as 0.29999 etc. there are 6 (instead of 5) different values for R. It seams that sometimes 0.9 gets interpreted as 0.89999999999999991 and sometimes as 0.90000000000000002 This can be (partialy) solved with:
set(round(df.R,1))
which (at least) returns 5 values:
{0.29999999999999999,
0.5,
0.69999999999999996,
0.90000000000000002,
1.1000000000000001}
But now I come to the dangerous part. If I want to do the slicing according to the known values of R (0.3, 0.5, 0.7, 0.9 and 1.1)
len(df[df.R==0.3])
returns
42
and
len(df[df.R==0.9])
returns
41
One value gets deleted by Python! (remember, there are 42 rows for each of 5 R's giving the total number of 210 rows in the file). How to deal with this problem?