I am having some issue converting a multidimensional list into a Pandas dataframe. The problem is related to the numeric fields: I have some number in a non-standard format, as you can see from this table (scraped using tabula.py):
[ Unnamed: 0 0 Stück kg € / kg 0.1 Stück.1 \
0 Region Nord-Ost NaN 64.852 6.269.400 1,60 0.0 37.408
1 Niedersachsen / Bremen NaN 164.424 15.993.570 1,59 0.0 88.625
2 Nordrhein-Westfalen NaN 179.692 17.422.749 1,59 0.0 73.199
3 Hessen / Rheinland-Pfalz NaN 6.322 610.099 1,61 NaN 10.281
4 Baden-Württemberg NaN 21.924 2.135.045 1,62 0.0 22.661
5 Bayern NaN 21.105 2.062.882 1,62 0.0 18.188
6 Deutschland gesamt NaN 458.319 44.493.745 1,59 NaN 250.362
kg.1 € / kg.1
0 3.632.427 1,56
1 8.683.864 1,56
2 7.155.988 1,55
3 1.004.925 1,60
4 2.220.986 1,63
5 1.798.013 1,58
6 24.496.203 1,57 ]
In this case the dot is the thousand separator. When i convert it to a Dataframe, those number become float (I think), and the result is the following.
Unnamed: 0 0 Stück kg € / kg 0.1 \
0 Region Nord-Ost nan 64.852 6.269.400 1,60 0.0
1 Niedersachsen / Bremen nan 164.424 15.993.570 1,59 0.0
2 Nordrhein-Westfalen nan 179.692 17.422.749 1,59 0.0
3 Hessen / Rheinland-Pfalz nan 6.322 610.099 1,61 nan
4 Baden-Württemberg nan 21.924 2.135.045 1,62 0.0
5 Bayern nan 21.105 2.062.882 1,62 0.0
6 Deutschland gesamt nan 458.319 44.493.745 1,59 nan
Stück.1 kg.1 € / kg.1
0 37.408 3.632.427 1,56
1 88.625 8.683.864 1,56
2 73.199 7.155.988 1,55
3 10.280999999999999 1.004.925 1,60
4 22.660999999999998 2.220.986 1,63
5 18.188 1.798.013 1,58
6 250.362 24.496.203 1,57
I would like to consider those numbers like strings, and then replace the dots with nothing, converting the number to a standard integer, but I cannot find a way to do that.
I already tried to set the dtype
of the df to string, like this:
df = pd.DataFrame(table[0], dtype=str);
But the problem is still there, any suggestions?