2

I am having some issue converting a multidimensional list into a Pandas dataframe. The problem is related to the numeric fields: I have some number in a non-standard format, as you can see from this table (scraped using tabula.py):

[                 Unnamed: 0   0    Stück          kg € / kg  0.1  Stück.1  \
0           Region Nord-Ost NaN   64.852   6.269.400   1,60  0.0   37.408   
1    Niedersachsen / Bremen NaN  164.424  15.993.570   1,59  0.0   88.625   
2       Nordrhein-Westfalen NaN  179.692  17.422.749   1,59  0.0   73.199   
3  Hessen / Rheinland-Pfalz NaN    6.322     610.099   1,61  NaN   10.281   
4         Baden-Württemberg NaN   21.924   2.135.045   1,62  0.0   22.661   
5                    Bayern NaN   21.105   2.062.882   1,62  0.0   18.188   
6        Deutschland gesamt NaN  458.319  44.493.745   1,59  NaN  250.362   

         kg.1 € / kg.1  
0   3.632.427     1,56  
1   8.683.864     1,56  
2   7.155.988     1,55  
3   1.004.925     1,60  
4   2.220.986     1,63  
5   1.798.013     1,58  
6  24.496.203     1,57  ]

In this case the dot is the thousand separator. When i convert it to a Dataframe, those number become float (I think), and the result is the following.

                 Unnamed: 0    0    Stück          kg € / kg  0.1  \
0           Region Nord-Ost  nan   64.852   6.269.400   1,60  0.0   
1    Niedersachsen / Bremen  nan  164.424  15.993.570   1,59  0.0   
2       Nordrhein-Westfalen  nan  179.692  17.422.749   1,59  0.0   
3  Hessen / Rheinland-Pfalz  nan    6.322     610.099   1,61  nan   
4         Baden-Württemberg  nan   21.924   2.135.045   1,62  0.0   
5                    Bayern  nan   21.105   2.062.882   1,62  0.0   
6        Deutschland gesamt  nan  458.319  44.493.745   1,59  nan   

              Stück.1        kg.1 € / kg.1  
0              37.408   3.632.427     1,56  
1              88.625   8.683.864     1,56  
2              73.199   7.155.988     1,55  
3  10.280999999999999   1.004.925     1,60  
4  22.660999999999998   2.220.986     1,63  
5              18.188   1.798.013     1,58  
6             250.362  24.496.203     1,57

I would like to consider those numbers like strings, and then replace the dots with nothing, converting the number to a standard integer, but I cannot find a way to do that.

I already tried to set the dtype of the df to string, like this:

df = pd.DataFrame(table[0], dtype=str);

But the problem is still there, any suggestions?

ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
  • You can use `Series.str.replace` and replace `.` by `,` for the columns you want, but you may need to first convert the to string – ThePyGuy Jun 09 '21 at 09:58

0 Answers0