30

I'm trying to import a large .csv file containing text and numbers using genfromtxt in numpy. I'm only interested in two columns. I have most of the import sorted out with:

def importfile(root):
    data = root.entry.get()
    atw = np.genfromtxt(data, delimiter=",",
                        skip_header=1,
                        skip_footer=2,
                        autostrip=True,
                        usecols=(25,26),
                        dtype=("|S10"))
    elem = atw[:,0]
    concs = atw[:,1]
        
    print(elem)
    print(concs)

With output for elem and concs respectively:

['Na2O' 'MgO' 'Al2O3' 'SiO2' 'P2O5' 'SO3' 'Cl' 'K2O' 'CaO' 'TiO2' 'Cr2O3'
'MnO' 'FeO' 'NiO' 'Cu2O' 'ZnO' 'Ga2O3' 'SrO' 'Y2O3']

['3.76E+00' '1.31E+01' '1.14E+01' '4.04E+01' '1.24E+00' '5.89E-02'
'2.43E-02' '1.53E+00' '1.49E+01' '2.87E+00' '6.05E-02' '1.96E-01'
'1.17E+01' '3.69E-02' '8.73E-03' '1.39E-02' '1.93E-03' '1.88E-01'
'5.58E-03']

I have tried many different things for converting the concs string into a float, but it doesn't seem to like the fact that the concs are in scientific notation... is there a way to turn the concs values into a float?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Dr. Toboggan
  • 555
  • 1
  • 6
  • 9

4 Answers4

38

The float function can do this:

>>> float('1.31E+01')
13.1

or for a list:

>>> map(float, ['3.76E+00', '1.31E+01', '1.14E+01'])
[3.76, 13.1, 11.4]
That1Guy
  • 7,075
  • 4
  • 47
  • 59
RichieHindle
  • 272,464
  • 47
  • 358
  • 399
  • 3
    Obligatory list comprehension approach: `n = ['3.76E+00', '1.31E+01', '1.14E+01'] [float(i) for i in n]` – Jason Sperske May 13 '14 at 16:23
  • 1
    float(i) won't work for me. I have a mixed list and I want to convert it. Not sure what should I use if I don't want to split it. – Reihan_amn Feb 10 '18 at 22:04
0
 with open( datafile,'r' ) as inData:
     for line in inData:
          j = list( map( float,   filter( None  , [ x for x in line.strip().split(',') ] )) )

Just mentioned generally, as it solves a similar problem that brought me to this page.

mist42nz
  • 97
  • 1
  • 8
0

MAybe that will be helpful for anybody, I had similar problem and I've found on stackoverflow about applying pandas to_numeric to DataFrame columns including replacing commas with dots

import re
import pandas as pd
atw[cc] = pd.to_numeric(atw[cc].apply(lambda x: re.sub(',', '.', str(x))))
Yury Wallet
  • 1,474
  • 1
  • 13
  • 24
-3

Having such list of scientific notations, you can also do this:

1. a = [9.0181446e-01, 1.3179450e-02, 4.3021311e-04, 2.3546994e-03, 3.6531375e-03, 7.8567989e-02]
2. max(a)

Output will be: 0.90181446