-1

condensed code

# attempt to condense code while preserving the parts
# relevant to the question

from xml.sax import handler, make_parser

class pdosHandler(handler.ContentHandler):
    def __init__(self, data):
        self.data   = data
        self.parts  = { 'energy_values': 0 }
        self.energy_values = []

    def startDocument( self ):
        print "Reading started"

    def startElement(self, name, attrs):
        for key, val in self.parts.iteritems():
            if( name == key ):
                self.parts[key] = 1;

    def characters( self, ch ):
        if self.parts['energy_values'] :
            if ch != '\n':
                self.data.energy_values.append(float(ch.strip()))

def pdosreader(inp, data):
    handler = pdosHandler(data)
    parser = make_parser()
    parser.setContentHandler(handler)
    inFile = open(inp)
    parser.parse(inFile)

    inFile.close()

line 153-155:

if( self.parts['energy_values'] ):
    if( ch != '\n' ):
        self.data.energy_values.append( string.atof(normalize_whitespace( ch ) ) )

error:

Traceback (most recent call last):
  File "siesta_pdos.py", line 286, in <module>
    main()
  File "siesta_pdos.py", line 278, in main
    pdosreader( args[0], data )
  File "siesta_pdos.py", line 262, in pdosreader
    parser.parse( inFile )
  File "/usr/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/usr/lib/python2.7/xml/sax/expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
  File "siesta_pdos.py", line 155, in characters
    self.data.energy_values.append( string.atof(normalize_whitespace( ch ) ) )
  File "/usr/lib/python2.7/string.py", line 388, in atof
    return _float(s)
ValueError: could not convert string to float:

inputfile:

<pdos>
<nspin>2</nspin>
<norbitals>7748</norbitals>
<energy_values>
           -29.99997
           -29.98997
           -29.97996
           ...
           ... (3494 lines skipped)
           ...
             4.97999
             4.98999
             4.99999
</energy_values>
</pdos>

full input at: http://dl.dropbox.com/u/10405722/inputfile.dat

full code at: http://dl.dropbox.com/u/10405722/siesta_pdos.py


The code reads correctly the first 3116 values and then exits with the error. Note that the same code with a shorter input (e.g. 3000 lines) works fine. Therefore it seems to me a buffer-related error that has nothing to do with the atof.

Any idea?

cipper
  • 1
  • 2
  • 3
    why don't you use "float()" instead of string.atof, also please copy in your question the value that threw the exception, as well as the relevant source code, it is not acceptable to link it from the net only. – Antti Haapala -- Слава Україні Aug 16 '13 at 12:16
  • 1
    What is the 3117th value? – doctorlove Aug 16 '13 at 12:21
  • 1
    Why not test for `if ch.strip():` instead? – Martijn Pieters Aug 16 '13 at 12:22
  • Thanks for the comments. The exception is always at line 3116, independently on the value contained in it (if you remove some lines at the top, the code always stops at line 3116). There are 3500 lines in the input file, all lines with a single float as shown above. – cipper Aug 19 '13 at 07:28
  • float() doesn't solve, and normalize_whitespace() is a call to string.strip(). – cipper Aug 19 '13 at 07:35
  • I'm having trouble figuring if this parser works at all. In fact, all you tell us is that it faults on line 3116 of the input, which means it may not be doing anything correctly but you never see it because the exception keeps anything from being reported. – msw Aug 19 '13 at 10:23
  • As written in the main post, the code works fine with shorter inputs. I have attached the link to an example of input file. – cipper Aug 19 '13 at 10:30
  • **SOLVED!** The problem was indeed related to the **bufsize** parameter defined in /usr/lib/python2.7/xml/sax/expatreader.py and in /usr/lib/python2.7/xml/sax/xmlreader.py By increasing from 2\*\*16 to 2\*\*17 the code works fine. **Is there any way to change the bufsize during the call, without manually changing the file? (I'm clearly not a python expert)** – cipper Aug 20 '13 at 10:15

1 Answers1

0

The documentation says that string.atof is

Deprecated since version 2.0: Use the float() built-in function.

You claim that float() doesn't work, which probably means that your input is invalid. It is very easy to use print when finding out why something doesn't work as you expect

if( ch != '\n' ):
    print repr(ch), repr(ch.strip())
    print repr(normalize_whitespace(ch))
    print repr(float(ch.strip()))
    self.data.energy_values.append(string.atof(normalize_whitespace(ch)))

Because you had to explain normalize_whitespace, that means it is a bad synonym; if you just called it strip, every reader would know what it did without having to look it up.

In case you don't know repr is intended to reduce ambiguity. For example:

>>> x = '1.234'
>>> print x
1.234
>>> print repr(x)
'1.234'
>>> print repr(float(x))
1.234

with the first print, it is unclear whether x is numeric or a string. With repr, there is no guessing involved.

msw
  • 42,753
  • 9
  • 87
  • 112
  • Hello, the input is valid. I had already tried to print out the value in order to understand the error. With that I could understand that the error is related to the number of lines, not to its content. I've included all the input in the main post, so you can verify yourself. – cipper Aug 19 '13 at 09:08
  • So are you saying that `float(u'')` raises a ValueError? Yep, it does. – msw Aug 19 '13 at 09:12
  • I agree, the problem is now to understand why the string is empty while the input is not. – cipper Aug 19 '13 at 09:17
  • Please take a look at the revised post. Thank you very much for the hints. – cipper Aug 19 '13 at 10:37
  • **SOLVED!** The problem was indeed related to the **bufsize** parameter defined in /usr/lib/python2.7/xml/sax/expatreader.py and in /usr/lib/python2.7/xml/sax/xmlreader.py By increasing from 2\*\*16 to 2\*\*17 the code works fine. **Is there any way to change the bufsize during the call, without manually changing the file? (I'm clearly not a python expert)** – cipper Aug 20 '13 at 10:14