2

I am new to Python...here is my problem. For an optimizing subroutine I am testing in Python, I need to parse a csv file with numbers.

The format of the csv file is thus:

Support load summary for anchor at node 5,

Load combination,FX (N),FY (N),FZ (N),MX (Nm),MY (Nm),MZ (Nm),,

Sustained,-3,-2679,120,2012,164,69,,
Operating1,1472,2710,-672,-4520,8743,-2047,,
Maximum,1472,2710,120,2012,8743,69,,
Minimum,-3,-2679,-672,-4520,164,-2047,,

Support load summary for anchor at node 40,

Load combination,FX (N),FY (N),FZ (N),MX (Nm),MY (Nm),MZ (Nm),,

Sustained,9,-3872,-196,-91,854,-3914,,
Operating1,-2027,-8027,3834,-7573,-9102,-6323,,
Maximum,9,-3872,3834,-91,854,-3914,,
Minimum,-2027,-8027,-196,-7573,-9102,-6323,,

Support load summary for anchor at node 125,

Load combination,FX (N),FY (N),FZ (N),MX (Nm),MY (Nm),MZ (Nm),,

Sustained,-7,-2448,76,264,83,1320,,
Operating1,556,-3771,-3162,-6948,-1367,1272,,
Maximum,556,-2448,76,264,83,1320,,
Minimum,-7,-3771,-3162,-6948,-1367,1272,,

Support load summary for Hanger at node 10,

Load combination,Load (N),,

Sustained,-3668,,
Operating1,-13876,,
Maximum,-3668,,
Minimum,-13876,,

Support load summary for Hanger at node 20B,

Load combination,Load (N),,

Sustained,-14305,,
Operating1,-13359,,
Maximum,-13359,,
Minimum,-14305,,

Support load summary for restraint at node 115B,

Load combination,FX (N),FY (N),FZ (N),,

Sustained,,-5655,,,
Operating1,3696,,
Maximum,,3696,,,
Minimum,,-5655,,,

My code works mainly on the lines starting with

Operating1,
Maximum, 
Minimum,

The job (cost function) is to total (algebraically) all the numbers following one of these keywords. Sometimes as you can see in the data file above, there is only one number in the 2nd or 3rd col. (see end of data file), sometimes, there is no number at all like in the following file fragment (see line for Operating1 below).

Support load summary for Hanger at node 115B,

Load combination,Load (N),,

Sustained,-5188,,
Operating1,,,
Maximum,,,
Minimum,-5188,,

I am using np.genfromtxt(). Works great except when I run into lines that have fewer than 4 values in columns or sometimes none at all.

I am using sum() on genfromtxt() - see code. When there is only one value, I used a float(). When there is none, I tried to identify and assign zero to the total. I can customize for each case but am wondering if there is a general, more abstract method of reading and totaling the numbers in unpredictable cases.

Plus, I tried the "missing_values" and "filling_values" but they do not seem to work. How do I count the # of non-zero columns in a file?

Here is part of the code so far:

def optimize(fn, optflag):

    modeltotals = []
    i=0
    csv1 = []
    j = 1 # line # count

    for line in csv.reader(filelist) :
    temp = repr(line)  

    if "Support load summary" in temp :
       csv1.append(line) # just making another list of actionable lines for future use
       if (d): print "\n", line
       continue
    if (optflag == "ope") : # optimize on Operating loads
       if "Operating1" in temp:
          csv1.append(line)
          if (len(line) > 4):
              modeltotals.append(sum(np.genfromtxt(line[1:], delimiter=",")))
              if (d): print "Sum of OPE Loads:", modeltotals[i], "\n"
          elif (len(line) > 0 and len(line) <= 4):
              if (d): print "line=", line, "length", len(line)
              line1 = np.genfromtxt(line[1:], delimiter=",")
              if not line1: # meaning if array is empty
                  modeltotals.append(0)
              else:
                  modeltotals.append(np.genfromtxt(line[1:], delimiter=",", missing_values=[0,0,0,0]))
              if (d): print "OPE Max:", modeltotals[i],"\n"

          i +=1
    elif (optflag == "minmax") : #optimize on all loads, min and max.
       #print  "i=", i
       if "Maximum" in temp:
          csv1.append(line)
          if (len(line) > 4):
              modeltotals.append(sum(np.genfromtxt(line[1:], delimiter=",")))
              if (d): print "Sum of Maxs:", modeltotals[i]
          elif (len(line) <= 4):
              #line1 = np.genfromtxt(line[1:], delimiter=",", filling_values = 0)
              #modeltotals.append(sum(line1))

              if (d): print "line=", line, "length", len(line)
              line1 = np.genfromtxt(line[1:], delimiter=",")
              print "line1 =", line1
              if not line1: # meaning if array is empty
                  modeltotals.append(0) 
              else:
                  modeltotals.append(np.genfromtxt(line[1:], delimiter=",", filling_values = 0))
              if (d): print "Max:", modeltotals[i]

          i+=1
       elif "Minimum" in temp:
          csv1.append(line)
          if (len(line) > 4):
              #print "#", j, "line", line
              modeltotals.append(sum(np.genfromtxt(line[1:], delimiter=",")))
              if (d): print "Sum of Mins:", modeltotals[i]

          elif (len(line) > 0 and len(line) <= 4):
              if (d): print "line=", line, "length", len(line)
              line1 = np.genfromtxt(line[1:], delimiter=",")
              if not line1: # meaning if array is empty
                  modeltotals.append(0)
              else:
                  modeltotals.append(np.genfromtxt(line[1:], delimiter=","))
              if (d): print "Min:", modeltotals[i]

          i +=1
    j+=1
 if len(modeltotals) > 0:
     print modeltotals
     average = float(sum(modeltotals))/len(modeltotals)  #sometimes error here
 else:
     return "000"  # error, seems like no file was analyzed
 if (d):
     print "Current model mean =", average

 del csv1[:]
 return abs(average)

The several errors I run into in different files are similar:

['Support load summary for restraint at node 20B', '']
Traceback (most recent call last):
File "sor4.py", line 190, in <module>
modelmean[filename] = optimize(filename, args.optimizeon)
  File "sor4.py", line 107, in optimize
   modeltotals.append(sum(np.genfromtxt(line[1:], delimiter=",")))
TypeError: iteration over a 0-d array

The other error is a "Cannot convert to a scalar."

I understand the errors but know not much Python to cleverly deal with them. Sorry for the long post; I will get better to present information more succinctly. As another poster here said, I will gratefully accept your answers. Thank you.

bmu
  • 35,119
  • 13
  • 91
  • 108

1 Answers1

1

I reduced your problem to the following code. It checks for nans and empty input strings.

from StringIO import StringIO
import numpy as np

def getnumbers(s):
    try:
        res = np.genfromtxt(s, delimiter=",")
        return res[np.where(np.isnan(res), False, True)]
    except IOError as ioe:
        return np.array(0.)

print(sum(getnumbers(StringIO('1., 2., , '))))
print(sum(getnumbers(StringIO(''))))

It gives the result

3.0
0.0
Holger
  • 2,125
  • 2
  • 19
  • 30
  • Sorry, pressed the wrong button! That looks so clean. So, this is the part that replaces existing IF/ELSE blocks under the outer IF Operating/Max/Min? I will give it a try and report back. Thanks again. – user1858112 Nov 28 '12 at 20:29
  • That WORKS and is cleverly elegant. I superficially understand what np.where is doing. It seems to return numbers at indices where the number in "res" is not "nan" and of course, passed to sum(). What does the last "True" in np.where() do? I assume that the first "False" applies to the isnan()? Sorry for more questions. I do not want to copy a solution blindly. – user1858112 Nov 28 '12 at 22:01
  • In this case `where` is called with 3 arguments. The first one is a condition. In this case it askes for `isnan`. The second argument is returned for the elements, where the given condition is fulfilled. The third argument, where the condition is not fulfilled. So `where` returns an array with booleans in this case. This array is used to index `res` by boolean indexing. – Holger Nov 29 '12 at 06:53