5

I am working with an Oracle EPM Product called Financial Data Quality Management Enterprise Edition (FDMEE). I have written a Jython script to parse a data file and push it to a custom table within the FDMEE product schema.

It works fine when I am pushing a subset of data file. But when I parse the entire data file, it fails with the error IndexError: index out of range: 7.

Following is the error message I receive :

File "\\vmhodvesip4\D$\SVESI7\Custom\FDMEEApps\BFRVN/data/scripts/event/BefImport.py", line 5, in <module>

    if row[7]=='JAN':

IndexError: index out of range: 7

Following is the code I use :

import csv

recReader = csv.reader(open('D:/SVESI7/Custom/FDMEEApps/BFRVN/inbox/BF_Reven_Load/Test03big.txt'), delimiter='!')
for row in recReader:
    if row[7]=='JAN':
        period_num = '1'
    elif row[7]=='FEB':
        period_num = '2'
    elif row[7]=='MAR':
        period_num = '3'
    elif row[7]=='APR':
        period_num = 4
    elif row[7]=='MAY':
        period_num = 5
    elif row[7]=='JUN':
        period_num = 6
    elif row[7]=='JUL':
        period_num = 7
    elif row[7]=='AUG':
        period_num = 8
    elif row[7]=='SEP':
        period_num = 9
    elif row[7]=='OCT':
        period_num = 10
    elif row[7]=='NOV':
        period_num = 11
    elif row[7]=='DEC':
        period_num = 12
    else:
        period_num = 'skip'

    if period_num != 'skip':
        params1 = ['batch_plnapps_oi',row[7],period_num,'20' + row[1][-2:],row[2], row[3], row[4], row[5], row[6], row[8], row[9], row[10], row[11], round(row[12],12)]
        ins_stmt1 = "insert into aif_open_interface(batch_name,period,period_num,year,col03,col04,col05,col06,col07,col09,col10,col11,col12,amount) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
        fdmAPI.executeDML(ins_stmt1,params1,False)

fdmAPI.commitTransaction()
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
Gautam TP
  • 51
  • 2
  • This error reports you don't have the 8th element of the list. What is the expected output? Check it and try to traceback. – Ricardo Silveira Sep 17 '15 at 02:19
  • Also, read Pandas documentation [here] (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) and work with dataframes, they are much better for this purpose. – Ricardo Silveira Sep 17 '15 at 02:21
  • I haven't had much time to look at your code, but I would suggest using a Dict_Reader from the csv module so that you can look that the column by column name rather by index. Try this instead and you may find a solution without actually knowing the problem – Connor Sep 17 '15 at 02:21
  • Hi @RicardoSilveira thanks for the comment, the expected output is that, it pushes the data from the file to the tables in a custom schema within the product database, the code is executing fine if I use a subset of the data, but gives the indexError if I am using a larger dataset (with similar data structure/format) , also going through pandas, thanks for the reference. – Gautam TP Sep 17 '15 at 02:33
  • @Connor Thanks for the reference Connor, going through it now – Gautam TP Sep 17 '15 at 02:33
  • What is the value of `row` when it gets the error? – Barmar Sep 17 '15 at 02:40
  • @Barmar The affected row is used to hold the information on various periods (jan to dec), there are also data rows with values more than 39 decimal precision which i am rounding it off to 12 decimals, not sure if this is large to support – Gautam TP Sep 17 '15 at 02:45
  • That doesn't answer my question. You need to look at `row` when it gets the error, and figure out why it doesn't have the expected number of fields. – Barmar Sep 17 '15 at 02:55
  • @Barmar Thanks Barmar, I am running some tests now with more data files, will update you. – Gautam TP Sep 17 '15 at 03:00
  • Just add a check `if len(raw) > 7):`. – luoluo Sep 17 '15 at 03:14
  • possible duplicate of [I am getting an Index error as list out of range. I have to scan through many lines](http://stackoverflow.com/questions/32332134/i-am-getting-an-index-error-as-list-out-of-range-i-have-to-scan-through-many-li) – luoluo Sep 17 '15 at 03:26

2 Answers2

2

There are obviously fewer than 8 columns for the affected row. Debug using a try/except block:

for n, row in enumerate(recReader, start=1):
    try:
        month = row[7]
    except:
        print('Row {0}: {1}'.format(n, row))

As a bonus, here is a more efficient way to write your code:

months = {'JAN': 1, 'FEB': 2, 'MAR': 3, 'APR': 4, 'MAY': 5, 'JUN': 6, 
          'JUL': 7, 'AUG': 8, 'SEP': 9, 'OCT':10, 'NOV': 11, 'DEC': 12]
for row in recReader:
    month = row[7]
    period_num = months.get(month, None)

    if period_num:
        params1 = ['batch_plnapps_oi', row[7], period_num, '20' + row[1][-2:], row[2], row[3], row[4], row[5], row[6], row[8], row[9], row[10], row[11], round(row[12], 12)]
        ins_stmt1 = "INSERT INTO aif_open_interface(batch_name, period, period_num, year, col03, col04, col05, col06, col07, col09, col10, col11, col12, amount) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
        fdmAPI.executeDML(ins_stmt1, params1, False)

fdmAPI.commitTransaction()
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • 1
    Thanks Alexander for the edit ! Testing the code, will let you know the update. – Gautam TP Sep 17 '15 at 02:49
  • You should be using a dict for the months instead of `index()`. Something like `months = {'JAN': 1, 'FEB': 2,...}` then you can just do `for row in recReader: period_num = months.get(row[7], None)`. `index()` needs to traverse the list each time to get the index while dicts have O(1) lookup. – kylieCatt Sep 17 '15 at 02:49
  • @IanAuld Thanks for the information Ian, Havenot used dict before, let me try dict for this approach, will let you know the update. – Gautam TP Sep 17 '15 at 02:55
  • Instead of doing a query for each row, there is also a more efficient way to batch the information and then just do one write (assuming you have enough memory for the file read). But that's another question... – Alexander Sep 17 '15 at 03:07
0

Without seeing your .csv we can't really help you too much, but...

  1. Ensure that every line in your csv has the correct format
  2. Make sure that the last line in your csv is not just whitespace.
  3. Look at the optional parameters in the documentation for csv.reader, specifically newline=''
Vikram Saran
  • 1,143
  • 8
  • 17
  • Hi Vikram, Thanks for the suggestion. The file has the correct format, and does not end with a whitespace (just verified), It does the load the data from the file to product table if the dataset is small, but fails to load a bigger dataset of the same format – Gautam TP Sep 17 '15 at 02:34
  • The file obviously doesn't have the correct format or you wouldn't be getting this error. The error indicates that there's a line with less than 8 fields. – Barmar Sep 17 '15 at 02:56
  • @Barmar Thanks Barmar/Vikram, its a fairly large file, I have created around 7-8 separate files form this big file to do the tests, let me see a way to verify the entire file if there are rows with incorrect format, this data file is a direct extract form a tool called Ab Initio which transforms the file after validation, will let you know the update. – Gautam TP Sep 17 '15 at 03:04
  • 1
    Why don't you just add `if len(row) < 8: print(row)` to the script so you see the erroneous row? – Barmar Sep 17 '15 at 03:08