0

I am working with Python 2.7 to extract data from a JSON API and push it into a SQL-Server table. I am having trouble with inserting data into the database where some of the entries returned from the JSON response are missing a section of the dictionary. IE, "CustomFields": 90% of the entries have information, however 10% don't therefore I get an index error eg

"CustomFields":[
],

vs

"CustomFields":[
{
"Type":"foo",
"Name":"foo",
"Value":"foo"
},
{
"Type":"foo",
"Name":"foo",
"Value":"foo"
},
{
"Type":"foo",
"Name":"foo",
"Value":"foo"
},

What would I change so that if I get a missing index, replace those with 'NULL' entries into the database.

response = '*API URL*'
json_response = json.loads(urllib2.urlopen(response).read())

conn = pypyodbc.connect(r'Driver={SQL Server};Server=*Address*;Database=*DataBase*;Trusted_Connection=yes;')
conn.autocommit = False
c = conn.cursor()

c.executemany("INSERT INTO phil_targetproccess (ResourceType, Id, Name, StartDate, EndDate, TimeSpent, CreateDate, ModifyDate, LastStateChangeDate, ProjectName, EntityStateName, RequestTypeName, AssignedTeamMember#1, Area, SubArea, BusinessTeam) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)" ,
             [(x['ResourceType'],
               x['Id'],
               x['Name'],
               (parse_date(x['StartDate'])),
               (parse_date(x['EndDate'])),
               x['TimeSpent'],
               (parse_date(x['CreateDate'])),
               (parse_date(x['ModifyDate'])),
               (parse_date(x['LastStateChangeDate'])),
               x['Project']['Name'],
               x['EntityState']['Name'],
               x['RequestType']['Name'],
               y['GeneralUser']['FirstName']+' '+y['GeneralUser']['LastName'],
               x['CustomFields'][0]['Value'],
               x['CustomFields'][1]['Value'],
               x['CustomFields'][2]['Value'])
               for x in json_response['Items']
                  for y in x['Assignments']['Items']])

Many thanks.

Phil Baines
  • 437
  • 1
  • 6
  • 18
  • 1
    try `x.get('StartDate')` instead of `x['StartDate']`. You can specify a fallback value as the second argument to .get, by default it is None. – bananafish Sep 18 '17 at 03:10
  • @bananafish for x in json_response['Items'] for y in x['Assignments']['Items'] It is not field defining values that are causing the issues, it is the for loop that is at the end. If I use a .get for the 'for y in ...''' I still get list index out of range – Phil Baines Sep 18 '17 at 04:10

3 Answers3

3

I think your issue is here

 x['CustomFields'][0]['Value'],
 x['CustomFields'][1]['Value'],
 x['CustomFields'][2]['Value']

When CustomFields has no elements

Try

 x['CustomFields'][0]['Value'] if len(x['CustomFields']) > 0 else '',
 x['CustomFields'][1]['Value'] if len(x['CustomFields']) > 1 else '',
 x['CustomFields'][2]['Value'] if len(x['CustomFields']) > 2 else '',
BA.
  • 924
  • 7
  • 10
  • Thanks heaps, that worked perfect! Now the final problem to solve, is that in the loop of "for y in x['Assignments']['Items']" If there are no keys under ['Assignments']['Items'] then a record will not be inserted into the database as it will ignore it. – Phil Baines Sep 18 '17 at 04:45
  • What's the behavior you expect? What do you want to insert if there are no items? – BA. Sep 18 '17 at 04:57
  • I expect that a null entry will be added if no value is provided – Phil Baines Sep 18 '17 at 07:39
  • I meant to say that all except 'GeneralUser' are retrieved from Items (x). For example, if there're no items, what are you inserting in the "ResourceType" and "ID"? – BA. Sep 18 '17 at 09:49
  • The other fields are auto generated and cannot be blank, but I will put in a statement to make it null if input is null. For now just trying to look into the "for y in x['Assignments']['Items']" so that, if that for loop is None, it will insert a null. Currently if there are no names assigned under that dict, it will just skip that record completely. – Phil Baines Sep 18 '17 at 10:04
  • OK try this for the last line "for y in x...." here : for y in x['Assignments']['Items'] + ([{'GeneralUser': {'FirstName': 'N/A', 'LastName': 'N/A'}} ] if len(x['Assignments']['Items']) == 0 else []) ]) – BA. Sep 18 '17 at 12:58
  • That is perfect! Thank you so much!! One final hurdle, I have a columns, AssignedTeamMember#1 and AssignedTeamMember#2 (generaluser 1 and 2). Sometimes there are two users under ['Assignments']['Items']. If there are two, it enters a new row with the other first and last name, I would like the second name to be stored in a teammembertwo column on the row. If there is no second user, the field will be null. Basially I would like one record per ID, if there is a second user it will put it in the AssignedTeamMember#2 column – Phil Baines Sep 18 '17 at 21:41
  • Then, you need to remove the second loop on y, The last line. And then, you use the same technique you used for CustomField: (x['Assignments']['Items'][0]['FirstName']+' '+x['Assignments']['Items'][0]['LastName']) if len(x['Assignments']['Items']) > 0 else None and another similar statement for AssignedUser#2 with index: 1 and condition > 1 – BA. Sep 19 '17 at 03:52
0

You can use get method to check whether that value in CustomFields

is available if so check the length of the list and then get the value of the dictionary in that list using the same get method.

For example:

customfield_value = (x['CustomFields'][0]).get("Value",None) if len(x['CustomFields'])>0 else None

This will return None if the value is not present in the index 0. you can follow the same for getting values from other 2 indices. If you didn't understand please comment it 'll explain further.

Mani
  • 5,401
  • 1
  • 30
  • 51
0

Final Script. Thanks for the help!

   c.executemany("INSERT INTO phil_targetproccess (ResourceType, Id, Name, StartDate, EndDate, TimeSpent, CreateDate, "
          "ModifyDate, LastStateChangeDate, ProjectName, EntityStateName, RequestTypeName, AssignedTeamMember1, "
          "AssignedTeamMember2, AssignedTeamMember3, AssignedTeamMember4, Area, SubArea, BusinessTeam) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
          [(x['ResourceType'],
            x['Id'],
            x['Name'],
            (parse_date(x['StartDate'])),
            (parse_date(x['EndDate'])),
            x['TimeSpent'],
            (parse_date(x['CreateDate'])),
            (parse_date(x['ModifyDate'])),
            (parse_date(x['LastStateChangeDate'])),
            x['Project']['Name'],
            x['EntityState']['Name'],
            x['RequestType']['Name'],
            x['Assignments']['Items'][0]['GeneralUser']['FirstName'] + ' ' + x['Assignments']['Items'][0]['GeneralUser']['LastName'] if len(x['Assignments']['Items']) > 0 else None,
            x['Assignments']['Items'][1]['GeneralUser']['FirstName'] + ' ' + x['Assignments']['Items'][1]['GeneralUser']['LastName'] if len(x['Assignments']['Items']) > 1 else None,
            x['Assignments']['Items'][2]['GeneralUser']['FirstName'] + ' ' + x['Assignments']['Items'][2]['GeneralUser']['LastName'] if len(x['Assignments']['Items']) > 2 else None,
            x['Assignments']['Items'][3]['GeneralUser']['FirstName'] + ' ' + x['Assignments']['Items'][3]['GeneralUser']['LastName'] if len(x['Assignments']['Items']) > 3 else None,
            x['CustomFields'][0]['Value'] if len(x['CustomFields']) > 0 else '',
            x['CustomFields'][1]['Value'] if len(x['CustomFields']) > 1 else '',
            x['CustomFields'][2]['Value'] if len(x['CustomFields']) > 2 else '')
           for x in json_response['Items']])
Phil Baines
  • 437
  • 1
  • 6
  • 18