-1

My program streaming data from Twython generates this error:

longitude=data['coordinates'][0]
KeyError: 0

This occurs in the following code:

class MyStreamer(TwythonStreamer):
    def on_success(self, data):
        if 'text' in data:
            if data['place']!=None:
                if 'coordinates' in data and data['coordinates'] is not None:
                    longitude=data['coordinates'][0]

I then inserted a print(data['coordinates']) the line before the longitude statement and the most recent time this error intermittently happened it printed out {'coordinates': [-73.971836, 40.798598], 'type': 'Point'}. Though sometimes it reverses the order of key entries like this: {'type': 'Point', 'coordinates': [-73.97189946, 40.79853829]}

I then added print calls for type(data) and type(data['coordinates']) and got dict as the result for both when the error happened.

I also now realize this has only happened (and happens every time) when data['place']!=None. So I am now doing print calls on data['place'],type(data['place']) and repr(data['place'])

What else can I put in here to trap for the error/figure out what is going on?

If it helps here is the 200 line python file that includes the TwythonStreamer class definition.

zondo
  • 19,901
  • 8
  • 44
  • 83
Jeff Winchell
  • 103
  • 1
  • 8
  • What exactly is your problem? Do you want your program to run through these errors or do you want to debug it? I would suggest try and except clauses for this kind of problem. – Ulf Aslak Apr 02 '16 at 00:56
  • @UlfAslak: Catching the exception doesn't stop it from being thrown. – Lightness Races in Orbit Apr 02 '16 at 01:43
  • What kind of object is `data`? It looks like a dictionary, but clearly it sometimes doesn't behave like one. (Or, I suppose it could be an issue with a list-like object stored as `data['coordinates']` that's not really a list.) – Blckknght Apr 02 '16 at 01:44
  • what is `type(data)` and `type(data['coordinates'])` when error occurs? – Łukasz Rogalski Apr 02 '16 at 16:09
  • @Rogaliski - I just put those in and will report when the error triggers again. – Jeff Winchell Apr 03 '16 at 03:41
  • @Blckknght - I assumed this is a dictionary. I haven't dug entirely into Twython source or Twitter API to know exactly. – Jeff Winchell Apr 03 '16 at 03:42
  • @UlfAslak Ideally I want to debug it to fix it. If I can't figure it out soon I will trap, log how frequently this happens and decide if that data loss is acceptable. This is for a product I am developing. – Jeff Winchell Apr 03 '16 at 03:42
  • I can't help you much because I I'm not behind your screen but I've worked with Twitter days before and here's how I deal with it. Either debug my code in pycharm where I can run it step by step and check that my datastructure makes sense at all times, or I just flood my code with print statements. – Ulf Aslak Apr 03 '16 at 04:46
  • For debugging purposeses, try wrapping the `longitude=` line with a `try` and `except` and in the `except` block, print out the `type` and the `repr` of both `data` and `data['coodinates']`. Or run in a debugger, so you can start examining the data right when the exception gets raised. – Blckknght Apr 03 '16 at 08:32
  • @Blckknght I added the `type` printouts. I will now add `repr`. FYI, it often takes hours before I get an error. – Jeff Winchell Apr 03 '16 at 17:03
  • @JeffWinchell. I see you changed your if statement, but you've now introduced another potential bug. The first part of the if statement will fail when `"coordinates"` is not in `data`, so it should instead be: `if 'coordinates' in data and data['coordinates'] is not None:`. As to your main problem: it's obvious that the data sometimes contains a dict rather than a list - but is this data coming *directly* from Twython? If so, which **specific** Twython api are you getting this data from? You could be wasting your trying to debug this if the bug is actually in Twython... – ekhumoro Apr 03 '16 at 17:43
  • @ekhumoro I changed the code per your suggestion. I also added contextual code to indicate where I am getting the data from in the Twython API. Twython is open source so finding a bug there is still useful, no? – Jeff Winchell Apr 03 '16 at 18:49
  • @JeffWinchell. Yes, of course - all I meant was that if the bug is in Twython, there's no point in only debugging your own code. – ekhumoro Apr 03 '16 at 19:39

1 Answers1

1

Now that you've added more realistic code to your question, it seems obvious where the problem lies. The Twython streamer doesn't always send coordinate data, and it can be None - but when it does send it, the lat/long values may be nested two layers deep.

So the data structure is this:

{
    'coordinates': {
        'coordinates': [-73.971836, 40.798598],
        'type': 'Point'
    },
    ...
}

Which means your code needs to look like this:

class MyStreamer(TwythonStreamer):
    def on_success(self, data):
        if 'text' in data:
            if 'place' in data and data['place'] is not None:
                if 'coordinates' in data and data['coordinates'] is not None:
                    longitude, latitude = data['coordinates']['coordinates']

Or more simply:

class MyStreamer(TwythonStreamer):
    def on_success(self, data):
        if 'text' in data:
            place = data.get('place')
            if place is not None:
                coords = data.get('coordinates')
                if coords is not None:
                    longitude, latitude = coords['coordinates']
ekhumoro
  • 115,249
  • 20
  • 229
  • 336
  • All that you wrote is correct and useful, but do note that the error is rare. If I understood correctly that the rest of the time the code works as is, it seems the data source is inconsistent; most of the time `coordinates` is at the root of the dictionary, but on rare occasions it is nested. Jeff will have to check the structure on every call and branch accordingly (and hope that it is the only inconsistency in the data). – Paulo Almeida Apr 03 '16 at 20:31
  • @Ekhumoro Actually, it appears to be odder than that since I do get longitude/latitude values in another branch of my code. It appears that it is nested 2 levels deep ONLY when there is a place value. If there is no place value, it is 1 level deep. I will change my code and if after a long time of running without error (or I see place data and longitude data in my database which means the same thing) then I'll consider the problem solved and not a python bug, but just a confusion in the Twitter API. – Jeff Winchell Apr 03 '16 at 20:37
  • @Ekhumoro So can I write my if tests more compactly (no and) by writing `if mydict.get('mykey') is not None:` – Jeff Winchell Apr 03 '16 at 20:41
  • @JeffWinchell. There's nothing odd or confusing here at all. You just haven't fully learned how the APIs work yet. If you'd included all the relevant information and code in your question to start with, this would have been an easy problem to solve. As for using `get()`: it depends on how the rest of your code is structured. If you need to access `place` and `coords` several times, it may result in slightly more efficient and more readable code. – ekhumoro Apr 03 '16 at 20:58
  • @Ekhumoror First of all, thanks for taking the time to help and for finding the solution. As for the other assumptions: 1. I've read all the API docs from Twython and Twitter and I still can't find where it says that coordinates is double deep when place is used. I think its an undocumented feature. 2. It is hard to know what is relevant code. 3. I had the print(data['coordinates']) and its result in the very first post, so a lot of people including me overlooked that it was clear I needed a data['coordinates']['coordinates']. Just shows we're all imperfect humans. :-) – Jeff Winchell Apr 03 '16 at 23:23
  • @JeffWinchell. I was referring to your first comment above, where you mention another branch of your code which, crucially, shows different behaviour. If you put that together with the code showing the specific Twython API you are using, it represents a lot of vital information missing from your original question - all of which you knew about from the start. This is why your question got downvoted (but not by me), and why grumpy old-timers like me sometimes express their frustration in the comments ;-) – ekhumoro Apr 04 '16 at 00:00
  • @Ekhumoro The title (SOMETIMES) could mislead, and the branch could help undo that, but the necessary data (the print result) was sufficient from the start. So lots of smart people overlooked that. It happens. After a couple years of reading StackOverflow (and 3 decades of software and bbs/compuserve/internet/twitter) this was my first post because too frequently Stackoverflow posters intimidate/denigrate people for not meeting their perfection standards. I'd bet 99% of readers feel the same way. People are trying to be helpful, but the methods sometimes fail. – Jeff Winchell Apr 04 '16 at 04:56
  • @JeffWinchell. I think it can only be seen as sufficient with the benefit of hindisght. The relatively large number of edits and comments are testimony to that. I think your other remarks are quite unfair. Remember that the people answering the questions here are all *volunteers*, and they have to deal with a high volume of questions (most of which you will never see) which are of much lower quality than the one you've asked here. – ekhumoro Apr 04 '16 at 15:56
  • @Ekhumoro Fair enough. Thank you. – Jeff Winchell Apr 04 '16 at 18:08
  • The solution works. The branch of code this problem is in has been triggered with no errors. Thank you! – Jeff Winchell Apr 04 '16 at 18:10