I'm trying to track down a Python UnicodeDecodeError in the following log line:
10.210.141.123 - - [09/Nov/2011:14:41:04 -0800] "gfR\x15¢\x09ì|Äbk\x0F[×ÐÖà\x11CEÐÌy\x5C¿DÌj\x08Ï ®At\x07å!;f>\x08éPW¤\x1C\x02ö*6+\x5C\x15{,ªIkCRA\x22 xþP9â\x13h\x01¢è´\x1DzõWiË\x5C\x10sòʨR)¶²\x1F8äl¾¢{ÆNw\x08÷@ï" 400 166 0.000 "-" "-"
I opened the entire log file in Vim, and then yanked the line into a new file so I could test just the one line. However, my parsing script works OK with the new file - it doesn't throw a UnicodeDecodeError. I don't understand why the one file would generate an error and the other one would not, when they are (on the surface) identical.
Here's what I tried: running enca
to determine the file encoding, which complained that it Cannot determine (or understand) your language preferences.
file -i
says that both files are Regular file
s. I also deleted every other line in the original log file and still got the error in one file and no error in the other. I tried deleting
set encoding=utf-8
from my .vimrc, writing the file again, and I still got the error in one file and not in the other.
The logs are nginx logs. Nginx has this note in their release notes:
*) Change: now the 0x00-0x1F, '"' and '\' characters are escaped as \xXX
in an access_log.
Thanks to Maxim Dounin.
My Python script has with open('log_file') as f
and the error comes up when I try to call json.dumps
on a dict.
How can I track this down?