3

import os

for root, dirs, files in os.walk('Path'):
     for file in files:
         if file.endswith('.c'):
             with open(os.path.join(root, file)) as f:
                    for line in f:
                        if 'word' in line:
                            print(line)

getting the error

UnicodeDecodeError: 'cp932' codec can't decode byte 0xfc in position 6616: illegal multibyte sequence

I think file needs shift jis encoding. can i set encoding at start only? i tried setting with open(os.path.join(root, file),'r',encoding='cp932') as f: but got same error

Chetan.B
  • 139
  • 1
  • 2
  • 10
  • Can you add the full stacktrace, to see whether the Exception is thrown on the "print(line)", or on the "for line in f"? You probably will have to open the files in binary mode as you won't know the encoding for all of them. – cbodt Aug 24 '17 at 09:20

4 Answers4

6

You could pass errors='ignore', but make sure to check what is the encoding of your files.

open(os.path.join(root, file),'r', encoding='cp932', errors='ignore')
cbodt
  • 170
  • 1
  • 5
  • It will ignore error ans skip that file is it like that? – Chetan.B Aug 24 '17 at 09:56
  • It will not ignore the file completely, but just the characters that cannot be decoded inside the file. Maybe there only some files or lines incorrectly encoded. You could check how many of these errors you have by catching the exception and printing the filename. – cbodt Aug 24 '17 at 10:32
  • And will corrupt your data @Chetan.B - terrible idea – Mr_and_Mrs_D Feb 20 '18 at 09:44
5

Ended up here because I got the same error.

I'm just learning, but fortunately I found a solution.

If it says:

UnicodeDecodeError: 'cp932' codec can't decode

it means that the file that you are using is not encoded in cp932, so you actually need to change the encoding.

In my case, I was trying to read a file encoded in UTF-8, so the solution was to include that when I opened my file:

open("file.txt","r",encoding='utf-8')

I hope that this helps anybody who comes here because of the same error.

Gabu
  • 51
  • 1
  • 2
0

Try using io library:

io.open(os.path.join(root, file), mode='r', encoding='cp932')
Žilvinas Rudžionis
  • 1,954
  • 20
  • 28
0

You need to change the reading mode from 'r' to 'rb'.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61