1

I use Python 2.7 on Win 7 Pro SP1.

I try code:

import os
path = "E:/data/keyword"
os.chdir(path)

files = os.listdir(path)
query = "{keyword} AND NOT("
result = open("query.txt", "w")

for file in files:
   if file.endswith(".txt"):
      file_path = file.name
      dane = open(file_path, "r")
      query.append(dane)
      result.append(" OR ")

result.write(query)
result.write(")")
result.close()

I get error:

file_path = file.name AttributeError: 'str' object has no attribute 'name'

I can't figure why.

I have secon error when path is with polish dialectical chars like "ąęłńóżć". I get error for:

path = "E:/Bieżące projekty/keyword"

I try fix it to:

path =u"E:/Bieżące projekty/keyword"

but it not help. I'm starting with Python and I can't find out why this code is not working.

What i want

  1. Find all text file in the directory.
  2. Join all text file in one file text named "query.txt"

fx.

file 1 data1 data2

file 2 data 3 data 4

Output from "query.txt": data1 data2 data 3 data 4

Pramod Gharu
  • 1,105
  • 3
  • 9
  • 18

1 Answers1

0

Above code working fine when path variable is without polish dialectical characters. When I change path I get error:

SyntaXError: Non-ASCII character '\xc5' in file query.py on line 9, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

On python doc PEP263 I find magic quote. Polish lang coding characters like "ąęłńóźżć" standard is ISO-8859-2. So i try add encoding to code. I try use UTF-8 too and I get the same error. My all code is (without 5 first lines with comment what code doing):

import os
#path = r"E:/data"
# -*- coding: iso-8859-2 -*-
path = r"E:/Bieżące przedsięwzięcia"
os.chdir(path)

files = os.listdir(path)
query = "{keyword} AND NOT("

for file in files:
    if file.endswith(".txt"):
        dane = open(file, "r")
        text = dane.read()
        query += text
        print(query)
        dane.close()
        query.join(" OR ")
result = open("query.txt", "w")
result.write(query)
result.write(")")
result.close()

On Unicode/UTF-8 character here I found that polish char "ż" is coded in UTF-8 as "\xc5\xbc". Mark # to coding line with path with "ż" as comment make error too. When I remove line with this char code:

path = r"E:/Bieżące przedsięwzięcia"

working fine and I get result which I want.

For editing I use Notepad++ with default setings. I only set in python code tab replace by four space.

*

Second Question

I try find in Python doc in variable path what r does mean. I can't find it in Python 2.7 string documentation. Could someone tell my how this part of Python (like u, r before string value) is named fx.

path = u"somedata"

path = r"somedata"?

I would get doc to read about it.