0

I have a folder called "pads" in which i have created 6 notepad documents(1.txt,2.txt..so on..6.txt),am trying to execute below code and getting following error

import os
from whoosh.index import create_in
from whoosh.fields import Schema, TEXT, ID
import sys
from whoosh.qparser import QueryParser
from whoosh import scoring
from whoosh.index import open_dir

def createSearchableData(root):   

        '''
        Schema definition: title(name of file), path(as ID), content(indexed but not stored),textdata (stored text content)
        '''
    schema = Schema(title=TEXT(stored=True),path=ID(stored=True),\
              content=TEXT,textdata=TEXT(stored=True))
    if not os.path.exists("indexdir"):
        os.mkdir("indexdir")

    # Creating a index writer to add document as per schema
    ix = create_in("indexdir",schema)
    writer = ix.writer()

    filepaths = [os.path.join(root,i) for i in os.listdir(root)]
    for path in filepaths:
        fp = open(path,'r')
        print(path)
        text = fp.read()
        writer.add_document(title=path.split("\\")[1], path=path,\
          content=text,textdata=text)
        fp.close()
    writer.commit()

    root = "pads"
    createSearchableData(root)
###ERROR###
pads/5.txt


IndexError: list index out of range

How come it is reading one notepad document which is 5.txt but not the rest of the files?

Jules G.M.
  • 3,624
  • 1
  • 21
  • 35
ravijprs
  • 39
  • 1
  • 4

2 Answers2

0

writer.add_document(title=path.split("\\")[1], path=path,

Per the printed path, there is no backslash in your path. Split returns a one element array, and python arrays start at 0.

Jules G.M.
  • 3,624
  • 1
  • 21
  • 35
  • thanks Jules, it worked, can you help on this problem. This is my actual problem (https://stackoverflow.com/questions/57839279/my-output-is-not-giving-the-documents-matched-for-the-query) – ravijprs Sep 08 '19 at 04:53
0

First, make sure that all the paths in your files contain '\\'. Also, if you want to obtain the title of the document, I recommend you using the last position of the vector obtained by the split() function. It would be something like the following:

writer.add_document(title=path.split("\\")[-1], path=path,\
      content=text,textdata=text)