0

I want to create a small search angine for tweets. I have a txt file with 20000 tweets. The file format is like:

TommyFrench1
851
85170333395811123
Lurgan, Moira, Armagh. Derry
This week we are double delight on first goalscorers on the four Champions League matches in shop. ChampionsLeague

Im_Aarkay
175
851703414300037122
Paris
@ChampionsLeague @AS_Monaco @AS_Monaco_EN Nopes, it's when City knocked outta Champions league. .
.
etc

The first line is the username, secondly I have the followers, next is the id and the location and last is the text(tweet).

I think that every tweet is a document. So i must have 20000 documents and every document must have 5 fields(username,followers,id etc).

How can i make the indexing?

I have seen some tutorials but i didn't found something similar

EDIT: Here is my code.

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;

public class MyProgram {

    public static void main(String[] args) throws IOException, ParseException {
        FileReader fileReader = new FileReader(new File("myfile.txt"));
        BufferedReader br = new BufferedReader(fileReader);
        String line = null;

        String indexPath = "C:\\Desktop\\myfolder";
        Directory dir = FSDirectory.open(Paths.get(indexPath));

        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig iwc = new IndexWriterConfig(analyzer);

        IndexWriter writer = new IndexWriter(dir, iwc);


        while ((line = br.readLine()) != null) {
            // reading lines until the end of the file
            Document doc = new Document();
            String username = br.readLine();
            doc.add(new Field("username", username, Field.Store.YES, Field.Index.ANALYZED));  // adding title field
            String followers = br.readLine();
            doc.add(new Field("followers", followers, Field.Store.YES, Field.Index.ANALYZED));
            String id = br.readLine();
            doc.add(new Field("id", id, Field.Store.YES, Field.Index.ANALYZED));
            String location = br.readLine();
            doc.add(new Field("location", location, Field.Store.YES, Field.Index.ANALYZED));
            String text = br.readLine();
            doc.add(new Field("text", text, Field.Store.YES, Field.Index.ANALYZED));
            writer.addDocument(doc);  // writing new document to the index


            br.readLine();
         }

    }
}

Im getting the following error: Index cannot be resolved or is not a field.

How can i fix this?

Sabir Khan
  • 9,826
  • 7
  • 45
  • 98
Lee Yaan
  • 547
  • 7
  • 26
  • What do you mean by 'indexing', what do you want to achieve with this? – Ivan Pronin Apr 26 '17 at 19:19
  • i have a project to create a small search machine for 20000 tweets. Indexing process is one of the core functionality provided by Lucene. I must read the txt file and every tweet must be a document. Then every document must have the fields username, id, location etc. I have an idea about hot it works but im beginner in Lucene and i cant find something similar like this – Lee Yaan Apr 26 '17 at 19:30
  • Have you looked at this question: http://stackoverflow.com/questions/4091441/how-do-i-index-and-search-text-files-in-lucene-3-0-2?rq=1 – Ivan Pronin Apr 26 '17 at 19:35
  • @Ivan Priorin Yes i looked this question but is an old version of lucene. There are many changes in current version(Lucene 6.5.0). For example im writing this line of code `IndexWriter writer = new IndexWriter(index, analyzer, true, new IndexWriter.MaxFieldLength(25000));` and im getting an error. In older versions this line is fine – Lee Yaan Apr 26 '17 at 20:43
  • If willing to get quality answers, do mention in your question that it is a compile time error and on which line.Its not clear from your question as what kind of error you face by looking at static code. Your question becomes valid once you do that. – Sabir Khan May 01 '17 at 07:34

1 Answers1

0

Its very hard to interpret from your question that you in fact facing a compile time error and not run time error.

I had to copy your code to understand that its a compile time error on - Field.Index.ANALYZED argument on Field constructor.

Refer Documentation and there are no such constructors in 6.5.0 anymore.

This is one of the reasons that folks use top level tools like SOLR etc because these kind of changes keep happening in low Lucene API.

Anyway, in above documentation, its also mentioned that you do ,

Expert: directly create a field for a document. Most users should use one of the sugar subclasses:

For your case, TextField and StringField are relevant classes - there is a subtle difference the two.

So I would use a constructor like - new StringField(fieldName, fieldValue, Store.YES) etc instead of directly doing on Field.

You can use Field also like - new Field(fieldName, fieldValue, fieldType) where fieldType is a FieldType.

You can initialize FieldType like - FieldType txtFieldType = new FieldType(TextField.TYPE_STORED) OR FieldType strFieldType = new FieldType(StringField.TYPE_STORED) etc.

All in all, they way you create a Field in Lucene has changed in recent versions so create your Field instances as per documentation of Lucene version being used.

Something like - doc.add(new Field("username", username, new FieldType(TextField.TYPE_STORED))) etc.

Sabir Khan
  • 9,826
  • 7
  • 45
  • 98