1

With (the latest) lucene 8.7 is it possible to open a .cfs compound index file generated by lucene 2.2 of around 2009, in a legacy application that I cannot modify, with lucene utility "Luke" ? or alternatively could it be possibile to generate the .idx file for Luke from the .cfs ? the .cfs was generated by compass on top of lucene 2.2, not by lucene directly Is it possible to use a compass generated index containing :
_b.cfs
segments.gen
segments_d

possibly with solr ?

are there any examples how to open a file based .cfs index with compass anywhere ?

the conversion tool won't work because the index version is too old :

from lucene\build\demo :

java -cp ../core/lucene-core-8.7.0-SNAPSHOT.jar;../backward-codecs/lucene-backward-codecs-8.7.0-SNAPSHOT.jar org.apache.lucene.index.IndexUpgrader -verbose path_of_old_index

and the searchfiles demo :

java -classpath ../core/lucene-core-8.7.0-SNAPSHOT.jar;../queryparser/lucene-queryparser-8.7.0-SNAPSHOT.jar;./lucene-demo-8.7.0-SNAPSHOT.jar org.apache.lucene.demo.SearchFiles -index path_of_old_index

both fail with :

org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported This version of Lucene only supports indexes created with release 6.0 and later.

Is is possible to use an old index with lucene somehow ? how to use the old "codec" ? also from lucene.net if possible ?

current lucene 8.7 yields an index containing these files :

segments_1
write.lock
_0.cfe
_0.cfs
_0.si

========================================================================== update : amazingly it seems to open that very old format index with lucene.net v. 3.0.3 from nuget !

this seems to work in order to extract all terms from the index :

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using System.Globalization;
    
    using Lucene.Net.Analysis.Standard;
    using Lucene.Net.Documents;
    using Lucene.Net.Index;
    using Lucene.Net.QueryParsers;
    using Lucene.Net.Search;
    using Lucene.Net.Store;
    using Version = Lucene.Net.Util.Version;
    
    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main()
            {
    
                var reader = IndexReader.Open(FSDirectory.Open("C:\\Temp\\ftsemib_opzioni\\v210126135604\\index\\search_0"), true);
                Console.WriteLine("number of documents: "+reader.NumDocs() + "\n");
                Console.ReadLine();
    
                TermEnum terms = reader.Terms();
                while (terms.Next())
                {
                    Term term = terms.Term;
                    String termField = term.Field;
                    String termText = term.Text;
                    int frequency = reader.DocFreq(term);
                    Console.WriteLine(termField +" "+termText);
                }
                var fieldNames = reader.GetFieldNames(IndexReader.FieldOption.ALL);
                int numFields = fieldNames.Count;
                Console.WriteLine("number of fields: " + numFields + "\n");
                for (IEnumerator<String> iter = fieldNames.GetEnumerator(); iter.MoveNext();)
                {
                    String fieldName = iter.Current;
                    Console.WriteLine("field: " + fieldName);
                }
                reader.Close();
    
                Console.ReadLine();
            }
        }
    
    }

out of curiosity could it be possible to find out what index version it is ? are there any examples of (old) compass with file system based index ?

user3181125
  • 77
  • 1
  • 8

1 Answers1

1

Unfortunately you can't use an old Codec to access index files from Lucene 2.2. This is because codecs were introduced in Lucene 4.0. Prior to that the code for reading and writing files of the index was not grouped together into a codec but rather was just inherently part of the overall Lucene Library.

So in version of Lucene prior to 4.0 there is no codec, just file reading and writing code baked into the library. It would be very difficult to track down all that code and to create a codec that could be plugged into a modern version of Lucene. It's not an impossible task, but it require an Expert Lucene developer and a large amount of effort (ie an extremely expensive endeavor).

In light of all that, the answer to this SO question may be of some use: How to upgrade lucene files from 2.2 to 4.3.1

Update

Your best bet would be to use an old 3.x copy of java lucene or the Lucene.net ver 3.0.3 to open the index, then add and commit one doc (which will create a 2nd segment) and do a Optimize which will cause the two segments to be merged into one new segment. The new segment will be a version 3 segment. Then you can use Lucene.Net 4.8 Beta or a Java Lucene 4.X to do the same thing (but Commit was renamed ForceMerge starting in ver 4) again to convert the index to a 4.x index.

Then you can use the current java version of Lucene 8.x to do this once more to move the index all the way up to 8 since the current version of Java Lucene has codecs reaching all the way back to 5.0 see: https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/codecs

However if you do receive the error again that you reported:

This version of Lucene only supports indexes created with release 6.0 and later.

then you will have to play this game one more cycle with a version 6.x Java Lucene to get from a 5.x index to a 6.x index. :-)

RonC
  • 31,330
  • 19
  • 94
  • 139
  • hmm that is the case in apache lucene, I haven't looked into it's sources, however, if anyone is cognizant of c# lucene.net, it seems to me, empirically, to open ancient .cfs of 2009 automagically, unless I suffer of hallucinations, in case anyone knowledgeable of lucene.net that perhaps they may offer some kludge within lucene java to also open legacy indexes for reasons of backwards compatibility. – user3181125 Feb 03 '21 at 18:40
  • I'm relatively familiar with Lucene.Net at this point, I even contributed code to the current 4.8 beta version of the project a few day ago. Also, I confirmed with the leader of the project that the core code is based on Java Lucene 4.8 (a few of the other aspects of Lucene.Net are based on Java Lucene 4.8.1) Lucene.Net 4.8 does contain a codec for Lucene 3.x so that it can support backward compatibility for the prior version, but no farther back then that. You can see the codes included in the project here: https://github.com/apache/lucenenet/tree/master/src/Lucene.Net/Codecs – RonC Feb 03 '21 at 18:51
  • is there any way to report the index version from lucene.net ? – user3181125 Feb 04 '21 at 19:15
  • 1
    See [How to determine the lucene index version?](https://stackoverflow.com/questions/44155910/how-to-determine-the-lucene-index-version) although I'm not sure I agree with the approaches presented there. If you open a segment via a `SegmentReader` you can access the codec name via `SegmentReader.SegmentInfo.Info.Codec.Name` from which you can intuit the version. eg. name "Lucene46" which is the codec that Lucene.Net 4.8 uses. If you have additional questions about getting the version please open a separate StackOverflow question for that. – RonC Feb 04 '21 at 19:32
  • in lucene.net I tried : Directory dir = FSDirectory.Open(args[0]); IndexReader reader = IndexReader.Open(dir, true); Console.WriteLine("number of documents: "+reader.NumDocs() + "\n"); Console.ReadLine(); long version= IndexReader.GetCurrentVersion((Directory)dir); it returns 1611665876477 decimal – user3181125 Feb 06 '21 at 01:06
  • @user3181125 It's best to start a new SO question for that be sure to include the version of Lucene.Net that you are using. – RonC Feb 06 '21 at 01:08
  • it should be lucene.net version 3.0.3 from nuget obtained within visual studio 2019 – user3181125 Feb 06 '21 at 01:10