Modeshape full-text-search works only on binary files

Question

I am trying to perform a full-text-search on my Modeshape 5.3.0.Final repository. The query is as simple as:

Query query = queryManager.createQuery("SELECT * FROM [nt:resource] as data WHERE ISDESCENDANTNODE('/somenode') AND CONTAINS(data.*,'*" + text + "*')

Looks like it works well for binary stored files (i.e. pdf,doc,docx, etc...) but it does not match txt files, or any text format file.

This is my repository configuration

{
  "name": "Persisted-Repository",
  "textExtraction": {
    "extractors": {
      "tikaExtractor": {
        "name": "General content-based extractor",
        "classname": "tika"
      }
    }
  },
  "workspaces": {
    "predefined": [
      "otherWorkspace"
    ],
    "default": "default",
    "allowCreation": true
  },
  "security": {
    "anonymous": {
      "roles": [
        "readonly",
        "readwrite",
        "admin"
      ],
      "useOnFailedLogin": false
    }
  },
  "storage": {
    "persistence": {
      "type": "file",
      "path": "/var/content/storage"
    },
    "binaryStorage": {
      "type": "file",
      "directory": "/var/content/binaries",
      "minimumBinarySizeInBytes": 999,
      "mimeTypeDetection": "content"
    }
  },
  "indexProviders": {
    "lucene": {
      "classname": "lucene",
      "directory": "/var/content/indexes"
    }
  },
  "indexes": {
    "textFromFiles": {
      "kind": "text",
      "provider": "lucene",
      "nodeType": "nt:resource",
      "columns": "jcr:data(BINARY)"
    }
  }
}

Currently I'm performing a hack to get around this issue by executing another search for configured text file extensions and manually using Tika (maybe since it's text already Tika is not required here...) to extract the text and search for occurrences.

Does anybody know if this is expected behavior or maybe I am doing something wrong?

Cheers!

Modeshape full-text-search works only on binary files

0 Answers0