0

Recoll does not index many text files by default. It seems to only index files where the mimemap explicitly includes the mime type, but not other "obvious" file types.

Examples:

  • yaml files -- file -i shows text/plain; charset=us-ascii but using recollindex -e -i /path/to/foo.yaml shows recoll detecting it as application/x-yaml via xdg-mime, which isn't an officially registered mime type -- but if recoll uses xdg-mime, one would think it would know how to deal with all the possible return values from xdg-mime
  • awk scripts -- same thing, with application/x-awk this is in the default mimeconf.
  • perl scripts -- same thing, with application/x-perl this is in the default mimeconf.
  • shell scripts -- same thing, with application/x-shellscript this is in the default mimeconf.
  • kotlin and other source code files -- recoll sees it as text/x-kotlin -- again a non-standard type via xdg-mime, but one that begins with text/ so Recoll should know it is text -- but still doesn't index it
  • readme files -- same thing, with text/x-readme

Now, this can be worked around on a case-by-case basis by adding into ~/.recoll/mimeconf something like:

[index]
application/x-yaml = internal text/plain
text/x-kotlin = internal text/plain
text/x-readme = internal text/plain

but doing this one file type at a time seems silly. Is there a way to say

  1. index everything with mime type text/* as text/plain, unless recoll already has a more specific parser for the type
  2. index obvious textual data (e.g. if file -i returns text/plain) as text/plain, again unless recoll already has a more specific parser for the type

If it matters, I'm using recoll packaged by Fedora.

Raman
  • 17,606
  • 5
  • 95
  • 112
  • When trying out some things for your question I had some success with `textunknownasplain = 1` combined with `usesystemfilecommand = 1` and `systemfilecommand = file -i` in `recoll.conf`. Not everything seemed to work though, I'm not really sure what is going on. – Marijn May 29 '23 at 21:10
  • @Marijn Thanks -- `textunknownasplain` handles the `text/*` types properly, thank you. `systemfilecommand` I'd rather not override as that switches from the recommended `xdg-mime` to `file`, which is going to change too much default behavior. The only thing I can think to do is to review all the mime types from `/usr/share/mime/types` and explicitly add all the relevant text types to the index conf. I would have thought that since recoll uses `xdg-mime` by default, recoll would already handle all the relevant types. Feel free to convert your comment to an answer and I will upvote/accept it. – Raman May 30 '23 at 12:38
  • I installed recoll on Ubuntu and there `mimeconf` already has a quite large list of mime types, including awk, perl, and shellscript, but without yaml, kotlin and readme. Since those are relatively newer filetypes/mimetypes, maybe the goal was indeed to add all textual types to `mimeconf` but this just didn't get updated recently. – Marijn May 30 '23 at 12:48
  • Yes, I was also just noticing the same thing! Looks like those entries in my custom `mimeconf` were redundant -- not sure why I put them there. Maybe an earlier version packaged for Fedora did not include them, but the current version does. – Raman May 30 '23 at 12:53

0 Answers0