1

I checked the Recoll manual and it explains how to create individual separate indexes but for single directories. Is it possible to create individual indexes for several directories for each index? Thanks!

1 Answers1

0

Under 'Index configuration', in the 'Top Directories', do NOT enter '/', but enter the path to each of the directories you want in one database.

Example:

/DB001_F3/firstSeachDirectory
/DB001_F4/secondSearchDirectory
/DB001_F9/thirdSearchDirectory

Where (in my case) /DB001_F? are each a separate partition.

As long as '/' is not in topdirs, only the directories specified will be traversed for indexing.

To give you an Idea of what I have (inexpertly) come up with, I have created a directory /XAPIAN under which I have separate directories for topical databases as follows (offered as an example):

recoll_AerospaceAstronautics
recoll_AstronomyAstrophysicsSpace
recoll_BiologyChemistry
recoll_BizMgmtStrategy
recoll_Career
recoll_CraftsHomeSurvival
recoll_EngineeringCAX
recoll_FinanceInvest
recoll_FoodNutrition
recoll_GeographyGeologyGeophysics
recoll_HealthMedical
recoll_HighGraphicReferences
recoll_HistorySocietyCulture
recoll_InfoTech
recoll_LitMediaComm
recoll_MaritimeNaval
recoll_MathPhysics
recoll_ReligionSpiritualism
recoll_SciFi

In each of those, I have different topic-specific files. The one under recoll_SciFi contains the following:

# The system-wide configuration files for recoll are located in:
#   /usr/share/recoll/examples
# The default configuration files are commented, you should take a look
# at them for an explanation of what can be set (you could also take a look
# at the manual instead).
# Values set in this file will override the system-wide values for the file
# with the same name in the central directory. The syntax for setting
# values is identical.
#
dbdir = /XAPIAN/recoll_SciFi/XAP_01
#
topdirs = \
/DB001_F2/VAULT__Library/SciFi__BOOKS \
/DB001_F2/VAULT__Library/SciFi__ByAuthor \
/DB001_F2/VAULT__Library/SciFi__CleanNames \
/DB001_F2/VAULT__Library/SciFi__Masterworks \
/DB001_F2/VAULT__Library/SciFi__NotIndexed \
/DB001_F2/VAULT__Library/SciFi__ToSortInStacks
#
skippedPaths = *_files
#
skippedNames+ = \
*.DEB \
*.DLL \
*.EXE \
*.GZ \
*.ISO \
*.MP3 \
*.MP4 \
*.ZIP \
*.TAR \
*.TGZ \
*.TTF \
*.Z \
*.deb \
*.dll \
*.exe \
*.gz \
*.iso \
*.mp3 \
*.mp4 \
*.zip \
*.tar \
*.tgz \
*.ttf\
*.z
#
reslisthtmldumpfile = /XAPIAN/recoll_SciFi/XAP_01__recoll-reslist.html
textfilemaxmbs = 200
compressedfilemaxkbs = 300000
maxfsoccuppc = 80
#
loglevel = 2
idxflushmb = 50
idxlogfilename = /XAPIAN/recoll_SciFi/XAP_01__IndexingMessages.log
#

To normalize my recoll indexing, I created the following scripts. The first is what I call a Bourne Header and save some commonly used logic, which is then used in other scripts.

Contents of Recoll__00_LibrarySelectionParms.bh is as follows:

#!/bin/sh

####################################################################################
####################################################################################
###
### Bourne Shell Header Library
###
####################################################################################
####################################################################################


TMPDIR="/site/DB003_F1/XAPIAN_WORK"
USER="ericthered"

### Prompt for choose indexing scope and set LOGDIR
DBDIR=""

### LIBRARY COLLECTIONS - INDEX TOPICS
###
### Previous single global library:
###         /XAPIANDB
###
### Segregated Libraries:
###         /XAPIAN/recoll_SciFi/XAP_01
###         /XAPIAN/recoll_MathPhysics/XAP_02

###         /XAPIAN/recoll_HistorySocietyCulture/XAP_03
###         /XAPIAN/recoll_BizMgmtStrategy/XAP_04

###         /XAPIAN/recoll_AerospaceAstronautics/XAP_01
###         /XAPIAN/recoll_AstronomyAstrophysicsSpace/XAP_01
###         /XAPIAN/recoll_BiologyChemistry/XAP_01
###         /XAPIAN/recoll_Career/XAP_01
###         /XAPIAN/recoll_CraftsHomeSurvival/XAP_01
###         /XAPIAN/recoll_EngineeringCAX/XAP_01
###         /XAPIAN/recoll_FinanceInvest/XAP_01
###         /XAPIAN/recoll_FoodNutrition/XAP_01
###         /XAPIAN/recoll_GeographyGeologyGeophysics/XAP_01
###         /XAPIAN/recoll_HealthMedical/XAP_01
###         /XAPIAN/recoll_InfoTech/XAP_01
###         /XAPIAN/recoll_LitMediaComm/XAP_01
###         /XAPIAN/recoll_MaritimeNaval/XAP_01
###         /XAPIAN/recoll_ReligionSpiritualism/XAP_01

###         /XAPIAN/recoll_HighGraphicReferences/XAP_01

for location in `ls -d /XAPIANDB /XAPIAN/recoll_*/XAP_?? | cut -f1-4 -d/ | sort -r --key=4.1 --field-separator=/ `
do
    #echo "\n Re-start indexing for '${location}' ? [y|N] => \c" ; read ans
    echo "\n Set indexing scope for '${location}' ? [y|N] => \c" ; read ans
    if [ -z "${ans}" ] ; then  ans="N" ; fi
    case ${ans} in
        y* | Y* ) DBDIR="${location}"
            break
            ;;
        n* | N* ) ;;
    esac
done
if [ -z "${DBDIR}" ] ; then  echo "\n No topic indexing selected.  'recollindex' has NOT been started.\n Bye!\n" ; exit 1 ; fi

### Set CONFIGDIR
case ${DBDIR} in
    /XAPIANDB ) LOGDIR=${DBDIR} ;
            CONFIGDIR="/home/${USER}/.recoll"
            LOG="${LOGDIR}/mine.log"
            ERRLOG="${LOGDIR}/mine.errlog" ;;
        * ) LOGDIR=`dirname ${DBDIR} ` ;
            CONFIGDIR="${LOGDIR}"
            LOG=${LOGDIR}/`basename "${DBDIR}" `.log
            ERRLOG=${LOGDIR}/`basename "${DBDIR}" `.errlog ;;
esac

### Set CONFIGFLG
CONFIGFLG="-c ${CONFIGDIR}"


### Scenario #1 - Single Global Database
#DBDIR = /XAPIANDB
#LOGDIR = /XAPIANDB
#CONFIGDIR = /home/${USER}/.recoll
#CONFIGFLG = -c /home/${USER}/.recoll
#LOG = /XAPIANDB/mine.log
#ERRLOG = /XAPIANDB/mine.errlog


### Scenario #2 - Segregated Topical Databases
#DBDIR = /XAPIAN/recoll_SciFi/XAP_01
#LOGDIR = /XAPIAN/recoll_SciFi
#CONFIGDIR = /XAPIAN/recoll_SciFi
#CONFIGFLG = -c /XAPIAN/recoll_SciFi
#LOG = /XAPIAN/recoll_SciFi/XAP_01.log
#ERRLOG = /XAPIAN/recoll_SciFi/XAP_01.errlog


reportParms()
{ 
    echo "\n =========================================================================================\n"
    echo "\t    TMPDIR = ${TMPDIR}"
    echo "\t      USER = ${USER}"

    echo "\t     DBDIR = ${DBDIR}"
    echo "\t    LOGDIR = ${LOGDIR}"
    echo "\t CONFIGDIR = ${CONFIGDIR}"
    echo "\t CONFIGFLG = ${CONFIGFLG}"
    echo "\t       LOG = ${LOG}"
    echo "\t    ERRLOG = ${ERRLOG}"
    echo "\n =========================================================================================\n"
}
#reportParms

The script that is used to generate the recoll database for a specific topical scope is Recoll__00a_RebuildIndex.sh:

#!/bin/sh

. ./Recoll__00_LibrarySelectionParms.bh

header()
{
TMPDIR="/site/DB003_F1/XAPIAN_WORK"
USER="your_user_ID"

### Prompt for choose indexing scope and set LOGDIR
DBDIR=""

###     LIBRARY COLLECTIONS - INDEX TOPICS
###
###     Previous single global library:
###                     /XAPIANDB
###
###     Segregated Libraries:
###         /XAPIAN/recoll_SciFi/XAP_01
###         /XAPIAN/recoll_MathPhysics/XAP_02
###
###         /XAPIAN/recoll_LitMediaComm/XAP_01
###         /XAPIAN/recoll_HistorySocietyCulture/XAP_01
###         /XAPIAN/recoll_ReligionSpiritualism/XAP_01
###         /XAPIAN/recoll_BiologyChemistry/XAP_01
###         /XAPIAN/recoll_GeographyGeologyGeophysics/XAP_01
###         /XAPIAN/recoll_AstronomyAstrophysicsSpace/XAP_01
###         /XAPIAN/recoll_FinanceInvest/XAP_01
###         /XAPIAN/recoll_FoodNutrition/XAP_01
###         /XAPIAN/recoll_HealthMedical/XAP_01
###         /XAPIAN/recoll_InfoTech/XAP_01
###         /XAPIAN/recoll_EngineeringCAX/XAP_01
###         /XAPIAN/recoll_MaritimeNaval/XAP_01
###         /XAPIAN/recoll_AerospaceAstronautics/XAP_01
###         /XAPIAN/recoll_BizMgmtStrategy/XAP_01
###         /XAPIAN/recoll_CraftsHomeSurvival/XAP_01
###         /XAPIAN/recoll_Career/XAP_01

for location in `ls -d /XAPIANDB /XAPIAN/recoll_*/XAP_?? | cut -f1-4 -d/ | sort -r --key=4.1 --field-separator=/ `
do
    echo "\n Re-start indexing for '${location}' ? [y|N] => \c" ; read ans
    if [ -z "${ans}" ] ; then  ans="N" ; fi
    case ${ans} in
        y* | Y* ) DBDIR="${location}"
            break
            ;;
        n* | N* ) ;;
    esac
done
if [ -z "${DBDIR}" ] ; then  echo "\n No topic indexing selected.  'recollindex' has NOT been started.\n Bye!\n" ; exit 1 ; fi

### Set CONFIGDIR
case ${DBDIR} in
    /XAPIANDB ) LOGDIR=${DBDIR} ;
            CONFIGDIR="/home/${USER}/.recoll"
            LOG="${LOGDIR}/mine.log"
            ERRLOG="${LOGDIR}/mine.errlog" ;;
        * ) LOGDIR=`dirname ${DBDIR} ` ;
            CONFIGDIR="${LOGDIR}"
            LOG=${LOGDIR}/`basename "${DBDIR}" `.log
            ERRLOG=${LOGDIR}/`basename "${DBDIR}" `.errlog ;;
esac
echo "DBDIR = ${DBDIR}"
echo "LOGDIR = ${LOGDIR}"
echo "CONFIGDIR = ${CONFIGDIR}"

### Set CONFIGFLG
CONFIGFLG="-c ${CONFIGDIR}"
echo "CONFIGFLG = ${CONFIGFLG}"

echo "LOG = ${LOG}"
echo "ERRLOG = ${ERRLOG}"


### Scenario #1 - Single Global Database
#DBDIR = /XAPIANDB
#LOGDIR = /XAPIANDB
#CONFIGDIR = /home/${USER}/.recoll
#CONFIGFLG = -c /home/${USER}/.recoll
#LOG = /XAPIANDB/mine.log
#ERRLOG = /XAPIANDB/mine.errlog


### Scenario #2 - Segregated Topical Databases
#DBDIR = /XAPIAN/recoll_SciFi/XAP_01
#LOGDIR = /XAPIAN/recoll_SciFi
#CONFIGDIR = /XAPIAN/recoll_SciFi
#CONFIGFLG = -c /XAPIAN/recoll_SciFi
#LOG = /XAPIAN/recoll_SciFi/XAP_01.log
#ERRLOG = /XAPIAN/recoll_SciFi/XAP_01.errlog

}
#header

echo "\n =========================================================================================\n"
echo "\t    TMPDIR = ${TMPDIR}"
echo "\t      USER = ${USER}"

echo "\t     DBDIR = ${DBDIR}"
echo "\t    LOGDIR = ${LOGDIR}"
echo "\t CONFIGDIR = ${CONFIGDIR}"
echo "\t CONFIGFLG = ${CONFIGFLG}"
echo "\t       LOG = ${LOG}"
echo "\t    ERRLOG = ${ERRLOG}"
echo "\n =========================================================================================\n"


DEBUG=0
FORCE=0
SIGNAL=0

while [ $# -ne 0 ]
do
    case $1 in
        "--force" ) FORCE=1 ; shift ;;
        "--debug" ) DEBUG=1 ; shift ;;
    #   "--stopall" )   SIGNAL=1 ; shift ;;
        * ) echo "\n\t Invalid parameter used on command line.  Only valid:  [--force] [--debug] \n\n Bye!\n" ; exit 1 ;;
    esac
done

killRunning()
{
    #   SIGHUP   1
    #   SIGINT   2
    #   SIGQUIT  3
    #   SIGKILL  9
    #   SIGTERM 15
    #   SIGCONT 18
    #   SIGSTOP 19

    #HUP_SIGNAL="-1"
    #INT_SIGNAL="-2"
    #QUIT_SIGNAL="-3"
    #TERM_SIGNAL="-15"
    KILL_SIGNAL="-9"

    #TEST   dat="root        2272       1 10 19:37 pts/0    00:09:21 recollindex"
    dat=`ps -ef | grep 'recollindex' | grep -v 'grep' `
    test ${DEBUG} -eq 1 && echo "dat = ${dat}"

    if [ -z "${dat}" ]
    then
        lines=0
    else
        lines=`echo "${dat}" | wc -l | awk '{ print $1 }' `
    fi
    test ${DEBUG} -eq 1 && echo "lines = ${lines}"

    if [ "${lines}" -eq 0 ]
    then
        echo "\n\t Detected no running instances of 'recollindex' ...\n"
        test ${SIGNAL} -eq 1 && echo "\n\t Abandoning per execution mode.\n"
    else
        procActn=`echo ${dat} | awk '{ print $2 }' `
        test ${DEBUG} -eq 1 && echo "procActn = ${procActn}"

        procName=`echo ${dat} | awk '{ print $8 }' ` ; procName=`basename ${procName} `
        test ${DEBUG} -eq 1 && echo "procName = ${procName}"

        if [ "${lines}" -eq 1 ]
        then
            if [ "${procName}" = "recollindex" ]
            then
                if [ ${FORCE} -eq 1 ]
                then
                    kill ${KILL_SIGNAL} ${procActn}
                    echo "\n Delay of 10 seconds to allow all cleanup from SIGKILL ..."
                    sleep 10

                    #procActn=`ps -ef | grep 'recollindex' | grep -v 'grep' | awk '{ print $2 }' `
                    #if [ -n "${procActn}" ]
                    #then
                    #   ps -ef | grep recoll
                    #fi
                    test ${SIGNAL} -eq 1 && echo "\n\t INDEXING HALTED - Abandoning per execution mode.\n"
                else
                    echo "\n\t Process is already running:\n"
                    ps -ef | grep 'recoll' | grep -v 'grep' | grep -v 'pdf'
                    
                    echo "\n\t Abandoning!\n Bye!\n" ; exit 1
                fi
            fi
        else
            echo "\n\t Detected multiple running instances of 'recollindex'. Unable to determine proper action.\n\t Abandoning!\n Bye!\n" ; exit 1
        fi
    fi
}

#killRunning

test ${SIGNAL} -eq 1 && exit 1

RETRY_ITEMS_FAILED_INDEXING="-k"
PURGE_OLD_INDEX="-z"

if [ "${FORCE}" = 1 ]
then

    #WAIT_TIME="60"
    #DELAY_BEFORE_START="-w ${WAIT_TIME}" 
    #INDEX_ALL_PER_SPECIFICATIONS="-i"

    COM="nice --adjustment 17 recollindex ${CONFIGFLG} ${PURGE_OLD_INDEX} ${RETRY_ITEMS_FAILED_INDEXING}"
        # ${INDEX_ALL_PER_SPECIFICATIONS}"  
        # ${DELAY_BEFORE_START}"

    echo "\n\t Rebuilding XAPIAN Database for 'recoll' ...\n\t COMMAND:  '${COM}'\n"

    ${COM} >${LOG} 2>${ERRLOG} &
else
    COM="nice --adjustment 17 recollindex ${CONFIGFLG} ${RETRY_ITEMS_FAILED_INDEXING}"
    echo "\n\t Re-starting 'recollindex' to continue rebuilding XAPIAN Database ...\n\t COMMAND:  '${COM}'\n"
    ${COM} >>${LOG} 2>>${ERRLOG} &
fi

sleep 5

ls -ltr ${LOGDIR} ${DBDIR}

Hope those help you get your head wrapped around the way to use recoll.

Eric Marceau
  • 1,601
  • 1
  • 8
  • 11
  • so the default .recoll index will contain those directories, no more no less. Is this correct? – Emmanuel Goldstein Jul 10 '22 at 10:22
  • 1
    YES. Recoll will only include the directories identified in the search list specification. If you include /, that nullifies any restricting of scope. – Eric Marceau Sep 05 '22 at 17:07
  • Because including /, even if you include specific dirs, will mean everything is included (the whole Harddisk). Is this assumption correct? – Emmanuel Goldstein Sep 07 '22 at 13:20
  • Including / implies scope is full captive system "universe". Listing only subdirectories specifies scope is limited below those sub-directories. – Eric Marceau Sep 07 '22 at 18:45
  • Then including both subdirectories and / makes no sense as / overrides the subdirectory. Can you confirm? – Emmanuel Goldstein Sep 12 '22 at 12:42
  • 1
    Correct. Imagine the paths as elements used by the find command. Find would only report items included by /directory/subdirectory , but never include /dir2, /dir3, etc. In that manner, the scope is limited. Using "find /" is global and would always include /directory/subdirectory, as well as /dir2, /dir3 and all other content of the sysem. – Eric Marceau Sep 13 '22 at 00:03