0

i have a server instance here with 4 Cores and 32 GB RAM and Ubuntu 20.04.3 LTS installed. On this machine there is an opengrok-instance running as docker container.

Inside of the docker container it uses AdoptOpenJDK:

OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
Eclipse OpenJ9 VM AdoptOpenJDK-11.0.11+9 (build openj9-0.26.0, JRE 11 Linux amd64-64-Bit Compressed References 20210421_975 (JIT enabled, AOT enabled)
OpenJ9   - b4cc246d9
OMR      - 162e6f729
JCL      - 7796c80419 based on jdk-11.0.11+9)

The code-base that the opengrok-indexer scans is 320 GB big and tooks 21 hours.

What i am figured is out was, that i've am disable the history-option it tooks lesser time. Is there a possibility to reduce this time, if the history-flag is set.

Here are my index-command:

opengrok-indexer -J=-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -J=-Djava.util.logging.config.file=/usr/share/tomcat10/conf/logging.properties -J=-XX:-UseGCOverheadLimit -J=-Xmx30G -J=-Xms30G -J=-server -a /var/opengrok/dist/lib/opengrok.jar -- -R /var/opengrok/etc/read-only.xml -m 256 -c /usr/bin/ctags -s /var/opengrok/src/ -d /var/opengrok/data --remote on -H -P -S -G -W /var/opengrok/etc/configuration.xml --progress -v -O on -T 3 --assignTags --search --remote on -i *.so -i *.o -i *.a -i *.class -i *.jar -i *.apk -i *.tar -i *.bz2 -i *.gz -i *.obj -i *.zip"

Thank you for your help in advance.

Kind Regards

Siegfried

sikienzl
  • 23
  • 1
  • 4

1 Answers1

0

You should try to increase the number of threads using the following options:

  --historyThreads number
    The number of threads to use for history cache generation on repository level. By default the number of threads will be set to the number of available CPUs.
    Assumes -H/--history.
    
  --historyFileThreads number
    The number of threads to use for history cache generation when dealing with individual files.
    By default the number of threads will be set to the number of available CPUs.
    Assumes -H/--history.

   -T, --threads number
    The number of threads to use for index generation, repository scan
    and repository invalidation.
    By default the number of threads will be set to the number of available
    CPUs. This influences the number of spawned ctags processes as well.

Take a look at the "renamedHistory" option too. Theoretically "off" is the default option but this has a huge impact on the index time, so it's worth the check:

  --renamedHistory on|off
    Enable or disable generating history for renamed files.
    If set to on, makes history indexing slower for repositories
    with lots of renamed files. Default is off.
  • Thanks for your answer. I have used the option -T with 3 threads (4 Cores - 1). What I don’t understand is that if I use the historyThread and historyFileThreads options, if these are additional threads to -T or if they are used by -T. – sikienzl Jun 12 '22 at 07:22
  • I'm not sure but I think they're additional to -T. Do some tests changing the values. – Marcelo Ávila de Oliveira Jun 12 '22 at 21:06
  • Ok. I will try your tip with -T 4 threads, --historyThreads 2 and --historyFileThreads 2. We will see. – sikienzl Jun 20 '22 at 06:47
  • I have run the command with T 4 and --historyThreads 2 and --historyFileThreads and it tooks more then > 22 hours. – sikienzl Jun 23 '22 at 06:14
  • Short update: After 18 hours it breaks with the following error: Indexer command failed (return code -9) – sikienzl Jun 27 '22 at 06:33
  • I have found a solution for now. If i put the following into the the read-only.xml, i can reduce the indexer time down to 5 hours: ```... 2 2 true 4096 ...``` – sikienzl Jun 29 '22 at 06:18
  • 1
    "historyCachePerPartesEnabled" is "true" by default, the "historyChunkCount" defaults to "128K" on Mercurial and "64K" on Git, so I don't understand how 4096 would improve the indexing time. I think something different did the trick. – Marcelo Ávila de Oliveira Jun 29 '22 at 13:04
  • You are right. In principle i have upgraded the java-version from 11 to 17 and use the following options now: -J=-XX:+Use1GC -J=-XX:-UseGCOverheadLimit -J=-XX:+AggressiveHeap -J=-XX:+UseStringDeduplication The count of the threads are still 3 now. Thanks for your help. – sikienzl Oct 07 '22 at 11:03