TL;DR
I have a vagrant hosted solr index on a windows 10 machine that fails and stops responding (connection reset by peer) without any exceptions in the solr logs. How can I start to debug what is going wrong?
Use Case/Problem
I am attempting to index a constant stream of user account data that has numerous deletes and updates per request. There an update to the stream of data every 4 to 5 seconds.
Everything seems to run smoothly until the solr index gets to ~5.5 million records. Then it fails without error or exception in the solr logs. The error the client receives is a Connection Reset by Peer. Looking at the solr vm, the solr instances has stopped running.
Here is the output of ps -aux | grep sorl
right after solr stops running:
solr 3048 0.0 0.0 16256 3612 ? Ss 17:23 0:00/lib/systemd/systemd --user
solr 3049 0.0 0.0 167420 3028 ? S 17:23 0:00 (sd-pam)
Then after a minute or two, the processes above disappear and there are no more solr processes running.
On inspecting the solr logs there are no errors or exceptions found.
VM Details
Here is the information about the vagrant instance (Vagrantfile).
config.vm.box = "ubuntu/disco64"
...
config.vm.provider "virtualbox" do |v|
v.memory 4096 (4 gigs)
v.cpus 4
end
The latest openjdk-8-jdk is installed.
Solr 8.20 is installed.
The solr service is installed in /vagrant/sorl, so in theory, there should be plenty of disk space. The vagrant instance is installed on an SSD drive that has 216 GB of space left.
Solr Config
I have tried followed this advice, Understanding Transaction Logs, Soft Commit and Commit In SolrCloud for configuring my solr index. I am trying to follow the Heavy (bulk) indexing and Index-heavy, Query-light strategies.
The only real values that I've changed in the default solrconfig.xml is setting openSearcher to true for autoCommit. I made this change so I could see the index as it grows and so I could query some data as the user account data stream is harvested.
<!-- AutoCommit
Perform a hard commit automatically under certain conditions.
Instead of enabling autoCommit, consider using "commitWithin"
when adding documents.
http://wiki.apache.org/solr/UpdateXmlMessages
maxDocs - Maximum number of documents to add since the last
commit before automatically triggering a new commit.
maxTime - Maximum amount of time in ms that is allowed to pass
since a document was added before automatically
triggering a new commit.
openSearcher - if false, the commit causes recent index changes
to be flushed to stable storage, but does not cause a new
searcher to be opened to make those changes visible.
If the updateLog is enabled, then it's highly recommended to
have some sort of hard autoCommit to limit the log size.
-->
<autoCommit>
<maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
<openSearcher>true</openSearcher>
</autoCommit>
I have increased the memory of the solr index to 2 gigs. Here is the output of ps -aux | grep java
when solr is running.
java -server
-Xms2056m
-Xmx2056m
-XX:+UseG1GC
-XX:+PerfDisableSharedMem
-XX:+ParallelRefProcEnabled
-XX:MaxGCPauseMillis=250
-XX:+UseLargePages
-XX:+AlwaysPreTouch
-verbose:gc
-XX:+PrintHeapAtGC
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-Xloggc:/vagrant/solr//logs/solr_gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=9
-XX:GCLogFileSize=20M
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=18983
-Dcom.sun.management.jmxremote.rmi.port=18983
-Dsolr.log.dir=/vagrant/solr//logs
-Djetty.port=8983
-DSTOP.PORT=7983
-DSTOP.KEY=solrrocks
-Duser.timezone=UTC
-Djetty.home=/opt/solr/server
-Dsolr.solr.home=/vagrant/solr//data
-Dsolr.data.home=
-Dsolr.install.dir=/opt/solr
-Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf -Dlog4j.configurationFile=file:/vagrant/solr//log4j2.xml
-Xss256k
-Dsolr.jetty.https.port=8983
-Dsolr.log.muteconsole
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /vagrant/solr//logs
-jar start.jar
--module=http
Other Background Information
I have worked with solr before, but never this in-depth or with this much aggressive data churn. My only real professional experience is adding a couple hundred thousand records into solr and then performing some easy queries, deleting the index, and then re-harvesting records back into the index...
Plea
Any friendly advice or comments on how to debug this problem would be greatly appreciated. I have searched and searched, but I cannot find anything that remotely looks like an answer for this problem.