0

I encountered a production error I hope someone can educate me on. I'm running a MongoDB AMI instance (1000 IOPS) in AWS in a cluster (2 dbs + arbiter). My app server connecting to the database resides on separate instances.

After weeks of running fine I got the exception below, which effectively shut down my db access until I rebooted the app server nodes. The system is pretty low-traffic, i.e. the servers in no way appear slammed (single-digit CPU utilization).

Below are my mongo config option settings:

auto_connect_retry = false
connections_per_host = 100
threads_multiplier = 50
max_wait_time = 120000
connect_timeout = 10000
socket_timeout = 60000

I need to understand what happened and if I can do anything to prevent it next time.

18:14:15.482 AWECluster-akka.actor.default-dispatcher-3 ERROR akka.actor.OneForOneStrategy - Write operation to server /10.0.8.10:27017 failed on database awe_prod_preview
com.mongodb.MongoException$Network: Write operation to server /10.0.8.10:27017 failed on database awe_prod_preview
    at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:153) ~[deps.jar:0.1-SNAPSHOT]
    at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:115) ~[deps.jar:0.1-SNAPSHOT]
    at com.mongodb.DBApiLayer$MyCollection.update(DBApiLayer.java:327) ~[deps.jar:0.1-SNAPSHOT]
    at com.mongodb.DBCollection.update(DBCollection.java:178) ~[deps.jar:0.1-SNAPSHOT]
    at com.mongodb.DBCollection.save(DBCollection.java:818) ~[deps.jar:0.1-SNAPSHOT]
    at com.mongodb.casbah.MongoCollectionBase$class.save(MongoCollection.scala:573) ~[deps.jar:0.1-SNAPSHOT]
    at com.mongodb.casbah.MongoCollection.save(MongoCollection.scala:866) ~[deps.jar:0.1-SNAPSHOT]
    at com.novus.salat.dao.SalatDAO.save(SalatDAO.scala:404) ~[deps.jar:0.1-SNAPSHOT]
    at com.novus.salat.dao.ModelCompanion$class.save(ModelCompanion.scala:272) ~[deps.jar:0.1-SNAPSHOT]
    at awe.etl._ETL$$anonfun$5$$anon$2.save(ETL.scala:75) ~[shock-etl.jar:0.3.22]
    at com.novus.salat.dao.BaseDAOMethods$class.save(DAO.scala:127) ~[deps.jar:0.1-SNAPSHOT]
    at awe.etl._ETL$$anonfun$5$$anon$2.save(ETL.scala:75) ~[shock-etl.jar:0.3.22]
    at awe.etl.step.Publisher.publishNow(Publish.scala:24) ~[shock-etl.jar:0.3.22]
    at awe.etl.ETLActor$$anonfun$receive$1$$anonfun$applyOrElse$4$$anonfun$apply$4.apply(ETLActor.scala:116) ~[shock-etl.jar:0.3.22]
    at awe.etl.ETLActor$$anonfun$receive$1$$anonfun$applyOrElse$4$$anonfun$apply$4.apply(ETLActor.scala:114) ~[shock-etl.jar:0.3.22]
    at scala.Option.fold(Option.scala:157) ~[deps.jar:0.1-SNAPSHOT]
    at awe.etl.ETLActor$$anonfun$receive$1$$anonfun$applyOrElse$4.apply(ETLActor.scala:114) ~[shock-etl.jar:0.3.22]
    at awe.etl.ETLActor$$anonfun$receive$1$$anonfun$applyOrElse$4.apply(ETLActor.scala:113) ~[shock-etl.jar:0.3.22]
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[deps.jar:0.1-SNAPSHOT]
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[deps.jar:0.1-SNAPSHOT]
    at awe.etl.ETLActor$$anonfun$receive$1.applyOrElse(ETLActor.scala:113) ~[shock-etl.jar:0.3.22]
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) [deps.jar:0.1-SNAPSHOT]
    at akka.actor.ActorCell.invoke(ActorCell.scala:456) [deps.jar:0.1-SNAPSHOT]
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) [deps.jar:0.1-SNAPSHOT]
    at akka.dispatch.Mailbox.run(Mailbox.scala:219) [deps.jar:0.1-SNAPSHOT]
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) [deps.jar:0.1-SNAPSHOT]
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [deps.jar:0.1-SNAPSHOT]
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [deps.jar:0.1-SNAPSHOT]
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [deps.jar:0.1-SNAPSHOT]
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [deps.jar:0.1-SNAPSHOT]
Caused by: java.net.SocketException: Connection timed out
    at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.6.0_24]
    at java.net.SocketInputStream.read(SocketInputStream.java:146) ~[na:1.6.0_24]
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) ~[na:1.6.0_24]
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) ~[na:1.6.0_24]
    at java.io.BufferedInputStream.read(BufferedInputStream.java:334) ~[na:1.6.0_24]
    at org.bson.io.Bits.readFully(Bits.java:46) ~[deps.jar:0.1-SNAPSHOT]
    at org.bson.io.Bits.readFully(Bits.java:33) ~[deps.jar:0.1-SNAPSHOT]
    at org.bson.io.Bits.readFully(Bits.java:28) ~[deps.jar:0.1-SNAPSHOT]
    at com.mongodb.Response.<init>(Response.java:40) ~[deps.jar:0.1-SNAPSHOT]
    at com.mongodb.DBPort.go(DBPort.java:142) ~[deps.jar:0.1-SNAPSHOT]
    at com.mongodb.DBPort.go(DBPort.java:106) ~[deps.jar:0.1-SNAPSHOT]
    at com.mongodb.DBPort.findOne(DBPort.java:162) ~[deps.jar:0.1-SNAPSHOT]
    at com.mongodb.DBPort.runCommand(DBPort.java:170) ~[deps.jar:0.1-SNAPSHOT]
    at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:100) ~[deps.jar:0.1-SNAPSHOT]
    at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:142) ~[deps.jar:0.1-SNAPSHOT]
    ... 29 common frames omitted
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
Greg
  • 10,696
  • 22
  • 68
  • 98
  • Do you have any logs from the mongo server? Is there a reason why this `auto_connect_retry` is false? – datasage Oct 14 '13 at 17:56
  • No errors in the db server logs--clean on both nodes. Should auto_connect_retry be true in a production environment? – Greg Oct 14 '13 at 18:07
  • I'm not familar with all mongo drivers, it appears that in this case the connection to mongo is persistent. For whatever reason the connection was terminated, but was not able to be re-established. – datasage Oct 14 '13 at 18:16
  • Auto connect retry should be false - that's the default, and best only to change this if you need to – Trisha Oct 21 '13 at 13:42

1 Answers1

0

I had the same issue and fixed it by changing mongod configuration.

The default /etc/mongod.conf configuration file supplied by the 3.0 series packages has bind_ip set to 127.0.0.1 by default. Modify this setting as needed for your environment or comment this entire line to have access to your db from anywhere.

Hope it will help you.