I seem to be getting different results from ES and CouchDB, ES has only 2 older documents which CouchDB doesn't have anymore, and CouchDB has many new documents than ES doesn't see at all. What causes this, and how do find out what the state of the CouchDB river is?
Here's my requests:
#ES has Document-1...
$curl http://localhost:9200/portal_production/portal_production/_search?pretty=true\&q=_id:Document-1
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.0,
"hits": [
{
"_index": "portal_production",
"_type": "portal_production",
"_id": "Document-1",
"_score": 1.0,
"_source": {
"_rev": "2-2a986416ddb8a95446b0e143739094d2",
"text": " FILE TYPE : INTERROGATION\n FILE TITLE : TMJ06001.A91\n FILE CREATED : 01 JANUARY 2006 AT 00:00\n\n! This file contains all detections for 2006 from the juvenile bypass outfall.\n! The tags were detected using an FS-2001F portable transceiver and flat-plate\n! antenna. These data were compiled from the original files by Dave Marvin,\n! PTAGIS. The original data files are listed in the data stream below, \n! followed by their contents.\n\n! TMJ06032.A1\n| 01 02/16/06 18:34:51 3D9.1BF11B4053 XX 91\n| 01 02/16/06 19:08:15 3D9.1BF1E7919A XX 91\n| 01 02/16/06 19:18:36 3D9.1BF1A998FA XX 91\n| 01 02/17/06 18:21:03 3D9.1BF20E8FE2 XX 91\n| 01 02/20/06 18:27:01 3D9.1BF11BFFF5 XX 91\n| 01 02/22/06 01:56:38 3D9.1BF23F62D4 XX 91\n| 01 02/22/06 03:56:10 3D9.1BF234346C XX 91\n| 01 02/22/06 17:59:11 3D9.1BF2342E83 XX 91\n| 01 02/22/06 19:03:37 3D9.1BF23435A4 XX 91",
"_id": "Document-1"
}
}
]
}
}
#~but CouchDB has no Document-1
$ curl http://localhost:5984/portal_production/Document-1
{
"error": "not_found",
"reason": "missing"
}
#CouchDB has Document-1000...
$ curl http://localhost:5984/portal_production/Document-1000
{
"_id": "Document-1000",
"_rev": "8-d7f049228abc6311a920f9f7786ab9a4",
"text": null,
"metadata": [],
"data": [
{
"1": "07/22/08 18:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 18:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 18:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 19:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 19:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 19:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 19:49:45",
"2": "3D9.1C2C42D260"
},
{
"1": "07/22/08 20:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 20:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 20:14:38",
"2": "3D9.1C2C54F95E"
},
{
"1": "07/22/08 20:22:24",
"2": "3D9.1BF1FDA622"
},
{
"1": "07/22/08 20:49:28",
"2": "3D9.1C2C42D260"
},
{
"1": "07/22/08 21:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 21:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 21:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 22:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 22:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 22:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 22:49:27",
"2": "3D9.1C2C42D260"
},
{
"1": "07/22/08 23:09:22",
"2": "3E7.0000001DFF"
},
{
"1": "07/22/08 23:09:22",
"2": "3E7.0000001DFF"
}
],
"foreign_keys": [],
"primary_keys": [
"1",
"2"
]
}
#~but ES has no Document-1000
$ curl http://localhost:9200/portal_production/portal_production/_search?pretty=true\&q=_id:Document-1000
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
#Everything ES has:
$ curl http://localhost:9200/portal_production/portal_production/_search?pretty=true
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.0,
"hits": [
{
"_index": "portal_production",
"_type": "portal_production",
"_id": "Document-1",
"_score": 1.0,
"_source": {
"_rev": "2-2a986416ddb8a95446b0e143739094d2",
"text": " FILE TYPE : INTERROGATION\n FILE TITLE : TMJ06001.A91\n FILE CREATED : 01 JANUARY 2006 AT 00:00\n\n! This file contains all detections for 2006 from the juvenile bypass outfall.\n! The tags were detected using an FS-2001F portable transceiver and flat-plate\n! antenna. These data were compiled from the original files by Dave Marvin,\n! PTAGIS. The original data files are listed in the data stream below, \n! followed by their contents.\n\n! TMJ06032.A1\n| 01 02/16/06 18:34:51 3D9.1BF11B4053 XX 91\n| 01 02/16/06 19:08:15 3D9.1BF1E7919A XX 91\n| 01 02/16/06 19:18:36 3D9.1BF1A998FA XX 91\n| 01 02/17/06 18:21:03 3D9.1BF20E8FE2 XX 91\n| 01 02/20/06 18:27:01 3D9.1BF11BFFF5 XX 91\n| 01 02/22/06 01:56:38 3D9.1BF23F62D4 XX 91\n| 01 02/22/06 03:56:10 3D9.1BF234346C XX 91\n| 01 02/22/06 17:59:11 3D9.1BF2342E83 XX 91\n| 01 02/22/06 19:03:37 3D9.1BF23435A4 XX 91",
"_id": "Document-1"
}
},
{
"_index": "portal_production",
"_type": "portal_production",
"_id": "Ifilter-1",
"_score": 1.0,
"_source": {
"headers": [
{
"val": "[ ]*(FILE[ ]+TYPE)[ ]*:[ ]*([A-Z]+)",
"id": "0"
},
{
"val": "[ ]*(FILE[ ]+TITLE)[ ]*:[ ]*([A-Z0-9.]+)",
"id": "1"
},
{
"val": "[ ]*(FILE[ ]+CREATED)[ ]*:[ ]*([A-Z0-9: ]+)",
"id": "2"
}
],
"_rev": "4-d9c8e771bc345d1182fbe7c2d63f5d00",
"_id": "Ifilter-1",
"filter_headers": {
"2": "[ ]*(FILE[ ]+CREATED)[ ]*:[ ]*([A-Z0-9: ]+)",
"1": "[ ]*(FILE[ ]+TITLE)[ ]*:[ ]*([A-Z0-9.]+)",
"0": "[ ]*(FILE[ ]+TYPE)[ ]*:[ ]*([A-Z]+)"
}
}
}
]
}
}
Found in logs
Sorry, I have been getting mauled by a bigger monster. Anyway, found an issue:
[2013-08-19 17:55:08,379][WARN ][river.couchdb ] [Morning Star] [couchdb][portal_production] failed to read from _changes, throttling....
java.io.IOException: Bogus chunk size
at sun.net.www.http.ChunkedInputStream.processRaw(ChunkedInputStream.java:319)
at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:572)
at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3052)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at org.elasticsearch.river.couchdb.CouchdbRiver$Slurper.run(CouchdbRiver.java:477)
at java.lang.Thread.run(Thread.java:724)
[2013-08-19 17:55:13,392][WARN ][river.couchdb ] [Morning Star] [couchdb][portal_production] failed to read from _changes, throttling....`