I am trying to stream data from a file to elastic search using akka streams and elastic4s.
I have a Movie
object than can be inserted into elastic search and am able to index objects of this type using the httpclient:
val httpClient = HttpClient(ElasticsearchClientUri("localhost", 9200))
val response = Await.result(httpClient.execute {
indexInto("movies" / "movie").source(movie)
}, 10 seconds)
println(s"result: $response")
httpClient.close()
Now I am trying to use akka streams to index Movie
objects.
I have a function to create the sink:
def toElasticSearch(client: HttpClient)(implicit actorSystem: ActorSystem): Sink[Movie, NotUsed] = {
var count = 0
implicit val movieImporter = new RequestBuilder[Movie] {
import com.sksamuel.elastic4s.http.ElasticDsl._
def request(movie: Movie): BulkCompatibleDefinition = {
count = count + 1
println(s"inserting ${movie.id} -> ${movie.title} - $count")
index("movies", "movie").source[Movie](movie)
}
}
val subscriber = client.subscriber[Movie](
batchSize=10
, concurrentRequests = 2
, completionFn = () => {println(s"completion: all done")}
, errorFn = (t: Throwable) => println(s"error: $t")
)
Sink.fromSubscriber(subscriber)
}
and a test:
describe("a DataSinkService elasticsearch sink") {
it ("should write data to elasticsearch using an http client") {
var count = 0
val httpClient = HttpClient(ElasticsearchClientUri("localhost", 9200))
val graph = GraphDSL.create(sinkService.toElasticSearch(httpClient)) { implicit builder: GraphDSL.Builder[NotUsed] => s =>
val flow: Flow[JsValue, Movie, NotUsed] = Flow[JsValue].map[Movie](j => {
val m = Movie.fromMovieDbJson(j)
count = count + 1
println(s"parsed id:${m.id} - $count")
m
})
sourceService.fromFile(3, 50) ~> flow ~> s
ClosedShape
}
RunnableGraph.fromGraph(graph).run
Thread.sleep(20.seconds.toMillis)
println(s"\n*******************\ndone waiting...\n")
httpClient.close()
println(s"closed")
}
}
I send 47 elements sourceService.fromFile(3, 50)
The output shows:
- 20 elements processed (parsed in the flow and indexed in the sink)
done waiting
closed
completion: all done
(thecompletionFn
)
If I change the parameters of the subscriber batchSize
and concurrentRequests
to be 12 and 3 respectively, I see 36 elements parsed and indexed.
So it appears as if the sink stops accepting elements after the batchSize * concurrentRequests
.
My questions are:
- Does the elastic4s streaming solution work when using an
httpClient
- what am I missing