1

I have a batch test using jMeter who sends several HTTP requests (GET) to processor NIFI using HandleHttpRequest and sends to Topic Kafka.

The problem is StandardHTTPContextMap returns SERVICE_UNAVAILABLE error, it seems this happens when rate of the dataflow is exceeding the provenance recording rate but i'm not sure.

Anyone have any idea? I drop a partial log:

2016-05-05 15:12:14,064 WARN [Timer-Driven Process Thread-7] 
o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate. Currently, there are 96 journal files (533328812 bytes) and threshold for blocking is 80 (1181116006 bytes)

2016-05-05 15:12:20,310 INFO [Provenance Repository Rollover Thread-2] 
o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (46096 records) into single Provenance Log File ./provenance_repository/8913710.prov in 43254 milliseconds

2016-05-05 15:12:20,314 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 65422 records

2016-05-05 15:12:20,398 INFO [Timer-Driven Process Thread-7]   
o.a.n.p.PersistentProvenanceRepository Provenance Repository has now caught up with rolling over journal files. Current number of journal files to be rolled over is 80

2016-05-05 15:12:20,399 INFO [Timer-Driven Process Thread-7] 
o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 9190418

2016-05-05 15:12:21,422 INFO [qtp1693512967-121] 
o.a.n.p.standard.HandleHttpRequest HandleHttpRequest[id=3858f0ad-b165-427b-a460-67fbf7cff0d8] Sending back a SERVICE_UNAVAILABLE response to 172.26.60.27; request was GET 172.26.60.27
RBT
  • 24,161
  • 21
  • 159
  • 240
galix85
  • 167
  • 3
  • 13

1 Answers1

4

You are correct in your analysis that the HTTP response you see is coming from the HttpContextMap[1]. Specifically the 'Request Expiration' property. When the request has been in the Map for over the configured amount it will automatically reply with SERVICE_UNAVAILABLE.

My guess at your problem is that NiFi is taking too long too to process all the requests you are submitting, causing the Provenance Repo to force roll over, which is a "Stop the world" event. So you stopped processing any data for 6 seconds (causing the requests to expire).

Assuming you don't want to just accept random 6 second "Stop the world" events and without knowing anything about your flow or configuration, you essentially need to either scale or adjust your flow. A couple of options being:

  • Scale to bigger nodes or more nodes
  • Process larger FlowFiles instead of many FlowFiles (helps a lot to speed up Provenance)
  • Push provenance repo to its own disk/Push it to many disks

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.http.StandardHttpContextMap/index.html

JDP10101
  • 1,852
  • 13
  • 20
  • Well, I tried to simplify dataflow to `HandleHttpRequest` - 'HandleHttpResponse` but nothing, the problem still there, is it possible something related with network. For instance: `keep-alive` header from browsers may keep open connection too long and for these reason NIFI may throw the "SERVICE_UNAVAILABLE"??? – galix85 May 24 '16 at 14:18
  • Hmm, well that's odd. The `keep-alive` header shouldn't have any effect on the provenance repo or the Context Map responding 503. A couple questions to get a frame of reference, I'm assuming this dataflow is the only thing running on your NiFi instance? Also is there anything usual about your set up? ie. running on a heavily taxed system, a really really really old system, etc. How many requests are you feeding the instance? Lastly, when was the last time you restarted? (I've seen weirder things come up when people leave their computer up for months/years) – JDP10101 May 24 '16 at 14:44
  • NIFI process about 1000 req/s running jMeter tests and then it's when throws SERVICE_UNAVAILABLE, but is unstable, sometimes with 200 req/s or less still throws the exception. NIFI is running in a virtual machine (4 cores 8GB Ram) I've to confirm this info. Do you have any idea about what may happen? – galix85 May 24 '16 at 16:17
  • Do you have back pressure configured for the connection coming off of the HandleHttpRequest? There are only 3 times I see the 503 would get returned: the flow takes too long to respond and context map returns 503; context map is full and HandleHttpRequest returns 503; back pressure is being applied to the connection and HandleHttpRequest responds 503 – JDP10101 May 24 '16 at 20:35
  • I've `Back pressure` with default value (0). I run benchmark tests with different setups: - `50threads with 2 req/s` with a positive result - `100threads with 1 req/s` with negative result (SERVICE_UNAVAILABLE) Our real case it's about thousands of users sending 1 req/min aprox. Which setup do you recommend? Is NIFI appropieate for this use case? Thanks again – galix85 May 25 '16 at 11:19
  • Is there a reason you set it to have so many threads? You may be causing it to "suffocate" itself by having so many threads. NiFi is appropriate for this use-case but try just using one or two threads and if you see the requests timing out then increase it. – JDP10101 May 25 '16 at 21:28
  • I think there have been a misunderstanding, when i'm talking about threads I mean "benchmark" threads. I've index.thread = 1 and "concurrent tasks" = 2 per processor in NIFI. What do you think about it? Thanks – galix85 May 27 '16 at 10:21