3

I've written a program in Node and Express, using Request to connect to an API and downloads a bunch of data (think 3,000 API requests) (all within the usage limits of the API, mind you).

When running this in a Docker container, I'm getting a lot of getaddrinfo ENOTFOUND errors, and I'm wondering if this is a resourcing issue. My requests are like so:

request.get(url, function(err, resp, body){
  // do stuff with the body here, 
  // like create an object and handball to a worker function
});

For the first few hundred requests this always works fine, but then I get lots nad lots of either ENOTFOUND or timeout errors, and I think the issue might be the way my code is dealing with all these requests.

I've batched them in a queue with timeouts so the requests happen relatively slowly, it helps a little bit but doesn't solve the problem completely.

Do I need to destroy the body/response objects to free up memory or something?

JVG
  • 20,198
  • 47
  • 132
  • 210
  • To clarify: Do you send all the requests immediately? Did you consider restricting the number of concurrent requests (e.g. with async) – h0ru5 Jan 11 '16 at 22:47
  • you're most likely running into provisioning issues with the API you're calling – Mike Dinescu Jan 11 '16 at 22:48
  • @h0ru5 I've done something similar, in that I've created a queue array and a script that loops through it, doing 25 calls every 5 seconds. – JVG Jan 11 '16 at 22:48
  • Are you absolutely sure you're not hitting api limits? This usually happens either due to the target network limiting you or being overloaded, or your server/local network being overloaded. at 25/5 sec it's probably not a network issue, more likely to be limiting. – Kevin B Jan 11 '16 at 22:48
  • @MikeDinescu Possibly, but see comment above. – JVG Jan 11 '16 at 22:48
  • @KevinB Yes, if hitting the limit (4000 requests / hr) I get a response with a body from the API, these trigger an error response though, either timeout or address not found. I will need to update the question as I'm actually only sending about 3,000 requests. – JVG Jan 11 '16 at 22:50
  • @Jascination They might limit the number of concurrent connections from one IP (simple DOS protection), in which case they won't respond to additional requests. Instead of using a timer to pull from the queue, try sending out a certain number of requests at once to start the procedure (say 20 to start) and then pulling a new one from the queue every time one request completes (so that there are no more than 20 concurrent requests at a time). If you still have the issue decrease 20 to 5 and try again. – Paul Jan 11 '16 at 22:55
  • Can you profile memory usage to check if you are hitting boundaries there? I still don't think that is really the problem, but rather some DoS prevention. – h0ru5 Jan 11 '16 at 22:55

1 Answers1

2

I've encountered similar issues with an API I was using, and it ended up being what some here suggested - rate limits. Some APIs don't return readable errors on rate limits, as they provide certain amount of resources per client, and when you've used it all up they can't even send you a bad response.

This happened even though I've stayed within the published rate limits per day, but turned out they have an unwritten limit per minute (or more like - just unable to process so many requests).

I answered it though by mocking that API with my own code, placing it in the network so it will maximize the similarities, and as my mocked code didn't do anything, I never got any errors in the NodeJS server.

Then I put it some countdowns and timeouts when it was needed.

I suggest the same to you. Remember them having a per hour limit, doesn't mean they don't have a different per second/minute limit.

AlexD
  • 4,062
  • 5
  • 38
  • 65