17

I get this error when making a lot of calls from my Azure Function. What does this error mean? How to troubleshoot? My guess is I'm running out of TCP sockets? I don't really know how to check that in the Function App menu. However I checked the Logs for the Azure Maps API and there is no record of errors or dropped calls so I think it's definitely an axios/function issue.

I have seen some suggestions to add Connection:Keep-Alive header or to even create an axios instance and reuse it. However I am not sure if my problem is even related to that.

Error: connect ETIMEDOUT 13.107.42.21:443
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16) {   errno: 'ETIMEDOUT',   code: 'ETIMEDOUT',   syscall: 'connect',   address: '13.107.42.21',   port: 443,   config: {
    url: 'https://atlas.microsoft.com/search/fuzzy/json?api-version=1.0&subscription-key= myKey &query=myAddress USA',
    method: 'get',
    headers: {
      Accept: 'application/json, text/plain, */*',
      'x-ms-client-id': 'my client ID',
      'User-Agent': 'axios/0.19.2'
    },
    transformRequest: [ [Function: transformRequest] ],
    transformResponse: [ [Function: transformResponse] ],
    timeout: 0,
    adapter: [Function: httpAdapter],
    xsrfCookieName: 'XSRF-TOKEN',
    xsrfHeaderName: 'X-XSRF-TOKEN',
    maxContentLength: -1,
    validateStatus: [Function: validateStatus],
    host: 'atlas.microsoft.com',
    data: undefined   },   request: Writable {
    _writableState: WritableState {
      objectMode: false,
      highWaterMark: 16384,
      finalCalled: false,
      needDrain: false,
      ending: false,
      ended: false,
      finished: false,
      destroyed: false,
      decodeStrings: true,
      defaultEncoding: 'utf8',
      length: 0,
      writing: false,
      corked: 0,
      sync: true,
      bufferProcessing: false,
      onwrite: [Function: bound onwrite],
      writecb: null,
      writelen: 0,
      afterWriteTickInfo: null,
      bufferedRequest: null,
      lastBufferedRequest: null,
      pendingcb: 0,
      prefinished: false,
      errorEmitted: false,
      emitClose: true,
      autoDestroy: false,
      bufferedRequestCount: 0,
      corkedRequestsFree: [Object]
    },
    writable: true,
    _events: [Object: null prototype] {
      response: [Function: handleResponse],
      error: [Function: handleRequestError]
    },
    _eventsCount: 2,
    _maxListeners: undefined,
    _options: {
      protocol: 'https:',
      maxRedirects: 21,
      maxBodyLength: 10485760,
      path: '/search/fuzzy/json?api-version=1.0&subscription-key= myKey &query=myAddress%20USA',
      method: 'GET',
      headers: [Object],
      agent: undefined,
      agents: [Object],
      auth: undefined,
      hostname: 'atlas.microsoft.com',
      port: null,
      nativeProtocols: [Object],
      pathname: '/search/fuzzy/json',
      search: '?api-version=1.0&subscription-key= myKey &query=myAddress%20USA'
    },
    _redirectCount: 0,
    _redirects: [],
    _requestBodyLength: 0,
    _requestBodyBuffers: [],
    _onNativeResponse: [Function],
    _currentRequest: ClientRequest {
      _events: [Object: null prototype],
      _eventsCount: 6,
      _maxListeners: undefined,
      outputData: [],
      outputSize: 0,
      writable: true,
      _last: true,
      chunkedEncoding: false,
      shouldKeepAlive: false,
      useChunkedEncodingByDefault: false,
      sendDate: false,
      _removedConnection: false,
      _removedContLen: false,
      _removedTE: false,
      _contentLength: 0,
      _hasBody: true,
      _trailer: '',
      finished: true,
      _headerSent: true,
      socket: [TLSSocket],
      connection: [TLSSocket],
      _header: 'GET /search/fuzzy/json?api-version=1.0&subscription-key= myKey &query=myAddress%20USA HTTP/1.1\r\n' +
        'Accept: application/json, text/plain, */*\r\n' +
        'x-ms-client-id: my client ID\r\n' +
        'User-Agent: axios/0.19.2\r\n' +
        'Host: atlas.microsoft.com\r\n' +
        'Connection: close\r\n' +
        '\r\n',
      _onPendingData: [Function: noopPendingOutput],
      agent: [Agent],
      socketPath: undefined,
      method: 'GET',
      insecureHTTPParser: undefined,
      path: '/search/fuzzy/json?api-version=1.0&subscription-key= myKey &query=myAddress%20USA',
      _ended: false,
      res: null,
      aborted: false,
      timeoutCb: null,
      upgradeOrConnect: false,
      parser: null,
      maxHeadersCount: null,
      reusedSocket: false,
      _redirectable: [Circular],
      [Symbol(kCapture)]: false,
      [Symbol(kNeedDrain)]: false,
      [Symbol(corked)]: 0,
      [Symbol(kOutHeaders)]: [Object: null prototype]
    },
    _currentUrl: 'https://atlas.microsoft.com/search/fuzzy/json?api-version=1.0&subscription-key= myKey &query=myAddress%20USA',
    [Symbol(kCapture)]: false   },   response: undefined,   isAxiosError: true,   toJSON: [Function] }
search-learn
  • 1,037
  • 1
  • 9
  • 23

5 Answers5

34

So essentially after talking to support for a whole month, they told me I was having SNAT problems. Basically running out of outbound ports... which didn't make sense to me since I was using a single axios instance and sharing it across the board. However after reading the provided documentation I came to the conclusion some additional changes needed to be made.

So I had a file in my main folder (in the wwwroot folder) that was called getAxios.js, this file would create a connection to a base url.

Directory layout:

-wwwroot

--getAxios.js

--myFunction

getAxios.js code:

const axios = require('axios')
const https = require('https')
const domain = 'https://atlas.microsoft.com'
let instance

module.exports = function (context)
{
    if (!instance)
    {
        //create axios instance
        instance = axios.create({
            baseURL: domain,
            timeout: 60000, //optional
            httpsAgent: new https.Agent({ keepAlive: true }),
            headers: {'Content-Type':'application/xml'}
        })
    }

    return instance;
}

However I didn't have the timeout set to anything specific before and I was not specifying keepAlive:true . Adding those two additional parameters has fixed my issue.

To explain a little more about the ports being exhausted, when a request is made and keepAlive is false that port connection is closed, but it has a small timeout duration before it can be used again. When you are processing a high volume of requests a bunch of these ports are in this state and can't be used and you end up having this issue. This is what MS support explained to me. I am sure I may be miswording a good bit of the networking aspect.

search-learn
  • 1,037
  • 1
  • 9
  • 23
  • 1
    @RyanRodemoyer Here is my answer – search-learn Aug 25 '20 at 19:47
  • Thanks so much!! My function runs successfully for the duration of all calls. – Ryan Rodemoyer Aug 27 '20 at 21:30
  • Very good! I had this problem in my script for a week and I couldn't find the solution! You saved my week! Thanks! – Carlos Henrique Feb 14 '21 at 15:47
  • Thank you so much! We have been banging our heads against the wall for a while on this issue. Using the same HTTPS agent for multiple requests fixed the SNAT port exhaustion issue for us. We also used this resource to confirm the problem root cause: https://www.danielstechblog.io/detecting-snat-port-exhaustion-on-azure-kubernetes-service/ – Paul Mougel May 23 '23 at 08:25
2

As per @search-learn, I solved my problem by explicitly specifying the timeout and httpsAgent properties on the Axios instance.

import axios from "axios";
const https = require("https");

const auth: string = Buffer.from(`:${envconfig.PAT}`).toString("base64");
axios.defaults.headers.common["Authorization"] = `Basic ${auth}`;
axios.defaults.timeout = 30000;
axios.defaults.httpsAgent = new https.Agent({ keepAlive: true });
Ryan Rodemoyer
  • 5,548
  • 12
  • 44
  • 54
0

Anybody who is going through this issue should definitely go through the below link. This article is written by Microsoft Azure developers themselves tackling this issue. I was making more than 1 lakh calls in my project and almost >95% of the calls were giving timeout error. I have applied the solution from the link with a small tweak and it is working.

https://azureossd.github.io/2022/03/10/NodeJS-with-Keep-Alives-and-Connection-Reuse/index.html

I have summarized the cause of the issue and solution below:

  1. There are only 128 to 160 slots available in a functions app, by their network design.
  2. When no more slots are available, Axios does not wait by default and starts to give back timeout errors for pending requests.
  3. Solution is to re-use the existing slots. (Roughly, trigger calls till slots are available and wait for the slots to become free and fire again, meanwhile make sure to keep all the pending request packets alive)
  4. The above point can be done by using 'agentkeepalive' package and passing keep alive agent options to the axios.

NOTE 1: Below code is mostly taken from the above link, apart from a minor tweak. Full credits and kudos to them.

const Agent = require('agentkeepalive');   
const HttpsAgent = require('agentkeepalive').HttpsAgent;

const keepAliveAgent = new Agent({
    maxSockets: 160,
    maxFreeSockets: 160,
    timeout: 60000,
    freeSocketTimeout: 30000,
    keepAliveMsecs: 60000 });

const httpsKeepAliveAgent = new HttpsAgent({
    maxSockets: 160,
    maxFreeSockets: 160,
    timeout: 60000,
    freeSocketTimeout: 30000,
    keepAliveMsecs: 60000 });

const axiosInstance = axios.create({
    baseURL: config.domain,
    httpAgent: keepAliveAgent,
    httpsAgent: httpsKeepAliveAgent });

Use the axiosInstance to make calls where ever as you see fit.

NOTE 2: This solution is similar to the one provided by 'search-learn' for this question. For some reason the exact solution did not work for me. But thanks a lot for pointing in the right direction.

sprash95
  • 137
  • 9
0

While load testing our application, I faced ETIMEOUT errors due to exhaustion of TCP ports. I use NestJS for our backend. Posting the solution which worked for me.

  1. Create a singleton axios instance to use throughout the application.

import axios, { AxiosInstance } from 'axios';
import http from 'node:http';
import https from 'node:https';

export default class AxiosCustomInstance {
  private static instance: AxiosInstance;

  public static getInstance(): AxiosInstance {
    if (!AxiosCustomInstance.instance) {
      const httpAgent = new http.Agent({
        keepAlive: true,
        timeout: 60000,
        scheduling: 'fifo',
      });
      const httpsAgent = new https.Agent({
        keepAlive: true,
        timeout: 60000,
        scheduling: 'fifo',
      });
      AxiosCustomInstance.instance = axios.create({
        httpAgent,
        httpsAgent,
      });
    }

    return AxiosCustomInstance.instance;
  }
}

Link for the http/https agent configs: NodeJS Docs

Now use this class to fetch the axios instance to use anywhere in your application

import AxiosCustomInstance from 'your path';

async yourFunc() {

  await AxiosCustomInstance.getInstance().request({
        method: '',
        url: '',
      });
}
0

If someone is using Google Cloud with some kind of VPC + Cloud NAT. Try to increase the reserved port amount on the NAT page. By default it has a minimum of 64 ports, I increased that number and the problem start to disappear.

Cloud Nat config

Consider combinating this with enabling the keepAlive option on Axios for a more proper solution.

Cheers!

Alejandro Barone
  • 1,743
  • 2
  • 13
  • 24
  • 1
    I would say that is not a good approach because increasing ports, is only creating a larger time gap before the issue falls in on you. Essentially delaying the issue and not solving it. However setting min ports higher with the combination of reusing them definitely sounds like a proper approach – search-learn Jun 21 '23 at 21:06
  • Thats right!, a combination with the axios `keepAlive` option sounds like a proper approach. In my case, even with the keepAlive enabled I was having some errors on high loads, so this combination helps a lot – Alejandro Barone Jun 21 '23 at 21:30