0

I am trying to improve writing performance between a C client program and a single node of InfluxDB.

Currently my record is 2.526K writes per seconds, as seen in the screenshot below:

My C program is basically an infinite loop that produces HTTP POST requests with the use of libcurl.

Here is the code responsible for the POST requests:

int configure_curl_easy_operation(CURL *curl_easy_handler)
{
  // using this doc page https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
  // behavior options
  curl_easy_setopt(curl_easy_handler, CURLOPT_VERBOSE, 1L);

  // callback options

  // error options

  // network options
  //curl_easy_setopt(curl_easy_handler, CURLOPT_URL, "http://localhost:8086/ping"); an old test
  curl_easy_setopt(curl_easy_handler, CURLOPT_URL, "http://localhost:8086/write?db=XXX_metrics");
  curl_easy_setopt(curl_easy_handler, CURLOPT_HTTP_CONTENT_DECODING, 0L);
  curl_easy_setopt(curl_easy_handler, CURLOPT_TRANSFER_ENCODING, 0L);
  //curl_easy_setopt(curl_easy_handler, CURLOPT_HTTPHEADER, )// work here
  curl_easy_setopt(curl_easy_handler, CURLOPT_PROTOCOLS, CURLPROTO_HTTP);
  curl_easy_setopt(curl_easy_handler, CURLOPT_POST, 1L);
  curl_easy_setopt(curl_easy_handler, CURLOPT_REDIR_PROTOCOLS, 0L);
  curl_easy_setopt(curl_easy_handler, CURLOPT_DEFAULT_PROTOCOL, "http");
  curl_easy_setopt(curl_easy_handler, CURLOPT_FOLLOWLOCATION, 0L);
  //curl_easy_setopt(curl_easy_handler, CURLOPT_HTTPHEADER, NULL);

  // NAMES and PASSWORDS OPTIONS

  // HTTP OPTIONS
  // curl_easy_setopt(curl_easy_handler, CURLOPT_HTTPGET, 0L);

  // SMTP OPTIONS

  // TFTP OPTIONS

  // FTP OPTIONS

  // RTSP OPTIONS

  // PROTOCOL OPTIONS

  if (curl_easy_setopt(curl_easy_handler, CURLOPT_POSTFIELDS, "metrics value0=0,value1=872323,value2=928323,value3=238233,value4=3982332,value5=209233,value6=8732632,value7=4342421,value8=091092744,value9=230944\nmetrics value10=0,value11=872323,value12=928323,value13=238233,value14=3982332,value15=209233,value16=8732632,value17=4342421,value18=091092744,value19=230944") != CURLE_OK)
    return (1);
  //curl_easy_setopt(curl_easy_handler, CURLOPT_MIMEPOST, mime);

  // CONNECTION OPTIONS

  // SSL and SECURITY OPTIONS

  // SSH OPTIONS

  // OTHER OPTIONS

  // TELNET OPTIONS
  return (0);
}
int do_things(t_contexts_handlers *ctxts_handlers)
{
  while (g_running)
    {
      if ((configure_curl_easy_operation(ctxts_handlers->curl.curl_easy_handler)) != 0)
    {
      fprintf(stderr, "Stop running after an error occured before making a curl operation\n");
      g_running = 0;
      continue;
    }
      if (curl_easy_perform(ctxts_handlers->curl.curl_easy_handler) != CURLE_OK)
    fprintf(stderr, "an error occured\n");
    }
  return (0);
}
  1. I don't use threads (so far)
  2. I use the easy API (so far)
  3. I've changed some configuration settings (but they didn't improve performance):
access-log-path : "/dev/null" 
pprof-enabled : false 
unix-socket-enabled : false 
[ifql] enabled : false 
[subscriber] enabled : false

Do you have some ideas to improve performance?

EDIT: As you can see, the first screenshot is not the one corresponding to the C code above. Here is the correct one:

Ronan Boiteau
  • 9,608
  • 6
  • 34
  • 56
Oscar
  • 1,071
  • 13
  • 26
  • 1
    Looks like you don't use batching of data points at all: one point - 1 http post. Try posting data in batches of 1000-10000 points per post. – Yuri Lachin Apr 25 '18 at 15:04
  • Can you provide me a simple example ? I was thinking that the \n character will batch two points and multiply my performances, regarding to the line protocol, but perhaps I missed something... – Oscar Apr 25 '18 at 15:08
  • I'm sorry - didn't notice \n. Still batch size has to be large enough to become noticable. You'll have to experiment to find optimum. Also looks strange that you have value0-value9 fields in first record and value10-value19 in the second. Typically field names are the same for all datapoints. Is it intended? And it is better to put explicit and different timestamps for each line, otherewise there is a change influxdb will treat all lines as having the same timestamp. – Yuri Lachin Apr 25 '18 at 15:41
  • Thank you for your answer, I will try with more data points (just copy/past the first one will do the job right ?). Yep, it's an error from my perception of "batching", now I have a better comprehension than when I wrote this code. I am not understanding why I have to put explitely the timestamp of every points, if I d'like to use the server's timestamp ? Actually I am benchmarking influxdb with some anonymized data, the production data will looks close but not exactly the same and I don't know yet if I they are timestamped or not, so I presume that the timestamp given by the server is great. – Oscar Apr 25 '18 at 23:36
  • Regarding timestamping: you are not using any tags in your data. So influxdb will treat all data points as one time serie and several points having identical timestamp will be considered as ONE datapoint actually and overwrite each other - only one point will be kept in db. Also don't forget that influx expects nanosecond ts precision by default. Review InfluxDB main concepts for better undestanding. – Yuri Lachin Apr 26 '18 at 07:43

1 Answers1

1

Try posting data in batches of 1000-10000 points per post. Batch size has to be large enough to become noticable. You'll have to experiment to find optimum.

And it is better to put explicit and different timestamps for each line, otherwise influxdb will treat all lines as having the same timestamp. In your case multiple points having identical timestamp will be considered as ONE datapoint actually and overwrite each other - only one point will be kept in db.

Yuri Lachin
  • 1,470
  • 7
  • 7