I am writing an application to read realtime data in an embedded environment and chuck it into a PostgreSQL 9.6 database.
The data consists of a header with meta data including a timestamp and 200 unsigned chars containing values. the 200 values are each values with an inferred timsetamp from the header by their position in the array.
There are various measurements distinguished by the interval, some are output every 10ms and others every 100ms. There is no pause, the data is coming out of an FPGA. There is no margin.
I decided to build a test platform to examine the best way of storing time series data. My criteria are the best solution for write performance and ease of querying. So my test platform is intended to write data to a set of tables which include a header with meta data and one of the following:- a bytea of 200 bytes a JSON field holding an array (JSONB was preferred hower libpq got upset, see later) an array of shorts ... and A header & hstore which IMHO will have a lot of duplicated data discrete values discrete values or hstore and inherited tables.
At the moment I have the first two running and thought I'd run some metrics.
The code already uses prepared parameters and prepared statements.
And the metrics were a shock. I was getting ~88 inserts/sec running against PostgreSQL 9.6 with 3 x nvme M.2 SSD's and an array in 8 drives in RAID 0 of SSD's, all set up as tablespaces with 32Gb RAM and a 4.2GHz i7.
I recompiled on the server and achieved a whopping 10% improvement.
So it was drop tools and look at performance as it plainly wasn't anywhere near the right ballpark.
Research led me to try wrapping it up in a transaction which gave me a tenfold improvement. That helps!
I'd like to try using multi line inserts but I can't see how to do this with libpq/libpqtypes, does anyone know how to do this or even if it is possible? Like:-
insert into table values (1,1),
(1,2),
(1,3),
(2,1); -- ... 100 times
WRT to JSONB (JSON works), I get the following error:-
code@snippet:~/src/pg/copy_in$ time ./bulk_load -t -n 1 -h 127.0.0.1 -d fpga_db -u fpga_user
bulk_load.c[186]: PQparamExecPrepared insert didn't go well, PGresult: ERROR: unsupported jsonb version number 91
INSERT INTO packet_test_jsonb (packet_date, module_id, channel_id, frame_count, packet_data) VALUES($1, $2, $3, $4, $5::jsonb)
Aborted (core dumped)
real 0m1.131s
user 0m0.040s
sys 0m0.020s
I couldn't see JSON or JSONB in libpqtypes, so the prepared statement uses VARCHAR for $5 with a cast in the insert which works for straight JSON. As I have used a cast in the insert, I have let the server decide .
I also couldn't see how to re-use my PQParam. Once one has done a PQparamReset(param), it isn't obvious (to me) how to just pop the value into the param instead of using PQputf. I thought re-use might give me a small speed up but I am still using PQparamCreate/PQparamClear for each use.
Failing all this, I could use COPY. I like the looks of that but am unsure how to write COPY compliant data, binary or not into 'some buffer or file' from C.
So to summarise:-
(Please) Can I/how do I do multi line inserts with prepared parameters/statements in libpq/libpqtypes?
Any ideas how to get stuff from an array of somethings into a JSONB field in PostgreSQL? I currently create a VARCHAR JSON string [0,1,2... 200] and cast in the insert
how do I re-use a PQparam? Even better if your reply includes multi line re-use ;)
do you have any good ideas on the best storage format for time series data with inferred timestamps?
I am really after example code please.