0

I have been a Delphi programmer for 25 years, but managed to avoid SQL until now. I was a dBase expert back in the day. I am using Firebird 3.0 SuperServer as a service on a Windows server 2012 box. I run a UDP listener service written in Delphi 2007 to receive status info from a software product we publish.

The FB database is fairly simple. I use the user's IP address as the primary key and record reports as they come in. I am currently getting about 150,000 reports a day and they are logged in a text file.

Rather than insert every report into a table, I would like to increment an integer value in a single record with a "running total" of reports received from each IP address. It would save a LOT of data.

The table has fields for IP address (Primary Key), LastSeen (timestamp), and Hits (integer). There are a few other fields but they aren't important.

I use UPDATE OR INSERT INTO when the report is received. If the IP address does not exist, a new row is inserted. If it does exist, then the record is updated.

I would like it to increment the "Hits" field by +1 every time I receive a report. In other words, if "Hits" already = 1, then I want to inc(Hits) on UPDATE to 2. And so on. Basically, the "Hits" field would be a running total of the number of times an IP address sends a report.

Adding 3 million rows a month just so I can get a COUNT for a specific IP address does not seem efficient at all!

Is there a way to do this?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
  • Inputting raw realtime data into OLTP systems is not the best idea. You better keep inputting them into some simple format (why not DBF ? I remember pathing http://tDBF.sf.net few years ago, or there was vkDBF on Torry) in multi-file FIFO queue fashion, like having each file as a fixed sides (10K records for example) buffer. Then have a separate thread that flushes complete files into Firebird, non-realtime, tolerant to garbage collection and all other random SQL slow-downs. – Arioch 'The May 31 '21 at 09:10
  • Alernatively you may store your raw data into binary files (`file of packed record`) and use Firebird's `EXTERNAL TABLE` feature to dump it into regular tables by a single `insert from select`. This however would require you to issue DDL commands (`dropping` - unlinking - spent table and `creating` - linking - new ones). If you are speed-bound this might have even better performance, than issuing separate (even prepared) `insert` command for every row. Albeit raw binary formats are always more fragile to work with. See lang-ref at https://firebirdsql.org/en/firebird-rdbms/ – Arioch 'The May 31 '21 at 09:15
  • `I use the user's IP address as the primary key` and `I would like to increment an integer value in a single record with a "running total" of reports received` - makes me think WHY would you need SQL at all then? all the OLTP complexities while you seem to need any trendy noSQL key-value database with atomic increments. May it be that you're overengineering? Model-wise even `TStringList.SaveToFile` would suit your data (speed and size would fail you, but mere data model seems the same) – Arioch 'The May 31 '21 at 09:29
  • `Adding 3 million rows a month just so I can get a COUNT for a specific IP address does not seem efficient` actually it is very efficient from another angle of view - think about debugging, verifying suspected problems, recovering from hardware/network/software faults, etc. Your totals in a key-value storage are actually a cache. When everything works as it should - caches are great thing. But on any suspcicion cache can be "flushed" to emptyness and populated anew. So, think twice before deleting recent data just to save some space that maybe is not scarse anyway. – Arioch 'The May 31 '21 at 09:36

1 Answers1

0

The UPDATE OR INSERT statement is not suitable for this, as you need to specify the values to update or insert, so you will end up with the same value for both the insert and the update. You could address this by creating a before insert trigger that will always assign 1 to the field that holds the count (ignoring the value provided by the statement for the insert), but it is probably better to use MERGE, as it gives you more control about the resulting action.

For example:

merge into user_stats
  using (
    select '127.0.0.1' as ipaddress, timestamp '2021-05-30 17:38' as lastseen 
    from rdb$database
  ) as src
  on user_stats.ipaddress = src.ipaddress
when matched then 
  update set 
    user_stats.hits = user_stats.hits + 1, 
    user_stats.lastseen = max_value(user_stats.lastseen , src.lastseen)
when not matched then
  insert (ipaddress, hits, lastseen) values (src.ipaddress, 1, src.lastseen)

However, if you get a lot of updates for the same IP address, and those updates are processed concurrently, this can be rather error-prone due to update conflicts. You can address this by inserting individual hits, and then have a background process to summarize those records into a single record (e.g. daily).

Also keep in mind that having a single record removes the possibility to perform more analysis (e.g. distribution of hits, number of hits on day X or a time HH:mm, etc).

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
  • Individual hits might be inserted as a GTT or ET, then single `merge` would do all the table. Basically that might map well into multi-threading data-stream centered approach like Omni Threads Library for Delphi. Each worker would be an FSM 1) accumulating raw data into FIFO on-disk buffer like DBF or even raw binary, then 2) it makes connection to FB and copies his buffer into a per-connection GTT, then 3) it runs the `merge` and `commit`s, then 4) cleans up (disconnects, deletes old DBF storage and requests a new one, etc), then switching back to state #1. FB connections pool is optional. – Arioch 'The May 31 '21 at 09:21
  • That said, OTL is heavily using language features Delphi 2007 lacks, so no direct use of it, but similar ideas might do (or using low-level parts of OTL): basically he needs a datastreams-regulated pool of worker threads, which given time and effort can be re-implemented on top of any threading model. https://stackoverflow.com/questions/33014800/otl-cant-be-compiled-under-d2007 – Arioch 'The May 31 '21 at 09:25
  • Thanks for this. This helped and forced me to learn a lot. I don't like to copy examples unless I know how it works. I am making progress and am getting closer to understanding SQL. What I ended up doing though was using a trigger. It did exactly what I wanted. I use BEFORE INSERT and add a timestamp in the appropriate code. I dropped inserting the timestamp from my code completely. It makes for cleaner code because I can still use UPDATE OR INSERT INTO. I did take your advice and all records lines instead of a single record. I imported almost a million lines and it is still extremely fast. – Brian Milburn Jun 02 '21 at 00:34
  • @BrianMilburn this however would provide for different data if server and client have different time set. Ideally all the computers in the world should have the same real time, but reality differs. You would probably have to at least specify this in troubleshooting documentation or at least think this through. – Arioch 'The Jun 02 '21 at 13:12