2

I have a PostgreSQL database with a few tables that store several million of data from different sensors. The data is stored in one column of each row like:

| ID | Data | Comment |
|  1 |   19 | Sunny   |
|  2 |  315 | Sunny   |
|  3 |  127 | Sunny   |
|  4 |   26 | Sunny   |
|  5 |   82 | Rainy   |

I want to apply a FIR filter to the data and store it in another table so I can work with it, but because of the amount of data I'm not sure of the best way to do it. So far I've got the coefficients in Octave and work with some extractions of it. Basically I export the column Data to a CSV and then run a csvimport in Octave to have it in a array and filter it. The problem is that this method doesn't allow me to work with more of several thousand data at the time.

Things I've been looking so far:

  1. PostgreSQL: I've been looking for someway to do it directly in the database, but I haven't been able to find any way to do it so far.
  2. Java: Another possible way to do it is making a small program that extracts chunks of data each time, recalculates the data using the coefficients and stores it back in other table of the database.
  3. C/C++: I've seen some questions and resolutions about how to implement the filter in StackOverflow here, here or here, but they seem to be for working with data on real time and not talking advantage of having all the data already.

I think the best way would be to do it directly with PostgreSQL and with Java or C/C++ would be too slow, but I don't have too much experience working with so much data so probably I'm wrong. Just need to know why and where to point myself to.

What's the best way to apply a FIR filter to data stored on a database, and why?

Community
  • 1
  • 1
  • This kind of work would typically be done in a high level langauge like Java or C++. You can potentially do it in the DB too and it might be some bit faster but probably not dramatically. As long as you pull records in chunks it should be decent in terms of performance. – Kurt Stutsman Feb 17 '16 at 23:13
  • So the best way to do it would be extracting the data in chunks from the database (like 100K each time) and processing in Java/C/C++? And anyway to process in Java making use of having the 100K data instead of going one by one in a while routine? – Antonio Nó Rodríguez Feb 17 '16 at 23:19
  • You should start with the somewhat naive approach and work on optimizing it once you get it working IFF the performance actually warrants it. Compilers are getting smarter and Java also has JIT and may be able to unroll your loops effectively without you making the code harder to understand or write. – Kurt Stutsman Feb 17 '16 at 23:20
  • 1
    An FFT with overlap-add (also [FFT convolution](http://www.dspguide.com/ch18.htm)) could speed up the computation if you have an FIR with many coefficients (say >~64) when processing data in blocks (if you need the performance as kurt said). – SleuthEye Feb 18 '16 at 06:36

0 Answers0