0

Scenario:

  1. I'm trying to build a real-time monitoring webpage for ship operations
  2. I have 1,000 - 10,000 ships operating
  3. All ships are sending real-time data to DB, 24 hours - for 30 days
  4. Each new data inserted has a dimension of 1 row X 100 col
  5. When loading the webpage, all historic data of a chosen ship will be fetched and visualized
  6. Last row of the ship's real-time data table will be queried, and fetched on the webpage to update real-time screen
  7. Each ship has its own non-real-time data, such as ship dimensions, cargo, attendants, etc...

So far I've been thinking about creating a new schema for each ship. Something like this:

public_schema
ship1_schema
ship2_schema
ship3_schema
  |--- realtime_table
  |--- cargo_table
  |--- dimensions_table
  |--- attendants_table
ship4_schema
ship5_schme

Is this a good way to store individual ship's real-time data, and fetch them on a webserver? What other ways would you recommend?

For time-series wise, I'm already using a PostgreSQL extension called Timescale DB. My question rather about storing time-series data, in case I have many ships. Is it a good idea to differentiate each ship's RT data my constructing a new schema?

++ I'm pretty new to PostgreSQL, and some of the advice I got from other people was too advanced for me... I would greatly appreciated if you suggest some method, briefly explain what it is

stalltron
  • 38
  • 4
Eric Kim
  • 2,493
  • 6
  • 33
  • 69
  • 1
    I would recommended *not* to use a schema per ship, but to store them in one table. – Laurenz Albe Sep 25 '18 at 05:50
  • @LaurenzAlbe I was thinking about that too, but when I'm fetching historic data of a ship, wouldn't it make take much longer time when the historic data is in millions of rows, since I'm adding one additional condition for the query? – Eric Kim Sep 25 '18 at 05:53
  • If you access the data by index, the size of the table has no influence on the speed. – Laurenz Albe Sep 25 '18 at 06:29

2 Answers2

2

This seems personally like the wrong way to work. In this case i would have all the ship data in one table and from there on i would include a shipid to

  • realtime_table
  • cargo_table
  • dimensions_table
  • attendants_table

From there on if you believe that your data will reach a lot of volume you have the following choices.

  1. Create indexes on the fields that are important to query, Postgres query planner is very useful for that.

  2. Latest Postgres has implemented table partitioning based on criteria you provide without having to use table inheritance.**

Since you will be needing live data on the web page you can use Listen command for Postgres for when data are received from the ship (Unless you have another way of sending this data to the web server like web sockets)

Bill Togkas
  • 140
  • 2
  • 11
1

Adding a bit of color here - if you are already using the TimescaleDB extension, you won't need to use table partitioning, since TimescaleDB will handle that for you automatically.

The approach of storing all ship data in a single table with a metadata table outside of the time series table is a common practice. As long as you build the correct indexes, as others have suggested, you should be fine. An important thing to note is that if you (for example) build an index on time, you want to make sure to include time in your queries to benefit from constraint exclusion.