3

I have an existing table in my database named price (has 264 rows) and I converted it into a hypertable price_hypertable doing:

CREATE TABLE price_hypertable (LIKE price INCLUDING DEFAULTS INCLUDING CONSTRAINTS EXCLUDING INDEXES);
SELECT create_hypertable('price_hypertable', 'start');

and the output it gave me is as follows:

       create_hypertable       
-------------------------------
 (4,public,price_hypertable,t)
(1 row)

The next thing I did was to populate the price_hypertable as follows:

insert into price_hypertable select * from price;

And I got the following output:

INSERT 0 264

Now, I wanted to check the chunks created, for which I did:

select public.show_chunks('price_hypertable');

and the output I got:

              show_chunks               
----------------------------------------
 _timescaledb_internal._hyper_4_3_chunk
 _timescaledb_internal._hyper_4_4_chunk
(2 rows)

When I do:

select * from _timescaledb_internal._hyper_4_3_chunk;
select * from _timescaledb_internal._hyper_4_4_chunk ;

I see that the 264 entries are split as follows:

_timescaledb_internal._hyper_4_3_chunk has 98 rows _timescaledb_internal._hyper_4_4_chunk has 166 rows

I have a few questions about these steps and their outputs:

  1. Can someone please explain to me what do the values 4 and t represent, when I did SELECT create_hypertable('price_hypertable', 'start');?
  2. After populating the price_hypertable, the data was automatically split into chunks, but of different size. Why does this happen? Why wasn't the data just split in half (132 rows in each chunk instead of 98 and 166)?

Any help is appreciated. Thanks

some_programmer
  • 3,268
  • 4
  • 24
  • 59

2 Answers2

2

For the first question, it is easier to see what they represent by executing create_hypertable as

SELECT * FROM create_hypertable('price_hypertable', 'start');

This gives something like:

 hypertable_id | schema_name | table_name         | created 
---------------+-------------+--------------------+---------
             4 | public      | price_hypertable   | t

For the second question, TmTron already answered. This is because the rows are sorted into buckets based on the time, and they are not necessarily evenly spaced. There is no automation that pick the correct interval for each bucket.

You can find information about the return values in the API documentation on create_hypertable which also discuss the parameter chunk_time_interval that can be used to set the chunk size.

Mats Kindahl
  • 1,863
  • 14
  • 25
  • This is a useful suggestion. Thanks a lot for it. – some_programmer Apr 30 '20 at 13:51
  • Could you please explain what you mean when you say 'rows are sorted into buckets based on the *time*'? – some_programmer Apr 30 '20 at 13:58
  • @Junkrat Maybe "group" is a better word, but I used "sorted" the less strict sense. Hypertables "shard" the rows on the time, which means that a bucket can be assigned a time range, say 12:00:00 - 13:00:00. This bucket will contain all the rows with a time in this range. – Mats Kindahl Apr 30 '20 at 18:27
1

related to your 2nd question:

When you don't specify the chunk_time_interval explicitly, the default is 7 days: see create-hypertable, Best Practices.
So the number of rows in each chunks depends on the distribution of your data (according to your start date-time column).

TmTron
  • 17,012
  • 10
  • 94
  • 142