1

For Reporting/data warehouse purposes we plan to populate data in star schema.

All our current data resides in Redshift, Since star schema follows traditional relationships between tables. should I opt for an RDBMS or building star schema on Redshift itself ?

Can somebody explain the pros and cons of choosing RDBMS vs Redshift for Star Schema?

jmng
  • 2,479
  • 1
  • 25
  • 38
Vinod Jayachandran
  • 3,726
  • 8
  • 51
  • 88
  • 1
    This might be too opinion based a question, but columnar compression on Redshift means star schemas could be quite good, and the team blogged about the best ways of doing it here: https://aws.amazon.com/blogs/big-data/optimizing-for-star-schemas-and-interleaved-sorting-on-amazon-redshift/ – Rich Feb 16 '18 at 07:53
  • 1
    you need to provide more information. Redshift is designed to work fine for star schema and works well with the right use cases. how big are your largest tables, what type of queries will you be throwing at it? – Jon Scott Feb 16 '18 at 08:59
  • Redshift _is_ an RDBMS – Nick.Mc Feb 20 '18 at 09:53

2 Answers2

5

Going for a star schema design involves more considerations than just what database will be storing the data.

One immediate reason to elect a star schema design is its simplicity and expressiveness: as a rule of thumb, business transactions are modeled using fact tables and business entities (or business "context") are modeled as dimension tables. One practical consequence of this design is that your data will be easier to understand and use by business users, regardless of whether you are using SQL or a BI/visualization tool like Tableau or PowerBI.

Virtually any RDBMS will fully support a star schema and, in some cases like Oracle, it can use special (bitmap) indexes to optimize the typical query patterns over a star schema. This has been the case for (at least) the past 15 years, where data warehouses of all sizes have been implemented in virtually any major RBDMS.

You can do the same in Redshift: define your data model (the star schema), build the data structures (the tables), load data into it accordingly and query it as usual. The star schema will simply be more expressive (thus, easier to use) than having unrelated or complex table structures; in this case, it will actually be mostly a sort of metadata layer that abstracts the data's complexity. Here's a post detailing a benchmark that measures Redshift's performance and support for star schema queries, using the industry standard TPC-DS benchmark: https://blog.fivetran.com/warehouse-benchmark-dce9f4c529c1

You don't even need a "proper" database to enable a star schema design, e.g. you could actually dump files on HDFS and still design a star schema over these using Hive, simply for the sake of making it easier to use, because Hive will give you a SQL interface over HDFS and the star schema will make the SQL simpler and easier to understand.

jmng
  • 2,479
  • 1
  • 25
  • 38
1

Both will support a star schema. Since your data is already in Redshift, you eliminate extra work and the risk of data loss/corruption of moving your data into a SQL database by staying with that platform.

How your current data is organized in Redshift, the amount of data and the type of queries you will run may impact query performance. As you did not start with the intent of a star schema, your Redshift instance may not be optimized for a star schema. Amazon has a blog post on this here. What tips you can implement depends on your current environment, but at a minimum you should be able to use the data compression and workload management suggestions. This guide to Redshift architecture and capabilities may also help you implement the Amazon optimization suggestions.

Oron Zimmer
  • 336
  • 1
  • 3