1

I need to test my data warehouse using TPC-DS. How can I generate queries for my data warehouse using TPC-DS? I tried to generate but it generate for a specific data warehouse.

Thanks.

vladimir
  • 13,428
  • 2
  • 44
  • 70
pas
  • 11
  • 3

1 Answers1

2

I'm not sure what you mean by "testing your data warehouse" using TPC-DS.

TPC-DS is a benchmark for database engines, focused on typical decision support access patterns; a data warehouse is an information systems concept that is usually built using a variety of database management systems (and other tools).

This being clarified, you can use TPC-DS to benchmark the database engine that you plan to use as a data store for your data warehouse. If that's your goal, you need to:

  1. either generate the data using the official TPC-DS tool, or download the dataset if you can find it online (alternatively, maybe your database vendor provides it already).
  2. load the data into the benchmark's model on the database you're testing.
  3. run the benchmark (the queries) over the data model you created. You can find an example of the queries here (for Impala, in this case), but you may have to translate them into the SQL idiom used by whatever DBMS you're using.

The TPC-DS specification doc not only provides this information but it can also help you understand some essential concepts on this topic: http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.11.0.pdf

jmng
  • 2,479
  • 1
  • 25
  • 38
  • Hello,Thank you for your answer. I understood all you refer but my doubt is: TPC-DS uses a specific DW and queries to benchmark, right? In my case I want to generate TPC-DS equivalent queries for my DW. If you know I appreciate. Thanks. – pas May 29 '19 at 18:35
  • TPC-DS does not "use a specific DW to benchmark". It uses a specific data model (i.e. linked database tables) and set of queries, not a specific database; its goal is precisely to benchmark different database engines with the same workload. You can build and load the data model in any database and, *if necessary*, translate the queries to match the SQL idiom that your database uses. Benchmarking with TPC-DS is simply executing the same queries in similar conditions on different database engines. – jmng May 30 '19 at 08:08
  • The tool that I linked in my response (dsdgen) can actually generate the TPC-DS queries for different DBMS, so you may not have to translate the SQL at all, depending on what database you use. – jmng May 30 '19 at 08:15
  • 1
    Hi. Thank you for your answers. What I would mean is that I have a DW schema and I needed to translate the TPC-DS specific queries to their model into my model. I wanna know if I could configure dsdgen tools to generate similar queries to my DW schema. – pas May 31 '19 at 10:08
  • Ah, now your goal is clearer to me :) TPC-DS is not designed for that scenario; it uses a specific data model that can be implemented in virtually any DBMS and benchmarked by executing the TPC-DS queries, which may need to be adapted to match the syntax of the DBMS being benchmarked. The data model and the queries are specific to TPC-DS because the goal is to benchmark, in a comparable way, how different DMBS perform for the same workload (i.e. the TPC-DS data model and queries). The goal is to benchmark a DBMS using a "traditional analytics workload", not a specific data model. – jmng May 31 '19 at 12:22
  • In your case, I don't see the point of translating the TPC-DS queries, which is likely impossible as they are designed for the TPC-DS data model. It seems more appropriate to load your data model and then run multiple executions of real world queries (similar or identical to the queries that will be run by end users) in a controlled benchmark environment (e.g. avoiding cached results). – jmng May 31 '19 at 12:22