1

In Teradata, which is more efficient - Sample or Top?

Since sample is random, does that cause Teradata to do less work and result in faster returns?

Routinely, I just want to see a few rows.

Thread related: differences between top and sample in teradata sql

Community
  • 1
  • 1
Neal Parker
  • 76
  • 1
  • 6

4 Answers4

5

I just ran three queries on a large Teradata dataset:

  1. SELECT * FROM table Sample 10;
  2. SELECT * Top 10 FROM table; (with no order by)
  3. SELECT * Top 10 FROM table ORDER BY column;

The DBQL metrics show that by far, the Top 10 with no order is the least resource-intensive. I had about a 99% drop in I/O & CPU just changing from SAMPLE to TOP.

So if your goal is purely efficiency, then TOP without the Order by is the clear winner going by TD's DBQL metrics.

DBQL Metrics

  • Can you add the DBQL metrics? Are you sure about that *99% drop in I/O & CPU just changing from SAMPLE to TOP*? How did it compare to TOP with ORDER? – dnoeth Aug 29 '17 at 15:38
  • Hey Dieter, Sure thing. I attached a screenshot of the metrics in my original Post above. These were run on TD 15.10 on a table with about 1.2 billion rows. I ran the Sample query twice to confirm since I was surprised by the results. – Filip Uscilowicz Aug 29 '17 at 17:30
  • Well, the `TotalIOCount` are logical IOs, can you also add physical IO: `ReqPhysIO`? – dnoeth Aug 29 '17 at 18:20
0

TOP usually is the faster one, taking the shortcut out of processing when enough data has been returned.

Sample has a lot more options and works to be statistically correct. To achieve that, it may have to read more data blocks.

ULick
  • 969
  • 6
  • 12
  • So, top without the order by would be considered more efficient? This is because Sample attempts to be statistically correct + the options, which may result in reading more data blocks. – Neal Parker Apr 30 '17 at 18:46
  • Yes. @Biswabid's link gives a good explanation. Or as in the link you mentioned 'The optimizer is free to select the cheapest plan it can find and stop processing as soon as it has found enough rows to return' – ULick Apr 30 '17 at 19:31
0

From an efficiency standpoint:

Using Top WITHOUT an Order By will be about the same as Sample.

Using Top WITH an Order By will cause more overhead.

Jason
  • 945
  • 1
  • 9
  • 17
  • So does that mean they are essentially equal as long as you leave out the order by on top? – Neal Parker Apr 30 '17 at 18:42
  • @NealParker I tested against the AdventureWorks2016 database on my personal pc and compared the execution plans. For both Top and Sample I tried 25,100,1000, 10,000, and 50,000. They were pretty much the same as long as the Order By wasn't in there. – Jason Apr 30 '17 at 18:45
  • Clustered index on Teradata? – David דודו Markovitz Apr 30 '17 at 20:41
  • @Jason: Where did you get a Teradata version of AdventureWorks2016? Or did you check the plans on SQL Server? – dnoeth May 01 '17 at 20:47
  • @dnoeth, I checked the execution plans on SQL Server. Honestly, I didn't realize Teradata is a data warehousing product when I wrote the answer. – Jason May 01 '17 at 21:02
  • @Jason: You're welcome, Microsoft's marketing is actually working great, they try to bias people: *SQL always means SQL Server* :-) But your answer is not wrong, of course adding `ORDER BY` might cause much more overhead on Teradata, too. – dnoeth May 01 '17 at 21:12
-2

I think its better to use TOP without order by than sample. Both of these evaluate the whole expression before displaying the result.

Reference : http://www.teradatatips.com/2010/10/top-vs-sample.html
Biswabid
  • 1,378
  • 11
  • 26