Why is a full TADIR select with ORDER BY PRIMARY KEY much slower than INTO sorted table?

Question

Having the following statements:

SELECT * FROM tadir ORDER BY PRIMARY KEY INTO TABLE @DATA(lt_tadir).

DATA lt_tadir TYPE SORTED TABLE OF tadir WITH UNIQUE KEY pgmid object obj_name.`
SELECT * FROM tadir INTO TABLE @lt_tadir.

Why is the first one around 4 times slower (verified on multiple systems? Relevant statements from the documentation:

For performance reasons, a sort should only take place in the database if supported by an index. This guaranteed only when ORDER BY PRIMARY KEY is specified

If a sorted resulting set is assigned to a sorted internal table, the internal table is sorted again according to the sorting instructions.

First I thought maybe column storage is an issue, but I tried another column storage table where both statements are around similar (even though the second one seems to be a bit faster each time). Same for the row storage I tested.

Also I somehow would have expected that the sort in ABAP (second Docu snipped) would have an performance impact. But it outperforms the primary key index select.

Tables aren't fully buffered, and even if that was the case, ORDER BY PRIMARY KEY does use the buffer.

Any ideas?

What's "4 times slower" in absolute numbers and how much time of that is spent on the database? Also I'm curious: Why do you need to copy _the whole TADIR_ into the work area? — Jonas Wilms, Dec 01 '21 at 22:32
@JonasWilms It was like 129 seconds vs 28 seconds. Regarding whole tadir: certainly not. I was just playing around since the "the internal table is sorted again" made me curious how much that double sort would slow down the code. I thought it would be way better then to get the data with primary key sort already from the buffer / db and avoid sorting it again when it already is. — Regyn, Dec 02 '21 at 11:29
Yeah but this is a broken example. You're asking the database to sort a table which is unordered (there's no natural order for TADIR entries). It would make much more sense to run this test on something which _has a natural order_, cause then the database might choose to optimize that access — Jonas Wilms, Dec 02 '21 at 11:35
@JonasWilms Why wouldnt the order, at least of some internal index, match the primary key order of `pgmid object obj_name`. I mean it's a must for the unique constraint anyway, so I'd expect there's some sort of fast book-keeping for it — Regyn, Dec 02 '21 at 11:44
Nope, [hash tables](https://en.wikipedia.org/wiki/Hash_table) are unordered (at least if your hash function is good :)). Checking whether an entry exists is still fast (hash_table[ hash(key) ]) — Jonas Wilms, Dec 02 '21 at 11:49
@JonasWilms That is of course true :) Indeed obviously not all tables use a B-Tree index for the primary key. — Regyn, Dec 02 '21 at 13:18

score 3 · Accepted Answer · answered Dec 02 '21 at 01:12

I ran the test (several times) and got at least similar execution times for both statements. The Application server sort version about 25% faster than HANA sort. So my mileage was different.

That HANA sort isnt "faster" is only mildly surprising until you look at the table definition. Sorting the entire inverted hash index not what it was designed for. :)

Some "rules" are meant to be broken.
Sorting 5 Million keys with inverted hashes might a good example. And now you have 5 Million records in memory, reading rows quickly by key will favor the internally sorted table. anyway ;)

DATA lt_tadir TYPE SORTED TABLE OF tadir WITH UNIQUE KEY pgmid object obj_name

is access friendly than the simple Standard Table anyway. @Data(lt_tab)

There are known disadvantages with inverted hash indexes. With typical access not normally noticed. https://www.stechies.com/hana-inverted-individual-indexes-interview-questionsand-ans/

Interesting that your difference isn't that big. I tried it on multiple systems and had similar results, usually like ~130 seconds vs 28 seconds or 160/49. But yeah, obviously database + server have different specs. What I wonder though is: is ABAP smart enough to detect that an `INTO ` can be optimized when the buffer/db delivers it with primary key order? But you are right, sorted table is nicer. I was just wondering whether that double sort could be avoided (and then use old `BINARY SEARCH` addition to find entries). — Regyn, Dec 02 '21 at 11:36
You can of course declare the table as sorted and still use the Order by primary key. . Timing wise, Í was getting around 27 secs for the order by primary key and around 23 seconds for select in to sorted table. — phil soady, Dec 03 '21 at 08:54

Jonas Wilms · Answer 2 · 2021-12-02T12:56:45.623

I'd like to argue that this performance test is senseless and any conclusions drawn from it are incorrect as:

Sorting in ABAP and sorting on the database only yields the same results if the whole table is selected. If the number of results is limited (e.g. get the last 100 invoices) then sorting on the database yields the correct result while sorting in ABAP (after the limit) does not. As one rarely selects a whole table the test case is completely unrealistic.
Sorting by the primary key (pgmid, object, obj_name) does not make any sense. In which scenario would that be useful? You might want to search for a certain object, then sorting by obj_name might be useful, or you might want to see recent transported objects and sort by the correction (korrnum)

Just to demonstrate run the following example:

REPORT Z_SORT_PERF.

DATA:
  start TYPE timestampl,
  end   TYPE timestampl.

DATA(limit) = 10000.

* ------------- HANA UNCOMMON SORT ----------------------------------------

GET TIME STAMP FIELD start.
SELECT * FROM tadir ORDER BY PRIMARY KEY INTO TABLE @DATA(hana_uncommon) UP TO @limit ROWS.
GET TIME STAMP FIELD end.

WRITE |HANA uncommon sort: { cl_abap_tstmp=>subtract( tstmp1 = end tstmp2 = start ) }s|.
NEW-LINE.


* ------------- HANA COMMON SORT ----------------------------------------

GET TIME STAMP FIELD start.
SELECT * FROM tadir ORDER BY KORRNUM INTO TABLE @DATA(hana_common) UP TO @limit ROWS.
GET TIME STAMP FIELD end.

WRITE |HANA common sort: { cl_abap_tstmp=>subtract( tstmp1 = end tstmp2 = start ) }s|.
NEW-LINE.

On my test system this runs in 1.034s (the uncommon one) vs 0.08s (the common one). That's 10 times faster (though that comparison also makes little sense). Why is that? Because there is a secondary index defined for the KORRNUM field for exactly that purpose (sorting) while the primary key index is supposed for enforcing uniqueness constraints and retrieving single records.

In general a database is not meant to optimize for one single query, but is optimized for overall system performance (under memory constraints). This is usually achieved by sacrificing the performance of uncommon queries for the optimization of common ones.

For performance reasons, a sort should only take place in the database if supported by an index. This guaranteed only when ORDER BY PRIMARY KEY is specified

That's a slightly misleading formulation. There are indices that are not optimal for sorting, and although there is no guarantee that a database creates a secondary index and uses it while sorting, it is very likely that it does so in common cases.

"As one rarely selects a whole table the test case is completely unrealistic.". That's a bold statement. We've plenty of use-cases where this is happening and also required. As to why sort by primary key: because we need a lookup for R3TR TRAN SU01, not R3TR PROG SU01. The key is shared in multiple tables. Yes, we don't do it with tadir, but ztables are bad for SO examples :). We do have multiple tables sharing keys which need a delta update in case the key was deleted in the primary table. SELECT SINGLE for every single entry is a performance nightmare — Regyn, Dec 02 '21 at 13:14
Yeah but your ztables are probably slightly smaller than TADIR, and you can create indices for your _actual workload_ — Jonas Wilms, Dec 02 '21 at 13:17
Correct. Though I still would have expected that, unless the primary key is some sort of hash, the double sort can be skipped in these cases. Since our lookup is always the primary key anyway. Especially since SE11 index creation does not have a field hashed or sorted key. So even for a custom unique key on primary key fields there's no guarantee it is sorted when queried. — Regyn, Dec 02 '21 at 13:22
Sorting an already sorted collection is usually a fast operation, so whether the ABAP kernel skips this if it can guarantee that the database result is correctly sorted is an implementation detail. I guess the point in the ABAP docu is that the sort order is guaranteed, even if the database does strange things — Jonas Wilms, Dec 02 '21 at 13:25

Why is a full TADIR select with ORDER BY PRIMARY KEY much slower than INTO sorted table?

2 Answers2