Slow SELECT FOR ALL ENTRIES

Question

The below SELECT runs with the internal table GIT_KUNNR_TAB containing 2.291.000 lines with unique clients (kunnr) and takes 16 minutes to complete.

  select kunnr umsks umskz gjahr belnr buzei bschl shkzg dmbtr bldat
           zfbdt zbd1t zbd2t zbd3t rebzg rebzj rebzz rebzt
        into corresponding fields of table git_oi_tab
      from bsid
        for all entries in git_kunnr_tab
      where bukrs  =  p_bukrs
        and kunnr  =  git_kunnr_tab-kunnr 
        and umsks  = ' '
        and augdt  =   clear_augdt
        and budat  le  p_key 
        and blart  in  s_blart
        and xref3  in  s_xref3.

BSID contains in total 20.000.000 records and for the 2.291.000 unique clients it gets 445.000 records from BSID.

Most of the time there are even more lines in GIT_KUNNR_TAB.

Is there any quicker selection?

For any performance question, you should better attach the `execution plan` (ST05) which tells what the database does to optimize the reading. It's specific to each system. Note that [`FOR ALL ENTRIES`](https://help.sap.com/doc/abapdocu_753_index_htm/7.53/en-US/index.htm?file=abenwhere_logexp_itab.htm) is an ABAP construct which splits the SELECT into many SELECT, by splitting `GIT_KUNNR_TAB` into groups of 5 lines (value of 5 modifiable in profile parameters `rsdb/max_[in_]blocking_factor` or hint `max_[in_]blocking_factor`) — Sandra Rossi, Apr 22 '19 at 14:44
@SandraRossi 5 is only true for Oracle as a default, it is 50 on DB2, and 100 on HANA. — András, Apr 22 '19 at 17:17
@András the value may even vary based on database, SAP version, and WHERE ([currently on HANA](https://launchpad.support.sap.com/#/notes/1987132), it's 50 or 100 depending on the WHERE) — Sandra Rossi, Apr 29 '19 at 08:29
@SandraRossi, I know, but I did not want to list all cases and exceptions in a comment — András, Apr 29 '19 at 08:47

score 3 · Accepted Answer · edited Jul 15 '21 at 08:41

3

Drop the FOR ALL ENTRIES part

Most likely the rest of the WHERE condition is selective enough. You get back more records than necessary, but much quicker.

As git_kunnr_tab is unique, you can turn it into a HASHED table, and filter git_oi_tab with that on the application server.

SELECT kunnr umsks umskz gjahr belnr buzei bschl shkzg dmbtr bldat
       zfbdt zbd1t zbd2t zbd3t rebzg rebzj rebzz rebzt
    INTO corresponding fields of table git_oi_tab
    FROM bsid
    WHERE bukrs = p_bukrs 
      AND umsks = ' ' 
      AND augdt = clear_augdt 
      AND budat le p_key 
      AND blart in s_blart 
      AND xref3 in s_xref3.

DATA: lt_kunnr_tab TYPE HASHED TABLE of <type of git_kunnr_tab>
          WITH UNIQE KEY kunnr.
lt_kunnr_tab = git_kunnr_tab.
LOOP AT git_oi_tab ASSIGNING FIELD-SYMBOL(<fs_oi>).
  READ TABLE lt_kunnr_tab TRANSPORTING NO FIELDS
      WITH KEY kunnr = <fs_oi>-kunnr.
    IF sy-subrc <> 0
      DELETE git_oi_tab.
    ENDIF.
  ENDIF.
ENDLOOP.
FREE lt_kunnr_tab.

This is not a general solution

If the FAE driver table contains more than 20% of the rows of the target table, dropping it completely is mostly beneficial for speed.
If it has less rows, FAE is the better solution.

Be careful however, dropping FAE can significantly increase the memory consumption of the resulting internal table!

FOR ALL ENTRIES vs Range table

You can see many places in the internet that Range tables are faster than FAE. This is true in some very specific cases:

Only one field is used from the FAE driver table¹
There are more rows in the driver table than FAE sends down in one batch
By default the batch size is 5 in Oracle, 50 in DB2, 100 in HANA
There are not so many rows in the Range thable that it causes a dump
The maximum length is 1 048 576 bytes (note 1002491)

Range tables can be faster than FAE because it sends down all the filtering conditions in one query. This is of course dangerous, as the size of a query is limited. If it exceeds the set limit, you get a dump.

However, using the hint MAX_IN_BLOCKING_FACTOR and MAX_BLOCKING_FACTOR you can give FAE all the benefits of range tables, without its downsides by increasing the batch size.

So only use Range tables with actual ranges, like between A and C, or not between G and J.

this one is not about speed, but functional correctness. Range tables treat fields independently, while FAE works with rows

edited Jul 15 '21 at 08:41

Suncatcher

10,355
10
52
90

answered Apr 22 '19 at 13:38

András

1,326
4
16
26

1

Agreed for the current case, but it might be counter-productive if the number of lines in `git_kunnr_tab` is low. – Sandra Rossi Apr 22 '19 at 14:46
Andras thanks for your answer. I will implement it tomorrow and I will let you know the results. I also trying to find a way to implement this selection in an abap parallel processing. Yours, I think is fine. – ekekakos Apr 22 '19 at 18:12
Sorry Andras but I had to resolve some production issues and I did not do sth. Maybe tomorrow. I will let you know all. Thanks. – ekekakos Apr 23 '19 at 19:30
What is the range seltab row limit when it invokes a dump? – Suncatcher Apr 24 '19 at 18:45
1

@Suncatcher The maximum length of a statement is 1MB (1 048 576 bytes) – András Apr 25 '19 at 08:39
So this is depends on length of range arguments, not on number of rows? – Suncatcher Apr 25 '19 at 13:03
1

@Suncatcher not directly. You multiply number of rows with width, and it must fit in 1MB – András Apr 25 '19 at 19:15
Hello to all of you. Sorry for the delay but we had Easter Holidays. Today I implement your solution Andras and is faster or to be more precise it is doing only 10 minutes for some selection criteria. I also add a selection to BSAD by appending the itab after the selection of BSID. But I noticed a strange thing and I want your opinion. In the morning the selection with FAE did 20 min, at 13:00 it did 2 min and at 16:00 again 20 min. All these happen in the development system with the same criteria. Can someone tell me why is this happened? Thanks – ekekakos May 06 '19 at 13:51
Most likely at 13:00 all the necessary pages were still in the cache. Then if was dropped some time before 16:00, so that one had to read from disk. You can test by running it three times back to back. The first time can be slow (the pages are in the cache or not), but the last two should be equally fast. If this is not the case, someone must have been running an expensive query on the same table in the morning and at 16:00 – András May 06 '19 at 18:27
Thanks Andras, I will check it and tell you. – ekekakos May 08 '19 at 07:14
@ekekakos, how did it go? – András Oct 09 '20 at 15:04

score -2 · Answer 2 · answered Apr 22 '19 at 19:19

-2

Normally for only a one field using a range is much faster. So if you selecting the data by some key from the internal table comparing only one field from the table, turn it into the range instead of FAE:

TYPES:
  tr_kunnr TYPE RANGE OF kunnr.

* or just do loop/append if you on the old system (>7.4)
DATA(lr_kunnr) = VALUE tr_kunnr(
  FOR <fs_oi> IN git_oi_tab
  (
    sign    = 'I'
    option  = 'EQ'
    low     = fs_oi-kunnr
  )
). 

 select kunnr umsks umskz gjahr belnr buzei bschl shkzg dmbtr bldat
           zfbdt zbd1t zbd2t zbd3t rebzg rebzj rebzz rebzt
        into corresponding fields of table @git_oi_tab
      from bsid
      where bukrs  =  @p_bukrs
        and kunnr  in @lr_kunnr...

I can't find the article, but an investigation was made, and the range is much faster in case of one field comparison than an FAE.

answered Apr 22 '19 at 19:19

Ilya Kaznacheev

168
1
4

1

That's a myth because `FOR ALL ENTRIES IN itab WHERE column = itab-column` is converted at run time into several `SELECT`, each one doing `WHERE column IN ('A','B'...)` (cf note [48230 Parameterization of SELECT ... FOR ALL ENTRIES statement](https://launchpad.support.sap.com/#/notes/48230)). So there's no difference with doing it explicitly. – Sandra Rossi Apr 23 '19 at 07:29
The difference is in the size of packets which are transmitted from DB to AS in each size. Maybe that was changed in later versions, but that's the main reason in the rest. – Ilya Kaznacheev Apr 24 '19 at 08:13
Normally (without hints) it converts the SELECT into a bunch of subrequests. I could give a link to the great article with a deep investigation of the case, but it is in Russian. Hint usage is a bad practice in the common scenario because it overrides basis settings and the system can't be easily scaled. – Ilya Kaznacheev Apr 24 '19 at 08:20
The note above served as support of how the ABAP SELECT is converted implicitly into several SELECT for the database, I didn't want to talk about hints. stackoverflow requires that the arguments are exposed here, and the link serves as a reference. It's not a problem to provide a link in another language because translation tools are sufficient nowadays. – Sandra Rossi Apr 24 '19 at 11:47
@IlyaKaznacheev the whole purpose of hints is to override basis settings. Those are generic, but hints can be tailored to specific queries. – András Apr 25 '19 at 19:20
@IlyaKaznacheev, the blog post is nice, but even its author says in the comments RANGEs are bad. – András Apr 27 '19 at 11:16
2

@IlyaKaznacheev thank you. But this [blog post](habr.com/ru/post/262649) is not about `IN`, it's just about `FOR ALL ENTRIES IN`. Nowhere there is an argument that `IN` is faster. – Sandra Rossi Apr 29 '19 at 08:39

Slow SELECT FOR ALL ENTRIES

2 Answers2

Drop the FOR ALL ENTRIES part

This is not a general solution

FOR ALL ENTRIES vs Range table

Linked