Very slow itab loop with nested SELECTs

Question

I have declare an internal table like:

DATA: wa_collectoraction TYPE zcollectoraction,
  it_collectoraction LIKE STANDARD TABLE OF zcollectoraction.

Then I fill the table with:

SELECT bukrs kunnr yearmonth MAX( dat ) AS dat
FROM zcollectoraction
  INTO CORRESPONDING FIELDS OF TABLE it_collectoraction
WHERE bukrs IN so_bukrs AND
      kunnr IN so_kunnr AND
      dat   IN so_date
GROUP BY bukrs kunnr yearmonth.

and finally I have the following loop

LOOP AT it_collectoraction INTO wa_collectoraction.
PERFORM progress_bar USING 'Retrieving data...'(035)
                           sy-tabix
                           i_tab_lines.
"Get the MAX TIME for all lines in order to cover the case we have more than 1 line."
SELECT SINGLE * FROM zcollectoraction
    INTO CORRESPONDING FIELDS OF wa_collectoraction
  WHERE bukrs = wa_collectoraction-bukrs AND
        kunnr = wa_collectoraction-kunnr AND
        dat   = wa_collectoraction-dat   AND
        time  = ( SELECT MAX( time ) AS time
                    FROM zcollectoraction
                    WHERE bukrs = wa_collectoraction-bukrs AND
                          kunnr = wa_collectoraction-kunnr AND
                          dat   = wa_collectoraction-dat ).

MODIFY it_collectoraction FROM wa_collectoraction.
ENDLOOP.

This loop is doing 5 minutes for 3000 records. Can someone tell me what to do in order to be faster?

Thanks in advance

Component Version: SAP ECC 6.0. DATABASE SYSTEM: DB6 sap versions: 700, 710, 701, 702, 711, 720, 730 — ekekakos, Oct 03 '17 at 12:13
The loop itself is fine. But what kind of grudge do you hold against your database that you're trying to kill it so cruelly? ;-) — vwegert, Oct 03 '17 at 17:45
vwegert, what I am trying to do, I explain it below. Well, let me tell you. In the select I am taking the specific fields of the max date of the month. Then I select all the fields of the record with the max time because there is a big possibility to have more than 1 record as a max date. You can see an example below. Thanks — ekekakos, Oct 03 '17 at 19:08

András · Answer 1 · 2019-03-26T14:25:42.320

4

The best tool to analyze a standalone report's performance is ST12, so if you have the chance, trace it.
Without a trace, we have to guess, the biggest problem is either the SELECT with the subSELECT, or the MODIFY.

1) SELECTs in a LOOP are always slow

Here you actually make two for every line in it_collectoraction.

Try reducing the number of SELECTs

Depending on the number of lines with the same dat, it might be much faster to replace the SELECT in the LOOP with a SELECT with FOR ALL ENTRIES from zcollectoraction outside the LOOP, and find the MAX(time) on ABAP side.

Index coverage

Seems to be fine.

2) MODIFY is slow on STANDARD tables

You have to sieve through the whole table just to find the relevant line. If you define it_collectoraction as SORTED, this will be much faster. If you use a field symbol in the LOOP, it can be avoided altogether.

Coding

Replace your LOOP with this:

TYPES: BEGIN OF tty_coll_act,
        bukrs TYPE burks,
        kunnr TYPE kunnr,
        dat   TYPE dat,
        time  TYPE time,
      END OF tty_coll_act.

DATA: lt_coll_act TYPE TABLE OF tty_coll_act,
      ls_coll_act LIKE LINE OF lt_coll_act.

FIELD-SYMBOLS: <fs_collectoraction> LIKE LINE OF it_collectoraction.

SELECT bukrs kunnr dat time
    INTO TABLE lt_coll_act
    FROM zcollectoraction
    FOR ALL ENTRIES IN it_collectoraction
    WHERE bukrs = wa_collectoraction-bukrs AND
          kunnr = wa_collectoraction-kunnr AND
          dat   = wa_collectoraction-dat.

SORT lt_coll_act BY bukrs kunnr dat time DESCENDING.

LOOP AT it_collectoraction ASSIGNING <fs_collectoraction>.
" the READ TABLE finds the first matching row,
" it will be MAX(TIME) as TIME is sorted descending       
  READ TABLE lt_coll_act INTO ls_coll_act
      WITH KEY  bukrs = <fs_collectoraction>-bukrs
                kunnr = <fs_collectoraction>-kunnr
                dat   = <fs_collectoraction>-dat BINARY SEARCH.
  <fs_collectoraction> = ls_coll_act.
ENDLOOP.

edited Mar 26 '19 at 14:25

answered Oct 03 '17 at 12:19

András

1,326
4
16
26

These are the key fields: – ekekakos Oct 03 '17 at 13:24
These are the key fields: MANDT-BUKRS-KUNNR-YEARMONTH-DAT-TIME Hera is an example 200-1000-620-201708-20170819-20170819121212.356/////////////////////// 200-1000-620-201708-20170819-20170819121211.356/////////////////////// 200-1000-620-201708-20170810-20170810121200.356/////////////////////// 200-1000-620-201708-20170810-20170810121100.356/////////////////////// So I want to display the MAX date of the month and if exist more than 1 record in max date I want to get the MAX time. – ekekakos Oct 03 '17 at 13:31
Even this code is too slow. SELECT * FROM zcollectoraction AS a INTO CORRESPONDING FIELDS OF TABLE it_collectoraction WHERE bukrs IN so_bukrs AND kunnr IN so_kunnr AND dat IN so_date AND time IN ( SELECT MAX( time ) AS time FROM zcollectoraction AS b WHERE b~bukrs EQ a~bukrs AND b~kunnr EQ a~kunnr AND b~dat IN so_date AND b~yearmonth EQ a~yearmonth ). – ekekakos Oct 03 '17 at 13:40
Andras as I am not an expert how I will implement this? – ekekakos Oct 03 '17 at 14:33
Andras, unfortunately the 1st select it is taking 1 minute for 950 records. It is much faster than my code 7000 recs vs 9500 recs in 10 minutes. What do you think from your experience, is it a good time? I believe not. Thanks. – ekekakos Oct 04 '17 at 07:37
@ekekakos if you mean the select with max(dat), 1 minute is acceptable if zcollectoraction has many million lines. Group Bys are not fast on DB6 – András Oct 04 '17 at 09:33
Typing `lt_coll_act` as `HASHED` with unique key may help to increase this stuff as well. – Suncatcher Oct 04 '17 at 09:38
And use [parallel cursors' technique](https://wiki.scn.sap.com/wiki/display/Snippets/ABAP+Code+for+Parallel+Cursor+-+Loop+Processing) to loop through nested itabs. – Suncatcher Oct 04 '17 at 09:39
@Suncatcher HASHED would not work. The combination of only _bukrs kunnr dat_ is not unique, and we do not know _time_ – András Oct 04 '17 at 10:02
@András, why have you done such conclusion? We do not know `zcollectoraction` structure for sure, so we can state it's not unique only allegedly %) – Suncatcher Oct 04 '17 at 10:58
@Suncatcher 1) duplicate keys cause a dump, so you should not use it if you do not know for sure. 2) OP uses MAX on `time`, which only makes sense if there can be more than one. – András Oct 04 '17 at 11:33

score 3 · Answer 2 · answered Oct 04 '17 at 03:41

3

Instead of adding a selection query inside a loop, get all data into an internal table and handle it with read statements inside the loop.

Adding select queries inside a loop will always slow down the execution of an application as the application has to execute database queries for each loop. Loading all required information into an internal table and then handling data within the application is much more faster.

Let me know if you require any further details on this.

answered Oct 04 '17 at 03:41

anslem arnolda

261
1
7
19

I also thought this and I created a table, I filled it with the selection data (30500 recs) and replace the real table but I forgot that I can't use a select statement with an itab. And also I don't know how to get the MAX time with the Read statement. I will be much obligated if you can help me. Thanks. – ekekakos Oct 04 '17 at 04:02
Add the two select statements out of the loop. First write a selection to get the "MAX TIME" to another internal table (itab1) with the help of your initial internal table with a for all entries(it_collectoraction). Then inside your loop, write a read statement and get the time (MAX TIME) from itab1 into a work area (wa1) which holds the time. Then write another read statement to get values from it_collectoraction into another work area (wa2) by passing the time value from (wa1), which is the max time value. This is the same logic as you have written initially which is much more optimized. – anslem arnolda Oct 04 '17 at 04:29
1

Refer the code sample given by András, it is correct – anslem arnolda Oct 04 '17 at 04:46

ekekakos · Answer 3 · 2017-10-05T09:23:12.503

First of all I want to thank all of you for your help. I change the logic of select by using an internal table with all records from dbtab according to the selection data of the user. So the code became as follow:

DATA: wa_collectoraction TYPE zcollectoraction,
  it_collectoraction TYPE TABLE OF zcollectoraction,
  itsort_collectoraction TYPE HASHED TABLE OF zcollectoraction
      WITH UNIQUE KEY mandt bukrs kunnr yearmonth dat time.

FIELD-SYMBOLS: <fs_collectoraction> LIKE LINE OF it_collectoraction.

SELECT bukrs kunnr yearmonth MAX( dat ) AS dat
  FROM zcollectoraction
    INTO CORRESPONDING FIELDS OF TABLE it_collectoraction
  WHERE bukrs IN so_bukrs AND
        kunnr IN so_kunnr AND
        dat   IN so_date
  GROUP BY bukrs kunnr yearmonth.

" Keep the total records which will be inserted.
i_tab_lines = sy-dbcnt.

SELECT * INTO TABLE itsort_collectoraction
  FROM zcollectoraction
  WHERE bukrs IN so_bukrs AND
      kunnr IN so_kunnr AND
      dat   IN so_date.

SORT itsort_collectoraction
            BY mandt bukrs kunnr yearmonth dat time DESCENDING.

LOOP AT it_collectoraction ASSIGNING <fs_collectoraction>.
  PERFORM progress_bar USING 'Retrieving data...'(035)
                             sy-tabix
                             i_tab_lines.

  READ TABLE itsort_collectoraction INTO wa_collectoraction
      WITH KEY bukrs = <fs_collectoraction>-bukrs
                kunnr = <fs_collectoraction>-kunnr
                yearmonth = <fs_collectoraction>-yearmonth
                dat   = <fs_collectoraction>-dat.
  <fs_collectoraction> = wa_collectoraction.
ENDLOOP.

This code run 43000 records in 1 minute. The only problem is that after the first 10000 to 15000 records the process is slowing down. I don't know if there is any command to clear sth. I don't know what to clear.

Again thanks a lot all of you. Regards Elias

PS. In the 1st 10 sec it process 14.000 records. In 1 minute process 38.500 and In 1 minute & 50 seconds finished the 54.500 records. It gives me the impression that it fulfills sth which slow-down the process. ANY IDEA?

Since you are working with the same table "zcollectoraction" twice, you should be able to move the logic to a single SELECT statement. Study subqueries and GROUP BY if needed. What is the expected output? Just the MAX( time ) value for each row? — Mikael G, Oct 04 '17 at 12:50
Using FOR ALL ENTRIES instead of repeating the IN statements would likely reduce the second SELECT significantly. — András, Oct 04 '17 at 14:04
Mikael, if you see the reason of this question you will see that I did a lot of selects in dbtab "zcollectoraction" which caused big delays. So I decided to use an Hashed itab with all records and fields of the dbtab according the selection criteria. This table contains 12 fields and each customer has many lines per month. They want to have a report which will display the last day of the month per customer. If the last day of the month has more than 1 lines, they want the last day with max time. With max select I am getting the last day of the month and then in the loop I get max time rec. — ekekakos, Oct 04 '17 at 14:10
OK. Then, I would say you can solve your requirement with a single SQL query without any ABAP processing of internal tables. GROUP BY and HAVING will solve it. — Mikael G, Oct 05 '17 at 14:23
Make sure that you clear the work area before the read statement. — anslem arnolda, Oct 06 '17 at 06:08
@ekekakos HASHED tables are only fast if you use all the key fields in the READ TABLE. You do not, so it performs as badly as a standard table. — András, Sep 08 '18 at 22:30

futu · Answer 4 · 2017-10-09T11:18:54.190

-1

I am a little late to the party, but what I see in your first post is that you just want to read the latest (max(date) and max(time)) entries from one table per bukrs and kunnr?

Use one select to get the table's content. Select only by keyfields or indexes: I assume that date is not a key field, but bukrs and kunnr:

SELECT bukrs kunnr yearmonth dat time
FROM zcollectoraction
  INTO CORRESPONDING FIELDS OF TABLE it_collectoraction
WHERE bukrs IN so_bukrs AND
      kunnr IN so_kunnr 
.

Delete non key fields from itab:

DELETE it_collectoraction WHERE dat NOT IN so_date.

Sort itab by date and time descending so that the latest entry is the first combination of bukrs and kunnr

SORT it_collectoraction BY bukrs kunnr date DESCENDING time DESCENDING.

Delete all adjacent (=all with same compare key after the first) entries per bukrs and kunnr

DELETE ADJACENT DUPLICATES FROM it_collectoraction COMPARING bukrs kunnr.

edited Oct 09 '17 at 11:18

answered Oct 05 '17 at 08:10

futu

868
6
12

Why would you move the condition on `dat` out of the select? The db does it faster, _and_ you have to move fewer lines. I see this frequently, but it is very bad for performance. – András Sep 08 '18 at 22:26
It is actually not that bad for performance, because for sql only the indexed fields are relevant. Deleting afterwards on the application server "in memory" is also fast. People use it frequently, because the db optimizer sometimes utilizes the wrong index, or combines multiple indexes that are not fully qualified, which is resulting in really really bad performance. If it is crucial that one particular index is used this is a common way to ensure it. – futu Apr 03 '19 at 08:04
It _is_ common, that is why I try to fight it whenever possible. If you do not have the right index, create one. If you have the right index, but the optimizer does pick it, use a hint. This is a very bad practice, and should stop. – András Apr 03 '19 at 09:33
In general, indexes are "expensive" for the whole system, thereof it is mostly forbidden to create new indexes for every single query. Using hints would be an alternative, but from a maintenance point of view these can be also considered as really bad practice, because they are undermining the openSQL abstraction layer to the database. Additionally, they are not traceable in the system as far as I know. In the end, the developer has to weigh the disadvantages . And of course you are right, in op's case, adding the date to the WHERE condition would be a better approach :) – futu Apr 03 '19 at 10:46