0

I have an update statement that works fine but takes a very long time to complete.

I'm updating roughly 150 rows in one table with some tens of thousands of rows exposed through a view. It's been suggested that I use the Partition By clause to speed up the process.

I'm not too familiar with Partition By statement but I've been looking around and I think maybe I need to use a field that has a numeric value that can be compared against.

Is this correct? Or can I partition the larger table with something else?

if that is the case I'm struggling with what in the larger table can be used. The table is composed as follows.

ID has a type of NUMBER and creates the unique id for a particular item. Start_Date has a date type and indicates the start when the ID is valid. End date has a date type and indicates the end time when the ID cease to be valid. ID_Type is NVARCHAR2(30) and indicates what type of Identifier we are using. ID_Type2 is NVARCHAR2(30) and indicates what sub_type of Identifier we are using. Identifier is NVARCHAR2(30) and any one ID can be mapped to one or more Identifiers.

So for example - View_ID

ID | Start_Date | End_Date   | ID_Type1| ID_Type2 | Identifier
1  | 2012-01-01 | NULL       | Primary | Tertiary | xyz1
1  | 2012-01-01 | NULL       | Second  | Alpha    | abc2
2  | 2012-01-01 | 2012-01-31 | Primary | Tertiary | ghv2
2  | 2012-02-01 | NULL       | Second  | Alpha    | mno4

Would it be possible to Partition By the ID field of this view as long as there is a clause that the id is valid by date?

The update statement is quite basic although it selects against one of several possible identifiers and and ID_Type1's.

UPDATE Temp_Table t set ID = 
(SELECT DISTINCT ID FROM View_ID v
 WHERE inDate BETWEEN Start_Date and End_Date
 AND v.Identifier = (NVL(t.ID1, NVL(t.ID2, t.ID3)))
 AND v.ID_Type1 in ('Primary','Secondary'));

Thanks in advance for any advice on any aspect of my question.

Additional Info ***

After investigating and following Gordon's advice I changed the update to three updates. This reduced the overall update process 75% going from just over a minute to just over 20 seconds. Thats a big improvement but I'd like to reduce the process even more if possible.

Does anyone think that Partition By clause would help even further? If so what would be the correct method for putting this clause into an update statement. I'm honestly not sure if I understand how this clause operates.

If the UPDATE using a SELECT statement only allows for 1 value to be selected does this exclude something like the following from working?

UPDATE Temp_Table t SET t.ID = 
(SELECT DISTINCT ID,
        Row_Number () (OVER PARTITION BY ID_Type1) AS PT1
 FROM View_ID v
 WHERE inDate BETWEEN v.Start_Date and v.End_Date
 AND v.Identifier = t.ID1
 AND PT1.Row_Number = 1 )

*Solution************

I combined advice from both Responders below to dramatically improve performance. From Gordon I removed the NVL from my UPDATE and changed it to three separate updates. (I'd prefer to combine them into a case but my trials were still slow.)

From Eggi, I looked working with some kind of Materialized view that I can actually index myself and settled on a WITH Clause.

UPDATE Temp_Table t set ID = 
(WITH IDs AS (SELECT /*+ materialize */ DISTINCT ID, Identifier FROM View_ID v
 WHERE inDate BETWEEN Start_Date and End_Date
 AND v.Identifier = ID1)
 SELECT g.ID FROM IDs g
 WHERE g.Identifier = t.ID1;

Thanks again.

Chris Gerken
  • 16,221
  • 6
  • 44
  • 59
dee
  • 609
  • 7
  • 16
  • 24

2 Answers2

1

The best option for partitioning seems to be the start date, because it seems to always have a value and you also get it as input parameter in your query.

If you have not already done that I would add a bitmap index on ID_Type1.

Eggi
  • 1,684
  • 4
  • 20
  • 31
  • He is not updating anything that uses the bitmap index! He is updating the ID column so the index is not altered in ANY way and is just used for selection. Please read to the end of a post before you come to conclusions. – Eggi Aug 08 '12 at 17:14
  • Thanks Eggi. I do need to research bitmap indexes but I don't think I can use an index on a view in Oracle (I could be wrong). I also don't have the poweer to index the underlying table(s) from where the view exposes the data or even alter the view I'm using. – dee Aug 08 '12 at 17:53
  • 1
    Do you have the possibility to create a materialized view? Maybe you should consider this. It gives you the possibility to create your own indices and it probably increases performance even without adding an index. More information on MVs: http://docs.oracle.com/cd/B10500_01/server.920/a96567/repmview.htm – Eggi Aug 08 '12 at 20:06
  • You half read my mind, what I wound up doing in the end was using the WITH clause and the /*+ Materialize */ hint and it is now updating very quickly in testing. SO thanks to both you and Gordon for providing helpful responses. Now I just need to deal with creating an exception for single row subquery error and I'm all set. Thanks again. – dee Aug 08 '12 at 20:44
1

It is very hard to imagine how windows/analytic functions would help with this update. I do highly recommend that you learn them, but not for this purpose.

Perhaps the suggestion was for partitioning the table space, used for the table. Note that this is very different from the "partition by" statement, which usually refers to window/analytic functions. Tablespace partitioning might help performance. However, here is something else you can try.

I think your problem is the join between the temp table and the view. Presumably, you are creating the temporary table. You should add in a new column, say UsedID, with the definition:

coalesce(t.ID1, t.ID2, t.ID3) as UsedId

The "WHERE" clause in the update would then be:

WHERE inDate BETWEEN Start_Date and End_Date AND
      v.Identifier = t.UsedId AND
      v.ID_Type1 in ('Primary', 'Secondary')

I suspect that the performance problem is the use of NVL in the join, which interferes with optimization strategies.

In response to your comment . . . your original query would have the same problem as this version. Perhaps the logic you want is:

WHERE inDate BETWEEN Start_Date and End_Date AND
      v.Identifier in (t.ID1, t.ID2, t.ID3) AND
      v.ID_Type1 in ('Primary', 'Secondary')
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you both very much for your fast response. @Gordon. Originally I had something similar to this but there is a possible issue. Say UsedID gets the value from t.ID1 but v.identifier only matches to t.ID2. Won't I then miss this ID? – dee Aug 08 '12 at 15:03
  • Thanks again Gordon, The IN statement here will likely wind up matching on more than identifer but I see what you mean with the problem being the NVL clause. I wonder if I can set up some kind of case logic here. I'll take a look and then update later. – dee Aug 08 '12 at 15:57
  • I was thinking you might want precedence. Perhaps doing this as three separate update statements solves the problem. – Gordon Linoff Aug 08 '12 at 16:02
  • I tried a couple of variations on updating the view separately and I Included the best results in my original question. If you think Partition By would be a dead end please let me know. Thanks. – dee Aug 08 '12 at 17:56
  • Your update statement does not suggest a partition by. I imagine that there is more to this problem than included in the question (what does the view look like? what is temp_table and why update all rows with the same value?) Perhaps partition by could be used to solve higher level problems with the processing. – Gordon Linoff Aug 08 '12 at 18:00