0

I am working on Oracle EXADATA data server with 52 CPU, but when I query a table with 9 million records on a full table scan with parallel hint of 128, the output of the query takes more than 20 minutes to fetch the data with no utilization in the DB level. what can be improved in this query?

The parallel_max_servers parameters are as under:

Current_utilization: 432
Max Utilization: 1040
Initial Allocation: 1040
Limit Value: 32767

Query used is as under:

Alter session force parallel;

Select /* +PARALLEL(128) */ * FROM TABLE;

Tried as mentioned in the problem statement

James Z
  • 12,209
  • 10
  • 24
  • 44
Nivas S
  • 1
  • 1
  • 1
    What do you expect? It just takes some time to fetch 9 million records, sending them over network and (most likely) printing them on screen. Try `create table ... as select * from ...` to see how fast it runs on the database actually. – Wernfried Domscheit Aug 05 '23 at 15:08
  • Yep. All of your wait time is likely just streaming all of the output across the display - not on the database at all. Try directing to an output file instead and avoid output to terminal screen or client app display. – pmdba Aug 05 '23 at 15:29
  • 1
    Chances are the problem is specific to your query. So please provide the relevant table schemas, the full query and its actual execution plan. And also describe how you execute the query and fetch the result. – Codo Aug 05 '23 at 15:42
  • @WernfriedDomscheit, I am presently writing the query output to the file and I can't create a table as I have only view privileges, I need to fetch the data and write to the file with limited time possible – Nivas S Aug 05 '23 at 16:27
  • @pmdba, I am already writing the output to a file. – Nivas S Aug 05 '23 at 16:28
  • @NivasS then your issue may also be disk I/O on your client. – pmdba Aug 05 '23 at 21:23
  • 1
    If I am not mistaken, `/* +PARALLEL(128) */` is only a comment, not a hint. If you want a hint, the plus sign must follow the asterix immediately. Then, to administer 128 parallel processes is a lot of work. With 52 CPUs I'd probably just try `/*+PARALLEL(26)*/`. But that idea may be wrong. Just try. – Thorsten Kettner Aug 06 '23 at 00:02
  • @Thorsten Kettner is correct, you aren't using a valid hint. It can't have a space before the `+`. But you don't want parallelism for this query so that's not your issue. – Paul W Aug 06 '23 at 03:34
  • If you have EM installed, obtain an actual execution plan from it or via `dbms_sql_monitor.report_sql_monitor` with `type => 'ACTIVE'`. It will show you an effective DOP, waits with timings and highlight the most time consuming operation – astentx Aug 06 '23 at 10:28
  • OP stated that there is "no utilization in the DB level". Assuming this is true and has been verified by wait data (and not just watching the CPU% on the host), it cannot a query plan issue, a lack of parallelism, or anything else that query tuning can address. The question isn't a bad one; it just needs to be explained how the network and client contribute to throughput. – Paul W Aug 06 '23 at 12:14

1 Answers1

0

"with no utilization in the DB level" - this is the key piece of information in your question. This means that nearly all your time is being spent either on the network or on the client machine processing the incoming data (e.g. writing it to a file maybe or whatever else it does). This makes this not a database issue. There is no query tuning that can help, because the query itself is providing you data as fast as you can consume it. The issue is you can't consume it very fast, but that's not the database's fault. The fact that this is Exadata and has a bunch of CPUs doesn't help you in this situation - there's literally nothing for all that power to do. A serial direct path full table scan in non-Exadata with just one or two CPUs available would be just as fast. For this reason, don't try to ask for parallelism to solve this, it won't help. Parallelism and Exadata features kick in usefully when you have some query complexity - a filter predicate, a join, a sort, a limited column list, almost anything beyond select * from table.

Here are some things to check.

  1. Ensure your fetch size is in the thousands at least. A fetch size of 1 or 10 or 100 can cause too many network round trips, and if your latency is in the tens of ms or higher, this can make pulling millions of rows very inefficient. Increase the fetch size to something respectable. Check your network latency (use ping) to ensure it's reasonable. You can determine your fetch size by finding your SQL in the shared pool (v$sql) and dividing rows_processed by fetches.

  2. Your rows are extraordinarily wide. If each row has thousands of bytes, 9 million rows will take a lot more work to fetch over the network and write to disk than 9 million narrow rows of only a few dozen bytes each. Consider omitting overly large columns that you may not need.

  3. Your application is processing the rows slowly. Is it sorting it? Looping on the rows and doing something? Tune the application.

  4. Your application is writing rows to a file slowly. The filesystem to which you are writing may be slow (NFS to some distant network, for example). Have your OS system admin or storage team take a look at this. If the writing process is I/O bound (not CPU bound), consider inline compression so fewer bytes are written to disk.

Lastly, once you've looked at all these options and still need the pull to go faster, you may need to establish multiple concurrent connections that each pull a portion of the table. For example, if you had 10 processes selecting select * from table where MOD(pkcol,10)=:thread-1 and they bind their thread number, they will each pull 1/10th of the table. While that does mean 10x more work on the database side, it will overcome limitations due to the network, application processing of rows, and to some extent help with writing to files by using multiple CPUs to do so. If this helps but your DBA complains about the load (this can cause excessive CPU on the Exadata storage cells so should not be overused), then consider partitioning the table and having individual concurrent processes pull individual partitions. This will reduce the load on the storage cells to an acceptable level.

Paul W
  • 5,507
  • 2
  • 2
  • 13