2

I was trying to fetch the count(*) from the table, which has almost 7 million records and it taking more than an hour for returning the result.

Also the table has 153 columns out of which index has been created for column 123, so tried to run the following query in parallel, but it didn't help.

select /*+ parallel (5) */ count(123) from <table_name>

Please suggest if there is alternative way.

When I ran desc on the table in Toad, the index tab holds the value of no. of rows. Any idea how that value is getting updated there?

Mat
  • 202,337
  • 40
  • 393
  • 406
Brabin
  • 23
  • 1
  • 1
  • 6
  • 1
    i assume by `count(123)` you mean `count(column_123)`, right? – Bohemian Jun 24 '13 at 06:50
  • `select /*+ parallel (5) */ count(123) from` I think you accidently the query. – ta.speot.is Jun 24 '13 at 06:51
  • 1
    Have you seen this : http://stackoverflow.com/questions/1840538/faster-alternative-in-oracle-to-select-count-from-sometable ? – LMeyer Jun 24 '13 at 07:02
  • yes Bohemain...its column no. # 123 – Brabin Jun 24 '13 at 07:03
  • Thanks Shinosha, I believe this would do the job for me..SELECT NUM_ROWS FROM ALL_TABLES WHERE TABLE_NAME = table_name; ...any idea how frequently the count is getting updated in table all_tables. – Brabin Jun 24 '13 at 07:15
  • It depends on your environment @Brabin see http://stackoverflow.com/questions/14125731/oracle-row-count-of-table-by-count-vs-num-rows-from-dba-tables/ – Ben Jun 24 '13 at 07:34
  • what is the practical task of performing regular COUNTs upon a 7M rows table? – Kirill Leontev Jun 24 '13 at 07:37
  • 1
    More than an hour for 7 million records is definitely too long. Could you show us the execution plan for the select statement (execute a `explain plan for select ..... from `, then a `select * from table(dbms_xplan.display)`). My assumption is that your 123-columns allows null values (so the index is not usable for counting) and that the table itself occupies way more space than would be needed because a huge amount of data was deleted) – René Nyffenegger Jun 24 '13 at 08:34
  • It does sound like too long but to know whether it IS too long (i.e. that something is wrong) one needs to know if this is a dedicated machine or a cloud server with multiple tenants, if there are other processes running concurrently that are consuming disk resources, etc. – Tim Jun 26 '13 at 17:01
  • Do you really have a column called "123" or have you changed the name for the sake of SO? 153 columns does sound like a lot. – port5432 Jun 27 '13 at 12:26

3 Answers3

4

A few issues to mention:

  1. For "select count(*) from table" to use an index, the indexed column must be non-nullable, or the index must be a bitmap type.
  2. If there are known to be no nulls in the column but there is no not null constraint on it, then use "select count(*) from table where column_name is not null".
  3. It does of course have to be more efficient to scan the index than the table, but with so many table columns you're probably fine there.
  4. If you really want a parallel index scan, use the parallel_index hint, not parallel. But with only 7 million rows you might not find any need for parallelism.
  5. You need to check the execution plan to see if an index and/or parallel query is in use.
  6. If you can use an estimated number of rows then consider using the sample clause: for example "select 1000*count(*) from table sample(0.1)"
David Aldridge
  • 51,479
  • 8
  • 68
  • 96
2

Counting the number of rows of large table takes long time. It's natural. Some DBMS stores the number of records, however, this kinds of DBMS limits concurrency. It should lock the entire table before DML operation on the table. (The entire table lock is necessary to update the count properly.)

The value in ALL_TABLES.NUM_ROWS (or USER_TABLES.NUM_ROWS) is just a statistical information generated by analyze table ... or dbms_stats.gather_table_stats procedure. It's not accurate, not real-time information.

If you don't need the exact number of rows, you can use the statistical information. However you shouldn't depend on it. It's used by Oracle optimizer, but shouldn't in application program.

I'm not sure why you have to count the number of rows of the table. If you need it in the batch program which is run infrequently, you can partition the table to increase the parallelism. If you need the count in online program, you should find a way not to use the count.

ntalbs
  • 28,700
  • 8
  • 66
  • 83
  • 1
    +1 But in general parallelism and partitioning are independent of each other. That was less true in older versions, and there are still some places where parallelism and partitioning must work together, like with partition-wise joins. But for just counting all the rows, dividing up the extents among 5 different processes probably wouldn't benefit from partitioning. – Jon Heller Jun 26 '13 at 18:29
  • 1
    Here's some more information about [Granules of Parallelism](http://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel002.htm#BEIIGIFH) if you're curious. – Jon Heller Jun 26 '13 at 23:22
  • @jonearles Yes, you are right. Thank you for your kindness and useful information. – ntalbs Jun 26 '13 at 23:46
  • 1
    @jonearles important to note though that parallel query are not a prerequisite for benefiting from partition-wise joins. Even a serially executed hash join will improve. – David Aldridge Jun 27 '13 at 10:38
0
select /*+ parallel (5) */ 

Seems like odd number for degree of parallelism. Well, obvious 5 is an odd number, and that is strange. The DoPs ought to be a power multiple of two (see below for more).

Anyway, do you have a reason for using parallel query? Do you have at least five spare processors? If not, there is a good chance the overhead of managing the PQ slaves is at least contributing to the poor performance.


Why should DOP = n*2? There is an established heuristic based on Queuing Theory that running more than two batch jobs simultaneously leads to degraded performance. Find out more. (I think queuing theory actually recommends a figure of 1.8, but as database jobs are often bound by I/O or disk we can usually get away with 2.)

I originally said "power of 2" but that's mainly because multi-core servers tend to have a number of CPUs which is a power of 2, but multiple of 2 is more accurate, because some boxes have 12 CPUs or some other number.

Now, if we have a 64 core box, a DOP of 5 or 37 is fine, because we have enough CPUs to run that many threads simultaneously. But if we have got a small quadcore box, only 2, 4 or 8 makes sense, because those are the only values which will ensure an even distribution of work across all four processors. Running five threads on a quadcore box means one CPU will be doing a lot more work than the other three; there is a possibility that it will take longer to finish, leaving the other three slaves waiting. So DOP=5 can actually lead to a greater elapsed time than DOP=4.

DOP=n*2 is only a rule of thumb, and not set in stone. However, it is based on sound reasoning, and we should know why we're doing something different. Obviously, we should have conducted some experiments to confirm that we have chosen the right DOP (whatever value we settle on).

APC
  • 144,005
  • 19
  • 170
  • 281
  • Why should the DOP be a power of two? Are you thinking about the number of hash partitions instead? – Jon Heller Jun 26 '13 at 18:36
  • @jonearles - nope, I'm definitely think about parallel query. The rule of thumb for DOP is 2*CPUs (assuming there are no other processes on the system contending for resource). Although the reason is the same as for hash partitions - even distribution, in this case of workload. – APC Jun 26 '13 at 19:56
  • The power of 2 rule is useful for hash partitioning because of the way ORA_HASH works. But I can't find any source that implies ORA_HASH, or a similar function, plays a role in distributing blocks to parallel servers. I assume that however the list of blocks and extents are stored, for this query it's a simple matter of taking that list and splitting it into N groups. Dividing by 5 should work as well as 4 or 8. And I've seen lots of Oracle examples and some auto-DOPs that are not powers of 2. The number is almost always a multiple of 2, but I wonder if even that really matters. – Jon Heller Jun 27 '13 at 05:51
  • @jonearles - we're straying off topic here. But anyway, see my edit. – APC Jun 27 '13 at 10:20