0

Our server has 8 cores and is running a web application(DHIS2) which uses postgres as database.

There is big select query which takes a few hours to execute. (The query is executed from the terminal) When that query is run the cpu utilization of that query's process jumps to a constant 100%.

This hangs the application and the application's page does not even load in the browser. This must be because other postgres processes are waiting for that query's process to comeplete.

BUT, when we have muliple cores in the machine then why utilization of a single core to 100% should stop the rest of the processes from executing?

I am unable to understand the concept of multiple cpus and cores in this context. Does postgres not recognize them? What can be the dependency of a select query on another query?

Could somebody please explain this behaviour and suggest ways in which to manage execution of big queries though some kind of postgres configuration may be?

Postgres Version - 9.6
OS - Ubuntu 16
Database Size - 200GB on disk
DHIS2 Version - 2.30

Query (Calucates outliers) -

select  datasets,
    max(ou1.name) Country, ou1.organisationunitid as Country__Id ,
    max(ou2.name) state, ou2.organisationunitid as State__Id ,
    max(ou3.name) Division, ou3.organisationunitid as Division__Id ,
    max(ou4.name) District, max(ou4.code) as District__Code, ou4.organisationunitid as District__Id, 
    max(ou5.name) Block, max(ou5.code) as Block__Code, ou5.organisationunitid as Block__Id,
    max(ou6.name) Facility, max(ou6.code) as Facility__Code,ou6.organisationunitid as Facility__Id,
    max(ou.name) as outlierfacility,
    max(de.name) as dataelement,
    max(coc.name) as category,
    concat(max(p.startdate),':',max(p.enddate)) as period,
    max(pt.name) as frequency,  
    _dv.value,
    u upperbound,
    l lowerbound,
    mean,
    std
from 
(
    with stats as (
        select  dv.sourceid,
            dv.dataelementid,
            dv.categoryoptioncomboid,
            dv.attributeoptioncomboid,
            array_agg(distinct dv.periodid) as periods,
            array_agg(distinct ds.name) as datasets,            
            avg(dv.value::float) as mean, 
            stddev(dv.value::float) as std
        from datavalue dv
        inner join datasetmembers dsm on dsm.dataelementid = dv.dataelementid
        inner join dataelement de on de.dataelementid = dsm.dataelementid
        inner join dataset ds on ds.datasetid = dsm.datasetid   
        inner join period pe on pe.periodid = dv.periodid
        inner join periodtype pt on pt.periodtypeid = pe.periodtypeid
        inner join categoryoptioncombo coc on dv.categoryoptioncomboid = coc.categoryoptioncomboid
        inner join _orgunitstructure ous on ous.organisationunitid = dv.sourceid
        where pe.startdate between date('2019-04-29') - interval '6 months' and date('2019-04-29') and pt.name='Monthly' 
        and de.valueType in ('NUMBER','INTEGER')
        and ds.uid in ('123qwe123','123ewq123')
        group by dv.sourceid,dv.dataelementid,dv.categoryoptioncomboid,dv.attributeoptioncomboid
    )
    select dv.*,datasets,mean,std,mean+3*std u,mean-3*std l
    from datavalue dv
    inner join period pe on pe.periodid = dv.periodid
    inner join periodtype pt on pt.periodtypeid = pe.periodtypeid   
    inner join stats on 
        stats.dataelementid = dv.dataelementid and 
        stats.sourceid= dv.sourceid and 
        stats.categoryoptioncomboid = dv.categoryoptioncomboid and 
        stats.attributeoptioncomboid = dv.attributeoptioncomboid
    where dv.periodid = any(periods)
    and (dv.value::float > mean+3*std or dv.value::float < mean-3*std)
) _dv
inner join dataelement de on _dv.dataelementid = de.dataelementid 
inner join categoryoptioncombo coc on _dv.categoryoptioncomboid = coc.categoryoptioncomboid
inner join _orgunitstructure ous on _dv.sourceid = ous.organisationunitid
inner join organisationunit ou on ou.organisationunitid = ous.organisationunitid
left join organisationunit ou1 on ou1.organisationunitid = ous.idlevel1
left join organisationunit ou2 on ou2.organisationunitid = ous.idlevel2
left join organisationunit ou3 on ou3.organisationunitid = ous.idlevel3
left join organisationunit ou4 on ou4.organisationunitid = ous.idlevel4
left join organisationunit ou5 on ou5.organisationunitid = ous.idlevel5
left join organisationunit ou6 on ou6.organisationunitid = ous.idlevel6
inner join period p on _dv.periodid = p.periodid
inner join periodtype pt on p.periodtypeid = pt.periodtypeid
group by ou1.organisationunitid,
ou2.organisationunitid,
ou3.organisationunitid,
ou4.organisationunitid,
ou5.organisationunitid,
ou6.organisationunitid,
_dv.dataelementid,_dv.sourceid,_dv.categoryoptioncomboid,_dv.attributeoptioncomboid,_dv.periodid,_dv.value,u,l,mean,std,datasets
order by country,state,division,district,block,facility,dataelement,category
harsh atal
  • 411
  • 6
  • 16
  • Does your application use a single connection to the database? –  Apr 30 '19 at 08:56
  • max_connections=100; The application connects through a hibernate configuration file : hibernate.dialect = org.hibernate.dialect.PostgreSQLDialect hibernate.connection.driver_class = org.postgresql.Driver hibernate.connection.url = jdbc:postgresql:dbname – harsh atal Apr 30 '19 at 08:59
  • You need to debug if it "hangs" inside the Java code, or if it "hangs" when running a SQL statement. If that is a Java application you can use VisualVM to get a thread dump –  Apr 30 '19 at 09:01
  • Even when we run the query though command line (not within the application), the application hangs and the top command shows a postgres process consuming 100% – harsh atal Apr 30 '19 at 09:03
  • Yes, queries do affect each other. It may happen that you acquire locks for several hours (is it transactional?). In this case everything hangs. It is unlikely we can help you without knowing the query. Another reason may be that you kill IOPS as well (do you use ssd?). With high priority query everything else hangs. – freakish Apr 30 '19 at 09:37
  • @freakish: a select statement does not acquire any locks –  Apr 30 '19 at 09:41
  • @a_horse_with_no_name Of course it does, it acquires ACCESS SHARE lock: https://www.postgresql.org/docs/9.4/explicit-locking.html Although in this case it should be ok, unless the query runs in serializable transaction (which I assume it doesn't). – freakish Apr 30 '19 at 09:46
  • @freakish: but that doesn't block other read queries. It only prevents DDL on the table –  Apr 30 '19 at 09:52
  • @a_horse_with_no_name If ACCESS EXCLUSIVE lock appears in the query queue (not even acquired yet) then everything scheduled after it is blocked until is is released. If it cannot be acquired because of the long running query then everything stops. So for example in this scenario vacuum full will block entire table for the duration of the long running query + vacuum itself. – freakish Apr 30 '19 at 09:55
  • Only DDL statements, TRUNCATE or an explicit `LOCK TABLE` need an `ACCESS EXCLUSIVE`. Regular SELECT statements do not request that nor do DML statement request that. –  Apr 30 '19 at 09:57
  • Disk is SSD and default_transaction_isolation is commented in the configuration file. The query is run as it is , no transactional block – harsh atal Apr 30 '19 at 09:57
  • @harsh Does your database hang on particular tables or on everything? Can you access other, unrelated to the big query, tables? Does `SELECT 1;` run? – freakish Apr 30 '19 at 10:02
  • @a_horse_with_no_name You may want to read this: https://www.joinhandshake.com/engineering/2017/02/27/database-migrations-and-locks.html It shows how naive `create table` can lock your database. The same applies to `vacuum full` and other ACCESS EXCLUSIVE queries. And explains in details what I've tried to say. I'm not saying it is OP's case but it is worth investigating. – freakish Apr 30 '19 at 10:13
  • Maybe this will help : https://dba.stackexchange.com/questions/186693/improving-ram-and-cpu-utilization-by-postgresql-9-6. And you can check some other configuration points that are maybe not well set for your needs : https://www.postgresql.org/docs/9.6/runtime-config-resource.html – Jaisus Apr 30 '19 at 16:05

0 Answers0