Postgresql - how to make use of loop (or apply in r) to decrease the r-script size

Question

I have two tables in my database. The columns in both tables and their datatypes are shown below. let say both tables store the data of 3 machines. Each machine has two s_id's and by using them i will select the required data of a particular machine.

the s_id's for each machine are

   m1   59,07
   m2   60,92
   m3   95,109



                                    Table "public.table_a"
    Column    |            Type             | Modifiers | Storage | Stats target | Description
--------------+-----------------------------+-----------+---------+--------------+-------------
 ettime       | timestamp without time zone |           | plain   |              |
 sn           | numeric                     |           | main    |              |
 s_id1        | numeric                     |           | main    |              |
 e_id1        | numeric                     |           | main    |              |
Indexes:
    "table_a_sn_key" UNIQUE CONSTRAINT, btree (sn)
Has OIDs: no


                               Table "public.table_b"
    Column    |            Type             | Modifiers | Storage  | Stats target | Description
--------------+-----------------------------+-----------+----------+--------------+-------------
 sn           | numeric                     |           | main     |              |
 ettime       | timestamp without time zone |           | plain    |              |
 value        | text                        |           | extended |              |
 comment      | text                        |           | extended |              |
 l_id         | numeric                     |           | main     |              |
 n_id         | numeric                     |           | main     |              |
 ettime.y     | timestamp without time zone |           | plain    |              |
 s_id2        | numeric                     |           | main     |              |
 e_id2        | numeric                     |           | main     |              |
Indexes:
    "table_b_sn_key" UNIQUE CONSTRAINT, btree (sn)
Has OIDs: no

by using the below script i will get the desired result.

library(RPostgreSQL)

M1 <- dbGetQuery(con, "select 
    a.r_date::date date, 
    downgraded,
    total, 
    round(downgraded::numeric/total* 100, 2) percentage
from (
    select date_trunc('day', eventtime) r_date, count(*) downgraded
    from table_b
    where s_id2 in (59,07) 
    group by 1
    ) b
join (
    select date_trunc('day', eventtime) r_date, count(*) total
    from table_a
    where s_id1 in (59,07)
    group by 1
    ) a
using (r_date)
order by 1")

since i am not from the programming background i am using the above complete query statement for each machine

M2 <-  dbGetQuery(con, "select 
        a.r_date::date date, 
        downgraded,
        total, ........
       ........

M3 <- dbGetQuery(con, "select 
        a.r_date::date date, 
        downgraded,
        total,.....
        .........

Instead of using query for each machine, is it possible to use a loop in my case. so that in one query i will get all machines data.

can someone tell me how to do these with my example. In reality i need to run the 6 seperate scripts and in each script i need data of three different machines.

construct 6 different connections and put them in a list. So you can use `lapply(cons, dbGetQuery, ...)` For `...` read the documentation of `dbGetQuery()` which parameter is the query. The result of `lapply()` is a list (each element is the result of one query). The parameter is `statement=`. So you have to do for `...`: `statement="select ..."` — jogo, Dec 05 '15 at 15:53

Parfait · Accepted Answer · 2015-12-08T13:47:26.713

Simply amend your SQL query to account for other machines. No need for an R workaround. In fact, you keep all data restructuring and processing on the SQL engine. Specifically, either add s_id to the where clauses (and group by) in derived tables or use a union query. In both, machine and s_id columns are added to identify them in imported data frame:

WHERE CLAUSE CHANGE

strsql <- "select a.machine, 
                  a.r_date::date date, 
                  downgraded,
                  total, 
                  round(downgraded::numeric/total* 100, 2) percentage
           from (
                  select CASE WHEN s_id2 IN (59,07) THEN 'M1'
                              WHEN s_id2 IN (60,92) THEN 'M2'
                              WHEN s_id2 IN (95,109) THEN 'M3'
                         END As machine, date_trunc('day', eventtime) r_date, 
                         count(*) downgraded
                  from table_b
                  where s_id2 in (59,07,60,92,95,109) 
                  group by CASE WHEN s_id2 IN (59,07) THEN 'M1'
                                WHEN s_id2 IN (60,92) THEN 'M2'
                                WHEN s_id2 IN (95,109) THEN 'M3'
                           END, date_trunc('day', eventtime)
                  ) b
            inner join (
                  select CASE WHEN s_id1 IN (59,07) THEN 'M1'
                              WHEN s_id1 IN (60,92) THEN 'M2'
                              WHEN s_id1 IN (95,109) THEN 'M3'
                         END As machine, date_trunc('day', eventtime) r_date,
                         count(*) total
                  from table_a
                  where s_id1 in (59,07,60,92,95,109) 
                  group by CASE WHEN s_id1 IN (59,07) THEN 'M1'
                                WHEN s_id1 IN (60,92) THEN 'M2'
                                WHEN s_id1 IN (95,109) THEN 'M3'
                           END, date_trunc('day', eventtime)
                  ) a
            on a.machine = b.machine and a.r_date = b.r_date
            order by a.r_date;"

machinesdf <- dbGetQuery(con, strsql)

UNION QUERY

strsql <- "select a.machine, 
                   a.r_date::date date, 
                   downgraded,
                   total, 
                   round(downgraded::numeric/total* 100, 2) percentage
            from (
                   select 'M1' as machine, date_trunc('day', eventtime) r_date, 
                          count(*) downgraded
                   from table_b
                   where s_id2 in (59,07) 
                   group by date_trunc('day', eventtime)
                 ) b
            inner join (
                   select 'M1' as machine, date_trunc('day', eventtime) r_date, 
                          count(*) total
                   from table_a
                   where s_id1 in (59,07) 
                   group by date_trunc('day', eventtime)
                 ) a
            on a.machine = b.machine and a.r_date = b.r_date
            order by a.r_date

            union

            select a.machine, 
                    a.r_date::date date, 
                    downgraded,
                    total, 
                    round(downgraded::numeric/total* 100, 2) percentage
             from (
                    select 'M2' as machine, date_trunc('day', eventtime) r_date, 
                           count(*) downgraded
                    from table_b
                    where s_id2 in (60,92) 
                    group by date_trunc('day', eventtime)
                  ) b
             inner join (
                    select 'M2' as machine, date_trunc('day', eventtime) r_date, 
                           count(*) total
                    from table_a
                    where s_id1 in (60,92) 
                    group by date_trunc('day', eventtime)
                  ) a
             on a.machine = b.machine and a.r_date = b.r_date

             union

             select a.machine, 
                     a.r_date::date date, 
                     downgraded,
                     total, 
                     round(downgraded::numeric/total* 100, 2) percentage
              from (
                     select 'M3' as machine, date_trunc('day', eventtime) r_date, 
                            count(*) downgraded
                     from table_b
                     where s_id2 in (95,109) 
                     group by date_trunc('day', eventtime)
                   ) b
              inner join (
                     select 'M3' as machine, date_trunc('day', eventtime) r_date, 
                            count(*) total
                     from table_a
                     where s_id1 in (95,109) 
                     group by date_trunc('day', eventtime)
                   ) a
              on a.machine = b.machine and a.r_date = b.r_date;"

machinesdf <- dbGetQuery(con, strsql)

Thank you very much for your time. I will check both the options. — Chanti, Dec 05 '15 at 19:19
I am using **Where Clause Change** obtained result machine s_id date downgraded total percentage M1 59 2015-08-20 3 40 7.50 M1 07 2015-08-20 2 39 5.13 — Chanti, Dec 07 '15 at 13:10
@Parfit: for each date, I am getting two rows (i.e for each s_id). for every machine under each date there should be only one row (which is a sum of both s_id's - not seperately for each s_id) — Chanti, Dec 07 '15 at 13:24
@Parfit: also please see my question once i changed the s_id column name in both tables — Chanti, Dec 07 '15 at 13:31
@Parfit: can you please update **where clause change** code without using s_id under group by.. i tried but it did't work – — Chanti, Dec 07 '15 at 13:47
@Chanti - See updated queries. Derived tables are now grouped by machine according to defined s_ids. And since s_id changed names, the `inner join...on` is used to link a and b tables. — Parfait, Dec 07 '15 at 16:59
@Parfit: Thank you very much for the updated queries. I tried where clause change and i got the following error. **ERROR:** column reference "machine" is ambiguous — Chanti, Dec 08 '15 at 11:06
@Chanti Simple change in very first line of select statement to choose any of the tables, a or b, for machine. Same done to union query. See edit. — Parfait, Dec 08 '15 at 13:50
@Parfit: Thank you very much. Perfect solution to my requirement. — Chanti, Dec 08 '15 at 15:54

Postgresql - how to make use of loop (or apply in r) to decrease the r-script size

1 Answers1