0

I am building an analytics page backed up with Python(Flask) and Redshift as data source. Data is presented in a start schema, so all I want to do is basic aggregation and filtering in specified time frame (sounds not like a rocket science). Though I can't find any elegant way to do this.

Let's say I have a SQL query that nicely provides daily stats for current month.

with current_month as (
    select date
    from date_d 
    where month_name = 'November' AND year_actual = '2018'
    order by date
),
filtered as (
    select date ,fact.id, fact.created_id,
    from fact
    join date_d ON date_d.id = fact.created_id
    where date_d.month_name = 'November' AND date_d.year_actual = '2018' AND fact.foo = 'bar'
),
total as (
    SELECT COUNT(id),DATE(date) 
    from filtered GROUP BY 2),

SELECT current_month.date, COALESCE(total.count,0) as total
from current_month
LEFT JOIN total ON current_month.date = total.date
group by 1,
order by current_month.date

Is there any way I can execute this query and load results into some predefined data structure? I looked at SQLAlchemy, because I didn't feel like executing raw SQL Queries, but ORM looks useless here. Looks like the only way is to execute raw SQL and load it into some sort of schema (using marshmallow). I looked at this article which is close but not that elaborate.

Maybe I am missing something? People should do such things rather often. Or maybe my approach is wrong over all?

P.S. Simple filtering works perfectly on my schema with Flask-Restless

Semant1ka
  • 647
  • 10
  • 26

1 Answers1

0

Try Pandas and Pyodbc:

import pandas as pd
import pyodbc

server = 'mysrvr'
db = 'mydb'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+db+';Trusted_Connection=yes')

sql = "select col1, col2, col3 from mytable"

my_dataframe = pd.read_sql(sql,conn)

my_dataframe will be your data frame/data structure.

CSY
  • 208
  • 1
  • 9