0

We have a sql server query in which we need to generate ntiles for increasingly large numbers of variables, such that the variables are combined with each other in their various permutations. Here's an excerpt exemplifying what I mean:

statement 1:

ntile(10) over (partition by  MAUorALL, User_Type, fsi.Month_ID 
                    order by Objects_Created) AS Ntile_Mon_Objects_Created,

statement 2:

ntile(10) over (partition by  MAUorALL, User_Type, fsi.Month_ID, *Country*
          order by Objects_Created) AS Ntile_Country_Objects_Created

statement 3:

ntile(10) over (partition by  MAUorALL, User_Type, fsi.Month_ID, *User*_Type
                 order by Objects_Created) AS Ntile_UT_Objects_Created

You can see that the statements are the same except that in the second and third one the italicized columns "country" and "user type" have been created. So we take ntiles for the same variable "Objects_Created" at different levels of specificity, and we also have to take ntiles for the various possible permutations of these variables, e.g.:

statement 4:

ntile(10) over (partition by  MAUorALL, User_Type, fsi.Month_ID, *Country, User_Type*
            order by Objects_Created) AS Ntile_Country_UT_Objects_Created

We can manually code these permutations to a point, but if we could use sqlalchemy to execute all the permutations of these variables it might make things easier. Does anyone have an example I could re-purpose?

Thanks for your help!

Bogdan Sahlean
  • 19,233
  • 3
  • 42
  • 57
ouonomos
  • 700
  • 1
  • 9
  • 25
  • What's going on? There seems to be a new process since the last time I used SA, whereby contributors are editing my code for--readability I guess. Thank you for that. Are you also providing answers that I'm missing? – ouonomos Mar 05 '14 at 21:30

1 Answers1

0

I have no idea how fsi is related to other columns, but assuming all data is in one model (which is easy to extend with sqlalchemy query) like below:

class User(Base):
    __tablename__ = 't_users'
    id = Column(Integer, primary_key=True)
    MAUorALL = Column(String)
    User_Type = Column(String)
    Country = Column(String)
    Month_ID = Column(Integer)
    Objects_Created = Column(Integer)

the task is accomplished by simple usage of itertools.permutations (or itertools.combinations, depending what you want to achieve) for creating query. Below code would generate a query for a User table with various ntiles for it. I assume reading the code suffice for understading what is happening:

# configuration: {label: Column}
column_labels = {
        'Country': User.Country,
        'UT': User.User_Type,
        }

def get_ntile(additional_columns=None):
    """ @return: sqlalchemy expression for selecting a given ntile() using
    predefined as well as *additional* columns.
    """
    partition_by = [
        User.MAUorALL,
        User.User_Type,
        User.Month_ID,
        ]
    label = "Ntile_Objects_Created"
    if additional_columns:
        lbls = []
        for col_name in additional_columns:
            col = column_labels[col_name]
            partition_by.append(col)
            lbls.append(col_name)
        label = "Ntile_{}_Objects_Created".format("_".join(lbls))
    xprs = over(
            func.ntile(10),
            partition_by = partition_by,
            order_by = User.Objects_Created,
            ).label(label)
    return xprs

def get_query(additional_columns=['UT', 'Country']):
    """ @return: a query object which selects a User with additional ntiles
    for predefined columns (fixed) and all possible permutations of
    *additional_columns*
    """
    from itertools import permutations#, combinations
    tiles = [get_ntile(comb)
            for r in range(len(additional_columns) + 1)
            for comb in permutations(additional_columns, r)
            ]
    q = session.query(User, *tiles)
    return q

q = get_query()
print [_c["name"] for _c in q.column_descriptions]
# >>> ['User', 'Ntile_Objects_Created', 'Ntile_UT_Objects_Created', 'Ntile_Country_Objects_Created', 'Ntile_UT_Country_Objects_Created', 'Ntile_Country_UT_Objects_Created']

for tile in q.all():
    print tile
van
  • 74,297
  • 13
  • 168
  • 171
  • Thanks so much Van. I'll try it out -- I expect this will get the job done. I'll comment on how it goes later. – ouonomos Mar 06 '14 at 17:51