Is there a way to retain the SqlAlchemy attribute names when you query the data into a pandas dataframe?
Here's a simple mapping of my database. For the school table, I've renamed 'SchoolDistrict', the DB name, to a shorter 'district'. I'm several layers removed from the DBA, so changing them in the source isn't feasible.
class School(Base):
__tablename__ = 'DimSchool'
id = Column('SchoolKey', Integer, primary_key=True)
name = Column('SchoolName', String)
district = Column('SchoolDistrict', String)
class StudentScore(Base):
__tablename__ = 'FactStudentScore'
SchoolKey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
PointsPossible = Column('PointsPossible', Integer)
PointsReceived = Column('PointsReceived', Integer)
school = relationship("School", backref='studentscore')
So when I query something like:
query = session.query(StudentScore, School).join(School)
df = pd.read_sql(query.statement, query.session.bind)
I end up with the underlying 'SchoolDistrict' name for the column, not my attribute name, in the returned DataFrame df.
EDIT: An even more annoying case is when duplicate column names exist across tables. For example:
class Teacher(Base):
__tablename__ = 'DimTeacher'
id = Column('TeacherKey', Integer, primary_key=True)
fname = Column('FirstName', String)
lname = Column('FirstName', String)
class Student(Base):
__tablename__ = 'DimStudent'
id = Column('StudentKey', Integer, primary_key=True)
fname = Column('FirstName', String)
lname = Column('FirstName', String)
So a query across both tables (like the one below) produces a dataframe with duplicate FirstName and LastName columns.
query = session.query(StudentScore, Student, Teacher).join(Student).join(Teacher)
Would it be possible to rename these columns at the moment of the query? Right now I'm having trouble keeping my head straight with these two systems of column names.