1

I am trying to insert data to my postgres table after reading a xlsx file. I need to validate the xlsx sheet data before I insert it into my table.

I am working on pandas dataframe df = pd.read_excel('/Users/ankitg3-mac/Downloads/medical_plans/%s' % filename)

I am using sqlalchemy as my ORM tool.

My model class:

    class MedicalPlan(Base):
        __tablename__ = "medical_plans"

        id = Column(Integer, nullable=False , primary_key=True)
        issuer_id = Column(Integer, ForeignKey('issuers.id'), nullable=False)
        service_area_id = Column(Integer)
        name = Column(String)
        on_exchange = Column(Boolean)
        off_exchange = Column(Boolean)
        starting_percentage_fpl = Column(REAL, nullable=False , default=0)
        ending_percentage_fpl = Column(REAL, nullable=False, default=0)
        metal_level_name = Column(String)
        network_type = Column(String)
        type = Column(String)
        is_age_29_plan = Column(Boolean)
        original_medicare = Column(Boolean)
        default_bhp = Column(Boolean, default=False)
        sort_rank_override = Column(Integer)
        recommended = Column(Boolean, default=False)
        comparable_individual_plan_id_trash = Column(Integer)
        group_or_individual_plan_type = Column(String)
        hios_plan_identifier = Column(String)

I am doing a bulk insert using a list of dictionaries.

conn.execute(MedicalPlan.__table__.insert(), medicalPlan_dict)

My medicalPlan_dict looks like below:

[{u'default_bhp': False, u'price_period': u'Monthly', u'plan_description': '', u'sbc_download_url': '', u'price_note': '', u'starting_percentage_fpl': 0, u'is_uhc_plan': False, 'issuer_id': 440, u'part_b_deductible': '', u'promotional_label': '', u'metal_level_name': u'Silver', u'network_url': '', u'group_or_individual_plan_type': u'Group', u'treatment_cost_calculator_url': '', u'hios_plan_identifier': u'99844RI1800001', u'original_medicare': False, u'part_d_prescription_coverage': '', u'recommended': False, u'off_exchange': True, u'is_age_29_plan': False, u'type': u'MetalPlan', u'ending_percentage_fpl': 0, u'plan_detail_footer': '', u'formulary_url': '', u'plan_detail_items': '', u'highlight_6': '', u'highlight_4': '', u'highlight_5': '', u'hsa_eligible': False, u'highlight_3': u'PCP 20% coinsurance', u'highlight_1': u'Silver', u'name': u'WI 80 INDEMNITY 18 OPTION 1 SILVER RX $10/45/90/25%', u'network_description': '', u'plan_detail_header': '', 'service_area_id': 1, u'data_sourced_from': u'uhc', u'plan_year': 2018, u'highlight_2': u'Indemnity', u'on_exchange': False, u'network_type': u'Indemnity'}, {u'default_bhp': False, u'price_period': u'Monthly', u'plan_description': '', u'sbc_download_url': '', u'price_note': '', u'starting_percentage_fpl': 0, u'is_uhc_plan': False, 'issuer_id': 484, u'part_b_deductible': '', u'promotional_label': '', u'metal_level_name': u'Silver', u'network_url': '', u'group_or_individual_plan_type': u'Group', u'treatment_cost_calculator_url': '', u'hios_plan_identifier': u'99806CAAUSJ-TMP1', u'original_medicare': False, u'part_d_prescription_coverage': '', u'recommended': False, u'off_exchange': True, u'is_age_29_plan': False, u'type': u'MetalPlan', u'ending_percentage_fpl': 0, u'plan_detail_footer': '', u'formulary_url': '', u'plan_detail_items': '', u'highlight_6': '', u'highlight_4': '', u'highlight_5': '', u'hsa_eligible': False, u'highlight_3': u'PCP 20% coinsurance', u'highlight_1': u'Silver', u'name': u'WI 80 INDEMNITY 18 OPTION 1 SILVER RX $10/45/90/25%', u'network_description': '', u'plan_detail_header': '', 'service_area_id': 1, u'data_sourced_from': u'uhc', u'plan_year': 2018, u'highlight_2': u'Indemnity', u'on_exchange': False, u'network_type': u'Indemnity'}, {u'default_bhp': False, u'price_period': u'Monthly', u'plan_description': '', u'sbc_download_url': '', u'price_note': '', u'starting_percentage_fpl': 0, u'is_uhc_plan': False, 'issuer_id': 440, u'part_b_deductible': '', u'promotional_label': '', u'metal_level_name': u'Silver', u'network_url': '', u'group_or_individual_plan_type': u'Group', u'treatment_cost_calculator_url': '', u'hios_plan_identifier': u'99844RI1800002', u'original_medicare': False, u'part_d_prescription_coverage': '', u'recommended': False, u'off_exchange': True, u'is_age_29_plan': False, u'type': u'MetalPlan', u'ending_percentage_fpl': 0, u'plan_detail_footer': '', u'formulary_url': '', u'plan_detail_items': '', u'highlight_6': '', u'highlight_4': '', u'highlight_5': '', u'hsa_eligible': False, u'highlight_3': u'PCP 20% coinsurance', u'highlight_1': u'Silver', u'name': u'WI 80 INDEMNITY 18 OPTION 1 SILVER RX $10/45/90/25%', u'network_description': '', u'plan_detail_header': '', 'service_area_id': 1, u'data_sourced_from': u'uhc', u'plan_year': 2018, u'highlight_2': u'Indemnity', u'on_exchange': False, u'network_type': u'Indemnity'}]

I need to validate the data before I insert it into my table. I read about sqlalchemy validations, and tried something like below, assuming that it will take care of the validations at the model level.

@validates('hios_plan_identifier')
    def validate_hios_plan_identifier(self, key, hios_plan_identifier):
        assert '/\A(\d{5}[A-Z]{2}[a-zA-Z0-9]{3,7}-TMP|\d{5}[A-Z]{2}\d{3,7}(\-?\d{2})*)\z/,' in hios_plan_identifier
        return hios_plan_identifier

I need to put few validations on each variable. Only the rows which passes should be inserted.

I am not sure how to work on validations at model level. How can I achieve this.

user1896796
  • 731
  • 3
  • 9
  • 25
  • Though it is somewhat obvious, you should provide a bit more information as to how is it not working. Questions without a clear problem statement are not useful in general. – Ilja Everilä Nov 13 '18 at 09:58
  • updated the whole question . Hope you get where I am stuck – user1896796 Nov 13 '18 at 10:33
  • This line: `'/\A(\d{5}[A-Z]{2}[a-zA-Z0-9]{3,7}-TMP|\d{5}[A-Z]{2}\d{3,7}(\-?\d{2})*)\z/,' in hios_plan_identifier`. `'/\A(\d{5}[A-Z]{2}[a-zA-Z0-9]{3,7}-TMP|\d{5}[A-Z]{2}\d{3,7}(\-?\d{2})*)\z/,'` is not a regex, it's just a normal string. That assert will not evaluate to True unless that whole string is in the value of `hios_plan_identifier`. – SuperShoot Nov 13 '18 at 10:37
  • is [`re.search()`](https://docs.python.org/3/library/re.html#re.search) what you are looking for? – SuperShoot Nov 13 '18 at 10:44
  • 1
    I wonder if the ORM level validations even apply to bulk inserts. – Ilja Everilä Nov 13 '18 at 10:51
  • then how exactly can I achieve this . Can I put validations on pandas dataframe itself ? – user1896796 Nov 13 '18 at 10:59
  • @IljaEverilä I wonder that too however from a previous edit of the question `even valid strings are rejected now`, made it seem like the validator was being applied. – SuperShoot Nov 13 '18 at 11:07
  • @user1896796 if you change that line to `assert re.search('/\A(\d{5}[A-Z]{2}[a-zA-Z0-9]{3,7}-TMP|\d{5}[A-Z]{2}\d{3,7}(\-?\d{2})*)\z/,', hios_plan_identifier)` does it run? – SuperShoot Nov 13 '18 at 11:09
  • @ SuperShoot No. It is not working – user1896796 Nov 13 '18 at 11:15
  • getting any errors? – SuperShoot Nov 13 '18 at 11:16
  • No errors . `99806CAAUSJ-TMP1` this should fail but it is not. – user1896796 Nov 13 '18 at 11:18
  • I just tried to compile that regex and it won't compile (throws an exception). If you aren't getting errors and it's not failing when it should, I'd say that @IljaEverilä is right and it isn't executing the validator on the bulk insert. – SuperShoot Nov 13 '18 at 11:22
  • Any way I can achieve this, if not at ORM level ? – user1896796 Nov 13 '18 at 11:24
  • You could iterate over all of the dicts and omit the ones that you don't want to keep before trying to write them to the db. Or perhaps a [trigger](https://www.postgresql.org/docs/current/trigger-definition.html) on the db level could be used but I don't have any experience with that. – SuperShoot Nov 13 '18 at 11:35
  • Or of course, don't use bulk inserts, instantiate an instance from each dict and write to db through the orm. – SuperShoot Nov 13 '18 at 11:42
  • Can you help me with a code example pls ? – user1896796 Nov 13 '18 at 11:44
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/183558/discussion-between-supershoot-and-user1896796). – SuperShoot Nov 13 '18 at 11:47

1 Answers1

0

Two options:

  • Add a CheckConstraint on the column and add your regex there, see https://www.postgresql.org/docs/9.3/functions-matching.html . That works in whatever way you throw the data into the database.
  • Use event based validation as demonstrated here , create a list of objects you want to insert and use session.add_all() for batch insert.
Michael
  • 7,316
  • 1
  • 37
  • 63