0

I'm very new to python and I'm trying to build this database using SQLalchemy and SQLite3.

There is a zip file of csv files that I am trying to process and put into a SQLite database. I have converted the csv files into tables, and one of the tables contains a column which needs to be substringed.

Basically, the column may have data such as: "xxxx Treatment Phase: dcdfdfdsfdsf Clinical criteria: dsfsdfsdfdsf Population criteria: sdfdsfdfdf Treatment criteria: xxsdasdasdsa" within the one cell and I am trying to separate the string prior to the phrase "Treatment Phase:", and also between "Treatment Phase" and "Clinical criteria", between "Clinical criteria:" and "Population criteria" and between "Population criteria" and "Treatment Criteria", however a cell may not have one or any of these key words.
Prior to me trying to substring this column, the database was surprisingly working. Now nothing is happening. It is as if it is caught in a loop or something, but not sure how to check this. ( i have removed all the other code for the other tables)

The txt files can be found here: http://www.pbs.gov.au/downloads/2019/10/2019-10-01-extracts-down-converted.zip

from numpy import genfromtxt
from time import time
from datetime import datetime
from sqlalchemy import Column, Integer, Float, Date, String, ForeignKey, 
Numeric, Table, Text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, relationship
import sqlalchemy

def Load_Data_tab(file_name):
    data = genfromtxt(file_name, dtype='unicode', delimiter = "\t", 
skip_header=False, converters={0: lambda s: str(s)})
    return data.tolist()

Base = declarative_base()

fN_Restriction = "RestrictionExtract_191001.txt"

class Restrictions(Base):
    __tablename__ = 'restrict'
    IndicatId1 = Column(Text, primary_key=True)
    RestrictIndicatText = Column(Text)
    MiscRestrictFlag = Column(Integer)
    DateRequireFlag = Column(String(1))
    TextRequireFlag = Column(String(1))
    TreatmentPhase = Column(Text)
    TreatmentCriteria = Column(Text)
    ClinicalCriteria = Column(Text)
    PopulationCriteria = Column(Text)

if __name__ == "__main__":

    #Create the database
    engine = create_engine('sqlite:///csv_test.db')
    Base.metadata.create_all(engine)

    #Create the session
    session = sessionmaker()
    session.configure(bind=engine)
    s = session()

 # Build Restriction Table
    try:
        data = Load_Data_tab(fN_Restriction)  
        for i in data:
            ChopString = i[1]
            TP = ChopString.find('Treatment Phase:') 
            CC = ChopString.find('Clinical criteria:')
            PC = ChopString.find('Population criteria:')
            TC = ChopString.find('Treatment criteria:')
            if TP = 0:
                if CC = 0:
                    if PC = 0:
                        if TC = 0:
                            RestrictIndicatText1 = i[1]
                        else:
                            RestrictIndicatText1 = ""
            if TP > 0:
                RestrictIndicatText1 = ChopString[1:(TP-1)]
                if CC > 0:
                    TreatmentPhase1 = ChopString[TP:(CC-1)]
                elif PC > 0:
                    TreatmentPhase1 = ChopString[TP:(PC-1)]
                elif TC > 0:
                    TreatmentPhase1 = ChopString[TP:(TC-1)] 
                else:
                    TreatmentPhase1 = ChopString[TP:]
            if CC > 0:
                if TP = 0:
                RestrictIndicatText1 = ChopString[1:(CC-1)]
                if PC > 0:
                    ClinicalCriteria1 = ChopString[CC:(PC-1)]
                elif TC > 0:
                    ClinicalCriteria1 = ChopString[CC:(TC-1)]
                else:
                    ClinicalCriteria1 = ChopString[CC:]       
            if PC > 0:
                if TP = 0:
                    if CC = 0:
                        RestrictIndicatText1 = ChopString[1:(PC-1)]
                if  TC > 0:
                    PopulationCriteria1 = ChopString[PC:(TC-1)]
                else:
                    PopulationCriteria1 = ChopString[PC:]    
            if TC > 0:  
                if TP = 0:
                    if CC = 0: 
                        if PC = 0:
                            RestrictIndicatText1 = ChopString[1:(TC-1)] 
            TreatmentCriteria1 = ChopString[TC:]
        record = Restrictions(IndicatId1 = i[0], RestrictIndicatText = RestrictIndicatText1, MiscRestrictFlag = i[2], DateRequireFlag = i[3], TextRequireFlag = i[4], TreatmentPhase = TreatmentPhase1, TreatmentCriteria = TreatmentCriteria1, ClinicalCriteria = ClinicalCriteria1, PopulationCriteria = PopulationCriteria11)
        record.IndicatId1 = str(record.IndicatId1)[2:-1]
        s.add(record) #Add all the records
    s.commit() #Attempt to commit all the records
finally:
    s.close() #Close the connection
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
  • I'm going to give this a shot, what I can say for now is that numpy may not be the best choice here. – AMC Oct 23 '19 at 02:11
  • i think I worked out what it is.....Im using = after the if statement not ==...., However, now you bring up a great point....what is best to use here if not numpy? – Stats_Dunce Oct 23 '19 at 09:52
  • Maybe pandas, that’s what i started messing around with. Wouldn’t using `=` over `==` throw an error? – AMC Oct 23 '19 at 11:37
  • Thats what i was expecting, but all it did was stall in a loop. Anyways many thanks, and I will try and use pandas for this – Stats_Dunce Oct 24 '19 at 21:49
  • did you ever manage to solve this? – AMC Nov 06 '19 at 01:33
  • Yeah it was the issue with the double equal signs....once that was identified it worked. Thank you for your help. – Stats_Dunce Nov 16 '19 at 11:21

0 Answers0