0

I am trying to fetch data from teradata then writing and reading to parquet file using prefect tasks My code is working when I am executing sql separately like below

def fetch_data(host,db_name,user,password,query):
    'logic'

@task(name="Write dask dataframe into GPFS parquet file")
def write_data(dask_dataframe,file_name):
    'logic'

@task(name="Read data from GPFS parquet file into dask dataframe")
def read_data(file_name):
    'logic'

with Flow("Teradata Example") as flow:
    result = fetch_data(host,db_name,user,password,query)
    write_data(dask_dataframe=result,file_name=file_name)
    read_data(file_name=file_name)

flow.run()

But code is failing when same fetcher code is running as task

@task(name="Fetch sql query data from teradta data source into dask dataframe")
def fetch_data(host,db_name,user,password,query):
    'logic'

@task(name="Write dask dataframe into GPFS parquet file")
def write_data(dask_dataframe,file_name):
    'logic'

@task(name="Read data from GPFS parquet file into dask dataframe")
def read_data(file_name):
    'logic'

with Flow("Teradata Example") as flow:
    result = fetch_data(host,db_name,user,password,query)
    write_data(dask_dataframe=result,file_name=file_name)
    read_data(file_name=file_name)

flow.run()

Added the teradata file code:

def get_partitions(num_partitions):
    list_range =[]
    initial_start=0
    for i in range(num_partitions):
        amp_range = 3240//num_partitions
        start = (i*amp_range+1)*initial_start
        end   = (i+1)*amp_range
        list_range.append((start,end))
        initial_start = 1
    return list_range

@delayed
def load(query,start,end,connString):
    return pd.read_sql(query.format(start, end),connString)

class TeradataFetch(Task):
    def __init__(
        args)

    @defaults_from_attrs("fetch", "fetch_count", "query", "commit", "charset")
    def run(
        self,
        query: str,
    ) -> Any:
        try:
            results = from_delayed([load(query,start, end,connString) for start,end in get_partitions(self.num_partitions)])
            logging.debug("Fetch Results: %s", results)
            return results

        except Exception as e:
            raise e


Can someone please suggest/help here?

Reetesh Nigam
  • 133
  • 2
  • 2
  • 15
  • 1
    How is it "failing"? Which step is failing? An error message would be very helpful. – Andrew Nov 13 '20 at 14:39
  • I am getting below error: [2020-11-13 16:20:05] INFO - prefect.FlowRunner | Beginning Flow run for 'Teradata Example' [2020-11-13 16:20:05] INFO - prefect.TaskRunner | Task 'Fetch sql query data from teradta data source into dask dataframe': Starting task run... [2020-11-13 16:20:05] ERROR - prefect.TaskRunner | Unexpected error: ValueError('Could not infer an active Flow context.',) – Reetesh Nigam Nov 13 '20 at 16:20
  • File "teradata_dask_flow.py", line 33, in fetch_data fetch_results = teradata_fetch(query=query) File "path\envs\prefect\lib\site-packages\prefect-0.13.13-py3.6.egg\prefect\core\task.py", line 514, in __call__ *args, mapped=mapped, upstream_tasks=upstream_tasks, flow=flow, **kwargs File "path\envs\prefect\lib\site-packages\prefect-0.13.13-py3.6.egg\prefect\core\task.py", line 566, in bind raise ValueError("Could not infer an active Flow context.") ValueError: Could not infer an active Flow context. – Reetesh Nigam Nov 13 '20 at 16:29
  • It appears you aren't showing all your code - that error suggests you are trying to use a task called `teradata_fetch` *within* the `fetch_data` task – chriswhite Nov 13 '20 at 17:36
  • @chriswhite Its a connector to teradata thats why I had not included but now I have added the entire logic of my file in above – Reetesh Nigam Nov 13 '20 at 19:47
  • 1
    Flows are responsible for orchestrating Tasks, so you can't create and bind a new task from within another task. You should either use your `teradata_fetch` directly within `fetch_data` without creating a Prefect Task, or bind it as a Task within the Flow context. – chriswhite Nov 13 '20 at 21:11

0 Answers0