0

I'm attempting to fetch data from Exasol using PyExasol, in parallel. I'm following the example here - https://github.com/badoo/pyexasol/blob/master/examples/14_parallel_export.py

My code looks like this :

import multiprocessing
import pyexasol
import pyexasol.callback as cb

class ExportProc(multiprocessing.Process):
    def __init__(self, node):
        self.node = node
        self.read_pipe, self.write_pipe = multiprocessing.Pipe(False)

        super().__init__()

    def start(self):
        super().start()
        self.write_pipe.close()

    def get_proxy(self):
        return self.read_pipe.recv()

    def run(self):
        self.read_pipe.close()

        http = pyexasol.http_transport(self.node['host'], self.node['port'], pyexasol.HTTP_EXPORT)
        self.write_pipe.send(http.get_proxy())
        self.write_pipe.close()

        pd1 = http.export_to_callback(cb.export_to_pandas, None)
        print(f"{self.node['idx']}:{len(pd)}")

EXASOL_HOST = "<IP-ADDRESS>:8563"
EXASOL_USERID = "username"
EXASOL_PASSWORD = "password"

c = pyexasol.connect(dsn=EXASOL_HOST, user=EXASOL_USERID, password=EXASOL_PASSWORD, compression=True)

nodes = c.get_nodes(10)

pool = list()
proxy_list = list()

for n in nodes:
  proc = ExportProc(n)
  proc.start()
  proxy_list.append(proc.get_proxy())
  pool.append(proc)

c.export_parallel(proxy_list, "SELECT * FROM SOME_SCHEMA.SOME_TABLE", export_params={'with_column_names': True})

stmt = c.last_statement()

r = stmt.fetchall()

At the last statement, I'm getting the following error and unable to fetch any results.

---------------------------------------------------------------------------
ExaRuntimeError                           Traceback (most recent call last)
<command-911615> in <module>
----> 1 r = stmt.fetchall()

/local_disk0/pythonVirtualEnvDirs/virtualEnv-01515a25-967f-4b98-aa10-6ac03c978ce2/lib/python3.7/site-packages/pyexasol/statement.py in fetchall(self)
     85 
     86     def fetchall(self):
---> 87         return [row for row in self]
     88 
     89     def fetchcol(self):

/local_disk0/pythonVirtualEnvDirs/virtualEnv-01515a25-967f-4b98-aa10-6ac03c978ce2/lib/python3.7/site-packages/pyexasol/statement.py in <listcomp>(.0)
     85 
     86     def fetchall(self):
---> 87         return [row for row in self]
     88 
     89     def fetchcol(self):

/local_disk0/pythonVirtualEnvDirs/virtualEnv-01515a25-967f-4b98-aa10-6ac03c978ce2/lib/python3.7/site-packages/pyexasol/statement.py in __next__(self)
     53         if self.pos_total >= self.num_rows_total:
     54             if self.result_type != 'resultSet':
---> 55                 raise ExaRuntimeError(self.connection, 'Attempt to fetch from statement without result set')
     56 
     57             raise StopIteration

ExaRuntimeError: 
(
    message  =>  Attempt to fetch from statement without result set
    dsn      =>  <IP-ADDRESS>:8563
    user     =>  username
    schema   =>  
)

It seems that the type of the returned statement is not 'resultSet' but 'rowCount'. Any help on what I'm doing wrong or why the type of statement is ''rowCount' ?

SpaceMonkey
  • 25
  • 1
  • 4

1 Answers1

0

PyEXASOL creator is here. Please not in case of parallel HTTP transport you have to process data chunks inside child processes. Your data set is available in pd1 DataFrame.

You should not be calling .fetchall() in the main process in case of parallel processing.

I suggest to check the complete examples, especially example 14 (parallel export).

Hope it helps!

wildraid
  • 126
  • 4