0

I'm reading data from a table in Sybase using a Table Input step. The query is really simple:

SELECT person_ref, displayname FROM person

That table has about 2 million rows. I'm connecting to Sybase ASE 12. My user has read-only rights. PDI is using the jconnect driver with the following options:

IMPLICIT_CURSOR_FETCH_SIZE=5000
SELECT_OPENS_CURSOR=True

I've also tried using the noholdlock option on that query to change the isolation level.

The problem is that the query seems to remain idle for a long time, nearly a minute. PDI indicates that the step is in idle state for that time and then changes to Running. This makes it hard to measure the time the process takes, because PDI won't start measuring time until the steps change state from idle.

I can't seem to find anything in the manuals, or any option that will speed up the read time by decreasing or eliminating this idel time. Is there any option I'm missing? Does the idle status mean that PDI is just waiting for a response from Sybase?

  • Have you tried enlisting the help of your Sybase ASE DBA to see if there are issues in Sybase? If the ASE version is 12.5.0.3+ then the DBA has access to the MDA/Monitoring tables which may also help determine the issue. The DBA should be able to tell you what's going on with your connection into Sybase, eg, waiting for a table scan from disk, intermittent blocking, in 'send sleep' (meaning ASE is waiting for an 'ack' from the client that it's received the latest batch of records, etc). – markp-fuso Feb 05 '18 at 14:17
  • I'm trying but it's a bit of a harrowing process as my company doesn't own the DB, we're just given access to it. But I will send a request to see if I can get access to these monitoring tables – Carlos Hurtado Feb 05 '18 at 17:18

3 Answers3

1

Maybe your query is long to retreive the data.

The latence time is in the jdbc architecture. It sends the query to the database, who stores the data in a buffer. Only when this buffer is full, the data is transferred back to PDI. Until it receives some data, the Input table is in idle mode.

If you want to measure the time including the idle time, put a step that will fire without any latency, for example a Generate row (1 row is enough) step. You do not need to connect this step to any thing, as the PDI will start all the steps in parallel as soon as possible.

You won't see the total result on the Input table row of the the Step Metrics bottom tab. But you will have the result on the Metrics.

You can also use a Block this step until steps finish. You have an example in the sample directory that was shipped with your distribution. Open youKettleInstallDir/sample/transformation/Block this step until steps finish.ktr, and replace the top row with your flow. Then watch the statistics of the blocking step.

AlainD
  • 6,187
  • 3
  • 17
  • 31
  • Alain, as you said, I believe it's just the time to retrieve the data and fill the buffer. I'll try to talk to the DBA to see if anything can be done to speed things up. It's weird, because this is a replica of a production database that shouldn't be in use by anyone other than PDI and the replication process. If I can find a solution to address this I'll be sure to post it! – Carlos Hurtado Feb 07 '18 at 12:37
  • 1
    Yon can split the table : `SELECT person_ref, displayname FROM person WHERE dispalyname LIKE 'A%'`, `....like 'B%'`,... Then you direct all these flows in the same next step. By kettle's architecture, all the steps will run in parallel and the latency will be shorter. Of course, you do not need 26 `Input table` steps, you can create the 'A%',... in a previous flow and use it as parameters in the table input. And may be you have a better way to divide the load process. – AlainD Feb 07 '18 at 17:17
  • I'm going to try that tonight! – Carlos Hurtado Feb 07 '18 at 17:44
  • Check the indexes on the table (via sp_helpindex ) and look for a column which is indexed if you're looking to do this in pieces, otherwise it will still tablescan to get the sub-set of data and remain slow. Any user can run sp_helpindex so you should have sufficient access to check that.
    – Rich Campbell Feb 15 '18 at 09:15
0

In my opinion, you have another step in your transformation locking the tables person. There is an overwhelming probability that you have a Output table step trying to truncate the table person.

AlainD
  • 6,187
  • 3
  • 17
  • 31
  • Actually, the output is to a text file that would later get bulk-loaded into another database. So the entire flow is: read from table input and dump to text file. I don't know if the reason for the idle time could be due to other processes accessing the same data as my query. – Carlos Hurtado Feb 05 '18 at 17:21
0

I don't know if this is what I would call an answer, but I definitely found a way to get the Sybase connection to respond quickly. There's a querying tool called Sybase anywhere, that you can use to query the DB directly. What I did was look into an installation in a separate machine that had a good connection.

That machine had an ODBC connection defined for the Sybase DB, and the install of the client tool had its own version of Sybase drivers, along with some DLL files. I tool the jars and dll's and put them in the machine that had PDI installed. I made sure they were all in the classpath, and created a generic JDBC connection that pointed to the system ODBC one. It's going at the speed you would expect now.