1

I need to find the fastest way of reading a large OpenEdge table (100 million rows plus), preferably programmatically (in c#) and outside of ETL tools such as SSIS or staging formats such as text file extracts.

I'm currently using ODBC (driver: Progress OpenEdge 11.5) to query the OpenEdge 11.5 tables in batches using the OFFSET and FETCH modifiers

SELECT COL_1, COL_2
FROM PUB.TABLE_1
ORDER BY ROWID ASC
  OFFSET {currentBatchStart} ROWS
  FETCH NEXT {batchSize} ROWS ONLY

I'm querying via a system DSN with FetchArraySize: 25 and QueryTimeout: -1. And I'm connecting to an OpenEdge server group set up for SQL only access with message buffer size: 1024.

I'm finding the performance is poor (about 1 million records every 15 minutes) and I suspect it will only slow down as I advance through the table when using the OFFSET FETCH modifiers.

My question is are there any methods I can adopt or settings I can play with to tune the query performance?

  1. For example are there better ways of constructing my SQL query? e.g. should I order by columns in an index rather than ROWID?
  2. Should I increase the message buffer size on the sql server group

Or should I be looking at alternative methods to read the data out of the table?


Note: Each batch is subsequently sqlbulkcopy'ed into a SQL Server table

Drammy
  • 940
  • 12
  • 30

2 Answers2

0

I'm not much on ODBC - from what I can make of your code this will have increasing performance issues as you get further down the table as you surmise.

My suggestion would to be to identify a unique index on that table and use that index's keys to determine what values to get next. Then your query becomes something like this:

WHERE table.KeyField > LastFieldValueRead 
ORDER BY table.KeyField
FETCH NEXT {batchSize} ROWS ONLY

Then the db engine can use your field values to find the offset and get the next values - this'll be much more performant than what you have now.

If this will be an ongoing concern 11.7 has Change Data Capture for logging data changes for replication elsewhere, and Progress sells the Pro2 tool to provide ongoing replication of data.

Tim Kuehn
  • 3,201
  • 1
  • 17
  • 23
  • Thanks Tim, unfortunately I'd like this solution to be db agnostic so it will work on any OE database. – Drammy Oct 20 '17 at 14:27
  • The only reasonably performant way to do that would be to write some OE code. I'm a consultant in this area and can help with that if need be. – Tim Kuehn Oct 20 '17 at 18:41
0

You should write OE code and connect to the SQL Server via .net functionality (if I remember correctly its in System.Data.SQL).

I've written a conversion tool this way which reads from SQL Server, Oracle DB, xBase and others and store them into a Progress RDBMS using almost everything from the original database (table, field and index name, format and the only thing that has to be converted where the datatypes). And I'm pretty sure it works the other way around also.