I need to find the fastest way of reading a large OpenEdge table (100 million rows plus), preferably programmatically (in c#) and outside of ETL tools such as SSIS or staging formats such as text file extracts.
I'm currently using ODBC (driver: Progress OpenEdge 11.5) to query the OpenEdge 11.5 tables in batches using the OFFSET and FETCH modifiers
SELECT COL_1, COL_2
FROM PUB.TABLE_1
ORDER BY ROWID ASC
OFFSET {currentBatchStart} ROWS
FETCH NEXT {batchSize} ROWS ONLY
I'm querying via a system DSN with FetchArraySize: 25
and QueryTimeout: -1
. And I'm connecting to an OpenEdge server group set up for SQL only access with message buffer size: 1024
.
I'm finding the performance is poor (about 1 million records every 15 minutes) and I suspect it will only slow down as I advance through the table when using the OFFSET FETCH modifiers.
My question is are there any methods I can adopt or settings I can play with to tune the query performance?
- For example are there better ways of constructing my SQL query? e.g. should I order by columns in an index rather than ROWID?
- Should I increase the message buffer size on the sql server group
Or should I be looking at alternative methods to read the data out of the table?
Note: Each batch is subsequently sqlbulkcopy
'ed into a SQL Server table