0

I was using bcp (sybase mass insert) to insert millions of records but my company is migrating to oracle.

I am not sure if I should use array binding or sql loader. I have a lot of data in memory. I can either 1. Create a text file with the data and use sql-loader to insert it or 2. use the array binding library to insert data. I'm not sure which is more practical for my application. What are the differences between one and the other. Is one better for certain applications?

Which should I use to replace bcp?

Justin Cave
  • 227,342
  • 24
  • 367
  • 384
anthonybell
  • 5,790
  • 7
  • 42
  • 60
  • 3
    I don't think you can compare these. sql-loader is a standalone Oracle tool for parsing and batch loading of text files, array binding is a programming approach to reduce network roundtrips (or context switches, if you use it in pl/sql). – Kirill Leontev Jul 01 '14 at 16:11
  • so both should have comparable performance if I need to insert 4 million records into a table with 800 million records? – anthonybell Jul 01 '14 at 16:19
  • Basically I have a lot of data in memory. I can either 1. Create a text file with the data and use sql-loader to insert it or 2. use the array binding library to insert data. I'm not sure which is more practical for my application. – anthonybell Jul 01 '14 at 16:22
  • I would expect sql-loader to be faster by default, since it uses (or at least tries to) direct-path write, while your average insert goes conventional way through buffer cache. however, direct-path is not always possible – Kirill Leontev Jul 01 '14 at 16:28

1 Answers1

4

SQL*Loader is the most direct replacement for bcp. If you have an existing process that uses bcp, moving to SQL*Loader is probably the path of least resistance.

You say that you already have the data in memory already. I assume that means that the data is in memory on a client machine not on the database server. Given that starting point, I'd generally prefer a direct path load assuming that whatever database access API you are using provides a direct path API. Incurring the overhead of writing a bunch of data to a file only to have SQL*Loader incur the overhead of reading that data back off disk just to use (assuming you set it up to do so) the direct path API to load the data should make SQL*Loader less efficient. Of course, as a purpose-built tool, it is likely that a SQL*Loader solution can be cobbled together with acceptable performance more quickly than you can write your own code to do so particularly if you're just learning the API.

If you don't have access to a direct path API and you're debating between an application doing a conventional path load using array binds or a SQL*Loader solution doing a direct-path load, the question is much closer. You'd probably need to benchmark the two. Direct-path loading is more efficient than a conventional path load. But writing all the data to disk and reading it all back is going to incur an additional cost. Whether the cost of reading and writing that data to disk outweighs the benefit of a direct-path load will depend on a variety of factors that are specific to your application (data volumes, disk speed, network I/O, etc.).

One additional option to consider may be to write the file to disk, copy the file to the database server, and then use an external table to expose the data to the database. This is generally more efficient than using SQL*Loader from a client machine. Whether it performs better than a direct-path load from an application and whether the additional complexity of writing files, moving them around, and generally moving control from an application to operating system utilities and back to the application outweighs the complexity of writing a bit more code in the application is something that you'd need to answer for yourself.

Justin Cave
  • 227,342
  • 24
  • 367
  • 384