2

I need to selectively (both rows and columns) export around 20 million rows from one table to another. This is what I tried:

--Run this in batch:
INSERT INTO Table 2
Select A, B from Table1 
where A > a and B < b

---Table1 have columns A, B....Z and got around 50 million records. 

It takes around 12 hours to finish. I don't think sybase allows bcp out with selective columns and rows from Table1 and bcp in to Table2. Is there an alternative fast approach that can be used? I would be happy if it can be done < 4 hours.

Thanks for reading it.

lost
  • 21
  • 1
  • 2

1 Answers1

1

I think you mean:

WHERE PK > start_value AND PK < end_value

There is no good reason to duplicating data in two tables on the same server, so hopefully the tables are on separate servers. If you are "archiving", then be advised, that is the wrong thing to do; enhance the table speed instead. Refer this post.

  1. That INSERT-SELECT will kill the transaction log, which would run progressively slower for you, and prevent other users from using the db. If you break into batches of 1000 rows, it will be faster and more sociable.

I would be happy if it can be done < 4 hours

  1. Should be no problem. Depends on your hardware and disk layout. I can load 16 million rows in 13.2 secs on my little demo box running ASE 15.5.

  2. bcp runs in two modes, automatically, depending on the conditions as follows:

    • FAST. this requires SELECT_INTO/BULK_COPY sp_dboption to be set, which allows bcp to NOT log INSERTS, only Allocations. Also requires the indices on the table to be dropped (they can be created after bcp finishes).

    • SLOW. Either of the above conditions not being met. All INSERTS are logged. Ensure you have a Threshold on the log that dumps it (it WILL fill).

  3. No problem at all for either the out_data_file or Table_2 to be a subset of columns of Table_1. Create a view of Table_2 on the Table_1 server. Bcp-out the view. You can also place a WHERE clause in the view, do transforms, etc.

  4. You can exec bcp in parallel (up to the no of CPU/Cores you have on your host system). Split the extract into the no of parallel streams (eg. on an 8 core machine, exec 8 extract jobs in parallel). Use the -F and -L parameters to specify one eighth of Table_1. Use "&" if you have an o/s and 8 x BAT files if you don't.

  5. You can also run (eg) 8 x INSERT-SELECT jobs in parallel. Split by PK value, not row number.

Community
  • 1
  • 1
PerformanceDBA
  • 32,198
  • 10
  • 64
  • 90