I'm looking to extract some data from an Oracle database and transferring it to a remote HDFS file system. There appears to be a couple of possible ways of achieving this:
- Use Sqoop. This tool will extract the data, copy it across the network and store it directly into HDFS
- Use SQL to read the data and store in on the local file system. When this has been completed copy (ftp?) the data to the Hadoop system.
My question will the first method (which is less work for me) cause Oracle to lock tables for longer than required?
My worry is that that Sqoop might take out a lock on the database when it starts to query the data and this lock isn't going to be released until all of the data has been copied across to HDFS. Since I'll be extracting large amounts of data and copying it to a remote location (so there will be significant network latency) the lock will remain longer than would otherwise be required.