0

I am trying to use sqoop transfer from cdh5 to import large postgreSQL table to HDFS. The whole table is about 15G.

  1. First, I tried to import just use the basic information, by entering schema and table name, it didn't work. I always get GC overhead limit exceeded. I tried to change the JVM heap size on Cloudera manager configuration for Yarn and sqoop to maximum (4G), still no help.

  2. Then, I am trying to use sqoop transfer SQL statement to transfer partly of the table, I added SQL statement in the field as the following: select * from mytable where id>1000000 and id<2000000 ${CONDITIONS} (partition column is id). The statement is failed, actually any kind of statements with my own "where" condition were having the error: "GENERIC_JDBC_CONNECTOR_0002:Unable to execute the SQL statement"

  3. Also I tried to use the boundary query, I can use "select min(id), 1000000 from mutable", and it worked, but I tried to use "select 1000000, 2000000 from mytable" to select data further ahead which caused the sqoop server crash and down.

Could someone help? How to add where condition? or how to use the boundary query. I have searched in many places, I didn't find any good document about how to write SQL statement with sqoop2. Also is that possible to use direct on sqoop2?

Thanks

0 Answers0