When we run sqoop import in the GCP dataproc clusters to either avrodatafile or parquetfile it fails with the below errors. However, import to textfile works. Feels like we might need some additional JARs are required.
The required Sqoop jars are loaded from GCS.
COMMAND used:
gcloud dataproc jobs submit hadoop \
--cluster={cluster_name} \
--region=us-central1 \
--class=org.apache.sqoop.Sqoop --jars={sqoop_jars_gcs}/sqoop-1.4.7.jar,{sqoop_jars_gcs}/avro-1.8.2.jar,{sqoop_jars_gcs}/terajdbc4.jar,{sqoop_jars_gcs}/log4j-1.2.17.jar,{sqoop_jars_gcs}/sqoop-connector-teradata-1.2c5.jar,{sqoop_jars_gcs}/tdgssconfig.jar,{sqoop_jars_gcs}/avro-1.8.2.jar \
-- import \
-Dmapreduce.job.user.classpath.first=true \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect={db_connection}DATABASE={source_db} \
--username={userid} \
--password-file {passfile} \
--driver com.teradata.jdbc.TeraDriver \
-e "sql query AND \$CONDITIONS" \
--target-dir=<dir> \
--delete-target-dir \
--as-<avrodatafile/parquetfile> \
--split-by <column>
Error when running --as-avrodatafile: We have the avro-1.8.2.jar in classpath but still no luck.
INFO - Error: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
INFO - at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:135)
INFO - at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
INFO - at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
INFO - at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
INFO - at java.security.AccessController.doPrivileged(Native Method)
INFO - at javax.security.auth.Subject.doAs(Subject.java:422)
INFO - at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
INFO - at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
INFO - Caused by: java.lang.reflect.InvocationTargetException
INFO - at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
INFO - at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
INFO - at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
INFO - at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
INFO - at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
INFO - ... 7 more
INFO - Caused by: java.lang.NoClassDefFoundError: org/apache/avro/mapred/AvroWrapper
INFO - at org.apache.sqoop.mapreduce.AvroImportMapper.<init>(AvroImportMapper.java:43)
INFO - ... 12 more
INFO - Caused by: java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroWrapper
INFO - at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
INFO - at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
INFO - at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
INFO - at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
INFO - ... 13 more
Error when running --as-parquetfile:
INFO - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
INFO - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
INFO - at java.lang.reflect.Method.invoke(Method.java:498)
INFO - at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:19)
INFO - Caused by: java.lang.NoClassDefFoundError: org/kitesdk/data/mapreduce/DatasetKeyOutputFormat
INFO - at org.apache.sqoop.mapreduce.DataDrivenImportJob.getOutputFormatClass(DataDrivenImportJob.java:213)
INFO - at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
INFO - at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:263)
INFO - at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:748)
INFO - at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:522)
INFO - at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
INFO - at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
INFO - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
INFO - at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
INFO - at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
INFO - at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
INFO - at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
INFO - ... 5 more
INFO - Caused by: java.lang.ClassNotFoundException: org.kitesdk.data.mapreduce.DatasetKeyOutputFormat
INFO - at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
INFO - at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
INFO - at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
INFO - at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
INFO - ... 17 more