I am new to Cloudera environment, I am trying to import data from RDBMS using Sqoop and I need to apply some transformations to data during the import. Specifically I need to encrypt some fields before storing them on the Hadoop DFS. To accomplish this I am trying to use the codegen command, which generates an ORM java class that I can modify.
Let's say I have a table 'products' on MySQL database and I want to import it on HDFS using Sqoop and encrypt the 'brand' field. First I've run this command:
sqoop codegen \
--connect jdbc:mysql://localhost/test \
--username username --password password \
--table products
This generates the files products.java, products.jar and products.class in the folder /tmp/sqoop-training/compile/fc8868dda33ef703ad126583cf77477f.
Now I've modified the method readFields in products.java like so:
// WARNING: This class is AUTO-GENERATED. Modify at your own risk.
//
// Debug information:
// Generated date: Thu Nov 16 06:55:13 PST 2017
// For connector: org.apache.sqoop.manager.MySQLManager
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.lib.db.DBWritable;
import com.cloudera.sqoop.lib.JdbcWritableBridge;
import com.cloudera.sqoop.lib.DelimiterSet;
import com.cloudera.sqoop.lib.FieldFormatter;
import com.cloudera.sqoop.lib.RecordParser;
import com.cloudera.sqoop.lib.BooleanParser;
import com.cloudera.sqoop.lib.BlobRef;
import com.cloudera.sqoop.lib.ClobRef;
import com.cloudera.sqoop.lib.LargeObjectLoader;
import com.cloudera.sqoop.lib.SqoopRecord;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.sql.Date;
import java.sql.Time;
import java.sql.Timestamp;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
public class products extends SqoopRecord implements DBWritable, Writable {
// [...]
public void readFields(ResultSet __dbResults) throws SQLException {
this.__cur_result_set = __dbResults;
this.prod_id = JdbcWritableBridge.readInteger(1, __dbResults);
this.brand = encrypt(JdbcWritableBridge.readString(2, __dbResults));
this.name = JdbcWritableBridge.readString(3, __dbResults);
this.price = JdbcWritableBridge.readInteger(4, __dbResults);
this.cost = JdbcWritableBridge.readInteger(5, __dbResults);
this.shipping_wt = JdbcWritableBridge.readInteger(6, __dbResults);
}
// [...]
}
I have two questions:
1) How can I recompile the products.java to obtain updated versions of products.class and products.jar? I've tried with
javac products.java
but JVM gives 82 errors, it seems it cannot find packages from hadoop and cloudera namespace:
error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.BytesWritable;
^
products.java:8: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Text;
^
products.java:9: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Writable;
^
products.java:10: error: package org.apache.hadoop.mapred.lib.db does not exist
import org.apache.hadoop.mapred.lib.db.DBWritable;
^
products.java:11: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.JdbcWritableBridge;
^
products.java:12: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.DelimiterSet;
^
products.java:13: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.FieldFormatter;
^
products.java:14: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.RecordParser;
^
products.java:15: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.BooleanParser;
^
products.java:16: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.BlobRef;
^
products.java:17: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.ClobRef;
^
products.java:18: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.LargeObjectLoader;
^
products.java:19: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.SqoopRecord;
2) Once I have successfully compiled products.java, how can I use Sqoop to import data on HDFS using my custom ORM class?
Thanks in advance!