1

I am new to Cloudera environment, I am trying to import data from RDBMS using Sqoop and I need to apply some transformations to data during the import. Specifically I need to encrypt some fields before storing them on the Hadoop DFS. To accomplish this I am trying to use the codegen command, which generates an ORM java class that I can modify.

Let's say I have a table 'products' on MySQL database and I want to import it on HDFS using Sqoop and encrypt the 'brand' field. First I've run this command:

sqoop codegen \ 
--connect jdbc:mysql://localhost/test \
--username username --password password \
--table products

This generates the files products.java, products.jar and products.class in the folder /tmp/sqoop-training/compile/fc8868dda33ef703ad126583cf77477f.

Now I've modified the method readFields in products.java like so:

// WARNING: This class is AUTO-GENERATED. Modify at your own risk.
//
// Debug information:
// Generated date: Thu Nov 16 06:55:13 PST 2017
// For connector: org.apache.sqoop.manager.MySQLManager
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.lib.db.DBWritable;
import com.cloudera.sqoop.lib.JdbcWritableBridge;
import com.cloudera.sqoop.lib.DelimiterSet;
import com.cloudera.sqoop.lib.FieldFormatter;
import com.cloudera.sqoop.lib.RecordParser;
import com.cloudera.sqoop.lib.BooleanParser;
import com.cloudera.sqoop.lib.BlobRef;
import com.cloudera.sqoop.lib.ClobRef;
import com.cloudera.sqoop.lib.LargeObjectLoader;
import com.cloudera.sqoop.lib.SqoopRecord;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.sql.Date;
import java.sql.Time;
import java.sql.Timestamp;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;

public class products extends SqoopRecord  implements DBWritable, Writable {

    // [...]

    public void readFields(ResultSet __dbResults) throws SQLException {
        this.__cur_result_set = __dbResults;
        this.prod_id = JdbcWritableBridge.readInteger(1, __dbResults);
        this.brand = encrypt(JdbcWritableBridge.readString(2, __dbResults));
        this.name = JdbcWritableBridge.readString(3, __dbResults);
        this.price = JdbcWritableBridge.readInteger(4, __dbResults);
        this.cost = JdbcWritableBridge.readInteger(5, __dbResults);
        this.shipping_wt = JdbcWritableBridge.readInteger(6, __dbResults);
    }

    // [...]

}

I have two questions:
1) How can I recompile the products.java to obtain updated versions of products.class and products.jar? I've tried with

javac products.java

but JVM gives 82 errors, it seems it cannot find packages from hadoop and cloudera namespace:

error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.BytesWritable;
                           ^
products.java:8: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Text;
                           ^
products.java:9: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Writable;
                           ^
products.java:10: error: package org.apache.hadoop.mapred.lib.db does not exist
import org.apache.hadoop.mapred.lib.db.DBWritable;
                                      ^
products.java:11: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.JdbcWritableBridge;
                             ^
products.java:12: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.DelimiterSet;
                             ^
products.java:13: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.FieldFormatter;
                             ^
products.java:14: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.RecordParser;
                             ^
products.java:15: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.BooleanParser;
                             ^
products.java:16: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.BlobRef;
                             ^
products.java:17: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.ClobRef;
                             ^
products.java:18: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.LargeObjectLoader;
                             ^
products.java:19: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.SqoopRecord;


2) Once I have successfully compiled products.java, how can I use Sqoop to import data on HDFS using my custom ORM class?



Thanks in advance!

revy
  • 3,945
  • 7
  • 40
  • 85

1 Answers1

1

On first question:

Add

export CLASSPATH=`hadoop classpath`:/opt/cloudera/parcels/CDH/lib/sqoop/lib

and then try again.

ps. Generally on architecture, minor comment on "Specifically I need to encrypt some fields before storing them on the Hadoop DFS" - why don't you use HDFS Transparent Encryption? https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_sg_hdfs_encryption.html You can achieve the same without any coding.

Tagar
  • 13,911
  • 6
  • 95
  • 110
  • Thanks! Unfortunately the folder /opt/cloudera/parcels/CDH/lib/sqoop does not exists. I can see only /opt/cloudera/parcels/CDH/lib/hue folder. Running command 'hadoop version' on terminal I can see that the CDH version is: cdh5.8.0. – revy Nov 20 '17 at 09:31
  • Do you have sqoop 1 or sqoop 2 installed? Was it installed as a parcel or a package? Did you add a sqoop 1 service through CM? – Tagar Nov 20 '17 at 17:25