Following is the gist of my problem.
Env: Hadoop 2 (CDH5.1) database: oracle 11g
Scenarios: I'm sqooping fact and dimension tables from the database into hdfs. Initially, I had challenges in handling nulls (which was handled using --null-string and --non-null-string) which was set to \N as per the recommendation. Everything was fine when the hive table that was built had string fields even for date and numerics.
Solution so far Based on a recommendation, I move to importing using the Avro format. I've built the hive table on the avro data and I'm able to query the tables. Now I need to create Hive joins and convert all the fields to their required type like dates to be dates/timestamps, numerics to be int/bigint etc. After the sqooping the avro schema created had converted all date fields to long and the hive table show bigint for those columns.
I'm confused around how sqoop is handling nulls and how those are to be handled in hive/hdfs MR etc.
Could you anybody suggest any practice that has been adopted that could be leveraged?
Thanks Venkatesh