0

I have a script on a Linux server (Cloudera) that uses unixODBC/isql to pull data from a Netezza server. It generates a text file on the local filesystem and then pushes it to HDFS.

export ODBCHOME=;
export ODBCINI=;
export LIBPATH=;
export LD_LIBRARY_PATH=;

NZ_ODBC_INI_PATH=/home/myuser/sql/;
export NZ_ODBC_INI_PATH;
export ODBCINI=/home/myuser/sql/odbc.ini;

export NZ_HOME=/usr/local/nz;
export PATH=$PATH:$NZ_HOME/bin64;
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$NZ_HOME/lib64;
export NZ_HOST=myntzserver

EXTRACT_PATH=/home/myuser/trgt_path/;
export EXTRACT_PATH;

cd $EXTRACT_PATH;

hive_char=`printf "\035\n"`

isql nettezaServer1 tsg1dev \$\$infadmin11 -v<<EOF
drop table test_1;
CREATE EXTERNAL TABLE test_1 ( GRP BIGINT, ID BIGINT, DATA CHARACTER V VARYING(10) ) USING (DATAOBJECT ("$EXTRACT_PATH/test_1.txt") REMOTESOURCE 'ODBC' delimiter 2);
insert into test_1 SELECT GRP,ID,DATA FROM SCHEMA1..test_table;
EOF

hdfs dfs -put /home/myuser/trgt_path/test_1.txt /user/projects/my_dir/

I am wondering if there is a way to either connect directly to HDFS, or pipe the data to HDFS as it comes in. In other words, I want to skip creating the text file on the local filesystem due to space constraints (the source table is hundreds of GB).

The reason I am not using sqoop is because the time it takes is too long for the project requirements. The script above is able to create the text file in 2-3 hours whereas sqoop on average is taking 7-8 to load the data. The sqoop expert on my team tried to fine-tune the command to get the time down, but had no luck.

To summarize, is it possible to skip the local filesystem and load this data directly to HDFS?

Danzo
  • 553
  • 3
  • 13
  • 26

1 Answers1

0

Netezza includes Fluid Query, which allows Netezza to query other data sources, and of more interest to you, the ability to move data directly from Netezza to HDFS in various formats.

While the software entitlement is Fluid Query is included with Netezza, your administrator will have to install and configure it prior to use.

Here's a quick overview of Fluid Query.

ScottMcG
  • 3,867
  • 2
  • 12
  • 21