For self learning purpose,I am trying to create a end to end dataflow in Google cloud:
1.Creating a mysql table using cloud sql 2.Using dataproc to create a temporary cluster to run the sqoop job using template. 3.Put the extracted data in BQ from storage bucket.
I am getting stuck in accessing the mysql table through sqoop.
ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Access denied for user 'test'@'localhost' (using password: YES)
java.sql.SQLException: Access denied for user 'test'@'localhost' (using password: YES).
I tried resolving this issue by: 1.Replacing the localhost with public ip. 2.Executing
GRANT ALL ON everlytics.* TO test@'<ip>' IDENTIFIED BY '1234';
flush privileges;
My code snippet is below:
gcloud dataproc workflow-templates set-managed-cluster $template_name --zone "asia-south1-a" \
--cluster-name=$cluster_name \
--region "asia-south1" \
--scopes=default,sql-admin \
--initialization-actions=gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh \
--properties=hive:hive.metastore.warehouse.dir=$bucket/hive-warehouse \
--metadata=enable-cloud-sql-hive-metastore=false \
--metadata=additional-cloud-sql-instances=$instance_name=tcp:3306 \
--master-machine-type n1-standard-1 \
--master-boot-disk-size 20 \
--num-workers 2 \
--worker-machine-type n1-standard-2 \
--worker-boot-disk-size 20 \
--image-version 1.2 &&
gcloud dataproc workflow-templates add-job hadoop \
--step-id=customers_564456778 \
--region="asia-south1" \
--workflow-template=$template_name \
--class=org.apache.sqoop.Sqoop \
--jars=$bucket/sqoop-1.4.7-hadoop260.jar,$bucket/avro-tools-1.8.2.jar,$bucket/mysql-connector-java-5.1.48.jar \
-- import -Dmapreduce.job.user.classpath.first=true \
--driver com.mysql.jdbc.Driver \
--username=test \
--password=1234 \
--query "select * from everlytics.customers where customerNumber>0 and \$CONDITIONS" \
--target-dir $bucket/$table_name \
--split-by customerNumber -m 2
Mysql version-5.6
Please advise if I am doing it correctly.