0

I don't know where I am going wrong, but my Sqoop export command from HDFS to MySQL fails everytime.

sqoop export --connect "jdbc:mysql://quickstart.cloudera:3306/streaming" 
--username root --password cloudera --table pd_count --update-key id 
--update-mode allowinsert  --export-dir /user/cloudera/input/* -m 1 --batch

There is only 1 folder in the export directory, and it contains 3 files namely,

  1. part-m-00000
  2. part-m-00001
  3. part-m-00002

I have updated last file so as to learn about --update argument. But, the job fails, no matter how many permutations I try.

  1. I export data into MySQL without incremental update and data gets exported successfully.
  2. I import the data into HDFS with "incremental append", it's a sucess.
  3. But when I try to export data into MySQL with "update-key", and "update mode", it doesn't transfer and fails.

Above mentioned command was the last used command.

Following is the recent error log at this link , Please do help me.

Thanks in advance.

  • 1
    what error are you getting? – शेखर Dec 14 '17 at 10:43
  • It fails with different reason everytime. One time mapreduce job would run for more than 30 minutes. Then I am forced to shutdown, because in normal cases, it takes around 1 minute for job to complete. Another time, it displays an error, "Export Job Failed". –  Dec 15 '17 at 04:43
  • I have tried by changing the legal value of --update-mode to "updateonly" from "allowinsert". Even that didn't help me in any way. –  Dec 15 '17 at 04:47
  • Please help with the recent error logs here. – Subash Dec 15 '17 at 08:57
  • Did you try checking the permissions-grant all privileges on streaming.* to ''@quickstart.cloudera ; And also the tables should match each other – Subash Dec 15 '17 at 09:18
  • @SubashKonar Sorry for late reply. I have edited the post with the link to error log. I have tried as much I could to paste the error log on this site, but the site doesn't allow me to do it, as the error log contains code. Please do help me. –  Dec 16 '17 at 05:36
  • Not able to access the link.Could you paste it here? – Subash Dec 18 '17 at 05:27
  • @SubashKonar Sorry that you couldn't access the link. I encrypted the file using password as a habit. Sorry. This is another [link](https://ghostbin.com/paste/wmy9p). This doesn't have any password protection. –  Dec 18 '17 at 05:33
  • Could you clear the folder and run the sqoop export again. – Subash Dec 18 '17 at 05:54
  • Did you mean about table in MySQL? No data gets transferred when it fails. If you mean't about folder in HDFS, I try with new copy of same data everytime. For me it happens this way, 1. I export data into MySQL without incremental update and data gets exported successfully. 2. I import the data into HDFS with "incremental append", it's a sucess. 3. But when I try to export data into MySQL with "update-key", and "update mode", it doesn't transfer and fails. –  Dec 18 '17 at 05:59

1 Answers1

0

Ok..I was assuming something different.Could you try using the below options

  1. Use --verbose in the export once again for extended logs.
  2. You can look at the application logs from the failed application. To fetch them run the following command as the user who ran the Sqoop command-yarn logs -applicationId application_1513399439223_0001 > app_logs.txt.
  3. It seems you didnt add --input-fields-terminated-by.

Updating the Answer as per your latest comment

I see you have killed the job.It might be related to performance.Please try tuning the below and run the sqoop again:

  • Set the number of mappers to 4 -m 4
  • Insert the data in batches --batch
  • Use the property sqoop.export.records.per.statement to specify the number of records that will be used in each insert statement sqoop export -Dsqoop.export.records.per.statement=100 --connect
  • Finally,specify how many rows will be inserted per transaction with the sqoop.export.statements.per.transaction property. sqoop export -Dsqoop.export.statements.per.transaction=100 --connect

Please provide the yarn logs and what is the volume of data?

Subash
  • 887
  • 1
  • 8
  • 19
  • I have used "verbose" to generate extended logs. Here is the [log](https://ghostbin.com/paste/f6wra). `--input-fields-terminated-by` did not help me. –  Dec 19 '17 at 06:17
  • After using all the commands you suggested, there was increase in performance, and execution time was reduced to 7m19sec. Earlier it used to take more than 30minutes. But, the sad part is job again failed. Volume of the data is 64MB. My Sqoop Job is using MapReduce as execution engine, and not YARN. So I guess I can't provide YARN logs. –  Dec 20 '17 at 05:53
  • OK..Are the logs still the same?Can you check the logs from here- /var/logs/hadoop-* or /var/logs/mapred-* or localhost:50030/jobtracker.jsp otherwise , it will be hard to know the specific reason for failure.Please give a try by running low memory file(10 mb or 5 mb) – Subash Dec 20 '17 at 10:38
  • I guess the problem is with data size and "--update-key updateonly" command. Because, I have created another table named "test" containing first 10 rows of "pd_count". I then imported into HDFS and exported into another table "test_exp". Next, using "--incremental append" command, I imported another 6 rows into HDFS. And, now when I export into MySQL using "--update-key updateonly" command, console displays a message as "Exported 16 records", but it did not. –  Dec 21 '17 at 05:21
  • When I used "-update-mode allowinsert" command, all the 16 records were exported and added up to previous records. And I had 10 duplicate rows. And, regarding logs, there isn't any folder named "/var/log/mapred-*", and there any many folders "/var/log/hadoop-*", which one should I check? Is it hdfs, yarn, httpfs, kms, mapreduce, –  Dec 21 '17 at 05:26
  • Please provide the logs related to sqoop folder.Could you refer this link for export of smaller data . https://acadgild.com/blog/exporting-files-hdfs-mysql-using-sqoop/ – Subash Dec 21 '17 at 10:22
  • I referred to the link provided. It's exactly what I have done. Using "--update-key updateonly" command, I get statement as all records are exported, but in reality they are updated in MySQL. Using "--incremental append" command, not only exports remaining part of data, but also exports records that are already present in MySQL. And regarding logs, there is no file in Sqoop folder. –  Dec 22 '17 at 06:12
  • Great!That means export is happening properly but you are having duplicates.Please check your table schema whether you have declared column 'id' as primary key in mysql table for the upsert to happen.And I would suggest,please go through the basics of sqoop here-https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html – Subash Dec 22 '17 at 07:40
  • It's not primary key. I have intentionally not declared it as primary key, because for the same id there will be multiple inputs. Like for same id, different count value will be added. –  Dec 22 '17 at 09:46
  • And, sorry there was typo in my previous comment. Using "--update-key updateonly" command, I get statement as all records are exported, but in reality they are not updated in MySQL –  Dec 22 '17 at 09:47
  • Please go through the https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_failed_exports and stack overflow solution https://stackoverflow.com/questions/39137254/sqoop-export-inserting-duplicate-entries need to mention a primary key as per your comment on dec 21_5:26 – Subash Jan 02 '18 at 17:50