1

Trying to do sqoop export from HDFS to MYSQL. Getting mapper error because of different date format between input file vs MySQL. Input file have data in mm/dd/yyyy format where in SQL it is date. I guess MySQL is yyyy-mm-dd.

Because of same getting an error as:

caused by: java.lang.RuntimeException: Can't parse input data: '2/18/2019'

My limitation as the source is from different provider and we can not request them to change it. So in this situation what options do i have? Any suggestions

Ned
  • 1,378
  • 16
  • 28
Amit
  • 11
  • 2
  • Why not load the date into a character string column, then convert it to a date once it's safely in mysql? In my opinion any bulk insert is best kept as simple as possible. – MandyShaw Jul 28 '18 at 19:53
  • Yes, but then it will be double effort...file has billions of records and will loose the real power of Hadoop clusters... what do you think? – Amit Jul 29 '18 at 13:29
  • That bit is not my area of expertise, sorry. But I would normally reckon that, if you are loading data from a source over which you have no control. you need to assume you will have to do some sort of data cleansing/post-processing before you can do anything useful with it. Alternatively, it may even be that Hadoop, or your chosen analytics tool(s), can cope with processing these 'dates' in character format - that would be worth checking. I don't think I have much more to offer on this one so good luck. – MandyShaw Jul 29 '18 at 13:36

1 Answers1

-1

edit

Unfortunately this answer may not be for you. If you are using a program that you don't have control over the source for, this won't help you.

I'll leave it up only because it is a common question that I see with people new to rdbms programming.

Original answer

Why are you treating dates and times as strings? For that matter why are you building SQL for each row? On the MySql side there is a better way to handle that.

Most RDBMS support the concept of a Prepared Statement, although the implementation differs by vendor. Java had support through jdbc for all of the major vendors flavor of prepared statement, so you don't need to worry about the implementation details.

Every time you execute SQL the database engine goes through several phases before the data is applied or returned. The first and most time consuming phase, called the "prepare" phase, is to analyze the SQL string and computer the ideal access path to complete it with. 50 to 80 percent of the SQL "execution" time is spent in this "Prepare" phase.

A simple optimization is to recognize that the ideal access path in a mature database rarely varies, which allows the programmer to prepare the statement once, return a handle to the access path, then pass only the handle and it's parameters across the wire from the application to the database. This minimizes overheads of access path computation, data type conversions, and network communication while automatically protecting from SQL injection attacks and taking care of such administrivia as date formatting.

In Java, this is represented with the PreparedStatement class.

Always use prepared statements. If used properly, they will eliminate 50 to 80% of the overheads of each database call. They also allow you to choose more simply by using native java types and simply passing the value into the execution with the PS.

Using PreparedStatement also eliminates much of the need to sanitize inputs. By it's nature, you don't need to worry about special characters, apart from those the target will reject (example: dropping a character with a codeine greater than 127 into a database that was built for ASCII only on a platform that enforces character set).

If you need to take input as String, and convert to Date, use java's DateFormat class.

pojo-guy
  • 966
  • 1
  • 12
  • 39
  • Thanks and appreciate your reply. This is my command sqoop export --connect "jdbc:mysql://ip-XX-XX-XX-XXX:YYYY/dbName" --username userName --table tableName --export-dir /user/Files/file.csv -P --input-null-string '\\N' --input-null-non-string '\\N' So not sure what do you mean by PreparedStatement...Can you please advice? – Amit Jul 28 '18 at 16:58