1

I'm currently using LOAD DATA LOCAL INFILE to import bulk financial data for processing.

The values within the file are stored to an accuracy of 30 decimal places, e.g.

125.154821679413246187945612314846

When this imported however, the data is always truncated to 10 decimal places, with trailing zeros, e.g.

125.154821679200000000000000000000

The column is set up as follows :-

USDPayable DECIMAL (33,30)

EDIT:

Table creation script:

CREATE TABLE IF NOT EXISTS dump
                        (
                            SaleID INT NOT NULL AUTO_INCREMENT,
                            Country VARCHAR(8),
                            Label VARCHAR(20),
                            Product VARCHAR(5),
                            URI VARCHAR(20),
                            UPC VARCHAR(20),
                            EAN VARCHAR(20),
                            ISRC VARCHAR(20),
                            TrackName VARCHAR(28),
                            ArtistName VARCHAR(64),
                            ComposerName VARCHAR(64),
                            AlbumName VARCHAR(54),
                            Quantity INT(10),
                            USDPayable decimal(33,30),
                            PRIMARY KEY (SaleID)
                        );

Load data script:

LOAD DATA 
  LOCAL INFILE '<my file>'
    INTO TABLE dump
        IGNORE 3 LINES
              (Country, Label, Product, URI, UPC, EAN, ISRC, 
               TrackName, ArtistName, ComposerName, AlbumName,
               Quantity, USDPayable)

Input data sample :

BE Label1 product code 00cflHmwefweidJA barcode ISRC ......... 1 0.003872402660862401479116078884
US Label2 product code 00cflHmtyfweidJA barcode ISRC ..........1 0.002220695558213356018688393633
BE Label2 product code 00cflHmwefweidJA barcode ISRC ..........2 0.002137613958913373918420510406
NO Label3 product code 00cflHmjkfweidJA barcode ISRC ..........3 0.02264616748080050066133527663
DE Label4 product code 00cflHmwefweidJA barcode ISRC ..........1 0.003018216435957714580945696704
CO Label5 product code 00cflHmzxfweidJA barcode ISRC ..........1 0.0004178407583000146349569881848
CA Label6 product code 00cflHmwefpoidJA barcode ISRC ..........2 0.01385864190292964399955986534
CA Label7 product code 00cflHmwefmnidJA barcode ISRC ..........1 0.003270121556795672746439239972
IS Label7 product code 00cflHmwefweidJA barcode ISRC ..........8 0.05702767311942350853930831032
TR Label7 product code 00cf09poefweidJA barcode ISRC ..........4 0.009839895102632677068730014884

UPDATE

After a while, I decided to bite the bullet - and stream the file line-by-line using PHP ready to process the values before inserting into the table. using fgets(), this value is truncated here also... It's almost as if mysql and php think that the truncated values are the literal values in the file as it's being read. Very confusing

O. Jones
  • 103,626
  • 17
  • 118
  • 172
locksem
  • 315
  • 3
  • 16
  • try number_format() – Bilal Ahmed May 18 '17 at 10:33
  • @BilalAhmed Could you expand on this? If you mean the php function, I was hoping to avoid any pre-processing (which explains the use of LOAD DATA LOCAL INFILE), as the file contains many hundreds of thousands of rows – locksem May 18 '17 at 10:49
  • Please consider an [edit] to your question to show us a few lines of your input file, and the output of `SHOW CREATE TABLE whatever` for the table you're trying to load. It looks like MySQL is reading numbers from your `INFILE`, converting them to its internal native number format -- double-precision floating point -- then converting them to decimal, thereby losing precision. You might try wrapping those numbers in quotation marks so `LOAD DATA INFILE` will assume they're text strings, not numbers, on the way in. – O. Jones May 18 '17 at 11:05
  • Thanks @O.Jones I've edited the question to include the table creation script. In regards to enclosing the decimal in quotes, the file is enormous, so the dump straight to db was an attempt here to avoid any pre-processing that required loading the entire file into memory. – locksem May 18 '17 at 11:44
  • I still think it would be helpful if you showed a dozen lines or so of your input file, or some fake data in the format of your file. – O. Jones May 18 '17 at 11:50
  • My apologies @O.Jones , I had forgot to add this sooner. The data is heavily redacted as you can tell, but I hope that this is enough to go on. It's a TAB delimited file (mysql's default delimiter value IIRC) – locksem May 18 '17 at 12:11

1 Answers1

1

php is a weakly typed language. If it sees something it assumes is a decimal number, it handles it as a float -- a ieee 754 double precision approximate number -- by default. The same goes for internal numeric processing in MySQL.

ieee 754 double precision isn't precise enough for your royalty computations (pity the poor musicians; you can't buy anything with a millionth of a penny).

So you're right to declare a high-precision decimal type for your table. But you have to trick MySQL into handling your numbers as if they were strings, and not take its favorite ieee 754 shortcut (or hack, we might say).

You could try something like this:

  LOAD DATA 
LOCAL INFILE 'C:\\yadda\\yadda\\sample.tsv'
  INTO TABLE dump
             (Country, Label, Product, URI, UPC, ISRC, Quantity, @USDPayable)
         SET USDPayable = CAST(@USDPayable AS DECIMAL(33,30) );

This will arrange to process your tiny fractions of dollars as a string on input, then convert it to the decimal format you need using a SET clause.

Notice how the list of columns in parentheses assigns that value from each column to @USDPayable.

That worked for me. If it doesn't work for you you probably should consider moving to a later version of MySQL.

Notice you'll have to change the list of columns for your real data. You didn't furnish some columns in your sample tsv file.

Be careful with weak data typing with this kind of money data. Please double check that your sums are coming out right. You may need to switch to a strongly typed language.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • 1
    I'm an easy guy to ask for help by people paying royalties to musicians. Lots of musicians among my family and friends. – O. Jones May 18 '17 at 23:25
  • Thanks ever so much @O.Jones this seems to have done the trick nicely - and lots of extra helpful info. It has made me rather dubious about continuing to rely on PHP for this sort of thing! – locksem May 19 '17 at 09:44