1

Loading flat file to postgres table.I need to do few transformations while reading the file and load it.

Like
-->Check for characters, if it is present, default some value. Reg_Exp can be used in oracle. How the functions can be called in below syntax?
-->TO_DATE function from text format
-->Check for Null and defaulting some value
-->Trim functions
-->Only few columns from source file should be loaded
-->Defaulting values, say for instance, source file has only 3 columns. But we need to load 4 columns. One column should be defaulted with some value

LOAD CSV
    FROM 'filename'
    INTO postgresql://role@host:port/database_name?tablename
        TARGET COLUMNS  
            (  
                alphanm,alphnumnn,nmrc,dte
            ) 

    WITH truncate,  
         skip header = 0,  
         fields optionally enclosed by '"',  
         fields escaped by double-quote,  
         fields terminated by '|',
         batch rows = 100,
         batch size = 1MB,     
         batch concurrency = 64

    SET work_mem to '32 MB', maintenance_work_mem to '64 MB';

Kindly help me, how this can be accomplished used pgloader?

Thanks

Raja
  • 507
  • 1
  • 6
  • 24
  • I might have some solution but I need to add into post a few lines from import file. Header + some 1 or 2 following lines. In pgloader you can use either hints for columns or data transformations. Plus add which version of pgloader you are using. – JosMac Dec 23 '16 at 09:46
  • Thanks Jos. Thought to use HAVING FIELDS, but my file is without header. – Raja Dec 23 '16 at 15:04
  • Like HAVING FIELDS (1, 2, 3, 4 [date format 'YYYYMMDD']). Tried (a, b, c, d [date format 'YYYYMMDD']). But it doesn't work. Can you please provide some syntax sample. I'm using version 9.5 – Raja Dec 23 '16 at 15:08
  • I think 9.5 is postgresql version :-) please try `pgloader --version` and please post at least 1 line from import file even with fake data - so I can see structure – JosMac Dec 23 '16 at 15:27
  • Sorry pgloader version "3.3.2" compiled with SBCL 1.2.8-1.el7 Doing kind of POC, to implement in real time work. Sample data from file: raj|B|0.5|20170101|ABCD Need to load only first,second,third and fourth column; Table has three column, third column should be defaulted with some value. Table structure: A B C-numeric D-date E-(Need to add default value) – Raja Dec 23 '16 at 17:32
  • Not formatted well. So im pasting the post again. pgloader version "3.3.2"-compiled with SBCL 1.2.8-1.el7 // Doing kind of POC, to implement in real time work // In real work, columns supposed to be in 100.Dummy data from file: raj|B|0.5|20170101|ABCD // Need to load only first,second,third and fourth column; Table has five column, fifth column should be defaulted with some value.// Table structure: A,B,C-numeric,D-date,E-(Need to add default value) – Raja Dec 23 '16 at 17:40
  • Thanks, I had vacations - I will look at it ASAP – JosMac Jan 02 '17 at 09:43
  • Sorry for delay - too much work after vacations. I did not have time to look at pgloader so far but I would like to give you a hint about another solution - foreign data wrapper `file_fdw` - see here http://postgresql.freeideas.cz/file_fdw-extension-small-hints/ or in docu - https://www.postgresql.org/docs/current/static/file-fdw.html This is a very useful and allows you to move all the conversion logic into PostgreSQL itself. And there is no problem when file does not have a header. – JosMac Jan 04 '17 at 14:57
  • Sorry. No problem Jos. I tried pg_bulkload, it worked. Thanks – Raja Jan 18 '17 at 09:15

1 Answers1

4

Here's a self-contained test case for pgloader that reproduces your use-case, as best as I could understand it:

/*  
  Sorry pgloader version "3.3.2" compiled with SBCL 1.2.8-1.el7 Doing kind
  of POC, to implement in real time work. Sample data from file:
  raj|B|0.5|20170101|ABCD Need to load only first,second,third and fourth
  column; Table has three column, third column should be defaulted with some
  value. Table structure: A B C-numeric D-date E-(Need to add default value)
*/

LOAD CSV
     FROM inline
        (
            alphanm,
            alphnumnn,
            nmrc,
            dte [date format 'YYYYMMDD'],
            other
        )
     INTO postgresql:///pgloader?so.raja
        (
            alphanm,
            alphnumnn,
            nmrc,
            dte,
            col text using "constant value"
        )

     WITH truncate,
          fields optionally enclosed by '"',
          fields escaped by double-quote,
          fields terminated by '|'

      SET work_mem to '12MB',
          standard_conforming_strings to 'on'

   BEFORE LOAD DO
   $$ drop table if exists so.raja; $$,
   $$ create table so.raja (
       alphanm   text,
       alphnumnn text,
       nmrc      numeric,
       dte       date,
       col       text
      );
   $$;

raj|B|0.5|20170101|ABCD

Now here's the extract from running the pgloader command:

$ pgloader 41287414.load 
2017-08-15T12:35:10.258000+02:00 LOG Main logs in '/private/tmp/pgloader/pgloader.log'
2017-08-15T12:35:10.261000+02:00 LOG Data errors in '/private/tmp/pgloader/'
2017-08-15T12:35:10.261000+02:00 LOG Parsing commands from file #P"/Users/dim/dev/temp/pgloader-issues/stackoverflow/41287414.load"
2017-08-15T12:35:10.422000+02:00 LOG report summary reset
             table name       read   imported     errors      total time
-----------------------  ---------  ---------  ---------  --------------
                  fetch          0          0          0          0.007s 
            before load          2          2          0          0.016s 
-----------------------  ---------  ---------  ---------  --------------
                so.raja          1          1          0          0.019s 
-----------------------  ---------  ---------  ---------  --------------
        Files Processed          1          1          0          0.021s 
COPY Threads Completion          2          2          0          0.038s 
-----------------------  ---------  ---------  ---------  --------------
      Total import time          1          1          0          0.426s 

And here's the content of the target table when the command is done:

$ psql -q -d pgloader -c 'table so.raja'
 alphanm │ alphnumnn │ nmrc │    dte     │      col       
═════════╪═══════════╪══════╪════════════╪════════════════
 raj     │ B         │  0.5 │ 2017-01-01 │ constant value
(1 row)