0

I have some data stored in txt file in the format like this:

aaaa----bbbb
cccc----dddd
...

However, the file may contains some redundant data in invalid format like this:

eeee----ffff----
hhhh
...

And now I want to import the data to a PostgreSQL table which has two text columns, and ignore the invalid lines at the same time. So I write a function below imitating the answer here.

 1  create or replace function import_data(in filepath text, out my_count integer) returns integer as
 2  $BODY$
 3  declare
 4          content text;
 5          oid integer;
 6          it text;
 7  begin
 8          oid := lo_import(filepath);
 9          content := encode(lo_get(oid, 0, 300000000), 'escape');
10  
11          content := replace(content, ',', '\,');
12          content := replace(content, '----', ',');
13          content := trim(content, E'\n');
14  
15          --begin
16          --      insert into public.mytable select (E'(\'' || regexp_split_to_table(content, E'\n') || E'\')')::public.mytable;
17          --exception when others then
18          --end;
19  
20          for it in select('(' || regexp_split_to_table(content, E'\n') || ')')::public.mytable loop
21                  begin
22                          insert into public.mytable select (it);
23                  exception when others then
24                          raise notice 'error occurred when importing file: %s', filepath;
25                  end;
26          end loop;
27  
28          perform lo_unlink(oid);
29  
30          get diagnostics my_count = ROW_COUNT;
31  end;
32  $BODY$
33          LANGUAGE plpgsql VOLATILE;

if I use the code commented from line 15 to line 18, it seems that PostgreSQL can catch the too_many_columns exception correctly. However, if I use the code from line 20 to line 26, PostgreSQL cannot catch the exception when too_many_columns occurs:

=> select import_data('/run/media/test.rec');
ERROR:  malformed record literal: "(eeee,ffff,)"
DETAIL:  Too many columns.
CONTEXT:  PL/pgSQL function import_data(text) line 20 at FOR over SELECT rows

The reason why I use the code from line 20 to line 26 is that I want to keep the records inserted before remained in the database but not rolled back as explained here:

When an error is caught by an EXCEPTION clause, the local variables of the PL/pgSQL function remain as they were when the error occurred, but all changes to persistent database state within the block are rolled back.

The first question is how can I ignore the exceptions correctly?

And the second question is why there is a too_many_columns error in PostgreSQL but there isn't a too_few_columns error?

Community
  • 1
  • 1
zsrkmyn
  • 547
  • 1
  • 5
  • 20
  • Perhaps you can, instead, use `COPY table_name FROM PROGRAM 'command'`, where `command` is a program or script (perl, python, or whatever you're skill in and that is suitable for easily processing and formatting a text file, and outputting CSV or TSV to STDOUT). – Ezequiel Tolnay Jun 09 '16 at 08:31
  • That's because the exception occurs in line 20 (in `select` statement, while converting result of `regexp_split_to_table` to `mytable` datatype), but you catch exceptions in line 22. One of possible solutions is to write a set-returning function to filter out malformed lines. – Egor Rogov Jun 09 '16 at 08:54
  • @EgorRogov Oh! I see! I have ever thought the `CONTEXT: PL/pgSQL function import_data(text) line 20 at FOR over SELECT rows` meant that the error occurs in FOR loop. Thanks a lot! – zsrkmyn Jun 09 '16 at 10:24
  • 1
    @ZiggyCrueltyfreeZeitgeister That's awesome! It is what I want! Thanks! – zsrkmyn Jun 09 '16 at 10:29

0 Answers0