I have some data stored in txt file in the format like this:
aaaa----bbbb
cccc----dddd
...
However, the file may contains some redundant data in invalid format like this:
eeee----ffff----
hhhh
...
And now I want to import the data to a PostgreSQL table which has two text columns, and ignore the invalid lines at the same time. So I write a function below imitating the answer here.
1 create or replace function import_data(in filepath text, out my_count integer) returns integer as
2 $BODY$
3 declare
4 content text;
5 oid integer;
6 it text;
7 begin
8 oid := lo_import(filepath);
9 content := encode(lo_get(oid, 0, 300000000), 'escape');
10
11 content := replace(content, ',', '\,');
12 content := replace(content, '----', ',');
13 content := trim(content, E'\n');
14
15 --begin
16 -- insert into public.mytable select (E'(\'' || regexp_split_to_table(content, E'\n') || E'\')')::public.mytable;
17 --exception when others then
18 --end;
19
20 for it in select('(' || regexp_split_to_table(content, E'\n') || ')')::public.mytable loop
21 begin
22 insert into public.mytable select (it);
23 exception when others then
24 raise notice 'error occurred when importing file: %s', filepath;
25 end;
26 end loop;
27
28 perform lo_unlink(oid);
29
30 get diagnostics my_count = ROW_COUNT;
31 end;
32 $BODY$
33 LANGUAGE plpgsql VOLATILE;
if I use the code commented from line 15 to line 18, it seems that PostgreSQL can catch the too_many_columns
exception correctly. However, if I use the code from line 20 to line 26, PostgreSQL cannot catch the exception when too_many_columns
occurs:
=> select import_data('/run/media/test.rec');
ERROR: malformed record literal: "(eeee,ffff,)"
DETAIL: Too many columns.
CONTEXT: PL/pgSQL function import_data(text) line 20 at FOR over SELECT rows
The reason why I use the code from line 20 to line 26 is that I want to keep the records inserted before remained in the database but not rolled back as explained here:
When an error is caught by an EXCEPTION clause, the local variables of the PL/pgSQL function remain as they were when the error occurred, but all changes to persistent database state within the block are rolled back.
The first question is how can I ignore the exceptions correctly?
And the second question is why there is a too_many_columns
error in PostgreSQL but there isn't a too_few_columns
error?