2

So I have a table with a lot of columns and rows and I need to filter it so columns with null values get droped.

In python I can do that this way(small example) :


df = pd.DataFrame({'col1': [1, None,3], 'col2': [3, 4,5],'col3': [3, 4,None]})

df.dropna(axis=1,how='any')

But I just have no clue how to do it in postgres.

I could not find an answer to that problem. I did find some answers but they use the column names which I just can't use because I have too many.

Thanks!

ShayHa
  • 402
  • 1
  • 5
  • 16
  • Does this help? https://stackoverflow.com/questions/31034847/how-to-find-all-rows-with-a-null-value-in-any-column-using-postgresql/31035052 – richyen Dec 23 '19 at 17:18
  • Thank you @richyen but it did not help to my case. – ShayHa Dec 23 '19 at 17:31
  • It was worth a try :D -- could you explain why it didn't work for you? I guess you want a query to actually drop the columns? If that's the case, I don't think there's any DDL that will do it--you would have to script it – richyen Dec 23 '19 at 17:35
  • Umm it didnt work for the reason I got the table as I could do select * from my_table, gave me the same results @richyen – ShayHa Dec 23 '19 at 17:47
  • 1
    Ah I see, that link gives all ROWS with nulls--you want all COLUMNS with nulls. I believe Postgres doesn't have any facility like that AFAIK, unless you write a `plpgsql` function to do it – richyen Dec 23 '19 at 17:54
  • Also, when you say "...so columns with null values get dropped" -- do you mean "omitted"? Or are you trying to clean up your schema to actually `DROP COLUMN` on your table(s)? – richyen Dec 23 '19 at 19:53
  • Trying to clean my table to find only the mandatory columns that have values in all rows – ShayHa Dec 24 '19 at 06:24

3 Answers3

2

I can't claim it will break any speed records, but it will do what you ask. No dynamic SQL or user-defined functions necessary.

SELECT t.*
FROM your_table as t
-- If nulls are present, these will not be equal
WHERE to_jsonb(t) = jsonb_strip_nulls(to_jsonb(t))

If performance becomes a real concern such as having to run this query many times, you could create an expression index for it. However, I would recommend normalizing your database's data model if that's the case. You may just be papering over structural defects.

CREATE INDEX nulls_detected
  ON your_table (to_jsonb(your_table) = jsonb_strip_nulls(to_jsonb(your_table)));

Further optimizations could probably be found using a bloom filter for your index.


Here's an example of this in action:

CREATE TABLE null_example (
  id serial PRIMARY KEY,
  col1 int,
  col2 text,
  col3 boolean
);

INSERT INTO null_example (col1, col2, col3) VALUES
(1, 'test1', true),
(NULL, 'test2', false),
(3, NULL, true),
(4, 'test4', NULL),
(5, 'test5', false);

Now if you run the following…

SELECT t.*
FROM null_example AS t
WHERE to_jsonb(t) = jsonb_strip_nulls(to_jsonb(t));

…you get the following output. Any rows that contain NULL column values have been omitted.

id | col1 | col2  | col3
---+------+-------+------
1  | 1    | test1 | t
5  | 5    | test5 | f

If you are trying to target columns for removal such as from an ALTER TABLE … DROP COLUMN statement, the following query can help you along the way as well.

SELECT results.key, count(*), array_agg(t.id) AS affected_ids
FROM null_example AS t
  CROSS JOIN LATERAL jsonb_each(to_jsonb(t)) AS results(key, value)
WHERE results.value = 'null'::jsonb
GROUP BY results.key

This returns:

 key | count | affected_ids
-----+-------+--------------
col2 | 1     | {3}
col3 | 1     | {4}
col1 | 1     | {2}
Miles Elam
  • 1,440
  • 11
  • 19
  • Looks interesting--can you demonstrate output? Tried it on my end, but it's not giving me the results that I would've expected – richyen Dec 23 '19 at 19:41
  • @richyen Added example – Miles Elam Dec 23 '19 at 19:50
  • Ah, I see now that I misunderstood the question. – Miles Elam Dec 23 '19 at 19:53
  • 1
    Well, let's check with OP--i added a comment on the question – richyen Dec 23 '19 at 19:54
  • Added a query to find any columns that contain NULLs. This could be passed to a plpgsql loop for dynamic SQL alter table statements, though as richyen pointed out, including the DROP statement has safety concerns. – Miles Elam Dec 23 '19 at 20:13
  • Thats really interesting @MilesElam! Could you tell me please what led you to think about that where clause? I'd never think about such thing. – ShayHa Dec 23 '19 at 20:20
  • 1
    I implemented a temporal table system (aka system-versioned tables) for my company in PostgreSQL. Because each table is different and because expecting folks to write custom temporal logic and functions for each table would be a maintenance nightmare, I figured out how to convert row data to and from JSON. This allowed for a more generic solution to the problem. A specific optimization I made was to convert OLD and NEW in a trigger to jsonb to detect if changes were made; the most efficient update is the one you don't have to make. It was superficially similar to this. – Miles Elam Dec 23 '19 at 20:27
1

There isn't a facility to do this in Postgres, so you'll have to build-your-own function. One way to approach this is to retrieve all the column names with:

select attname
  from pg_attribute
 where attrelid = (select oid from pg_class where relname = 'foo')
   and attnum > 0;

Then, loop through the attnames and perform:

select count(*)
  from foo
 where <attname> is null;

A possible function may look like;

postgres=# create table foo (col1 int, col2 int, col3 int);
CREATE TABLE
postgres=# insert into foo values (1, null, null);
INSERT 0 1
postgres=# insert into foo values (1, 1, null);   
INSERT 0 1
postgres=# insert into foo values (1, null, 1);
INSERT 0 1
postgres=# CREATE OR REPLACE function find_null_cols() RETURNS setof record
AS
$$
declare
rr record;
r record;
c int;
begin
for r in (select attname
  from pg_attribute
 where attrelid = (select oid from pg_class where relname = 'foo')
   and attnum > 0)
loop
  execute format ('select count(*) from foo where %s is null', r.attname) into c;
  if c > 0 then
    select r.attname::text, c into rr;
    return next rr;
  end if;

end loop;
return;
end
$$
LANGUAGE plpgsql;
CREATE FUNCTION
postgres=# select * from find_null_cols() as x(colname text, nullcount_rows int);
 colname | nullcount_rows 
---------+----------------
 col2    |              2
 col3    |              2
(2 rows)

I won't add the DROP COLUMN commands for your safety ;)

richyen
  • 8,114
  • 4
  • 13
  • 28
0

dynamic sql in postgresql

You can use 'execute statement'. Get the names of the columns with

SELECT *
   FROM information_schema.columns
  WHERE table_schema = 'your_schema'
    AND table_name = 'your_table'
      ;

and for each one of them, make a statement 'delete from' your view if the column has any null value

DECLARE
    column_name text;
BEGIN
    FOR column_name IN SELECT FROM information_schema.columns WHERE table_schema = 'your_schema' AND table_name = 'your_table' LOOP

        ...
        EXECUTE 'delete from ... where(SELECT ....' || column_name || '.....)';
        ...

    END LOOP;
END;
Arcadio
  • 116
  • 12
  • How does this indicate which column has null? This shows me table_catelog, table_schema and more columns, not familiar with that – ShayHa Dec 23 '19 at 17:46
  • 1
    You can read, for example, https://hub.packtpub.com/how-to-implement-dynamic-sql-in-postgresql-10/ You have to know how to program functions in postgresql – Arcadio Dec 23 '19 at 17:49
  • Thanks, will read. So in order to get my desired outcome I must create a function? – ShayHa Dec 23 '19 at 17:53
  • Dynamic SQL is very powerful but also bypasses a lot of safety. Use with extreme caution, especially if it is driven by user-provided parameters. – Miles Elam Dec 23 '19 at 19:32