4

I use the below code in Power Query to remove empty columns from a table with a lot of columns. It runs very slow and I am looking for a way to speed it up. Basically if all the entries in a given column is null, the column should be removed

//Remove Empty Columns
ColumnstoKeep = List.Select(
        Table.ColumnNames(#"Expanded"),each List.NonNullCount(Table.Column(#"Expanded",_)) <>0 ),

RemoveEmptyColumns = Table.SelectColumns(#"Expanded",ColumnstoKeep),
Rafadan
  • 97
  • 6
  • Do you have to check whether the entire column is null or can you just check the first row and remove null columns that way. Your existing code will be checking all the rows in your dataset... – Davide Bacci Feb 22 '23 at 09:34
  • I need to check all rows unfortunately. – Rafadan Feb 22 '23 at 09:40
  • I doubt there is any way to speed this up then. The code is doing the bare minimum to meet your requirements which is check a full column of values to see if any are not null. You could try buffering the table before the null check. That would be my next option. – Davide Bacci Feb 22 '23 at 09:43
  • I am just learning the M language so I was wondering, if somehow converting to a different data type or searching using a different function could speed it up. Can you suggest the code sniplet so that I can try buffering? Thanks – Rafadan Feb 22 '23 at 09:47
  • Also, maybe aborting the search in a given column, if a non-null value is encountered could cut back on the time. But I am not sure how easy that will be to implement. – Rafadan Feb 22 '23 at 09:48
  • For my specific dataset the following algorithm would be quite efficient: 1. Scan Columns from left to right 2. If item is non-null, add column name to a list, go to the next column 3.if all rows in a column is empty, stop search 4. select only columns in the list – Rafadan Feb 22 '23 at 09:53
  • I'll add the buffer as an example. Your second option may be viable but I suspect M has the optimisation done internally already. – Davide Bacci Feb 22 '23 at 09:53
  • How many rows are in your table? How many columns? What is the data source? – Davide Bacci Feb 22 '23 at 09:56
  • The table is dynamic, currently I have ~500 columns and 30 rows and it takes ~10 minutes to run :( I will have many more rows in the future. – Rafadan Feb 22 '23 at 09:58
  • In the above comment, I mean that in this specific dataset, if a column is empty, all the columns to the right will also be empty. – Rafadan Feb 22 '23 at 09:59
  • I see - in that case, there will be a way to optimise definitely. If you only need the position of first null column and can then exclude all others. Try the buffer and let me know how you get on with that first. – Davide Bacci Feb 22 '23 at 10:04
  • Buffer is definetely an improvement, runs ~10x faster! – Rafadan Feb 22 '23 at 10:06
  • Lol - buffer to the rescue. See how you get on with bigger datasets otherwise we can craft a list.generate to get the column names. – Davide Bacci Feb 22 '23 at 10:07

3 Answers3

4
//Remove Empty Columns
bufferedTable = Table.Buffer( #"Expanded"),
ColumnstoKeep = List.Select(
        Table.ColumnNames(#"bufferedTable"),each List.NonNullCount(Table.Column(#"bufferedTable",_)) <>0 ),

RemoveEmptyColumns = Table.SelectColumns(#"bufferedTable",ColumnstoKeep),
Davide Bacci
  • 16,647
  • 3
  • 10
  • 36
3

Here is another answer which will stop once the first non null column is reached.

enter image description here

enter image description here

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTIyAhIoKFYHKoNNAipujE0CUzwWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t, Column5 = _t, Column6 = _t, Column7 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}, {"Column2", Int64.Type}, {"Column3", Int64.Type}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}}),
    sampleTable = Table.ReplaceValue(#"Changed Type","",null,Replacer.ReplaceValue,{"Column4", "Column5", "Column6", "Column7"}),
    columnNames = Table.ColumnNames( sampleTable),
    Custom1 = List.Generate(
        ()=> [name = columnNames{index}, index = 0] ,
        each  [index] < List.Count(columnNames) and List.NonNullCount(Table.Column(sampleTable,[name])) <>0,
        each [name = columnNames{index}, index = [index]+ 1] ,
        each [name]
        )
in
    Custom1
Davide Bacci
  • 16,647
  • 3
  • 10
  • 36
0

Those are some great suggestions already. A method that I've used before is:

let
  tbl = Source, 
  Headers = Table.ColumnNames( tbl ), 
  Result = 
    Table.SelectColumns(
      tbl, 
      List.Select( 
          Headers, 
          each List.MatchesAny( Table.Column( tbl, _ ), each _ <> null ) )
    )
in
  Result

I haven't tested the performance, but worth a try. It uses List.MatchesAny to find any column that have at least 1 value that is not null (see https://powerquery.how/list-matchesany/)

Cheers, Rick