0

My problem is that I have a dataframe which has null values, but these null values are filled with another column of the same data frame, then I would like to know how to take that column and put the information of the other column to fill the missing data. I'm using deepnote link: https://deepnote.com

For example:

Column A Column B
Cell 1 Cell 2
NULL Cell 4

My desired output:

Column A
Cell 1
Cell 4

I think it should be with sub queries and using some WHERE, any ideas?

LMOTRU
  • 11
  • 3
    Looks like you need a function - similar to `coalesce(colA,colB)` which returns the first non-null value from the parameters, not quite sure how that applies in deepnnote however. Seems that if you are using a deepnote "sql block" then you should be able to use `coalesce()` see: https://deepnote.com/docs/sql-cells – Paul Maxwell Feb 14 '23 at 03:36
  • Hey, do you want a solution in clean Python or in a combination of Python and some SQL on top of that (DuckDB)? There is many solutions and you mentioning "where" seems like you're trying to work with SQL? Also, your link to Deepnote is not linking to your project, but just to a landing page of Deepnote. Can you possibly update your question with the right link or share the exact code you're working with? It will help greatly with answering this correctly. – Jakub Žitný Feb 15 '23 at 20:34

1 Answers1

2

thanks for the question and welcome to StackOverflow.

It is not 100% clear which direction you need your solution to go, so I am offering two alternatives which I think should get you going.

Pandas way

You seem to be working with Pandas dataframes. The usual way to work with Pandas dataframes is to use Pandas builtin functions. In this case, there is literally a function for filling null values, it's called fillna. We can use it to fill values from another column like this:

df_raw = pd.DataFrame(data={'Column A': ['Cell 1', None], 'Column B': ['Cell 2', 'Cell 4']})

# copying the original dataframe to a clean one
df_clean = df_raw.copy()

# applying the fillna to fill null values from another column
df_clean['Column A'] = df_clean['Column A'].fillna(df_clean['Column B'])

This will make your df_clean look like you need

Column A
Cell 1
Cell 4

Dataframe SQL way

You mentioned "queries" and "where" in your question which seems you might be playing with some combination of Python and SQL world. Enter DuckDB world which supports exactly this, in Deepnote we call these Dataframe SQLs.

You can query e.g. CSV files directly from these Dataframe SQL blocks, but you can also use a previously defined Dataframe.

select * from df_raw

In order to fill the null values like you are requesting, we can use standard SQL querying and a function called coalesce as Paul correctly pointed out.

select coalesce("Column A", "Column B") as "Column A" from df_raw

This will also create what you need in SQL world. In Deepnote, specifically, this will also give you a Dataframe.

Column A
Cell 1
Cell 4

Feel free to check out my project in Deepnote with these examples, and go ahead and duplicate it if you want to iterate on the code a bit. There is also plenty more alternatives, if you're in a real SQL database and want to update existing columns, you would use update statement. And if you are in a pure Python, this is of course also possible in a loop or using lambda functions.

Jakub Žitný
  • 962
  • 1
  • 9
  • 38