How to convert DataFrame.append() to pandas.concat()?

Question

In pandas 1.4.0: append() was deprecated, and the docs say to use concat() instead.

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

Codeblock in question:

def generate_features(data, num_samples, mask):
    """
    The main function for generating features to train or evaluate on.
    Returns a pd.DataFrame()
    """
    logger.debug("Generating features, number of samples", num_samples)
    features = pd.DataFrame()

    for count in range(num_samples):
        row, col = get_pixel_within_mask(data, mask)
        input_vars = get_pixel_data(data, row, col)
        features = features.append(input_vars)
        print_progress(count, num_samples)

    return features

These are the two options I've tried, but did not work:

features = pd.concat([features],[input_vars])

and

pd.concat([features],[input_vars])

This is the line that is deprecated and throwing the error:

features = features.append(input_vars)

score 16 · Accepted Answer · edited Sep 12 '22 at 07:20

16

You can store the DataFrames generated in the loop in a list and concatenate them with features once you finish the loop.

In other words, replace the loop:

for count in range(num_samples):
    # .... code to produce `input_vars`
    features = features.append(input_vars)        # remove this `DataFrame.append`

with the one below:

tmp = []                                  # initialize list
for count in range(num_samples):
    # .... code to produce `input_vars`
    tmp.append(input_vars)                        # append to the list, (not DF)
features = pd.concat(tmp)                         # concatenate after loop

You can certainly concatenate in the loop but it's more efficient to do it only once.

edited Sep 12 '22 at 07:20

fantabolous

21,470
7
54
51

answered Feb 24 '22 at 21:43

From personal experience, each append can individually take almost as long as the entire concat, so the time savings by doing it once at the end can be massive. – fantabolous Sep 12 '22 at 07:27
It is very unfortunate that they are deprecating append for dataframes. With my code, creating the dataframe using the temporary list as shown here results in my code running 10X slower. – Jamie Nov 29 '22 at 13:29

ArchAngelPwn · Answer 2 · 2022-02-24T21:50:46.477

5

This will "append" the blank df and prevent errors in the future by using the concat option

features= pd.concat([features, input_vars])

However, still, without having access to actually data and data structures this would be hard to test replicate.

edited Feb 24 '22 at 21:50

answered Feb 24 '22 at 21:39

ArchAngelPwn

2,891
1
4
17

3

On the official Pandas docs for the latest release, you will see that .append() was deprecated. https://pandas.pydata.org/docs/whatsnew/v1.4.0.html They say I should use concat() instead, but I can't get it to work. I will keep exploring the pandas docs. – Stephen Stilwell Feb 24 '22 at 21:44
I updated my answer to use the concat thank you for pointing out the docs sorry if I missed them before – ArchAngelPwn Feb 24 '22 at 21:51
1

This directly fixes the op's mistake e.g. [features],[input_vars] should be [features, input_vars]. However in the case of a loop like the op, the other answer is far more efficient. – fantabolous Sep 12 '22 at 07:44

score 0 · Answer 3 · answered Feb 18 '23 at 12:11

For example, you have a list of dataframes called collector, e.g. for cryptocurrencies, and you want to harvest first rows from two particular columns from each datafarme in our 'collector'. You do as follows

pd.concat([cap[['Ticker', 'Market Cap']].iloc[:1] for cap in collector] )

score 0 · Answer 4 · answered Apr 25 '23 at 15:20

There is another unpleasant edge case here: If input_vars is a series (not a dataframe) that represents one row to be appended to features, the deprecated use of features = features.append(input_vars) works fine and adds one row to the dataframe.

But the version with concat features = pd.concat([features, input_vars]) does something different and produces lots of NaNs. To get this to work, you need to convert the series to a dataframe:

features = pd.concat([features, input_vars.to_frame().T])

See also this question: Why does concat Series to DataFrame with index matching columns not work?

Archimedes Trajano · Answer 5 · 2023-07-10T22:17:02.847

You can bring it back by creating a module

import pandas as pd


def my_append(self, x, ignore_index=False):
    if ignore_index:
        return pd.concat([self, x])
    else:
        return pd.concat([self, x]).reset_index(drop=True)


if not hasattr(pd.DataFrame, "append"):
    setattr(pd.DataFrame, "append", my_append)

This will add the implementation and can be tested as follows

import pandas as pd
import lib.pandassupport


def test_append_ignore_index_is_true():
    df = pd.DataFrame(
        [
            {"Name": "John", "Age": 25, "City": "New York"},
            {"Name": "Emily", "Age": 30, "City": "San Francisco"},
            {"Name": "Michael", "Age": 35, "City": "Chicago"},
        ]
    )
    new_row = pd.DataFrame([{"Name": "Archie", "Age": 27, "City": "Boston"}])
    df = df.append(new_row, ignore_index=True)
    print(df)
    assert df.equals(
        pd.DataFrame(
            [
                {"Name": "John", "Age": 25, "City": "New York"},
                {"Name": "Emily", "Age": 30, "City": "San Francisco"},
                {"Name": "Michael", "Age": 35, "City": "Chicago"},
                {"Name": "Archie", "Age": 27, "City": "Boston"},
            ],
            [0, 1, 2, 0],
        )
    )


def test_append():
    df = pd.DataFrame(
        [
            {"Name": "John", "Age": 25, "City": "New York"},
            {"Name": "Emily", "Age": 30, "City": "San Francisco"},
            {"Name": "Michael", "Age": 35, "City": "Chicago"},
        ]
    )
    new_row = pd.DataFrame([{"Name": "Archie", "Age": 27, "City": "Boston"}])
    df = df.append(new_row)
    assert df.equals(
        pd.DataFrame(
            [
                {"Name": "John", "Age": 25, "City": "New York"},
                {"Name": "Emily", "Age": 30, "City": "San Francisco"},
                {"Name": "Michael", "Age": 35, "City": "Chicago"},
                {"Name": "Archie", "Age": 27, "City": "Boston"},
            ],
            [0, 1, 2, 3],
        )
    )

How to convert DataFrame.append() to pandas.concat()?

5 Answers5

Linked

Related