1

Good afternoon everyone,

I want to filter out from a DataFrame the columns that I am not interested in. To do that - and since the columns could change based on user input (that I will not show here) - I am using the following code within my offshore_filter function:

# Note: 'df' is my DataFrame, with different country codes as rows and years as columns' headers

import datetime as d
import pandas as pd

COUNTRIES = [
        'EU28', 'AL', 'AT', 'BE', 'BG', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL',
        'ES', 'FI', 'FR', 'GE', 'HR', 'HU', 'IE', 'IS', 'IT', 'LT', 'LU', 'LV',
        'MD', 'ME', 'MK', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK',
        'TR', 'UA', 'UK', 'XK'

YEARS = list(range(2005, int(d.datetime.now().year)))

def offshore_filter(df, countries=COUNTRIES, years=YEARS):
    # This function is specific for filtering out the countries
    # and the years not needed in the analysis

    # Filter out all of the countries not of interest
    df.drop(df[~df['country'].isin(countries)].index, inplace=True)

    # Filter out all of the years not of interest
    columns_to_keep = ['country', 'country_name'] + [i for i in years]
    temp = df.reindex(columns=columns_to_keep)
    df = temp  # This step to avoid the copy vs view complication

    return df

When I pass a years list of integers, the code works well and filters the DataFrame by taking only the columns in the years list.

However, if the DataFrame's column headers are strings (e.g. '2018' instead of 2018), changing [i for i in years] into [str(i) for i in years] doesn't work, and I have columns of Nan's (as the reindex documentation states).

Can you help me spot me why?

  • Is it possible to add an example dataset of a couple of rows so we can test ourselves? – Erfan Mar 11 '19 at 15:12
  • Thank @erfan for the answer. I uploaded two example files in [here](https://drive.google.com/open?id=1Ab6MnTcCrZ9cwOAYFtPjj5bi4Ub6yTJ3). Pay attention: one is pure Excel file, the other a TSV one. The one not working (e.g. the one with the column headers as strings) is the TSV one. – Filippo Antonio Capizzi Mar 11 '19 at 15:22

0 Answers0