Good afternoon everyone,
I want to filter out from a DataFrame the columns that I am not interested in.
To do that - and since the columns could change based on user input (that I will not show here) - I am using the following code within my offshore_filter
function:
# Note: 'df' is my DataFrame, with different country codes as rows and years as columns' headers
import datetime as d
import pandas as pd
COUNTRIES = [
'EU28', 'AL', 'AT', 'BE', 'BG', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL',
'ES', 'FI', 'FR', 'GE', 'HR', 'HU', 'IE', 'IS', 'IT', 'LT', 'LU', 'LV',
'MD', 'ME', 'MK', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK',
'TR', 'UA', 'UK', 'XK'
YEARS = list(range(2005, int(d.datetime.now().year)))
def offshore_filter(df, countries=COUNTRIES, years=YEARS):
# This function is specific for filtering out the countries
# and the years not needed in the analysis
# Filter out all of the countries not of interest
df.drop(df[~df['country'].isin(countries)].index, inplace=True)
# Filter out all of the years not of interest
columns_to_keep = ['country', 'country_name'] + [i for i in years]
temp = df.reindex(columns=columns_to_keep)
df = temp # This step to avoid the copy vs view complication
return df
When I pass a years
list of integers, the code works well and filters the DataFrame by taking only the columns in the years
list.
However, if the DataFrame's column headers are strings (e.g. '2018'
instead of 2018
), changing [i for i in years]
into [str(i) for i in years]
doesn't work, and I have columns of Nan's (as the reindex
documentation states).
Can you help me spot me why?