1

I have a huge dataset about airbnb in the world. This dataset have 5500 city in it. I want to work only on 'London', 'Paris' and 'Berlin' So from my original dataset name 'df' I want to creat a new dataset 'filtered_df' with only all the data from these 3 cities. I have a variable 'City', so i tried this below but doesn't work as i want.

df_berlin = df['City']== 'Berlin'

df_paris = df['City']== 'Paris'

df_london = df['City']== 'London'

filtered_df = [df_berlin + df_paris + df_london]

pandawan
  • 13
  • 5
  • It's not surprising that this does not do anyting. It's just assigning `False` to all three o your filter variables and adds them up to result in `[0]`. What format is your dataset in and what framework do you use to work with it? There may be a built-in subset method you can use. – Martin Wettstein Jun 10 '21 at 08:52
  • Maybe you can try something like ```df_paris = df["Paris"]``` – Milos Stojanovic Jun 10 '21 at 08:52
  • my dataset is in CSV format, and i'm working on jupiter with python. I get a key error when using df_paris = df["Paris"], because Paris, London and Berlin are in the column 'City' – pandawan Jun 10 '21 at 08:55
  • I'm just beginning with Pandas, too, so I'm not sure if it's right, but does `df[df.City=='Berlin']` do it? – fsimonjetz Jun 10 '21 at 09:04
  • No :/ i just get all the column names, but not the data about berlin – pandawan Jun 10 '21 at 09:11
  • I think we need to know exactly what format the data is in, otherwise it's just guesswork ;/ – fsimonjetz Jun 10 '21 at 09:15
  • 1
    @pandawan run `df.head()` and add output in the question. It will gives some insight about your dataset. I assume that you have converted your csv format dataset into a `pandas DataFrame` using `read_csv()`. – nobleknight Jun 10 '21 at 09:19
  • Does this answer your question? [How do I select rows from a DataFrame based on column values?](https://stackoverflow.com/questions/17071871/how-do-i-select-rows-from-a-dataframe-based-on-column-values) – SunilG Jun 10 '21 at 09:43
  • How about `filtered_df = df.loc[df['City'].isin(['Berlin', 'Paris', 'London'])]`? – 0x5453 Jun 10 '21 at 13:17

1 Answers1

1
df_berlin=df[df.City=='Berlin']
df_paris=df[df.City=='Paris']
df_london=df[df.City=='London']
filtered_df = df_berlin.append(df_paris.append(df_london))
filtered_df.sort_index(inplace=True,kind='mergesort')

I tried to simulate this on a small dataset and it worked, for your dataset since it is huge, you can use mergesort and it should work I guess.