Dropping Columns From Data frame to only show needed ones

Question

Basically i want to drop some columns that i don't need. And i'm kind of stumped why this is not working

import os
import pandas


def summarise(indir, outfile):
os.chdir(indir)
filelist = ".txt"
dflist = []
colnames = ["DSP Code", "Report Date", "Initial Date", "End Date", "Transaction Type", "Sale Type",
            "Distribution Channel", "Products Origin ID", "Product ID", "Artist", "Title", "Units Sold",
            "Retail Price", "Dealer Price", "Additional Revenue", "Warner Share", "Entity to be billed",
            "E retailer name", "E retailer Country", "End Consumer Country", "Price Code", "Currency Code"]
for filename in filelist:
    print(filename)
    df = pandas.read_csv('SYB_M_20171001_20171031.txt', header=None, encoding='utf-8', sep='\t', names=colnames,
                         skiprows=3)
    df['data_revenue'] = df['Units Sold'] * df['Dealer Price']  # Multiplying Units with Dealer price = Revenue
    df = df.sort_values(['End Consumer Country', 'Currency Code'])  # Sorts the columns alphabetically
    df.to_csv(outfile + r"\output.csv", index=None)
    dflist.append(filename)
    df.drop(columns='DSP Code')


summarise(r"O:\James Upson\Sound Track Your Brand Testing\SYB Test",
      r"O:\James Upson\Sound Track Your Brand Testing\SYB Test Formatted")

I want to drop all the column titles you can see in colnames excluding 'Units Sold', 'Dealer Price', 'End Consumer Country', 'Currency Code'. I tried to remove one column using df.drop(columns='DSP Code') but this doesn't seem to work.

Any help would be greatly appreciated :)

`df.drop` does not change the original dataframe. Try assign a different variable name. Example: `df1 = df.drop(columns='DSP Code')` — Xiaoyu Lu, Jan 18 '19 at 16:01
Possible duplicate of [Pandas deleting row with df.drop doesn't work](https://stackoverflow.com/questions/38481409/pandas-deleting-row-with-df-drop-doesnt-work) — busybear, Jan 18 '19 at 16:06
Since it seems like you want to `drop` most of the columns, you should instead specify `usecols` when you read in the csv, that way you don't need to read them in and then drop them without ever using them. — ALollz, Jan 18 '19 at 16:17
try adding `usecols = ['Units Sold', 'Dealer Price', 'End Consumer Country', 'Currency Code']` as an argument to `.read_csv`. I'm not super familiar on how it works with the ordering of `names` though. Also your `ouptut` file won't have all of the columns, so may not be entirely what you need. — ALollz, Jan 18 '19 at 16:19
SPy, i just added the example in my answers.. using `usecols` as @ALollz illustrated above as well. — Karn Kumar, Jan 18 '19 at 16:23
SOLVED!!! Thank you @ALollz and thank you everyone else for their contribution and feedback :) — James Upson, Jan 18 '19 at 16:27

Karn Kumar · Accepted Answer · 2019-01-18T16:36:37.880

2

You Can do it like :

df.drop(['Col_1', 'col_2'], axis=1, inplace=True)

OR:

df = df.drop(columns=colnames)

As suggested in comment section use usecols which provides a kind of filter to trim down the column section to use only which are require rest columns will not be processes and thus efficiency will be increased and resource consumption will also be less:

df = pandas.read_csv('SYB_M_20171001_20171031.txt', encoding='utf-8', sep='\t', usecols=["col1", "col2", "col3"],skiprows=3)

edited Jan 18 '19 at 16:36

answered Jan 18 '19 at 16:11

Karn Kumar

8,518
3
27
53

No this still doesn't work :( when i run the script the columns are still there – James Upson Jan 18 '19 at 16:13

score 1 · Answer 2 · answered Jan 18 '19 at 16:04

1

df.drop(columns='DSP Code')

this bit is not working cause you are not assigning it to a new df

df = df.drop(columns='DSP Code')

You can also just keep the columns you care about by copying them into a second dataframe.

answered Jan 18 '19 at 16:04

ilamaaa

421
2
8

score 0 · Answer 3 · answered Jan 18 '19 at 16:06

0

According to pandas.DataFrame.drop, it returns a dataframe unless you do the operation inplace.

Returns:
dropped : pandas.DataFrame

inplace : bool, default False

If True, do operation inplace and return None.

Either do it in place: df.drop(columns=['DSP Code'], inplace=True) or store the returned dataframe: df=df.drop(columns=['DSP Code'])

answered Jan 18 '19 at 16:06

Ricky Kim

1,992
1
9
18

Tried both with no luck, columns are still there – James Upson Jan 18 '19 at 16:14
Maybe because your writing to csv before dropping? – Ricky Kim Jan 18 '19 at 16:20
`drop` first and then `to_csv` – Ricky Kim Jan 18 '19 at 16:21

score 0 · Answer 4 · answered Jan 18 '19 at 16:28

0

Just do:

df = df['Units Sold', 'Dealer Price', 'End Consumer Country', 'Currency Code']

You keep the ones you want, instead of dropping the others.

answered Jan 18 '19 at 16:28

micric

621
4
15

Dropping Columns From Data frame to only show needed ones

4 Answers4