1

Basically i want to drop some columns that i don't need. And i'm kind of stumped why this is not working

import os
import pandas


def summarise(indir, outfile):
os.chdir(indir)
filelist = ".txt"
dflist = []
colnames = ["DSP Code", "Report Date", "Initial Date", "End Date", "Transaction Type", "Sale Type",
            "Distribution Channel", "Products Origin ID", "Product ID", "Artist", "Title", "Units Sold",
            "Retail Price", "Dealer Price", "Additional Revenue", "Warner Share", "Entity to be billed",
            "E retailer name", "E retailer Country", "End Consumer Country", "Price Code", "Currency Code"]
for filename in filelist:
    print(filename)
    df = pandas.read_csv('SYB_M_20171001_20171031.txt', header=None, encoding='utf-8', sep='\t', names=colnames,
                         skiprows=3)
    df['data_revenue'] = df['Units Sold'] * df['Dealer Price']  # Multiplying Units with Dealer price = Revenue
    df = df.sort_values(['End Consumer Country', 'Currency Code'])  # Sorts the columns alphabetically
    df.to_csv(outfile + r"\output.csv", index=None)
    dflist.append(filename)
    df.drop(columns='DSP Code')


summarise(r"O:\James Upson\Sound Track Your Brand Testing\SYB Test",
      r"O:\James Upson\Sound Track Your Brand Testing\SYB Test Formatted") 

I want to drop all the column titles you can see in colnames excluding 'Units Sold', 'Dealer Price', 'End Consumer Country', 'Currency Code'. I tried to remove one column using df.drop(columns='DSP Code') but this doesn't seem to work.

Any help would be greatly appreciated :)

  • You need to save your output or set the `inplace=True` – busybear Jan 18 '19 at 16:01
  • `df.drop` does not change the original dataframe. Try assign a different variable name. Example: `df1 = df.drop(columns='DSP Code')` – Xiaoyu Lu Jan 18 '19 at 16:01
  • Possible duplicate of [Pandas deleting row with df.drop doesn't work](https://stackoverflow.com/questions/38481409/pandas-deleting-row-with-df-drop-doesnt-work) – busybear Jan 18 '19 at 16:06
  • 2
    Since it seems like you want to `drop` most of the columns, you should instead specify `usecols` when you read in the csv, that way you don't need to read them in and then drop them without ever using them. – ALollz Jan 18 '19 at 16:17
  • @ALollz how would this be done? – James Upson Jan 18 '19 at 16:18
  • @ALollz, rightly spot on , thats the better way. – Karn Kumar Jan 18 '19 at 16:18
  • @XiaoyuLu Using the second option greys out the df1 – James Upson Jan 18 '19 at 16:19
  • 1
    try adding `usecols = ['Units Sold', 'Dealer Price', 'End Consumer Country', 'Currency Code']` as an argument to `.read_csv`. I'm not super familiar on how it works with the ordering of `names` though. Also your `ouptut` file won't have all of the columns, so may not be entirely what you need. – ALollz Jan 18 '19 at 16:19
  • SPy, i just added the example in my answers.. using `usecols` as @ALollz illustrated above as well. – Karn Kumar Jan 18 '19 at 16:23
  • 1
    SOLVED!!! Thank you @ALollz and thank you everyone else for their contribution and feedback :) – James Upson Jan 18 '19 at 16:27
  • @JamesUpson, Excellent you got it , How that fixed :-) – Karn Kumar Jan 18 '19 at 16:28

4 Answers4

2

You Can do it like :

df.drop(['Col_1', 'col_2'], axis=1, inplace=True)

OR:

df = df.drop(columns=colnames)

As suggested in comment section use usecols which provides a kind of filter to trim down the column section to use only which are require rest columns will not be processes and thus efficiency will be increased and resource consumption will also be less:

df = pandas.read_csv('SYB_M_20171001_20171031.txt', encoding='utf-8', sep='\t', usecols=["col1", "col2", "col3"],skiprows=3)
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53
1
df.drop(columns='DSP Code')

this bit is not working cause you are not assigning it to a new df

df = df.drop(columns='DSP Code')

You can also just keep the columns you care about by copying them into a second dataframe.

ilamaaa
  • 421
  • 2
  • 8
0

According to pandas.DataFrame.drop, it returns a dataframe unless you do the operation inplace.

Returns:
dropped : pandas.DataFrame

inplace : bool, default False

If True, do operation inplace and return None.

Either do it in place: df.drop(columns=['DSP Code'], inplace=True) or store the returned dataframe: df=df.drop(columns=['DSP Code'])

Ricky Kim
  • 1,992
  • 1
  • 9
  • 18
0

Just do:

df = df['Units Sold', 'Dealer Price', 'End Consumer Country', 'Currency Code']

You keep the ones you want, instead of dropping the others.

micric
  • 621
  • 4
  • 15