1

enter image description here I have file named Example.xls in which i have data in tab sales and purchase.
We have data in both tab from Column A to E. When i import these data through pandas module, i want that result like Column A to F where column F should display the name sheet name. How to display the name of sheet name in pandas module?

I am using code

all= pd.read_excel(Example.xlsx',sheet_name=['Sales','Purchas'])
enter image description here and then

df= pd.concat(All[frame]for fram in All.keys())

and then after i want to put the name of tabs in my data frame "All" in the last column which is F respectively

2 Answers2

2

The below code will solve your problem:

import os
from glob import glob
import pandas as pd

f_mask = r'path\*.xlsx' ## The folder path where your Example.xlsx is stored

df = \
pd.concat([df.assign(file=os.path.splitext(os.path.basename(f))[0],
                     sheet=sheet)
           for f in glob(f_mask)
           for sheet, df in pd.read_excel(f, sheet_name=None).items()],
          ignore_index=True)

The code works in following way:

  1. Check the base folder and take all the .xlsx files in it
  2. Read the files one by one
  3. Make two additional columns, one for file name other for sheet name

This solution will work if you want to do the exercise for more than 1 .xlsx file

Rahul Agarwal
  • 4,034
  • 7
  • 27
  • 51
  • Thank you ed 4 mins ago Rahul Agarwal .. i will try .. but what is the role of glob ? – Manoj Kumar Jun 27 '19 at 07:44
  • 1
    `Glob` is basically going to that folder and seeing different .xlsx files. In future if you want to do same exercise for multiple excel files, this will take care of it. – Rahul Agarwal Jun 27 '19 at 07:46
  • 1
    do i need to define all the name of tabs? i am using code all= pd.read_excel(Example.xlsx',sheet_name=['Saales','Purchas']) and then df= pd.concat(All[frame]for fram in All.keys()).. and then after i want to put the name of tabs in my data frame "All" in the last column which is F respectivel – Manoj Kumar Jun 27 '19 at 07:55
  • 1
    For above code to work properly, put all your excel files or 1 in any folder. Give that folder path in `f_mask` and simply run the code. No need to define anything. The code will take care of everything!! – Rahul Agarwal Jun 27 '19 at 07:56
  • 1
    i tried but showing error " Traceback (most recent call last): File "", line 2, in pd.concat([df.assign(file=os.path.splitext(os.path.basename(f))[0], NameError: name 'pd' is not defined" – Manoj Kumar Jun 27 '19 at 08:12
  • 1
    Updated the answer with one more package – Rahul Agarwal Jun 27 '19 at 08:24
  • wow is Awesome, never thought this will with all excel file . @Rahul Agarwal .. but i am not getting the name of tab name in last column . that was my query .. could u pls help. – Manoj Kumar Jun 27 '19 at 08:27
  • 1
    I was getting when i ran the same code...it will be in the column named `sheet`. So, what is the extension of your excel files, if it is `.xls` change the `f_mask` to `f_mask `= r'path\*.xls'` – Rahul Agarwal Jun 27 '19 at 08:30
  • 1
    we its imported all the data but what did not display the tab name which is sales and purchase in last column .. could u pls help – Manoj Kumar Jun 27 '19 at 09:35
  • 1
    in your resultant df is the column name `Sheet` is getting formed or not ? – Rahul Agarwal Jun 27 '19 at 09:39
  • 1
    No i cant see that name as sheet ... :( please check above question .. i have attached the print screen – Manoj Kumar Jun 27 '19 at 09:46
  • 1
    Can you also post like what you have modified in my code!! or you are using it exactly like I posted!! Bcoz same code is working on my system – Rahul Agarwal Jun 27 '19 at 09:52
  • 1
    I am using same code as you mentioned .. there is no change in code . Only i – Manoj Kumar Jun 27 '19 at 16:10
  • 1
    Can you post a screenshot of your `Example.xls` here – Rahul Agarwal Jun 27 '19 at 17:55
  • Rahul pls ...help me – Manoj Kumar Jul 04 '19 at 07:17
  • tell me what you put in `f_mask ` in your solution – Rahul Agarwal Jul 04 '19 at 08:26
  • this is what i did >>> import os >>> from glob import glob >>> import pandas as pd >>> f_mask = r'C:\Users\dell\Desktop\kk\*.xlsx' ## The folder path where your Example.xlsx is stored >>> df = \ pd.concat([df.assign(file=os.path.splitext(os.path.basename(f))[0], sheet=sheet) for f in glob(f_mask) for sheet, df in pd.read_excel(f, sheet_name=None).items()], ignore_index=True) – Manoj Kumar Jul 07 '19 at 16:50
  • Warning (from warnings module): File "__main__", line 6 FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'. – Manoj Kumar Jul 07 '19 at 16:50
  • 1
    You can ignore warning...2nd: Make a new folder in desktop lets say `Test`, Put your xl file in this `Test` folder and now your `f_mask` should be `C:\Users\dell\Desktop\Test\*.xlsx` – Rahul Agarwal Jul 08 '19 at 10:24
  • @ Rahul : How are you doing . i have an another query as i have stucked . Could you please help me . its realted with groupby – Manoj Kumar Jun 27 '20 at 18:54
  • you can post the link of the question!! I will see and let you know!! – Rahul Agarwal Jun 29 '20 at 05:52
  • I am sad now .. i have asked the same question again again beacuse i did not get the appropariate answer .. i really dont know how this app works .. Eeveryone is degarding me ..but dont see why i am asking again and again ...https://stackoverflow.com/questions/62628305/no-one-is-able-to-help-for-groupby-by-one-level-and-give-the-rest-dublicate-data.... please help me if you can – Manoj Kumar Jun 29 '20 at 11:19
  • 1
    The above link is not working!! let me know how can I help...even if you want to understand how the website works!! – Rahul Agarwal Jun 30 '20 at 12:14
  • Really thanks alot https://stackoverflow.com/questions/62613892/not-able-to-groupby-by-one-level-in-my-dataframe-by-pandas .. you can find my question – Manoj Kumar Jun 30 '20 at 12:30
  • @ManojKumar: I checked the answer given by Yoben_s in this question https://stackoverflow.com/questions/62612488/how-to-crate-the-group-by-in-pandas-only-in-one-level . It is working perfectly fine for me. Coming the same output as you need. Let me know the exact problem you are getting while implementing above solution – Rahul Agarwal Jul 01 '20 at 17:55
  • No Rahul is not working from Yoben because i am extracting from Excel but he created a dats as list that i cant do while importing a excel file .. could you please show me how did u do that ..even Datanovice also did .. but if u see result Month is not in assending way .. – Manoj Kumar Jul 02 '20 at 08:07
  • I also did from excel...in one sheet I wrote your data..then simply run the code from Yoben , I am getting the month in ascending way. – Rahul Agarwal Jul 02 '20 at 08:25
  • I can only think is..you have older version of pandas to check please run `import pandas as pd print(pd.__version__)`. Let me know what is the pandas version – Rahul Agarwal Jul 02 '20 at 08:25
  • Hey Buddy .. its 0.25.3 – Manoj Kumar Jul 03 '20 at 19:15
  • 1
    Hi...it is updated version only....Don't know why this is happening with you man!! I tried the exact same and is working fine!! – Rahul Agarwal Jul 04 '20 at 19:21
  • Well thanks a ton Rahul.. i am able to do so.... just asking another question ...I have 2 data Frame (df1 and df2) which are having diffrent headers and i want to add into one data Frame as df3 . so when ever df1 ends then after one row df 2 must be displayed.. is it possiable ? – Manoj Kumar Jul 04 '20 at 19:32
  • 1
    Yes..by `df3 = pd.conact[df1,df2]` – Rahul Agarwal Jul 06 '20 at 04:41
  • Thanks Rahul for alwyes help and ssupport .. i already fixed this ... just asking is there any around where i can i use the function add 0 in last number of each cell – Manoj Kumar Jul 10 '20 at 20:11
2

I think this is the simplest way.

import pandas as pd

path = r'path_of_your_file'
workbook = pd.read_excel(path, sheet_name = None)
df= pd.DataFrame()
for sheet_name, sheet in workbook.items():
    sheet['sheet'] = sheet_name
    df = df.append(sheet)

# Reset your index or you'll have duplicates
df = df.reset_index(drop=True)
MarLeo
  • 2,566
  • 2
  • 18
  • 16
  • i have error >>> import pandas as pd >>> path = r'C:\Users\dell\Desktop\kk\' SyntaxError: EOL while scanning string literal >>> path = r'C:\Users\dell\Desktop\kk\Example.xlsx' >>> workbook = pd.read_excel(path, sheet_name = None) >>> df= pd.DataFrame() >>> for sheet_name, sheet in workbook.items(): sheet['sheet'] = sheet_name df = df.append(sheet) SyntaxError: unexpected indent >>> – Manoj Kumar Jul 07 '19 at 16:57
  • This line is your problem: path = r'C:\Users\dell\Desktop\kk\'. the last backslash escapes the closing apostrophe. You don't need that line at all anyway, the path should include the file name. – MarLeo Jul 08 '19 at 10:29
  • Rahul--- It is consolidating all the sheet . but what i want to do is i want to Sheet name also in the last column so that i can read what data belongs which sheet... i y hope u get my issue now .. pls help me... – Manoj Kumar Jul 13 '19 at 19:38
  • @ManojKumar I think you added the comment to the wrong answer but my code here will add the sheet name to the last column as you require. – MarLeo Jul 15 '19 at 08:43
  • hi Rahul -- @Reimar. how are you doing buddy . hope everything will be fine . i have raised one query hope you can help me. really appricates for your support. please check my today questions . – Manoj Kumar Jun 24 '20 at 21:07