How to read .xls file with multiple header in R / Python for data processing?

Question

I have this file which is specified below,

.Xls file with multiple header

As you can see it consists of many header layers, how can i read this file in R / Python so that i could get it in proper format for processing it?

There are more than 2 headers for each column, how to manage that so i will end up having proper 1 header dataset! — Nitz, Feb 27 '17 at 05:39

Batman · Answer 1 · 2017-02-27T06:03:15.313

0

You can manually specify column names when reading with Pandas.

import pandas as pd

file_name = r"/foo/bar/data.xlsx"
columns = ["Foo", "Bar", "Baz"]

df = pd.read_excel(file_name, header=None, skiprows=7, names=columns)

To set mutli-level columns:

df = pd.DataFrame({'Foo':[1,2,3],'Bar':[2,4,6], "Baz": [3, 6, 9]})
columns = [("Cereals", "Rice", "Autumn"), ("Cereals", "Rice", "Summer"), ("Cereals", "Wheat", "Winter")]
df.columns = pd.MultiIndex.from_tuples(columns)

edited Feb 27 '17 at 06:03

answered Feb 27 '17 at 05:45

Batman

8,571
7
41
80

Okay, but each column in dataset comes under different category based upon multiple headers, so how can i be able to retain the hierarchy? for example "Autumn" column is under "Rice" header and which is again under "Cereals and Millets". – Nitz Feb 27 '17 at 05:54
Which is why I asked what you considered to be proper. See the edited answer. – Batman Feb 27 '17 at 06:03
Okay! i am handling this type of file for first time so bit confused, even i don't know what would be the proper format for this. Thanks for your suggestion i will try with this one. – Nitz Feb 27 '17 at 06:25

score 0 · Answer 2 · answered Feb 27 '17 at 06:11

0

In pandas you could look at the Hierarchical indexing (MultiIndex) http://pandas.pydata.org/pandas-docs/stable/advanced.html

But as your after Proper heading then do as "Batman" said above by reading in and applying your own column headings

answered Feb 27 '17 at 06:11

nicktids

163
6

Any solution using R? – Nitz Feb 27 '17 at 06:28

How to read .xls file with multiple header in R / Python for data processing?

2 Answers2