-2

I have this file which is specified below,

.Xls file with multiple header

As you can see it consists of many header layers, how can i read this file in R / Python so that i could get it in proper format for processing it?

alistaire
  • 42,459
  • 4
  • 77
  • 117
Nitz
  • 81
  • 1
  • 4

2 Answers2

0

You can manually specify column names when reading with Pandas.

import pandas as pd

file_name = r"/foo/bar/data.xlsx"
columns = ["Foo", "Bar", "Baz"]

df = pd.read_excel(file_name, header=None, skiprows=7, names=columns)

To set mutli-level columns:

df = pd.DataFrame({'Foo':[1,2,3],'Bar':[2,4,6], "Baz": [3, 6, 9]})
columns = [("Cereals", "Rice", "Autumn"), ("Cereals", "Rice", "Summer"), ("Cereals", "Wheat", "Winter")]
df.columns = pd.MultiIndex.from_tuples(columns)
Batman
  • 8,571
  • 7
  • 41
  • 80
  • Okay, but each column in dataset comes under different category based upon multiple headers, so how can i be able to retain the hierarchy? for example "Autumn" column is under "Rice" header and which is again under "Cereals and Millets". – Nitz Feb 27 '17 at 05:54
  • Which is why I asked what you considered to be proper. See the edited answer. – Batman Feb 27 '17 at 06:03
  • Okay! i am handling this type of file for first time so bit confused, even i don't know what would be the proper format for this. Thanks for your suggestion i will try with this one. – Nitz Feb 27 '17 at 06:25
0

In pandas you could look at the Hierarchical indexing (MultiIndex) http://pandas.pydata.org/pandas-docs/stable/advanced.html

But as your after Proper heading then do as "Batman" said above by reading in and applying your own column headings

nicktids
  • 163
  • 6