I have one excel sheet with right format(Certain number of headers and specific names). Here I have another excel sheet and I have to check this excel sheet for right format or not(have to be the same number of header and same header names, no issue if the values below header will changed.). how can solve this issue ? NLP or any other suitable method is there?
Asked
Active
Viewed 3,352 times
-2
-
Is there any NLP method to solve this? – Learner Oct 18 '18 at 05:42
-
I don't understand why you change the accepted answer when the answer I gave was more precise about what you asked :-) – A. Wolf Oct 23 '18 at 20:24
-
@A.Wolf : **df_out = pd.DataFrame([('string1',1),('string2',2), ('string3',3)], columns=['Name', 'Value']) df_out.to_excel('tmp1.xlsx') ** I can't understand this line ..Can u explain ?\ – Learner Oct 25 '18 at 12:31
-
It creates an Excel file from the dataframe. It was useful only to have a complete working example. – A. Wolf Oct 25 '18 at 12:33
-
with those excels you can see the difference of calling the method on two equals Excel files and on two different Excel files. – A. Wolf Oct 25 '18 at 12:35
2 Answers
0
You can use pandas
for that comparison.
import pandas as pd
f1 = pd.read_excel('sheet1.xlsx')
f2 = pd.read_excel('sheet2.xlsx')
header_threshold = 5 # any number of headers
print(len(f1.columns) == header_threshold)
print(f1.columns) # get the column names as values

iDrwish
- 3,085
- 1
- 15
- 24
-
Can apply any natural language processing methods here ? like train the system that this is the right format excel sheet. – Learner Oct 18 '18 at 05:43
-
The code above didn't check the header names . I have check the header names also – Learner Oct 18 '18 at 06:05
0
If you have to compare two Excel you could try something like this (I add also some example Excels):
def areHeaderExcelEqual(excel1, excel2) :
equals = True
if len(excel1.columns) != len(excel2.columns):
return False
for i in range(len(excel1.columns)):
if excel1.columns[i] != excel2.columns[i] :
equals = False
return equals
And that's an application:
import pandas as pd
#create first example Excel
df_out = pd.DataFrame([('string1',1),('string2',2), ('string3',3)], columns=['Name', 'Value'])
df_out.to_excel('tmp1.xlsx')
#create second example Excel
df_out = pd.DataFrame([('string5',1),('string2',5), ('string2',3)], columns=['Name', 'Value'])
df_out.to_excel('tmp2.xlsx')
# create third example Excel
df_out = pd.DataFrame([('string1',1),('string4',2), ('string3',3)], columns=['MyName', 'MyValue'])
df_out.to_excel('tmp3.xlsx')
excel1 = pd.read_excel('tmp1.xlsx')
excel2 = pd.read_excel('tmp2.xlsx')
excel3 = pd.read_excel('tmp3.xlsx')
print(areHeaderExcelEqual(excel1, excel2))
print(areHeaderExcelEqual(excel1, excel3))
Note: Excel's files are provided just to see the different outputs.
For example, excel1 looks like this:
The idea is the same for the other files. To have more insights, see How to create dataframes.
Here's you're code:
f1 = pd.read_excel('file1.xlsx')
f2 = pd.read_excel('file2.xlsx')
print(areHeaderExcelEqual(f1, f2))

A. Wolf
- 1,309
- 17
- 39
-
Is there any method to train the system by using features of right format and then it can detect the wrong format – Learner Oct 18 '18 at 07:14
-
-
-
I think that if you need a specific number of columns (which can be a parameter) and column names ML and NLP isn't necessary. – A. Wolf Oct 18 '18 at 07:37
-
1I think that you have to explain better your problem if you think NLP can help :-) Otherwise I think that this explanation answers to your question :-) You can accept this answer (which can help other people) and ask a new question citing this question to ask explicitely for NLP. Those are two different questions. – A. Wolf Oct 18 '18 at 07:45
-
Can u explain **([('string1',1),('string2',2), ('string3',3)], columns=['Name', 'Value'** what is string1, 1 and the column =['Name', 'Value' . please explain with a example – Learner Oct 26 '18 at 12:32
-
I edited the answer with the excel output. **columns** is a list with the column names, while [('string1',1),('string2',2), ('string3',3)] is the list of the elements in the dataframe. You can see that each tuple (i.e. elements within brackets) is a line in the Excel file. – A. Wolf Oct 26 '18 at 12:41
-
Dataframe is fundamental concept in Python, you have to learn it. If you're not familiar with it, please refer to the link I added to the answer. – A. Wolf Oct 26 '18 at 12:42
-
Actually I have two excel files , I just need to compare those files. no need to create excel file like you mentioned – Learner Oct 26 '18 at 12:46
-
-
You can start from readExcel(); but in this way, you can copy & paste the code without have any Excel file and have a meaningful output. – A. Wolf Oct 26 '18 at 12:47