1

I'm trying to convert an excel sheet into a doc object using spacy, I spent the last couple of days trying to go around it but it seems a bit challenging. I have opened the sheet in both openpyxl and pandas, I can read the excel sheet and output the content but I couldn't integrate spacy to create doc/token objects.

Is it possible to process excel sheets in spacy's pipeline?

Thank you!

Tech
  • 13
  • 3
  • You just need to get the text into a plain string and then pass the string to spaCy. spaCy doesn't know anything about Excel. – polm23 Feb 10 '22 at 05:13

1 Answers1

1

Spacy has no support for excel. You could use pandas to read either the csv(if csv format) or excel file like

     import pandas as pd
     df = pd.read_csv(file)

or

     df  = pd.read_excel(file)

respectively. Select required text column and iterate over df 'column' values and pass them over to nlp() of spacy