0

I just started to use Python. Actually, I'm setting up a new methodology to read patent data. With textrazor this patent data should be analyzed. I'm interested in getting the topics and save them in a term-document-matrix. It's already possible for me to save the output topics, but only in one big cell with a very long vector. How can I split this long vector, to save the topics in different cells in an Excel file?

If you have any ideas regarding this problem, I would be thankful for your answer. Also, feel free to recommend or help me with my code.

data = open('Patentdaten1.csv')
content= data.read()
table=[]
row = content.split('\n')

for i in range(len(row)):
    column= row[i].split(';')
    table.append(column)

patent1= table[1][1]

import textrazor

textrazor.api_key ="b033067632dba8a710c57f088115ad4eeff22142629bb1c07c780a10"

client = textrazor.TextRazor(extractors= ["entities", "categories", "topics"])

client.set_classifiers(['textrazor_newscodes'])

response = client.analyze(content)

topics= response.topics()

import pandas as pd

df = pd.DataFrame({'topic' : [topics]})

df.to_csv('test.csv') 
Fabián Montero
  • 1,613
  • 1
  • 16
  • 34
  • I think to open the csv file you should also use a library like Pandas. then you have the data already in the correct format – Uli Sotschok Aug 22 '19 at 08:08

1 Answers1

0

It's a bit difficult to see exactly what is the problem without an example input and/or output, but saving data to excel via pandas removes any need for intermediate processing: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html

For instance:

import pandas
data = pandas.DataFrame.from_dict({"pantents": ["p0", "p1"], "authors": ["a0", "a1"]})
data.to_excel("D:\\test.xlsx")

Output: enter image description here

Artur
  • 407
  • 2
  • 8
  • thank you for your reply. The input is a large patent text, which is analyzed with the package "textrazor". the following output of this code is an excel-file (in a csv-format) where the information is saved as: ,topic 0,"[TextRazor Topic 0 with label Stairs, TextRazor Topic 1 with label Graphical user interface, TextRazor Topic 2 with label Portable media player, TextRazor Topic 3 with label Scrolling, TextRazor Topic 4 with label Input device,...] Now I want to split this long vector after into short vectors, to save the different topics in different cells. thank you! – Dogan Kirhan Aug 22 '19 at 08:49