0

I'm working with the python tool in Alteryx to do some NLP with Spacy. I have some comments I want to extract noun chunks from, but I can't figure out which comments they are from and I'll have hundreds or even thousands of comments being processed.

For example, I have these 2 comments:

|Row|    Comment                   | Id#|

|1  |The dog is brown.  He is soft.|245|

|2  |The black cat is soft.        |763|

Current Output:

| 1     | 2     |

| dog   |       |

|  he   |       |

| black | cat   |

Desired output:

|1   | 2     | 3   |

|row1| dog   |     |

|row1|  he   |     |

|row2| black | cat |

After importing spacy, dframcy and pandas in the Alteryx tool, my code starting at the .csv import in Alteryx:

df = Alteryx.read("#1")
Txt = df[["Comment"]]
String = doc.to_string()
dframcy = DframcCy(nlp)
Txt = dframcy.nlp(String)

output1 = pd.DataFrame(data=Txt.noun_chunks)
Alteryx.write(output1,1)

I know Alteryx very well, but I'm very new to NLP and spacy, and have beginner level knowledge of python.

I know I have to import the file as a data frame in Alteryx, convert it to a string for NLP, then convert it back to a data frame for the output out of the Alteryx, which is the tough part for me because most solutions don't involve converting anything back to a data frame and appending row data (for those who are familiar with python/spacy but not Alteryx). I need the row number (or Id number) because I want to join other data in the csv to those row numbers.

D.L
  • 4,339
  • 5
  • 22
  • 45

0 Answers0