2

I am trying to use microsoft's cognitive services with data lake and have run into a problem while trying to get key phrases and sentiment from the text in a column of a CSV file.

I have checked to make sure that the file is formatted correctly and is being read correctly (I have done a few basics, like copying, to make sure it is workable).

I have also made sure that the column I am interested in the CSV file (Description) contains just text(string) when it is extracted by itself.

The input file and output folder are in my Azure data lake and I am running the script from my data lake analytics on Azure. I have not tried to run this locally in Visual Studio.

I used Key Phrases Extraction (U-SQL) and Sentiment Analysis (U-SQL) as my reference and followed the directions there, including getting the plugins.

In each case when I submit the job I get an error that I cannot seem to find a way round. Below I have shown the code that I have used for each and the error that I get when running it.

Key Phrase Code

REFERENCE ASSEMBLY [TextSentiment];
REFERENCE ASSEMBLY [TextKeyPhrase];

@myinput =
EXTRACT 
    Modified_On string,
    _Name string,
    Description string,
    Customer string,
    Category string,
    Target_Market string,
    Person_Responsible string,
    Status string,
    _Region string,
    Modified_On_2 string,
    Created_On string,
    _Site string,
    _Team string    
FROM "/userData/fromSharepoint/Game_Plans"
USING Extractors.Csv(skipFirstNRows:1);

@keyphrase =
PROCESS @myinput
PRODUCE 
    Description,
    KeyPhrase string
READONLY
    Description
USING new Cognition.Text.KeyPhraseExtractor();

OUTPUT @keyphrase
    TO "/userData/testingCognitive/tesing1.csv"
    USING Outputters.Csv();

Key Phrase Error Message

enter image description here

Sentiment Code

REFERENCE ASSEMBLY [TextSentiment];
REFERENCE ASSEMBLY [TextKeyPhrase];

@myinput =
EXTRACT 
    Modified_On string,
    _Name string,
    Description string,
    Customer string,
    Category string,
    Target_Market string,
    Person_Responsible string,
    Status string,
    _Region string,
    Modified_On_2 string,
    Created_On string,
    _Site string,
    _Team string    
FROM "/userData/fromSharepoint/Game_Plans"
USING Extractors.Csv(skipFirstNRows:1);

@sentiment =
PROCESS @myinput
PRODUCE 
    Description,
    sentiment string,
    conf double
READONLY
    Description
USING new Cognition.Text.SentimentAnalyzer(true);

OUTPUT @sentiment
    TO "/userData/testingCognitive/tesing1.csv"
    USING Outputters.Csv();

Sentiment Error Message

enter image description here

Any assistance on how to solve this would be much appreciated.

Alternatively if anyone has got these functions working and can provide some scripts to test with and links to input files to download that would be awesome.

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
Daniel
  • 21
  • 1

1 Answers1

1

I can't reproduce your exact error (can you post some simple sample data?) but I can get these libraries to work. I think the KeyPhraseExtractor by default expects columns called Text and KeyPhrase so if you are going to change them then you have to pass your column names in as arguments, eg

@keyphrase =
    PROCESS @myinput
    PRODUCE Description,
            KeyPhrase string
    READONLY Description
    USING new Cognition.Text.KeyPhraseExtractor("Description", "KeyPhrase");

UPDATE: There are some invalid characters in your sample file, just after the word "Bass". This is a non-breaking space (U+00A0) and I don't think you'll be able to import them - happy to be corrected. I removed these manually and was able to import the file. You could pre-process them in some manner.

Invalid characters

wBob
  • 13,710
  • 3
  • 20
  • 37
  • 1
    I have just given it a go with the same code as I posted in the initial question, but with the modifications that you suggested. I have made some dummy data here: [dummy data csv](https://1drv.ms/u/s!At0AYxnCavH1gddWT4Bz1sIWoOuUeQ) Using that dummy data I was able to reproduce the same error using both my original code, as well as with the suggestions you made. Thanks so much for your help. – Daniel Apr 04 '18 at 09:49
  • After manually removing them were you able to reproduce the error? – Daniel Apr 04 '18 at 21:55
  • No, the code ran successfully for me after manually removing the invalid characters. – wBob Apr 04 '18 at 22:05
  • Would you be able to send me the edited data you used and the complete exact code you used? Then I can test with that and see if it works. If not, I might need to try spinning out a new data lake and analytics, in case there is some setting I have gotten wrong. Once again thanks so much for your help. – Daniel Apr 04 '18 at 22:39
  • Just delete out the two records from your sample file which contain the text "Yancey Bass". Use the local emulator to save you spinning up any more resources in Azure. – wBob Apr 04 '18 at 23:11
  • I have given that a go. Here is the data that I used: [manually cleaned data](https://1drv.ms/u/s!At0AYxnCavH1gddfXemT1Mp_QYxQNg) and these are the error messages I get when trying to run slight interations of the code: [error1](https://1drv.ms/u/s!At0AYxnCavH1gddgGITUMDsuNWIypQ) [error2](https://1drv.ms/u/s!At0AYxnCavH1gddhhUp3kuV6Z7gNMQ) [error3](https://1drv.ms/u/s!At0AYxnCavH1gddiU_OGyj5BJiPI3A) – Daniel Apr 05 '18 at 01:54
  • Try updating the version of the sample code from the portal. – wBob Apr 05 '18 at 08:41
  • Are you able to provide a reference of what you mean by this? – Daniel Apr 05 '18 at 08:55
  • Try [here](https://blogs.msdn.microsoft.com/azuredatalake/2017/02/20/enabling-u-sql-advanced-analytics-for-local-execution/) but you must have done this at least once already? – wBob Apr 05 '18 at 12:33
  • Yes I've tried that. Still doesn't seem to work. Have tried spinning up a whole new azure and still get the "Cannot implicitly convert type 'Cognition.Text.KeyPhraseExtractor' to 'Microsoft.Analytics.Interfaces.IProcessor'" error. – Daniel Apr 08 '18 at 20:58
  • Have you tried this in the local emulator with your corrected file? – wBob Apr 08 '18 at 21:52
  • Do you have the latest Cognitive bits? See [Registering Cognitive Extensions in U-SQL](https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/cognitive-capabilities-in-u-sql#registeringExtensions) and follow the steps if you are unsure. – David Paul Giroux Apr 09 '18 at 16:48