1

Why there are significant differences for Sentiment/Emotion scores, between input method as URL, and direct text from the same URL?

For example:

  • URL: http://economictimes.indiatimes.com/markets/stocks/news/greed-could-turn-into-fear-anytime-keep-strict-stop-losses-for-long-positions-jimeet-modi/articleshow/53569552.cms.

  • Text(extracted from the url above):

    The Nifty50 opened with high spirit at the beginning of the week, dipped mid-week, but managed to bounce back supported by liquidity gush in the system. The PMI data points towards an acceleration in the economy. The macro indicator suggests whopping 5.2 percent expansion in July as against 2.8 percent in May and 2.8 percent in the corresponding previous year. Markets too are continuously discounting the encouraging macroeconomic numbers. July automobiles growth numbers surprised the Street. Passenger vehicles registered on an average 12 per cent growth signaling a loud and clear economic robustness in the system. Path breaking new laws will spearhead the country to become the second-largest economy in the world by the turn of this decade. Key events of the week: Foundation for the historic path-breaking tax reform has been laid down last week. Now the superstructure will be built over a period of time through the state approvals etc. GST will truly lead India to accelerated corruption free inclusive growth for the masses in the country. Far reaching amendments were cleared by the lawmakers for speedy and hassle free debt recovery in a time bound manner further enhancing the Bankruptcy Code for making India Bad Debt free economy. Potentially now, the ecosystem for PSU Banks will change permanently and they too will emerge as profitable as their private sector peers. We recommend this Video for youADSPARC PTY LTDRecommended By Colombia Technical Outlook: The Nifty50 has renewed the upward momentum amid overextended rally. However, the rally is not supported by the momentum indicators. But, markets can remain at overbought levels for extended periods of time during liquidity driven rallies. Greed is keeping the markets at alleviated levels. However, the sentiments can change from greed towards fear, overnight on the occurrence of some negative news, causing corrections to begin. Traders should trail their stops on their long positions and investors should stay on the sidelines till the market comes to touch the lower level of the regression channel which comes at around 8300-8400 levels in Nifty50. Long term trend is firmly intact but short term is ripe for a correction. Expectations for the week: The market is mesmerised in hopes of macro factors being in favour for further economic growth and expansion. The market will show a lot of activity in the mid cap space and therefore the front line index may not show the underlying volatility in the mid-cap space. Companies operating in the industry wherein lot of unorganized player operate will get benefitted out of GST. Favourable monsoon and coming festive season will keep the market at alleviated levels. Any correction should be utilized for building long-term portfolios. Traders should play the momentum stocks and trail the profits. The Nifty50 closed higher by 0.52 percent at 8,683.

German Attanasio
  • 22,217
  • 7
  • 47
  • 63
Mahfooz
  • 11
  • 3
  • Yeah, I just forgot to add example. Lets take this URL and text from this article http://economictimes.indiatimes.com/markets/stocks/news/greed-could-turn-into-fear-anytime-keep-strict-stop-losses-for-long-positions-jimeet-modi/articleshow/53569552.cms – Mahfooz Aug 06 '16 at 17:52

2 Answers2

1

When using a URL, AlchemyLanguage tries to extract the important information from a web page, removing navigation links, advertisements, and other undesired content. In this case, I think the extracted text seems to be different than the one you supplied manually using the text endpoint.

If you use TEXT, you are basically sending the text you want to analyze so you don't have irrelevant text like with the URL.


AlchemyLanguage allows you to see the extracted text when using a URL. Just add showSourceText=1 to the request. That will show you the text that was used during the analysis.

See: http://www.ibm.com/watson/developercloud/alchemy-language/api/v1/#emotion_analysis

German Attanasio
  • 22,217
  • 7
  • 47
  • 63
  • 1
    Thanks @german-attanasio . I thought Watson can extract the most relevant input data from URL. I just need to be carefull. – Mahfooz Aug 07 '16 at 00:33
  • @German : AlchemyLanguage will not extract "ALL" the text from the HTML. It tries to extract the important information from a webpage, removing navigation links, advertisements, and other undesired content. – RAVI Aug 07 '16 at 19:43
  • Looks like Alchemy's "Text Extraction" is flawed. In some cases it's not even able to extract complete text from the URL. Take this as an example - http://timesofindia.indiatimes.com/tech/tech-news/Quantum-computing-gets-a-boost-from-new-form-of-light/articleshow/53582513.cms . When I gave URL as input it did not extracted last 3 paras. I think this should be easy to fix as many other chrome extensions and Safari are doing it successfully. (I am using https://alchemy-language-demo.mybluemix.net/ for testing). – Mahfooz Aug 08 '16 at 23:36
  • @Mahfooz : Trick of timesofindia - Replace articleshow with articleshowprint - http://timesofindia.indiatimes.com/tech/tech-news/Quantum-computing-gets-a-boost-from-new-form-of-light/articleshowprint/53582513.cms – RAVI Aug 10 '16 at 23:12
  • @Ravi Nice trick. Thanks. Can be used in automated text extraction. – Mahfooz Aug 12 '16 at 01:50
1

Watson tries to extract the most relevant input data from URL. But in some cases it may not get exact text data as per our definition of main content.

In your case as per your definition of main content last para of the article was extracted extra from URL. (Last para of the article is ambiguous, someone may consider it as part of the article, someone may not.)

Last Para Text from URL:

(The author is CEO, SAMCO Securities. Views and recommendations expressed in this section are his own and do not represent those of ETMarkets.com. Please consult your financial advisor before taking any position.)

As there are some Entities/Keywords/Tokens in last para which can affect overall sentiment score, you will find some difference between 2 scores.

You can check online demo for more information : Online Demo

For API you can check : showSourceText and sourceText Parameters

Ref : Alchemy Sentiment API

RAVI
  • 3,143
  • 4
  • 25
  • 38