4

I'm trying to parse RSS feeds from a url using feedparser in python.

>>> import feedparser 
>>> d = feedparser.parse('http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801')
>>> d
{'feed': {'summary': u'<span><h1>Server Error in \'/mobile\' Application.<hr color="silver" size="1" width="100%" /></h1>\n\n            
<h2> <i>Attempted to divide by zero.</i> </h2></span>\n\n            <font face="Arial, Helvetica, Geneva, SunSans-Regular, sans-serif ">\n\n            <b> Description: </b>An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.\n\n            <br /><br />\n\n            <b> Exception Details: </b>System.DivideByZeroException: Attempted to divide by zero.<br /><br />\n\n            
<b>Source Error:</b> <br /><br />\n\n            <table bgcolor="#ffffcc" width="100%">\n               <tr>\n                  <td>\n                      <code>\n\nAn unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.</code>\n\n                  </td>\n               </tr>\n            </table>\n\n            <br />\n\n            <b>Stack Trace:</b> <br /><br />\n\n            <table bgcolor="#ffffcc" width="100%">\n               <tr>\n                  <td>\n                      <code><pre>\n\n[DivideByZeroException: Attempted to divide by zero.]\n   System.Decimal.FCallDivide(Decimal&amp; d1, Decimal&amp; d2) +0\n   System.Decimal.Divide(Decimal d1, Decimal d2) +17\n   Martjack.CMS.PageControlsModelComp.GetPluginDataEnt(PageControlEnt objPageControlEnt, MerchantENT MerchantEnt, PageControlModel&amp; objPageControlModel, ProductEnt_RE ProductEnt, String MobileVersion) +2324\n   
Martjack.CMS.PageControlsModelComp.GetPageControlOutputData(PageModel pagemodel, PageControlEnt objPageControlEnt, MerchantENT MerchantEnt, String seocid, String combiType, String MobileVersion, ProductEnt_RE ProductEnt, String siteurl) +694\n   Martjack.CMS.PageControlsModelComp.GetPageControlModels(PageModel Pagemodel, MerchantENT MerchantEnt, String seocid, String combiType, String MobileVersion, DNDPageControlViewCollection objDNDPageControlViewCollection, Boolean isdndrequest, Int64 pgcontrolid, String siteurl) +919\n   Martjack.CMS.PageModelComp.GetPageModel(MerchantENT MerchantEnt, Int32 predefinedPageId, Boolean isPredefined, ChannelType channel, String seocid, String Bid, String combiType, String MobileVersion, Boolean isDndRequest, 
DNDPageControlViewCollection ObjDNDPageControlViewCollection, Boolean ControlsInfo, Int64 pgcontrolid) +1717\n   MartJack.Facade.CMSFacade.GetPageModel(MerchantENT MerchantEnt, Int32 PageId, Boolean isPredefined, ChannelType channel, String seocid, String bid, String combitype, String mobileversion, Boolean isDndRequest, DNDPageControlViewCollection ObjDNDPageControlViewCollection, Boolean ControlsInfo, Int64 pgcontrolid) +119\n   MobileECommerce.MobileECommerce.ProductsController.GetPageModelByRequest(String seoid, String bid) +227\n   MobileECommerce.MobileECommerce.ProductsController.Index(String id, String seobrand, String category, String categoryparent) +54\n   lambda_method(Closure , ControllerBase , Object[] ) +272\n   
System.Web.Mvc.ActionMethodDispatcher.Execute(ControllerBase controller, Object[] parameters) +17\n   System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters) +212\n   System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary`2 parameters) +239\n   System.Web.Mvc.&lt;&gt;c__DisplayClass15.&lt;InvokeActionMethodWithFilters&gt;b__12() +56\n   System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodFilter(IActionFilter filter, ActionExecutingContext preContext, Func`1 continuation) +282\n   System.Web.Mvc.&lt;&gt;c__DisplayClass17.&lt;InvokeActionMethodWithFilters&gt;b__14() +20\n   System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodWithFilters(ControllerContext controllerContext, IList`1 filters, ActionDescriptor actionDescriptor, IDictionary`2 parameters) +201\n   System.Web.Mvc.ControllerActionInvoker.InvokeAction(ControllerContext controllerContext, String actionName) +351\n   System.Web.Mvc.Controller.ExecuteCore() +99\n   System.Web.Mvc.ControllerBase.Execute(RequestContext requestContext) +94\n   System.Web.Mvc.ControllerBase.System.Web.Mvc.IController.Execute(RequestContext requestContext) +10\n   
System.Web.Mvc.&lt;&gt;c__DisplayClassb.&lt;BeginProcessRequest&gt;b__5() +43\n   System.Web.Mvc.Async.&lt;&gt;c__DisplayClass1.&lt;MakeVoidDelegate&gt;b__0() +21\n   System.Web.Mvc.Async.&lt;&gt;c__DisplayClass8`1.&lt;BeginSynchronous&gt;b__7(IAsyncResult _) +12\n   System.Web.Mvc.Async.WrappedAsyncResult`1.End() +53\n   System.Web.Mvc.Async.AsyncResultWrapper.End(IAsyncResult asyncResult, Object tag) +28\n   System.Web.Mvc.Async.AsyncResultWrapper.End(IAsyncResult asyncResult, Object tag) +15\n   System.Web.Mvc.&lt;&gt;c__DisplayClasse.&lt;EndProcessRequest&gt;b__d() +34\n   System.Web.Mvc.SecurityUtil.&lt;GetCallInAppTrustThunk&gt;b__0(Action f) +7\n   System.Web.Mvc.SecurityUtil.ProcessInApplicationTrust(Action action) +23\n   System.Web.Mvc.MvcHandler.EndProcessRequest(IAsyncResult asyncResult) +68\n   
System.Web.Mvc.MvcHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result) +9\n   System.Web.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +714\n   System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean&amp; completedSynchronously) +240\n</pre></code>\n\n                  </td>\n               </tr>\n            </table>\n\n            <br />\n\n            
<hr color="silver" size="1" width="100%" />\n\n            <b>Version Information:</b>\xa0Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.0.30319.272\n\n            </font>'}, 'status': 302, 'version': u'', 'encoding': u'utf-8', 'bozo': 1, 'headers': {'content-length': '11348', 'x-powered-by': 'ASP.NET', 'set-cookie': 'SERVERID=HAS14; path=/', 'originserver': 'HAS14', 'server': 'Microsoft-IIS/7.5', 'connection': 'close', 'cache-control': 'private', 'date': 'Tue, 16 Apr 2013 08:03:59 GMT', 'content-type': 'text/html; charset=utf-8', 'x-aspnet-version': '4.0.30319'}, 'href': 
u'http://www.shop.inonit.in/mobile/Products//NA/NA/0', 'namespaces': {}, 'entries': [], 'bozo_exception': SAXParseException('not well-formed (invalid token)',)}

I get nothing in the output, while if you go to the link (http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801) it shows a whole lot of stuff! Maybe it is redirecting me to some other page that does't exist (as I tried crawling individual pages of this website using scrapy, but couldn't, since I was redirected to some non-existant url).

Any help on this would be great. Thanks!

user_2000
  • 1,103
  • 3
  • 14
  • 26
  • What do you mean by "nothing in the output"? `>>> len(d['feed']['summary']) 5601`, there is a nice 'divide by zero' message in there. ` – Steven Almeroth Apr 16 '13 at 15:17
  • 1
    ah im sorry, by nothing I meant nothing relevant, as in the elements(title, price etc), clearly its not able to read the feeds, but if you open the link, you'll see all the data – user_2000 Apr 16 '13 at 19:35

1 Answers1

4

Are you using proxy? If you are, do it this way -

import urllib2, feedparser
proxy = urllib2.ProxyHandler({"http":"proxy:port"})
d = feedparser.parse('http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801', handlers = [proxy])
Jayanth
  • 41
  • 4