I am scraping the fields from http://164.100.47.132/LssNew/psearch/QResult16.aspx?qref=15844. All fields are properly 'returned' on console, with usual HTML tags. I need to pipe these fields to a CSV file (CSVItemExporter). If I try to put the html response in a temp register and apply the converter operation in the second step when assigning to the item field, I get a separate set of error messages.
I tried solutions in BeautifulSoup get_text and html2text, as in Is it possible that Scrapy to get plain text from raw html data directly instead of using xPath selectors? and How can I get all the plain text from a website with Scrapy?. The solutions therein 'print' well but fail to assign to the respective fields.
Any converter operation on the response function (converter(response +extract)) leads to errors such as "str object has no attribute 'get_text'" (html2text) or returns text with random \r\n items inserted (BeautifulSoup). I suspect this is because of hard CRs in the original text, which the original author may have put to keep stuff aligned. How do I get around this problem? Python 2.7 on Win 32.