Questions tagged [html-parser]

HTML Parser is a Java HTML parsing library. It features filters, visitors, custom tags and easy to use JavaBeans.

211 questions
0
votes
1 answer

Htmlparser can not parse "tbody" in Java

In org.htmlparser I want to get tbody node by id Parser htmlParser = Parser.createParser("
idleman
  • 1
  • 1
0
votes
1 answer

finding all anchor tags with pdf file as the source in php using regex or HTML parser

how can i find all the anchor tags with source pdf $string="hello this is a dummy text i need only abc.pdf in the string variable
Bipin Chandra Tripathi
  • 2,550
  • 4
  • 28
  • 45
0
votes
1 answer

Getting exact symbol using HTMLParser

HTMLParser.unescape behaves like this: >>> import HTMLParser >>> h= HTMLParser.HTMLParser() >>> h.unescape('alpha < β') u'alpha < \u03b2' What should I do to get the exact beta symbol instead of \u03b2? Thanks
bdhar
  • 21,619
  • 17
  • 70
  • 86
0
votes
1 answer

Why JSoup parse wrong my HTML code?

I'm trying to parse a web page, but when I want to get a piece of text in the page. Jsoup get me a wrong Document when I call Jsoup.parse() and Jsoup.connect().get() methods. This is a piece of the web page and my code. The doc var has a wrong DOM. …
Beni
  • 45
  • 1
  • 6
0
votes
2 answers

How to find the error line in HTML when HTMLParserError occurs

now i am writing a web crawler using python, but sometimes it throws HTMLParserError: junk characters in start tag: u'\u201dTPL_password_1\u201d\r\n\t\t', at line 21285, column 6 it said the error was found at line 21285, does it mean that the error…
Searene
  • 25,920
  • 39
  • 129
  • 186
0
votes
2 answers

java.lang.NoClassDefFoundError: org/htmlparser/util/ParserException

I'm trying to make this http://htmlparser.sourceforge.net/ code run in eclipse. There instructions are simply "To use the library, you will need to add either the htmllexer.jar or htmlparser.jar to your classpath when compiling and running." I've…
itgiawa
  • 1,616
  • 4
  • 16
  • 28
-1
votes
1 answer

How to get twitter profile name using python BeautifulSoup module?

I'm trying to get twitter profile name using profile url with beautifulsoup in python, but whatever html tags I use, I'm not able to get the name. What html tags can I use to get the profile name from twitter user page ? url =…
mac
  • 863
  • 3
  • 21
  • 42
-1
votes
1 answer

AttributeError: 'NoneType' object has no attribute 'text' - BeautifulSoup to CSV

I am trying to make a webscraper that looks through multiple pages to make a csv list for me. When I run the basic of the code it works, but when I have it iterate to go to multiple pages I get a…
-1
votes
1 answer

update text in HTML page using parser

I always gets error in the middlebitparent.replaceWith(nodespan); in the following code which is written in jsoup to navigate the HTML doc and change the background color of word "In" Elements divs= doc.select("div"); for(Element div :…
Bachayer
  • 37
  • 2
  • 7
-1
votes
1 answer

How to use html.parser

Hi everyone I am new to python and trying to use html.parser module of python, I want to scrape this website and fetch the urls, deal name and price with html.parser which is present inside an li…
user13723363
-1
votes
1 answer

Extract nth tags from HTML after specific tag with beautifulsoup

Using beautifulsoup, I have some HTML var like this : ....<\head>

my title <\h2>

text text text <\p>

text2 text2 <\p>

my title 2<\h2>

text text text <\p> I want to extract every and the next tags. Example i…

-1
votes
1 answer

how to crawl through html string content (tag by tag) using python

I have html string and would like to find the text elements and replace with the tokens. I used beautifulsoup to get the data but get_text is giving only text not corresponding elements. How to go thorugh html string from root node to last node and…
-1
votes
3 answers

RegEx for capturing an attribute value in a HTML element

I have a problem to extract text in the html tag using regex. I want to extract the text from the following html code. Google The result: TEXTDATA I want to…
elevaku
  • 7
  • 2
-1
votes
1 answer

Want to add some line at particular location in html code

I want to add some line of code in html code at some particular location. I want to know which library will be more helpful BeautifulSoup or html parser? I just to want to add a new line and then write a line of code there. Please help out. I need…
deep5459
  • 35
  • 8
-1
votes
1 answer

How to find only the content of a specific HTML tag using c++?

I'm writing a program where into a string I've the code of an HTML page. Now, I need to get the text between the
tag. My html page contain more than one article tag, so I need to get the text of the different article tags An example of…
Silvia B
  • 45
  • 3
  • 10
12