Questions tagged [tidy]

Tidy is a C library for cleaning up "bad" HTML. Don't use this tag for questions about keeping your code tidy.

Tidy is a library written in C for converting HTML that is syntactically incorrect to correct HTML or to XHTML. Especially useful when you are scraping web pages with curl and XML parsing functions because XML parsing functions don't accept bad HTML. Extensions for Tidy are available in PHP and Perl. The Tidy extension in PHP supports functions to covert bad HTML to XHTML with various options like dropping deprecated tags like font tag and hiding comments and dropping proprietary tags and dropping empty paragraphs and a lot more.

571 questions
0
votes
1 answer

tidy_parse_string expects exactly 1 parameter, 2 given

I have HTML tidy extension on my home computer using PHP 5.2.11 (Windows - WAMP), and I use this to clean up HTML: $data = tidy_parse_string($data, array( 'clean' => TRUE, 'indent' => 0, 'output-xhtml' => true, 'wrap' => 7000, …
mwieczorek
  • 2,107
  • 6
  • 31
  • 37
0
votes
1 answer

How to get the rows of an html table with tidylib in C?

As I ask in the title, I can't get the rows of an html tabl in C using tidylib. I have read the documentation at http://tidy.sourceforge.net/docs/api/ but I really can't find what I want. After a CURL call, I save the result in a file or in a…
Drew
  • 251
  • 1
  • 3
  • 13
0
votes
1 answer

parse .htm file/url into .xml file

I am trying to transform a .htm webpage into .xml file using JTidyand will need to extract some data/anchor element in .xml file. However, when doing the transforming step, it always results in a error file and tells me Warning: unknown attribute…
Zzz...
  • 291
  • 6
  • 19
0
votes
1 answer

Clean up HTML keeping custom tags

I have an string like following. LUSAKA (AP) -- X&Y Ltd. & M.K. Ltd will be merged. How can I make it valid XML so my etree.XMLParser does not throw error. I need to convert it to something like. LUSAKA
Shiplu Mokaddim
  • 56,364
  • 17
  • 141
  • 187
0
votes
0 answers

extract headers (h1) from html document

I want to use tidypp based on tidylib, executing the following example works fine, but the following example extract hrefs, witch means tag = "tidytag_A" and attribute ="href", I have no idea how can I extract tags like

,

, ... The example…

Redaa
  • 1,540
  • 2
  • 17
  • 28
0
votes
1 answer

Camel: using unmarshal().tidyMarkup() from multiple threads

It's apparently thread safe. However, can anyone tell me if it's locked with single instance of TidyMarkupDataFormat or separate instances are created for separate threads? I mean do we have multiple parsers (one per one thread) or a single parser…
Archer
  • 5,073
  • 8
  • 50
  • 96
0
votes
1 answer

tidy -clean is not removing tags

Hi I am using Tidy in command line. everything is working fine but -clean y is not working. tidy -m sampleon.html --clean y --doctype "strict" after conversion there is no much difference. it has still lot of tags in it.
AahladParadigm
  • 203
  • 2
  • 8
0
votes
1 answer

ImportError: No module named mxTidy

I have installed egenix_mx_experimental-3.0.0-py2.7.But when I run the program, the error occured. Traceback (most recent call last): File "E:/python/pyCharm/131113.py", line 21, in from mx import Tidy File…
lina
  • 1
  • 4
0
votes
1 answer

PHP Tidy removes the closing tag incorrectly

I was testing my PHP Tidy config one day and found that it fail to process any page from the guardian. My config is: $tidy_config = array( 'new-blocklevel-tags' => 'article aside audio figure footer header nav section source track video svg', …
bitinn
  • 9,188
  • 10
  • 38
  • 64
0
votes
2 answers

Is it possible to run tidy at google app engine PHP

It seems like google app engine php doesn't have tidy extension, my question is , is there any way to use tidy on google app engine php? Thanks,
aserww106
  • 1,311
  • 3
  • 12
  • 14
0
votes
3 answers

Tidying up jQuery

I have a page which uses a lot of jQuery code but it is becoming unmanageable. I want to be able to tidy it up to place related parts into separate .js files and include them onto the page in a script tag. I seem to be able to do this by creating…
0
votes
2 answers

How to use tidy with vim without unix linebreak in quickfix window and how to correct only the errors?

After a lot of searching and trying I found this to make my tidy work with vim: :set makeprg=tidy\ -e\ --gnu-emacs\ yes :set shellpipe=2> :set errorformat=%f:%l:%c:\ %m :make % :copen But why does the output in the quickfix window has an ^M unix…
Reman
  • 7,931
  • 11
  • 55
  • 97
0
votes
0 answers

Convert php files with html4 tags to xhtml using tidy

I'm trying to use Tidy to convert a lot of html4 tags in php files to xhtml, like this in Debian: tidy -indent -o xhtmltestnew.php -asxhtml xhtmltest.php I have two problems. 1) Can I make tidy ignore my PHP code? It messes a lot of it up by…
0
votes
1 answer

How to stop TIDYCom from deleting opening tags that don't have closing tags tags during cleanup

TidyCom removes unclosed tags during cleanup. For example, the

tag with missing closing is wiped off when compared with source file. How can this be retained, instead? Here is my code: Dim tid As New TidyObject() tid.Options.Doctype =…

0
votes
2 answers

Jtidy - Shouldn't display encoding character(â„¢) for TM in page source code?

I'm using Jtidy to rendor news information, when news information has TM in it then page source is showing it as '™' which is invalid... Here is my code: InputStream is = new ByteArrayInputStream(description.getBytes()); OutputStream…
TP_JAVA
  • 1,002
  • 5
  • 23
  • 49