Questions tagged [tidy]

Tidy is a C library for cleaning up "bad" HTML. Don't use this tag for questions about keeping your code tidy.

Tidy is a library written in C for converting HTML that is syntactically incorrect to correct HTML or to XHTML. Especially useful when you are scraping web pages with curl and XML parsing functions because XML parsing functions don't accept bad HTML. Extensions for Tidy are available in PHP and Perl. The Tidy extension in PHP supports functions to covert bad HTML to XHTML with various options like dropping deprecated tags like font tag and hiding comments and dropping proprietary tags and dropping empty paragraphs and a lot more.

571 questions
5
votes
2 answers

Hebrew characters processed by HTML Tidy turn into gibberish

I'm using HTML Tidy Online (http://infohound.net/tidy/) to tidy up some very old and messed up HTML file which contains some Hebrew characters. Whenever the page is processed by Tidy the output turns Hebrew characters into gibberish, even after…
Charles
  • 157
  • 1
  • 10
5
votes
2 answers

tidyr separate only last n instances

I have a data.frame in R, which, for simplicity, has one column that I want to separate. The following sample snipped using tidyr::separate, almost does the job: tmp2 <- data.frame( varTreatName = c( "resp_Nadd_belowCanopy",…
Thomas Wutzler
  • 255
  • 1
  • 9
5
votes
2 answers

No linebreak after tags in tidy

HI, I have the following input:

Hi you

I'd like to treat this as XML. I run Tidy on the cmd-line with the following options: input-xml: yes output-xml: yes indent: no My output is this:

Hi you

However…
Aaron
  • 3,249
  • 4
  • 35
  • 51
5
votes
1 answer

Validate HTML5 Document in PHP using Tidy

I am trying to clean up a HTML string and create an HTML5 document using Tidy and PHP, however, am creating a HTML3.2 document. As seen, I am getting an Config: missing or malformed argument for option: doctype error. I am operating PHP Version…
user1032531
  • 24,767
  • 68
  • 217
  • 387
5
votes
0 answers

Insatlled TinyManaged NuGet but can't run due to missing libtidy.dll

I installed the TidyManaged NuGet package and wrote some basic code to convert an HTML file to XHTML but when I run it I get the following error: An unhandled exception of type 'System.DllNotFoundException' occurred in TidyManaged.dll Additional…
Matthew Verstraete
  • 6,335
  • 22
  • 67
  • 123
5
votes
1 answer

how to make syntastic with html tidy aware of ionic tags?

I'm trying to edit an ionic application with vim that has syntastic enabled using html tidy. Unfortunately, I'm getting a load of errors. How can I make html tidy aware of ionic tags, or failing that make it ignore them so that I don't receive…
Chris Snow
  • 23,813
  • 35
  • 144
  • 309
5
votes
3 answers

Proper usage of JTidy to purify HTML

I am trying to use JTidy (jtidy-r938.jar) to sanitize an input HTML string, but I seem to have problems getting the default settings right. Often strings such as "hello world" end up as "helloworld" after tidying. I wanted to show what I'm doing…
ragebiswas
  • 3,818
  • 9
  • 38
  • 39
5
votes
1 answer

HTML Tidy stripping space at the start

File.html word ratti Command $ tidy File.html Output wordratti Desired output word ratti Where's the space? Log line 1 column 1 - Warning: missing declaration line 1 column 1 - Warning:…
Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133
5
votes
7 answers

Configuring and Using HTML Tidy

I would like to use Textmate's built-in Tidy (Ctrl+Shift+H) functionality to indent my HTML 'without modifying anything' in the code. I write pretty neat HTML already, I just need Tidy to indent my code with Soft-tabs. Currently it breaks a lot of…
eozzy
  • 66,048
  • 104
  • 272
  • 428
5
votes
2 answers

Configuring HTML Tidy to indent tags and nothing else in Notepad++

All I want HTMLTidy to do is indent my HTML document's tags, but it currently also changes the doctype, adds an xmlns attribute to the html tag, changes
tags, and probably does some other stuff. How do I make it so that HTMLTidy in Notepad++…
Max
  • 115
  • 1
  • 3
  • 6
5
votes
2 answers

php tidy strange behaviour

I'm using php's tidy library to "clean and repair" some html coming from user input. Everything works fine, but i'm running into a problem that I can't figure out what its cause is. My code is like this: $tidy = new tidy(); $tidy_options =…
CdB
  • 4,738
  • 7
  • 46
  • 69
5
votes
1 answer

Pure Python Tidy-like application/library

I'm looking for a pure Python library which works like Tidy. Please kindly advise. Thank you.
Viet
  • 17,944
  • 33
  • 103
  • 135
4
votes
2 answers

Beautiful Soup and uTidy

I want to pass the results of utidy to Beautiful Soup, ala: page = urllib2.urlopen(url) options = dict(output_xhtml=1,add_xml_decl=0,indent=1,tidy_mark=0) cleaned_html = tidy.parseString(page.read(), **options) soup =…
jldugger
  • 2,339
  • 6
  • 22
  • 24
4
votes
2 answers

Good JavaScript-based tidying for HTML, CSS and JS?

Can anyone recommend a JavaScript-based code reformatter for HTML, CSS and JS? I'm making a web-based IDE and I'd like to be able to tidy up some code without having to refresh the page or wait for an ajax request. Note: I need it to be able to…
mpen
  • 272,448
  • 266
  • 850
  • 1,236
4
votes
1 answer

Using HTML Tidy in Visual C++ 2010 Windows Forms project

I am using VC++ 2010 Express and I am attempting to include HTML Tidy to perform cleanup on HTML code strings. What I want to do is process the HTML as a string (NOT from a file) and save the processed cleaned HTML to a string (NOT to a file). The…
Jason
  • 236
  • 1
  • 3
  • 9