Questions tagged [tidy]

Tidy is a C library for cleaning up "bad" HTML. Don't use this tag for questions about keeping your code tidy.

Tidy is a library written in C for converting HTML that is syntactically incorrect to correct HTML or to XHTML. Especially useful when you are scraping web pages with curl and XML parsing functions because XML parsing functions don't accept bad HTML. Extensions for Tidy are available in PHP and Perl. The Tidy extension in PHP supports functions to covert bad HTML to XHTML with various options like dropping deprecated tags like font tag and hiding comments and dropping proprietary tags and dropping empty paragraphs and a lot more.

571 questions
4
votes
2 answers

JTidy Node.findBody() — How to use?

I'm trying to do XHTML DOM parsing with JTidy, and it seems to be rather counterintuitive task. In particular, there's a method to parse HTML: Node Tidy.parse(Reader, Writer) And to get the of that Node, I assume, I should use Node…
ansgri
  • 2,126
  • 5
  • 25
  • 37
4
votes
2 answers

How to best use JTidy with a Spring servlet container?

I have a Java servlet container using the Spring Framework. Pages are generated from JSPs using Spring to wire everything up. The resulting HTML sent to the user isn't as, well, tidy as I'd like. I'd like to send the HTML to Tidy right before…
Dean J
  • 39,360
  • 16
  • 67
  • 93
4
votes
1 answer

HTML Tidy: how to set 'force-output' to 'yes'?

I am using HTML Tidy in the command line environment for Windows. I need to force the conversion of some html files to xml, even if there are errors. I do the following steps: create a file "conf.txt", whose content is: force-output: yes type the…
4
votes
2 answers

PHP Tidy removes valid tags

I'm using php extension tidy-html to clean up php output. I know that tidy removes invalid tags and can't even handle HTML5 doctype, but I'm using tag which used to be in HTML specifications. However, it gets changed for
    anyway. Oddly…
Amunak
  • 456
  • 5
  • 19
4
votes
2 answers

How to set up Dave Raggett's HTML Tidy on Windows?

I'm having a hard time looking for an easy to understand instruction to download, set up and use this HTML Tidy by Dave Raggett. Please help, I need this kind of tool that can almost perfectly scan html errors. If there is a GUI version for the…
fiberOptics
  • 6,955
  • 25
  • 70
  • 105
4
votes
2 answers

HTML Tidy. Please don't add end tags

I have three files. header.php index.php footer.php The header file contains from to
The index file contains page content The footer file contains
to Together they contain a normal HTML file with PHP When I…
Hedam
  • 2,209
  • 27
  • 53
3
votes
6 answers

Clean up PHP/HTML pages

Does anybody know of a good tool that cleans up files with php and html in it? I've used Tidy before but it doesn't do a good job at leaving the php code alone. I know there are various implementations of tidy but does any tool reign champion…
RDW
  • 31
  • 1
  • 2
3
votes
1 answer

tidying html with ckeditor

Hi I've got a small problem with ckeditor, basically I need to make the editor run it's html cleanup command. Is there any way of doing this. At present it doesn't seem to run after I type some stuff into the source and then press save I would like…
Richard Housham
  • 864
  • 2
  • 15
  • 34
3
votes
1 answer

Whitespace remover for rails ERB templates?

after googling around with no success i try it here. i'm looking for a rails gem, which removes whitespace from rendered ERB templates, so code structure etc. gets removed, just compressed html code. any tip here? thanks in advance
trnc
  • 20,581
  • 21
  • 60
  • 98
3
votes
5 answers

Force HTML Tidy to output XML (instead of XHTML), or force XSLTproc to parse XHTML files

I have a large number of HTML files that I need to process with XSLT, using an XML file to choose which HTML files, and what we're doing with them. I tried: Use HTML Tidy to convert HTML -> XHTML / XML Use document(filename) in XSLT to read in…
Adam
  • 32,900
  • 16
  • 126
  • 153
3
votes
3 answers

R : How to extract the factor levels as numeric from a column and assign it to a new column using tydyverse?

Suppose I have a data frame, df df = data.frame(name = rep(c("A", "B", "C"), each = 4)) I want to get a new data frame with one additional column named Group, in which Group element is the numeric value of the corresponding level of name, as shown…
Wang
  • 1,314
  • 14
  • 21
3
votes
1 answer

How to tidy up malformed xml in ruby

I'm having issues tidying up malformed XML code I'm getting back from the SEC's edgar database. For some reason they have horribly formed xml. Tags that contain any sort of string aren't closed and it can actually contain other xml or html…
hadees
  • 1,754
  • 2
  • 25
  • 36
3
votes
1 answer

tidy function cannot be used within future_map?

I have R code below. for the last row, when I used map() function, it worked well. however, when I changed to future_map() function, I got the following error message: "Error: Problem with mutate() column model. i model = future_map(splits, fun1). x…
YanLi
  • 31
  • 2
3
votes
3 answers

How to split a column into multiple (non equal) columns in R

I'm working with a string that is a list of elements separated by commas. I want to separate the string so that each element has its column. But I'm having trouble because there are a different number of elements per list. X1 <- "a,b,c" X2 <-…
Sharif Amlani
  • 1,138
  • 1
  • 11
  • 25
3
votes
2 answers

xsltproc html documents

I'm trying to clean some htmls. I have converted them to xhtml with tidy $ tidy -asxml -i -w 150 -o o.xml index.html The resulting xhtml ends up having named entities. When trying xsltproc on those xhtmls, I keep getting errors. $ xsltproc…
vangop
  • 175
  • 2
  • 8