Questions tagged [tidy]

Tidy is a C library for cleaning up "bad" HTML. Don't use this tag for questions about keeping your code tidy.

Tidy is a library written in C for converting HTML that is syntactically incorrect to correct HTML or to XHTML. Especially useful when you are scraping web pages with curl and XML parsing functions because XML parsing functions don't accept bad HTML. Extensions for Tidy are available in PHP and Perl. The Tidy extension in PHP supports functions to covert bad HTML to XHTML with various options like dropping deprecated tags like font tag and hiding comments and dropping proprietary tags and dropping empty paragraphs and a lot more.

571 questions
7
votes
1 answer

PHP: Gmail's messages contain invalid HTML and random jargon

I'm creating an email-based CMS with PHP, and I'm required to use Gmail as the email service. The script is insanely simple for now, and the only problem I'm having is dealing with Gmail's email syntax. I was expecting something a bit more…
Blender
  • 289,723
  • 53
  • 439
  • 496
7
votes
2 answers

Regex to remove xml declaration from a string

First of all, I know this is a bad solution and I shouldn't be doing this. Background: Feel free to skip However, I need a quick fix for a live system. We currently have a data structure which serialises itself to a string by creating "xml"…
xan
  • 7,440
  • 8
  • 43
  • 65
7
votes
1 answer

How to configure ob_tidyhandler dynamically?

The PHP tidy extension has a function ob_tidyhandlerDocs that works with PHP output bufferingDocs as a callback, e.g.: ob_start('ob_tidyhandler'); I know that Tidy has a lot of configuration settingsDocs, however I am hitting a road block to setup…
hakre
  • 193,403
  • 52
  • 435
  • 836
6
votes
2 answers

Tidy HTML output with JavaScript

I have a large chunk of HTML. In order for it to fit a certain container, I crop the HTML (not just the text) at, let’s say, 200 characters. Obviously, some of the tags will remain unclosed in this case. Is there a way, except writing the cleaner…
spliter
  • 12,321
  • 4
  • 33
  • 36
6
votes
1 answer

How can I find how many locations near a radius of 250 meters

I have the following dataframe: df <- tribble(~ id, ~ lon, ~ lat, 1, -56.2112038, -34.8358207, 2, -55.96403429999999, -34.7260945, 3, -56.155449, -34.9030824, …
Paula
  • 497
  • 2
  • 8
6
votes
0 answers

Parse HTML5 with xmllint invalid tag

I'm trying to parse html5 with xmllint, and it's generating errors on certain tags. To make sure it's valid I piped the output through tidy first, but it generated the same errors. I only want to extract the text. Is there any way to read these…
Matts
  • 1,301
  • 11
  • 30
6
votes
5 answers

HTML Indentation in the mVc World

Here is a question that has been bugging me for a while, nowadays it's considered a good practice to indent HTML code but I've some reservations indenting code in a MVC pattern, here is a (silly) example: HTML Code:
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
6
votes
3 answers

Is there a utility to tidy VBScript?

I'm wanting a tool to tidy VBScript code. I'm looking for something to do the same job as perltidy for Perl, or astyle for C++ and Java code. I've looked, but failed to find anything here or via Google. Open Source software would be preferred. Can…
Bobby
  • 199
  • 5
6
votes
2 answers

PHP Tidy class not found, error

I was writing some code for repair html string. I read some nice solutions which work with the Tidy PHP class but I had some troubles with it. What in this post is written, is exactly what I want but I need to install / load the PHP Tidy…
Roberto Rizzi
  • 1,525
  • 5
  • 26
  • 39
6
votes
1 answer

PHP DOM append HTML to existing document without DOMDocumentFragment::appendXML

I need to load some arbitrary HTML into an existing DOMDocument tree. Previous answers suggest using DOMDocumentFragment and its appendXML method to handle this. As @Owlvark indicates in the comments, xml is not html and therefore this is not a good…
wmarbut
  • 4,595
  • 7
  • 42
  • 72
6
votes
1 answer

Indenting by spaces using HTMLTidy in Notepad++

How do I make it so that instead of indenting my elements by multiples of 2 spaces (when formatting without wrapping is selected), HTMLTidy indents them by multiples of tabs (4 spaces long but only 1 byte in size)?
Max
  • 115
  • 1
  • 3
  • 6
5
votes
2 answers

jTidy returns nothing after tidying HTML

I have come across a very annoying problem when using jTidy (on Android). I have found jTidy works on every HTML Document I have tested it against, except the following:
Henry Thompson
  • 2,441
  • 3
  • 23
  • 31
5
votes
2 answers

Rhub CRAN check keeps giving HTML note on Fedora test - no command 'tidy' found

I'm developing an R package to be uploaded to CRAN and I keep getting a NOTE when I run devtools::check_rhub() The results I'm getting are: > On fedora-clang-devel (r-devel) checking HTML version of manual . . . NOTE Skippping checking HTML…
5
votes
5 answers

Screenscraping the ugliest HTML you've ever seen in your life

I'm using PHP and libtidy to attempt to screen scrape what might possibly be the most horrendous and malformed use of HTML tables in history. The site closes few table, tr, td, font, or bold tags and consistently nests many different layers of…
Andy Baird
  • 6,088
  • 4
  • 43
  • 63
5
votes
2 answers

How to get xdmp:tidy() to tidy up HTML5?

With the new doctype and elements that are part of HTML5, how do you get xdmp:tidy() to recognize those in HTML5? If I have an html page that contains something like:
blah
blah
and…
RyanS
  • 51
  • 2
1 2
3
38 39