Questions tagged [tidy]

Tidy is a C library for cleaning up "bad" HTML. Don't use this tag for questions about keeping your code tidy.

Tidy is a library written in C for converting HTML that is syntactically incorrect to correct HTML or to XHTML. Especially useful when you are scraping web pages with curl and XML parsing functions because XML parsing functions don't accept bad HTML. Extensions for Tidy are available in PHP and Perl. The Tidy extension in PHP supports functions to covert bad HTML to XHTML with various options like dropping deprecated tags like font tag and hiding comments and dropping proprietary tags and dropping empty paragraphs and a lot more.

571 questions
-1
votes
1 answer

Is there a good, general approach to convert semi-structured data to tibble/dataframe in R?

I am new to R programming and most of my experience thus far is with using highly structured rectangular data from a .csv or .xlsx. But now I've been handed about 30 spreadsheets of budget data that look like this: And in order to work with them,…
ScottyJ
  • 945
  • 11
  • 16
-1
votes
2 answers

R creating a comprehensive table of correlation between combinations of columns

Here is a look at my dataset. I'm looking at baseball data. structure(list(INDEX = 1:6, TARGET_WINS = c(39L, 70L, 86L, 70L, 82L, 75L), TEAM_BATTING_H = c(1445L, 1339L, 1377L, 1387L, 1297L, 1279L), TEAM_BATTING_2B = c(194L, 219L, 232L, 209L, 186L,…
hachiko
  • 671
  • 7
  • 20
-1
votes
1 answer

How Can I Bind and Graph Two Books for Similarity of Word Frequency?

I am using Text Mining with R: A Tidy Approach by Julia Silge & David Robinson to try to bind and graph two books, the first by Jane Austen (Persuasion, for which read "persua"), the second by Charlotte Bronte (for which read "janeyre"), in order to…
-1
votes
1 answer

Tidy multivariate data in R

I have a dataset with the following structure: the rows are participants in an experiment, and the columns are questions they answered. All the columns titled EC belong to one type of task, all those titled ART belong to another etc. After reading…
Maria Gold
  • 59
  • 3
  • 6
-1
votes
1 answer

Get body without tags using tidy

http://php.net/manual/en/tidy.body.php will return the body content wrapped with the tag. How do I get the body content without the tag? I've come up with a couple possible solutions, however, they are not very elegant. $tidy = new…
user1032531
  • 24,767
  • 68
  • 217
  • 387
-1
votes
2 answers

How to tidy up things like ú in xml?

I have the following XML file: I want to convert ú to ú. The following way of calling tidy does not work. Does anybody know what is…
user1424739
  • 11,937
  • 17
  • 63
  • 152
-1
votes
2 answers

Format XML using command line

I have a html text file and i want to format it so that paragraphs are always on the same line e.g.

paragraph info here

instead of

paragraph info here

Is there a tool that enables me to do this
rurounisuikoden
  • 269
  • 1
  • 4
  • 16
-1
votes
2 answers

Can you believe php -m says yes and phpinfo() says no?

This is my php.ini extension_dir = "ext" extension=php_tidy.dll [Tidy] ;tidy.default_config = /usr/local/lib/php/default.tcfg tidy.clean_output = Off Can you belive when I type php -m, I get tidy in the list but when I check the phpinfo()…
Baghera
  • 53
  • 6
-2
votes
2 answers

How to move data up and get rid of NA?

photo of current data The data shows NA for some points but the information is right below it. It is the same UPC, Store, and Week. How do I group my data to avoid redundancy and the NA data? This is my code so far: `library(tidyverse) RD <-…
-2
votes
1 answer

extract column names in data.table

I have a data set: N <- 10 dt <- data.table(jan = rnorm(N), feb=rnorm(N), mar = rnorm(N), apr = rnorm(N), may = rnorm(N), jun = rnorm(N), jul= rnorm(N), aug= rnorm(N),sep= rnorm(N), aug = rnorm(N), sep= rnorm(N), oct= rnorm(N),nov=…
tobinz
  • 305
  • 1
  • 7
-2
votes
1 answer

Reshape wide format (87 items) into long format 4 variables

I have a data frame in a wide format (four variables that are rated from 1 to 7) which are repeated for 87 items. My data frame looks like this Subject| Variable 1 for item 1| Variable 2 for item 1| Variable 3 for item 1| Variable 4 for item 1 …
-2
votes
1 answer

How can I convert HTML to XML (which conforms with XML schema or DTD)

I'm trying to convert some HTML files to XML format on ubuntu and they should conform to a specific XML schema or DTD. I guess Tidy should do that but I don't understand the syntax for it. Or if there are other tools, I'd be glad to try them out.…
TheSoldier
  • 484
  • 1
  • 5
  • 25
-2
votes
2 answers

How to take variables in a column and make them into numerous columns

So I have this dataset which I have been cleaning for someone else but they want a specific column made into several columns by type of observation. For example this is a column of diagnoses and she wants this column to be expanded so it is one…
googleplex101
  • 195
  • 2
  • 13
-2
votes
1 answer

python, xmlrpc, tidy & unicode issues

I've been trying to work around an issue I'm facing for two days now. The final goal is to migrate the content of an apple wiki server to foswiki/twiki markup. I found an xslt stylesheet that does most of the work, and does it reasonably well, and…
tink
  • 14,342
  • 4
  • 46
  • 50
-3
votes
1 answer

R: Covernt a complex time series dataframe to long

This is for R date <- seq(as.Date("2020/03/11"), as.Date("2020/03/16"), "day") x_pos_a <- c(1, 5, 4, 9, 0) x_pos_b <- c(2, 6, 9, 5, 4) like so [...] I have a timeseries dataframe with 69 time points. The rows in the dataframe are dates. Four…
RTERG
  • 1
  • 2
1 2 3
38
39