Questions tagged [xidel]

Xidel is a command line tool to download and extract data from HTML/XML pages as well as JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern templates. It can also edit or create new XML/HTML/JSON documents.

Xidel supports:

Extract expressions

  • CSS 3 Selectors: to extract simple elements
  • XPath 3.0: to extract values and calculate things with them
  • XQuery 3.0: to create new documents from the extracted values
  • JSONiq: to work with JSON apis
  • Templates: to extract several expressions in an easy way using an annotated version of the page for pattern-matching
  • XPath 2.0/XQuery 1.0: compatibility mode for the old XPath/XQuery version

Following

  • HTTP Codes: Redirections like 30x are automatically followed, while keeping things like cookies
  • Links: It can follow all links on a page as well as some extracted values
  • Forms: It can fill in arbitrary data and submit the form

Output formats

  • Adhoc: just prints the data in a human readable format
  • XML: encodes the data as XML
  • HTML: encodes the data as HTML
  • JSON: encodes the data as JSON
  • bash/cmd: exports the data as shell variables

Connections

  • HTTP / HTTPS, as well as local files and stdin

Systems

  • Windows (using wininet), Linux (using synapse+openssl), Mac (synapse)
81 questions
2
votes
2 answers

Xidel extract data inside the tag -- raw output

Pleased to be member of StackOverflow, a long time lurker in here. I need to parse text between two tags, so far I've found a wonderful tool called Xidel I need to parse text in between
Text. Also tags. More…
RomanM
  • 93
  • 2
  • 6
2
votes
2 answers

How to have always the same number of results in xpath even if some tags are not present?

I try to crawl data from a website. Target are sites where not always all details are given. For example one profile has name, birthday given and the other one only name. I now try to grasp this tags with xidel and xpath which would work like a…
Fuzzyma
  • 7,619
  • 6
  • 28
  • 60
2
votes
1 answer

Xpath expression returns empty output

My xidel command is the following: xidel "https://www.iec-iab.be/nl/contactgegevens/c360afae-29a4-dd11-96ed-005056bd424d" -e '//div[@class="consulentdetail"]' This should extract all data in the divs with class consulentdetail Nothing special I…
Fuzzyma
  • 7,619
  • 6
  • 28
  • 60
2
votes
1 answer

How to add line break to Xidel output?

I have my batch file to grab links using xidel, the output html doesn't contain line breaks to separate each link from the other one @echo off for /f "delims=" %%a in ('wmic OS Get localdatetime ^| find "."') do set dt=%%a set YYYY=%dt:~0,4% set…
M. A.
  • 424
  • 6
  • 21
2
votes
2 answers

Xidel: Parse attributes into new object

Given is a verbose GC log from any Java virtual machine (could be any xml, so not tagging with java):
Benjamin Marwell
  • 1,173
  • 1
  • 13
  • 36
2
votes
2 answers

Using xidel to extract a key-value pair

I have multiple tables on a website like so:
Name foo
Count 15
Date 2014-11-17
Simon
  • 4,395
  • 8
  • 33
  • 50
1
vote
4 answers

How to extract an embedded link from an as text saved html document OR how to use xidel to extract the correct link?

I am on Windows and I am using the "Git for windows" tools in batch files. My etracted code from html site looks like this:
areich1976
  • 13
  • 3
1
vote
3 answers

xidel: wrong order of results on hacker news

To scrape hacker news, I use: xidel -e '//span[@class="titleline"]/a/@href|//span[@class="titleline"]' https://news.ycombinator.com/newest But the output in not in the expected order, the URL come after the text, so it's very difficult to…
1
vote
1 answer

xidel: is it possible to declare a variable for a XPath expression?

Like xmlstarlet sel --var xp 'xpathExpression' -t -v '$xp' file.xml is it possible to use internal variables in xidel? I know I can use shell and "double quotes", but that's not the question.
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
1
vote
2 answers

xidel: how to retrieve value from JSON key containing dots(.)?

I try to retrieve 1 in : $ cat object.json { "apiVersion": "apps/v1", "kind": "Deployment", "metadata": { "annotations": { "deployment.kubernetes.io/revision": "1" } } } $ xidel -e…
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
1
vote
1 answer

Xidel print the original url

I am trying to extract url from a webpage and follow them and I am skipping the 4XX and 5XX.My question is Is it possible to print the URL of the request that is returning 200 using xidel xidel -s --error-handling=4xx=skip,5xx=skip "URL" -e "PRINT…
Sanchu Varkey
  • 49
  • 1
  • 5
1
vote
3 answers

How to filter a XML file and save filtered results as new XML file using XMLStarlet / XMLint / XSLT / Xidel / Grep

I have been searching for a solution for a very simple task : filter XML results based on multiple criteria and save it as a new XML file. By filtering I mean, to select the values for the output. So, only output the XML whom meet the conditions of…
1
vote
3 answers

XPath-3 CSV generation

I'm trying to convert the following XML to CSV using XPath 3.0 (xidel --xpath): A B C
Fravadona
  • 13,917
  • 1
  • 23
  • 35
1
vote
1 answer

Xidel output to bash variable

According to the Xidel documentation, I'm under the impression that the following code should work, and should produce an output that I can access in the BASH variable "bar": #!/bin/bash TEST='test' xidel $TEST -e…
Cody S
  • 4,744
  • 8
  • 33
  • 64
1
vote
1 answer

How do I optionally check for @href in xpath

I'm trying to optionally set $_url to "not-found" if there's no href. xidel --trace-stack --ignore-namespaces 'https://www.uline.com/Grp_41/Peanuts-and-Dispensers' --user-agent 'Mozilla/5.0 Firefox/94.0.1' \ -f…
chovy
  • 72,281
  • 52
  • 227
  • 295