I would like to extract the processing instructions (particularly xml-model) from an XML file; yet both (n)xml-parse-file
as well as libxml-parse-xml-region
do not recognize processing instructions.
Is there a clean way to extract processing instructions or do I have to regex search for PIs?
edit: Here is a first draft of the functionality I was looking for:
(cl-defun extract-processing-instructions (&rest processing-instructions)
"Extracts all/only the specified xml processing instructions from the current buffer and returns them as a list of string."
(interactive)
(let ((pi-re
(format "<\\?\\(%s\\).*\\?>" (string-join processing-instructions "\\|")))
(result))
(save-excursion
(goto-char (point-min))
(while (re-search-forward pi-re nil t)
(push (match-string 0) result)))
(nreverse result)))
(cl-defun pi-str2sexp (pi-str)
"Takes a processing instruction as a string and transforms it to a sexp-structure (in the style of xml-parse-*)."
(let (sexp attr-alist)
(save-match-data
;; get and push pi-element-name
;; (string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
(string-match "<\\?\\([[:alnum:]-]*\\)" pi-str)
(push (make-symbol (match-string 1 pi-str)) sexp)
;; construct attribute alist
(while (string-match "\\([[:alnum:]-]*\\)=\"\\([^ ]*\\)\""
pi-str (match-end 0))
(push (cons (make-symbol (match-string 1 pi-str))
(match-string 2 pi-str))
attr-alist)))
;; finally: push attr alist and return sexp
(push (nreverse attr-alist) sexp)
(nreverse sexp)))
edit 2: Turns out advicing/generally building upon xml-parse-* in this matter (like suggested by @Tom Regner) is a huge pain. :(
The thing I came up with was a context manager, the idea was to use it to around-advice string-parse-tag-1 (which is at the heart of xml-parse-* (of course stand-alone use is also an option):
(cl-defmacro --replace-first-group (regex-replace-alist)
`(save-excursion
(dolist (expression ,regex-replace-alist)
(goto-char (point-min))
(replace-regexp (car expression) (cadr expression)))))
(cl-defmacro with-parsable-pi (buffer &body body)
"Context manager that treats xml processing instructions in BUFFER as normal elements."
(declare (indent defun))
`(let ((old-buffer ,buffer))
(with-temp-buffer
(insert-buffer-substring old-buffer)
(goto-char (point-min))
(--replace-first-group '(("\\(\\?\\)>" "/>") ("<\\(\\?\\)" "<")))
,@body)))
This e.g. allows calls like
(with-parsable-pi (current-buffer)
(xml-parse-tag-1))
so it is at least possible to get an element at a time; but since the XML exposed in the context manager isn't actually valid and xml-parse-* (rightfully) errors if invalid XML is encountered, it isn't possible to process more than one element at a time.
I was thinking of maybe introducing a pseudo root element or something, but the kludge spiral is ghastly enough as it is.
Another idea of course would be to run an xpath query to extract processing instructions. If there only was a solid xpath solution in Emacs Lisp..