14

I want to load an HTML file (using fs.read), load the DOM using jsdom, and then change the text of the nodes of the body (via jquery). Then I want to save the edited DOM window as an HTML file. Is there a way to do this? The code I am using is the following:

fs.readFile(file, 'utf8', function(error, data) {
    jsdom.env(data, [], function (errors, window) {
        var $ = require('jquery')(window);
        $(document.body.getElementsByTagName("*")).each(function () {
            var content = $(this).text();
            var word = "\\b"+wordSuggestions.word+"\\b";
            var re = new RegExp(word, "g");
            content = content.replace(re, wordSuggestions.suggestion);
            $(this).text(content);
        });

        fs.writeFile(file, data, function (error){ // saving the new HTML file? What should I put instead of data? Window?
        });
    });
});
Erik Philips
  • 53,428
  • 11
  • 128
  • 150
Eva
  • 313
  • 1
  • 6
  • 17
  • I think there are two wrong things about `$(document.body.getElementsByTagName("*"))`. First, it's a combination of jQuery and javascript selectors. It should be `$("*")` in jQuery, *or* `document.body.getElementsByTagName("*")` in javascript. Secondly, it's server-side code, so jQuery is not available here, unless I don't know about server-side implementation of jQuery. – Jeremy Thille May 08 '15 at 09:47
  • I forgot to include var $ = require('jquery')(window) in this post, so I edited it to include it. When using document.body.getElementsByTagName("*") (without jquery), can I just edit the text of each node? And then save the edited html? – Eva May 08 '15 at 10:48
  • Still, `$(document.body.getElementsByTagName("*"))` doesn't make sense and is probably invalid, even with jQuery loaded. So you're telling me you're running jQuery *server*-side? I searched around and I can see no place where this is used, or even saying it's possible at all. jQuery is made for DOM manipulationsm and there's no DOM on a server. There's something I don't understand here. `fs` is a SERVER module, and `read` and `write` can only be achieved from a server, not a browser. Please confirm this is a *server*-side script ( = NOT run in a browser) and you're trying to use jQuery in it. – Jeremy Thille May 08 '15 at 10:54
  • I got that from another stackoverflow question. This is indeed a server side script. I am using the npm module for jquery for this (https://www.npmjs.com/package/jquery). – Eva May 08 '15 at 10:57
  • This is a module to build your own version of jQuery, not use it server-side. Still, jQuery is _client_-side, it doesn't run on a server. Your server script must use plain javascript. `document.body.getElementsByTagName` does make _no sense_ at all, because on a server, there is no DOM, there are no elements, and no tag (with no name). This just can't work, you're mixing front-end and back-end code. It's like mixing javascript and PHP together, hoping it will work, it's just impossible. – Jeremy Thille May 08 '15 at 11:00
  • In your _front-end_ code, query and manipulate the DOM with jQuery. Then, send data to the server with ajax. Then, in a completely different file , write your _server-side_ script, that will receive this data and write it to the server's disk. A browser does not have access to the file system (for security reasons). – Jeremy Thille May 08 '15 at 11:03
  • jdsom (https://www.npmjs.com/package/jsdom) loads the DOM. Could you maybe suggest a non-jquery way to iterate over the nodes in the loaded html file and edit them? – Eva May 08 '15 at 11:03
  • This is a Node.js program that loads an html file that it has to alter. The alterations are not to be done front-end, since the alterations are done on a file that is extracted from the file input. Node does have access to the file system using the fs module. – Eva May 08 '15 at 11:07
  • PHP has built in engine to parse DOM, search for **PHP XML DOM Parser**. – skobaljic May 08 '15 at 11:07
  • Changing to PHP is not an option, I am working in Node.js. – Eva May 08 '15 at 11:08
  • I know it does have access to file system, that's what I've been saying since the beginning :) It's kind of the point of a server, isn't it? But I think you're trying to do things I've never heard of (manipulating DOM server-side), I didn't even know it was possible, so I may be of no help. – Jeremy Thille May 08 '15 at 11:09
  • Jeremy Thille, I was commenting on the suggestion by skobaljic. – Eva May 08 '15 at 11:11
  • Oh ^^' Sorry. I removed this comment. – Jeremy Thille May 08 '15 at 11:12

2 Answers2

16

Here's an example of how to do it. I've based it on your code but simplified it a bit so that I'd have code that executes and illustrates how to do it. The following code reads foo.html and adds the text modified! to all p element and then writes it out to out.html. The main thing you were missing is window.document.documentElement.outerHTML.

var jsdom = require("jsdom");
var fs = require("fs");

fs.readFile('foo.html', 'utf8', function(error, data) {
    jsdom.env(data, [], function (errors, window) {
        var $ = require('jquery')(window);
        $("p").each(function () {
            var content = $(this).text();
            $(this).text(content + " modified!");
        });

        fs.writeFile('out.html', window.document.documentElement.outerHTML,
                     function (error){
            if (error) throw error;
        });
    });
});
Louis
  • 146,715
  • 28
  • 274
  • 320
  • 1
    Is there also a way to select all text nodes instead of all p element? – Eva May 08 '15 at 12:12
  • There's no jQuery function that returns all text nodes (in the DOM sense of the term "text node"). It is possible to walk the DOM tree and process all text nodes one by one but this does not involve jQuery. In case you'd want to edit your question here to prompt readers to solve the "select all text nodes" issue, I *strongly* advise against it. You've already asked a question which was framed as being about saving modified data from jsdom, and I've answered it. Adding a new issue to the question after getting an answer that addresses the original issue is not well regarded by the community. – Louis May 08 '15 at 12:41
  • Another thing is that if you were to combine the jsdom issue with the issue of how you should process the text nodes you are really combining issues that can in fact be solved independently. So rather than benefiting from the input of everybody who knows jsdom to solve the jsdom issue and the input of everybody who knows the DOM (and perhaps jQuery) to solve the text node issue, you have to rely on the *intersection* of those who know both jsdom *and* jQuery, which is a smaller set. You also make the question less likely to be useful to others and get upvotes. – Louis May 08 '15 at 12:43
  • Thanks for the suggestions. I will not edit the question and work with all p-nodes for now. – Eva May 08 '15 at 13:33
  • this will strip the doctype. try `await promisify(fs.writeFile)('out.html, ` ${dom.window.document.documentElement.outerHTML}`, 'utf-8')` – Benny Powers Feb 02 '19 at 23:55
  • It's best to use [`serialize()`](https://github.com/jsdom/jsdom#serializing-the-document-with-serialize) to get the result. – x-yuri Mar 13 '19 at 03:18
2

There's no jsdom.env() anymore, and I think this example is easier to understand:

const fs = require('fs');
const jsdom = require('jsdom');
const jquery = require('jquery');

fs.readFile('1.html', 'utf8', (err, data) => {
    const dom = new jsdom.JSDOM(data);
    const $ = jquery(dom.window);
    $('body').html('');
    fs.writeFile('2.html', dom.serialize(), err => {
        console.log('done');
    });
});
x-yuri
  • 16,722
  • 15
  • 114
  • 161