0

I have a variable with the HTML code:

let htmlDocument = '<div id="buildings-wrapper"> \
    <div id="building-info"> \
    <h2><span class="field-content">Britney Spears' House</span></h2> \
    <div class="building-field"> \
    <div class="field-content">9999 Hollywood Blvd</div> \
    </div> \
    <div class="building-field"> \
    <div class="field-content">Building Hours: Mon. 07:00-23:00 Tue.-Fri. 06:30-22:00, Sat. 07:30-18:00, Sun. 12:00-18:00 Holidays - Closed</div> \
    </div> \
    <div class="building-field"> \
    <div class="field-content"><a href="http://www.britneyspears.com">Locate on the stars map</a></div> \
    </div> \
    </div> \
    <div id="building-image"> \
    <div class="field-content"><img src="../../../../ssc.adm.britneyspears.com/classroomservices/image/viewimage?userEvent=ShowBuildingImage&amp;buildingID=britneyspears" alt="Image of BritneySpears"></div> \
        </div> \
        </div>';

I want to traverse the variable and store this section of HTML in a separate variable:

<div class="field-content">9999 Hollywood Blvd</div>

This is what I have so far:

public traverseHTML(htmlDocument: any): any {
    let htmlBlock: any;
    let divs: any = htmlDocument.getElementsByTagName('div');
    for (var i = 0; i < divs.length; i++) {
        if (divs[i].getAttribute("id") == "field-content") {
            htmlBlock = divs[i];
        }
    }
    return htmlBlock;
}

I'm sure there are all sorts of issues with my function but I can't get to them cause I can't even get past the second line. I get an error saying htmlDocument.getElementsByTagName isn't a function. How do I iterate thru the HTML by div?

Please note I can't use JQuery due to project specs.

EDIT:

I'm getting document is not defined when I try to document.createElement('div') and DOMParser is not defined when I try to create a DOMParser. Am I setting up the class incorrectly? This is the code for the entire class:

import parse5 = require('parse5');
import {ASTNode} from 'parse5';



export default class DSController {
//private parser: DOMParser;

constructor() {
    //this.parser = new DOMParser();
}

public traverseHTML(htmlDocument: any): any {
    let parser = new DOMParser();
    let parsed: any = parser.parseFromString(htmlDocument, "text/html");
    let selectParsed: any = parsed.querySelectorAll('field-content')[1];
    console.log(selectParsed);

    return selectParsed;

   /* let element = document.createElement("div");
    element.innerHTML = htmlDocument;
    console.log(element.querySelectorAll(".field-content")[1]); // <div class="field-content">9999 Hollywood Blvd</div>
    */
}




public parseHTML(): any {

    //let document: parse5.ASTNode;
    return;
}
}
falafel
  • 23
  • 3
  • 13
  • 1
    You can't traverse a string. You can only do it on an actual DOM – Nitzan Tomer Nov 06 '16 at 23:51
  • @NitzanTomer Oh. :( If I have a file called BRITNEYSPEARS in my project folder that contains the HTML code above, how would I reference it so I can traverse it? – falafel Nov 06 '16 at 23:56
  • "The specs of the project" that stop you from using jquery are almost certainly going to vanish if you explain to your customer/boss how many hours it will take to do something like this well, which you can solve in minutes with the correct tools. Unless of course your client enjoys paying people for reinventing wheels. – Paul Nov 07 '16 at 00:01
  • @Paul this is for school so whatever the project description says goes, unfortunately. – falafel Nov 07 '16 at 00:08
  • Ah, it wasn't clear this is for school from your post. My apologies, but I've seen enough work projects where some ignorant person says "no third party library " or the like as to have a visceral reaction. – Paul Nov 07 '16 at 00:10
  • Why don't you just use a backtick for the multiline string? – Azamantes Nov 07 '16 at 00:32
  • 1
    @Azamantes I didn't know about backticks for multi-line strings till now! – falafel Nov 07 '16 at 00:40
  • @Paul There is no reason whatsoever to bring in jQuery or any other third-party library to deal with this problem. –  Nov 07 '16 at 03:24
  • Where did this huge HTML-as-string thing come from? What is the rule governing which `div` you want to grab? –  Nov 07 '16 at 03:25
  • @torazaburo I made it. I'm supposed to traverse an entire HTML page but I thought it would be easier to start small. I need to grab specific pieces of information from the page: the name of the building, the building's address, etc. – falafel Nov 07 '16 at 03:37
  • @torazaburo yes – falafel Nov 07 '16 at 04:09
  • Then please tag as node.js. –  Nov 07 '16 at 04:12

2 Answers2

2

You can create an element and then insert this string into it as html.
Then you can query this element for what you're looking for:

let htmlDocument = '<div id="buildings-wrapper"> \
    <div id="building-info"> \
    <h2><span class="field-content">Britney Spears House</span></h2> \
    <div class="building-field"> \
    <div class="field-content">9999 Hollywood Blvd</div> \
    </div> \
    <div class="building-field"> \
    <div class="field-content">Building Hours: Mon. 07:00-23:00 Tue.-Fri. 06:30-22:00, Sat. 07:30-18:00, Sun. 12:00-18:00 Holidays - Closed</div> \
    </div> \
    <div class="building-field"> \
    <div class="field-content"><a href="http://www.britneyspears.com">Locate on the stars map</a></div> \
    </div> \
    </div> \
    <div id="building-image"> \
    <div class="field-content"><img src="../../../../ssc.adm.britneyspears.com/classroomservices/image/viewimage?userEvent=ShowBuildingImage&amp;buildingID=britneyspears" alt="Image of BritneySpears"></div> \
        </div> \
        </div>';

let element = document.createElement("div");
element.innerHTML = htmlDocument;

console.log(element.querySelectorAll(".field-content")[1]); // <div class="field-content">9999 Hollywood Blvd</div>

(code in playground)

Nitzan Tomer
  • 155,636
  • 47
  • 315
  • 299
  • Thanks. That's exactly what I needed. Do I need to import something to make `document` work? My IDE says `Reference error: document is not defined`. – falafel Nov 07 '16 at 00:07
  • No, there shouldn't be any need to import anything to use `document`. Is this intended to be run on a browser or node? – Nitzan Tomer Nov 07 '16 at 00:23
  • I'm not sure what you mean by node. The app is for browser. My intention is to use this method to grab a larger section of HTML and then parse it into a data structure for querying in the browser. I probably gave you some irrelevant info but better to overshare... – falafel Nov 07 '16 at 00:27
  • You should be able to compile this code without anything else to import. As it works in the playground link I posted. What IDE are you using? – Nitzan Tomer Nov 07 '16 at 00:46
  • Yes, I saw that document is part of `lib.d.ts` which comes standard. I'm using Webstorm. – falafel Nov 07 '16 at 03:36
1

You can also use DOMParser:

new DOMParser().parseFromString(htmlDocument, "text/html")
  .querySelectorAll('.field_content)[1]
  • I tried to create a DOMParser and got `DOMParseris not defined`. Can you look at my edit in the OP and see if you see anything wrong with it? Is it because I'm using node.js? Thanks. – falafel Nov 07 '16 at 04:06
  • For node.js, you will need some kind of DOM library. –  Nov 07 '16 at 04:12
  • They suggested parse5. I guess I've been confused cause I thought I was supposed to traverse the HTML then parse it...but should I parse the entire HTML file then traverse it instead? – falafel Nov 07 '16 at 04:17
  • You're confusing "parsing" and "extracting". "Parsing" refers to analyzing a string according to some grammar (such as HTML) and producing a processable representation (such as the DOM). When you have done that, and only then, can you "extract" something from the representation, such as the `div` element you are interested in. –  Nov 07 '16 at 04:20
  • Thanks, that's really helpful. I've been trying to extract then parse the entire day. D: – falafel Nov 07 '16 at 04:25