0

I’d like to do some hygiene on a bloated images folder/directory for a website of mine. I’m a grade just above novice working with javascript, it seems like it might be possible achieve a solution using javascript…

The solution I’m searching for would in essence crawl the entire directory of html/asp/css files (all in one directory) and scrape any and all image file names. Then output those file names into a delaminated list (or facsimile of) so that I could then compare that “scraped” list to images directory list – therefore identifying a list of unused images by process of elimination.

This is a crud script (obviously missing a lot of code) I've use // to mark the logic I think will work but is obviously psuedo code.

    var URL_LENGTH = document.SITEMAP.getElementsByName("URL").length;
    vari=1;
    varz=1;
    var list = [];

    var URLX = 'P' + NUMBER;
    var NUMBER = 1;
    var PAGE;
    var MINE;
    var PAT1 = /(.gif|.jpg|.png)/g;
    var IMGNAME;

    for (i=1;i<=URL_LENGTH;i++)
        {
        PAGE = document.SITEMAP.getElementById(URLX).innerHTML;
        MINE = document.[PAGE].match(PAT1).length;
        for (z=1;z<=MINE;z++)
            {
            //global for string ending with (.gif|.jpg|.png)
            //find begining of the image name string by looking for /( |'|"|/)/ that come before (.gif|.jpg|.png)
            //inserting image name string into value for var IMGNAME
            list.push(IMGNAME);
            }
        NUMBER=NUMBER+1;
        }

with the SITEMAP looking someting like this

    <div name="URL" id="P1">page1</div>
    <div name="URL" id="P2">page2</div>
    <div name="URL" id="P3">page3</div>
Brad
  • 6,106
  • 4
  • 31
  • 43
  • 2
    That looks strange, but normally you'd use a serverside language to scrape the contents on the server, not JS ! – adeneo Feb 28 '13 at 22:05
  • 1
    along @adeneo's comment, how were you planning to keep state of your results on the client side? – Brad Feb 28 '13 at 22:08
  • I'm an artist that knows know html and low level javascript, I'm sure there is some sort of JQuery or VB script that would be more ideal but thats punching above my weight class. I'm hoping to get this done with the scripting I have a better handle on than trying to learn a whole new syntax. – Alistar Spaceman Feb 28 '13 at 22:10
  • I'm imagining that this will be done in a html browser... I create the sitemap file place it in the main directory, create a separate file with the js also in the main directory and fire it with FF or IE... – Alistar Spaceman Feb 28 '13 at 22:11
  • You don't really have access to the filesystem with a browser. It would be a lot easier to do on the serverside IMO, and I have no idea how you'd check the entire server structure for images. To just get a sitemap file you could probably use ajax, and then parse that file etc. but as far as I know there is no `document.sitemap` object available? – adeneo Feb 28 '13 at 22:14
  • I'm using 'SITEMAP' as the document name. I also have access to the filesystem as it is my website. – Alistar Spaceman Feb 28 '13 at 22:18
  • this has to be done on the server side, for security reasons you can't access the filesystem with javascript in a browser, you can achieve this by using node.js if you want to do it in javascript. – supernova Feb 28 '13 at 22:29

0 Answers0