-1

With the whole COVID-19 crisis happening around the world, I decided to embark on a nerdy little project.

I am trying to make digital copies of cards in mass for magic the gathering on a game called tabletop simulator... I am also a little rusty but wanted to jump back into programming so why not?

where I am at right now: I have made a program (sourced below) that at present is supposed to extract all images from a website with all common extensions.

EDIT: when I get the image url it doesnt have a suggestive file name. i dont understand how to extract the image from the way it is presented to me. ImageIO.read(imgURL) returns null for some reason.

The source is coded like this :

<a href="../Card/Details.aspx?multiverseid=482864" id="ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ctl00_listRepeater_ctl00_cardImageLink" onclick="return CardLinkAction(event, this, 'SameWindow');">

<img src="../../Handlers/Image.ashx?multiverseid=482864&amp;type=card" id="ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ctl00_listRepeater_ctl00_cardImage" style="border-radius:6px;-webkit-border-radius:6px;-moz-border-radius:6px;" width="95" height="132" alt="Abandoned Sarcophagus" border="0">

</a>

This link is what pulls up the card image... I noticed the format, which is new to me, is ".jfif" which i imagine is a new version of ".jpeg". I got this format from downloading directly from my browser. how do I go about extracting it from the page?

Code is not my own idea, got this from an older post

edited CODE:

        HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
        htmlKit.read(br, htmlDoc, 0);

        for (HTMLDocument.Iterator iterator = htmlDoc.getIterator(HTML.Tag.IMG); iterator.isValid(); iterator.next()) {
            AttributeSet attributes = iterator.getAttributes();
            String imgSrc = (String) attributes.getAttribute(HTML.Attribute.SRC);

            System.out.println(imgSrc);
            if (imgSrc != null && (imgSrc.toLowerCase().endsWith(".jpg") || (imgSrc.toLowerCase().endsWith("type=card") || (imgSrc.endsWith(".jfif")) || (imgSrc.endsWith(".png")) || (imgSrc.endsWith(".jpeg")) || (imgSrc.endsWith(".bmp")) || (imgSrc.endsWith(".ico"))))) {
                System.out.println(imgSrc);
                try {
                    downloadImage(webUrl, imgSrc);
                } catch (IOException ex) {
                    System.out.println(ex.getMessage());
                }
            }

        }
    private static void downloadImage(String url, String imgSrc) throws IOException {
        BufferedImage image = null;
        try {
            if (!(imgSrc.startsWith("http"))) {
                url = url + imgSrc;
            } else {
                url = imgSrc;
            }
            imgSrc = imgSrc.substring(imgSrc.lastIndexOf("/") + 1);
            String imageFormat = null;
            imageFormat = imgSrc.substring(imgSrc.lastIndexOf(".") + 1);
            String imgPath = null;
            imgPath = "/img depository" + imgSrc + "";
            URL imageUrl = new URL(url);
            image = ImageIO.read(imageUrl); // null is returned here!!
            if (image != null) {
                File file = new File(imgPath);
                ImageIO.write(image, imageFormat, file);
                System.out.println("Success!");
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}

CONSOLE output:

../../Handlers/Image.ashx?multiverseid=482864&type=card
../../Handlers/Image.ashx?multiverseid=482826&type=card
../../Handlers/Image.ashx?multiverseid=482827&type=card
../../Handlers/Image.ashx?multiverseid=482793&type=card
../../Handlers/Image.ashx?multiverseid=482828&type=card
../../Handlers/Image.ashx?multiverseid=482700&type=card
../../Handlers/Image.ashx?multiverseid=484896&type=card
../../Handlers/Image.ashx?multiverseid=482829&type=card
../../Handlers/Image.ashx?multiverseid=484713&type=card
../../Handlers/Image.ashx?multiverseid=482701&type=card
../../Handlers/Image.ashx?multiverseid=482702&type=card
../../Handlers/Image.ashx?multiverseid=482771&type=card
../../Handlers/Image.ashx?multiverseid=482757&type=card
../../Handlers/Image.ashx?multiverseid=482703&type=card
../../Handlers/Image.ashx?multiverseid=482794&type=card
../../Handlers/Image.ashx?multiverseid=482865&type=card
../../Handlers/Image.ashx?multiverseid=482830&type=card
../../Handlers/Image.ashx?multiverseid=482831&type=card
../../Handlers/Image.ashx?multiverseid=482883&type=card
../../Handlers/Image.ashx?multiverseid=482704&type=card
../../Handlers/Image.ashx?multiverseid=484869&type=card
../../Handlers/Image.ashx?multiverseid=482884&type=card
../../Handlers/Image.ashx?multiverseid=482866&type=card
../../Handlers/Image.ashx?multiverseid=482705&type=card
../../Handlers/Image.ashx?multiverseid=482885&type=card
../../Handlers/Image.ashx?multiverseid=482795&type=card
../../Handlers/Image.ashx?multiverseid=482796&type=card
../../Handlers/Image.ashx?multiverseid=482886&type=card
../../Handlers/Image.ashx?multiverseid=482887&type=card
../../Handlers/Image.ashx?multiverseid=484914&type=card
../../Handlers/Image.ashx?multiverseid=484887&type=card
../../Handlers/Image.ashx?multiverseid=482888&type=card
../../Handlers/Image.ashx?multiverseid=482867&type=card
../../Handlers/Image.ashx?multiverseid=482706&type=card
../../Handlers/Image.ashx?multiverseid=484711&type=card
../../Handlers/Image.ashx?multiverseid=482758&type=card
../../Handlers/Image.ashx?multiverseid=484870&type=card
../../Handlers/Image.ashx?multiverseid=482889&type=card
../../Handlers/Image.ashx?multiverseid=484905&type=card
../../Handlers/Image.ashx?multiverseid=482772&type=card
../../Handlers/Image.ashx?multiverseid=484871&type=card
../../Handlers/Image.ashx?multiverseid=482707&type=card
../../Handlers/Image.ashx?multiverseid=482708&type=card
../../Handlers/Image.ashx?multiverseid=482709&type=card
../../Handlers/Image.ashx?multiverseid=482890&type=card
../../Handlers/Image.ashx?multiverseid=484712&type=card
../../Handlers/Image.ashx?multiverseid=482773&type=card
../../Handlers/Image.ashx?multiverseid=482774&type=card
../../Handlers/Image.ashx?multiverseid=482775&type=card
../../Handlers/Image.ashx?multiverseid=482736&type=card
../../Handlers/Image.ashx?multiverseid=482891&type=card
../../Handlers/Image.ashx?multiverseid=482710&type=card
../../Handlers/Image.ashx?multiverseid=482711&type=card
../../Handlers/Image.ashx?multiverseid=482832&type=card
../../Handlers/Image.ashx?multiverseid=482776&type=card
../../Handlers/Image.ashx?multiverseid=482892&type=card
../../Handlers/Image.ashx?multiverseid=482868&type=card
../../Handlers/Image.ashx?multiverseid=482777&type=card
../../Handlers/Image.ashx?multiverseid=482833&type=card
../../Handlers/Image.ashx?multiverseid=482834&type=card
../../Handlers/Image.ashx?multiverseid=482797&type=card
../../Handlers/Image.ashx?multiverseid=484868&type=card
../../Handlers/Image.ashx?multiverseid=484878&type=card
../../Handlers/Image.ashx?multiverseid=482798&type=card
../../Handlers/Image.ashx?multiverseid=482737&type=card
../../Handlers/Image.ashx?multiverseid=484906&type=card
../../Handlers/Image.ashx?multiverseid=484888&type=card
../../Handlers/Image.ashx?multiverseid=482893&type=card
../../Handlers/Image.ashx?multiverseid=482835&type=card
../../Handlers/Image.ashx?multiverseid=484889&type=card
../../Handlers/Image.ashx?multiverseid=482759&type=card
../../Handlers/Image.ashx?multiverseid=482712&type=card
../../Handlers/Image.ashx?multiverseid=482836&type=card
../../Handlers/Image.ashx?multiverseid=484879&type=card
../../Handlers/Image.ashx?multiverseid=482713&type=card
../../Handlers/Image.ashx?multiverseid=484897&type=card
../../Handlers/Image.ashx?multiverseid=482714&type=card
../../Handlers/Image.ashx?multiverseid=482894&type=card
../../Handlers/Image.ashx?multiverseid=482895&type=card
../../Handlers/Image.ashx?multiverseid=482896&type=card
../../Handlers/Image.ashx?multiverseid=482897&type=card
../../Handlers/Image.ashx?multiverseid=482837&type=card
../../Handlers/Image.ashx?multiverseid=482715&type=card
../../Handlers/Image.ashx?multiverseid=482898&type=card
../../Handlers/Image.ashx?multiverseid=482760&type=card
../../Handlers/Image.ashx?multiverseid=484872&type=card
../../Handlers/Image.ashx?multiverseid=482838&type=card
../../Handlers/Image.ashx?multiverseid=482738&type=card
../../Handlers/Image.ashx?multiverseid=484890&type=card
../../Handlers/Image.ashx?multiverseid=482899&type=card
../../Handlers/Image.ashx?multiverseid=482778&type=card
../../Handlers/Image.ashx?multiverseid=482839&type=card
../../Handlers/Image.ashx?multiverseid=482900&type=card
../../Handlers/Image.ashx?multiverseid=484880&type=card
../../Handlers/Image.ashx?multiverseid=482779&type=card
../../Handlers/Image.ashx?multiverseid=482716&type=card
../../Handlers/Image.ashx?multiverseid=484881&type=card
../../Handlers/Image.ashx?multiverseid=482761&type=card
../../Handlers/Image.ashx?multiverseid=482799&type=card
../../Handlers/Image.ashx?multiverseid=482901&type=card
/images/Redesign/Shadow.png
//media.wizards.com/2018/images/magic/gatherer/footerbanner.jpg
/images/Redesign/hasbro_logo.png
/images/Redesign/wizards_logo.png
  • 1
    This is not a precise enough error description for us to help you. *What* doesn't work? *How* doesn't it work? What trouble do you have with your code? Do you get an error message? What is the error message? Is the result you are getting not the result you are expecting? What result do you expect and why, what is the result you are getting and how do the two differ? Is the behavior you are observing not the desired behavior? What is the desired behavior and why, what is the observed behavior, and in what way do they differ? – Jörg W Mittag Apr 16 '20 at 04:55
  • Also, please make sure to construct a [mre]. Note that all three of those words are important: it should be an *example* only, you should not post your entire actual code, rather you should create a simplified example that demonstrates your problem. Also, it should be *minimal*, i.e. it should not contain anything that is not absolutely required to demonstrate the problem. (Most beginner problems can be demonstrated in less than 5 short simple lines of code.) And it should be *reproducible*, which means that if I copy&paste and run the code, I should see the exact same problem you see. – Jörg W Mittag Apr 16 '20 at 04:56
  • My desired result is downloading all the images on a given URL that end in .jpeg or .png etc. to my surprise it doesn't download anything but does not produce errors. all it does is spit out the file names. I'll edit it to the parts of code that I think are most pertinent – Harley Fioretti Apr 16 '20 at 05:11
  • i dont really know the conventions for using stack overflow so i apologise. – Harley Fioretti Apr 16 '20 at 05:20

1 Answers1

2

I don't know where you saw .jfif in relation to that link, because I don't see that anywhere.

What I see is a link URL:
https://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=482864&type=card

When opened in a web browser (FireFox for me), I see the server response has the following HTTP headers:

Cache-Control: public
Content-Type: image/jpeg
Expires: Fri, 16 Apr 2021 04:30:35 GMT
Server: Microsoft-IIS/8.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Thu, 16 Apr 2020 04:30:35 GMT
Content-Length: 170170

The important part is Content-Type with value image/jpeg, telling you the content is a JPEG image.

Unfortunately, the server doesn't provide a suggestive file name, which would have been a header like this:

Content-Disposition: attachment; filename="filename.jpg"

Without that suggestion from the server, and you knowing and understanding the URL, you could e.g. write code to name the file from the URL and Content-Type header, naming the file card482864.jpeg.

Andreas
  • 154,647
  • 11
  • 152
  • 247
  • I got the format from simply downloading the image from my browser. Could you give an example on how to name the file from the URL? Do I just tack on .jpeg somewhere in the string? – Harley Fioretti Apr 16 '20 at 04:49
  • @HarleyFioretti Look at the URL. Do you see the values `card` and `482864` somewhere? As I said, because you know the URL, you can write code to extract those two values from the URL to build the file name, and you set file extension to `.jpeg` because the `Content-Type` header is `image/jpeg`. – Andreas Apr 16 '20 at 04:52
  • I think my problem lies in using ImageIO as well... .png files aren't downloading either – Harley Fioretti Apr 16 '20 at 05:06
  • @HarleyFioretti Actually, my web browser said that the file was PNG, even though the `Content-Type` said `image/jpeg`. Web browser knows that web servers are unreliable, so they auto-detect a lot of stuff. Click the link to view image, then right-click and select "image info". My browser says `Type: PNG Image` and `Dimensions: 265px x 370px` – Andreas Apr 16 '20 at 05:32
  • those are the right dimensions. Microsoft edge only wants to save the image as .jfif so my assumption was that was the format, but perhaps there is an unseen conversion the browser does that i cant replicate with my current code... – Harley Fioretti Apr 16 '20 at 05:39
  • @HarleyFioretti Edge likely just sees the `image/jpeg` and a URL without a file extension, then adds `.jfif` as an attempt to help you save a file that can be double-clicked to be viewed, since that works by file extension. – Andreas Apr 16 '20 at 05:42
  • I think i just need to explore some different image readers... ill do some research and come back tomorrow – Harley Fioretti Apr 16 '20 at 05:49