0

I have a nodejs app where a user can supply a url that is an external url, like www.google.com, or one that points to one of the webapp's pages, like A/B/C.aspx.

To make sure that the url is valid, I have been doing:

const request = require("request");
request(url, (err, response) => {
    if (err || (response && response.statusCode !== 200)) {
        // not valid, though maybe I should be more lenient and allow codes < 400
    }
    else {
        // valid
    }
});

However, I came across an issue where a url that pointed to webapp page that didn't exist would pass. This, I found out, was because the invalid url was being redirected to a valid url, and response.statusCode was set to 200.

To fix this, I turned off redirection by replacing the url parameter with the object {url: url, followRedirect: false}, however this just causes the response.statusCode to be set to 302.

Not only do I want it to not redirect, but I want it to validate the supplied url, so that the statusCode is set to 404 if the page isn't found rather than to 302.

Can this be done?

pushkin
  • 9,575
  • 15
  • 51
  • 95
  • What does your last sentence mean? You don't control what status code is returned. If the target server wants to return a 302, that's what it is going to do. You just have to decide what to do when you get a 302. You can either decide you don't like that as a valid response, you can look at what it's being redirected to and decide based on that (like redirected to the same domain, just a different path) or you can then go validate the URL that it is being redirected to. That's all up to you. – jfriend00 Jul 19 '17 at 22:42
  • @jfriend00 You're right that I can't tell it what to return, but I was wondering if the `request` module had a way of pretending that no redirection is requested, so instead of returning the 302, it would return a 404 or 200. In my case, redirection should be valid, but I also want to make sure that the supplied url points to a valid page. – pushkin Jul 19 '17 at 22:45
  • No. I don't think the request module has that option. You code it yourself in the response handler. If you want to allow redirects, but want to make sure the redirect page is valid, can't you just let it follow the redirection and then either get a 200 page or some error status? – jfriend00 Jul 19 '17 at 22:47
  • @jfriend00 The problem is that the invalid url, for reasons that I'm not in control of, redirects to a valid url, so if I let it redirect, I get a `statusCode` of 200, whereas I want it to be 404, since the supplied url was in fact invalid. – pushkin Jul 19 '17 at 22:51
  • You've got me totally confused. You either want to allow 302 redirects if they redirect to a valid page or not. A URL that successfully redirects to another URL is NOT an invalid URL on it's own. It's a valid 302 redirect. So, here are your four cases: 1) Get 2xx response, success. 2) Get 400 or higher which you treat as an error. 3) Get 3xx and the URL it redirects to gets 2xx. 4) Get 3xx and the URL it redirects to does not get 2xx. When you've outlined what result you want for each of those four cases and put it in your question, then perhaps I could help you more. – jfriend00 Jul 19 '17 at 22:57
  • I'm not as interested in the url it redirects to as I am in the url that is redirected *from*. If there is no redirection, then it's simple. Something that comes back as <400 we'll treat as ok. Otherwise, it's an error. However if we're redirected from a page that doesn't exist (i.e. invalid url) to a page that *does* exist, I want to treat that as invalid. Right now, the result is either 200 if I allow redirection, or 302 if I don't allow redirection. I want it to be 200 if the *original* url is valid and 404 if it's not. – pushkin Jul 20 '17 at 03:02
  • You don't seem to understand that there is no such thing as a 302 where the original URL is not valid. No such thing. A 302 means that the server responded to that URL, thus it is being handled - it is a valid URL and not an error. And, likewise, there is no such thing as a 302 where there is some page associated with that original URL. I asked you to edit your question and state what behavior you want for the four conditions I listed. You either just don't understand what a 302 is or won't specify desired behavior which makes further interactions pointless. I tried, didn't work. – jfriend00 Jul 20 '17 at 03:12
  • @jfriend00 Thank you for the previous explanation, I *think* I was confused about what 302 implies. However, I'll first note that my terminology could have been clearer. When I say "invalid url", I'm not talking about an invalid format (e.g. "asd#@#$.3423#$"), I'm really talking about a url that points to a page that doesn't exist. Could that still get redirected? If it gets redirected, does that imply that the page does exist? – pushkin Jul 20 '17 at 03:22
  • A 302 means the server is not going to give you a page for that URL right now. Instead, it wants you to go to another URL. What that actually means is totally up to the server. It could be that this is just an old version of what used to serve a page and they are now referring you to the new location of that page. Or, it could mean you aren't logged in and you have to go to the login page first. Or, it could be a catch all handler for pages that never existed and they just want to show you some other content at some other URL instead. As a client, you have no way of knowing which it is. – jfriend00 Jul 20 '17 at 03:27
  • @jfriend00 I see. So your last scenario is the likely one. The page doesn't exist and I get redirected. But I want to be able to figure out if that page doesn't exist. If I do a request, I guess the client will just get redirected if the server says so. So rather, I guess I need to ask the *server* if the page exists. – pushkin Jul 20 '17 at 03:37
  • 1
    There is no such thing as ask a server if the page exists. You issue a GET request for the page. The server either gives you some content, gives you some sort of error or gives you a redirect. Those are your three outcomes. When you get a redirect, there's no way to ask if there's actually a page behind that original URL sometimes or not. As of this moment, there is not - there's only a redirect. It seems for your purposes, you should just count 3xx status codes as "no page there" and call it a day. – jfriend00 Jul 20 '17 at 04:08
  • If, down the road, you find some exception to that, then you can figure out how to detect that exception, but there's no generic way to find if there's sometimes an actual page behind this URL that is now giving me a 302. The only response you know about for that original URL is the 302 referral to another URL. That's it. – jfriend00 Jul 20 '17 at 04:10

1 Answers1

1

I'll try to roll all my comments into an answer in an attempt to wrap up this question.

When you request a page and the server responds with a 302 status and a redirect URL, that can mean anyone of these types of things:

  1. Rather than show you the page content at the requested URL, the server wants you to first go to this other URL (such as when you are not yet logged in). Once logged in, a request for that URL very well may show you regular content.

  2. The content for that URL may have temporarily been moved to a different URL so the server wants the browser to go to that other URL and fetch the content there.

  3. The server may have once supported that URL, but now no longer does and wants to send the browser to a generic page describing that issue (technically the server probably should use a 404 for this, but not all will.

  4. There may be actually have a catch-all handler for unsupported URLs and rather than giving you a generic 404 page, they are redirecting you to somewhere else on the site.

When you get a 302 status back, you have no way of knowing which of these it is. It's entirely up to how they code their server which or all of these it might be.

So, when you're testing out a URL and getting a 302 back, you just have to make your own policy decision about how you want to characterize that particular URL. At that point in time, that URL does not have specific page content. Instead, it consists of a referral to another URL. It is a valid server and request URL and you do get a valid response from the server, but it is only a referral to another URL, not page content itself.

I think you have four general cases to deal with:

  1. You get a 2xx response status with page content. I assume you want to characterize that as a valid URL.

  2. You get a response status of 400 or higher. I assume you want to characterize that as NOT a valid URL.

  3. You get a response status of 3xx (like 302) and the URL that it redirects to gives you a 2xx response status with page content. This is your own app's policy decision how you want to characterize that. Without understanding everything your app is trying to do that is related to characterizing URLs, we cannot help you here. Decide what is in the best interests of your app.

  4. You get a response status of 3xx (like 302) and the URL that it redirects to does not give you a 2xx response status with page content. I assume you would want to classify this as NOT a valid URL. It generated a referral to a bad page.

So, it appears to me like cases 1, 2 and 4 are pretty clear how you would want to handle them. That only leaves case #3 for you to decide what is best for your app.


It appears that you started out with the notion that there's a 302 that has page content and a 302 that does not have page content and you somehow wanted to know the difference between those two. That is simply not the case. A 302 means that right now, this server will not offer you any page content for that URL, but would rather you go to a different URL. You have no idea why. You have no idea if that's just a temporary condition. All you know is that right now, the server is responding to that URL, but is giving the client a referral to go elsewhere, not serving content directly from that URL.

It's kind of like you call your friend up on the phone and you get a recorded message that your friend can now be reached at a new and different number (that's like a 302). Without some outside context, you have no way of knowing if this is just a temporary condition or if this is a permanent condition. And, without trying the new number and successfully reaching your friend, you don't even know if the new number actually works to reach your friend.

jfriend00
  • 683,504
  • 96
  • 985
  • 979