1

I have used URL Shortener services, such as goo.gl or bit.ly, to shorten long URLs in my applications using their respective APIs. These APIs are very convenient, unfortunately I have noticed that the long URL gets hit when they shorten it. Let me explain a bit the issue I have. Let's say for instance that I want users to validate something (such as an email address, or a confirmation) and propose to them in my application a link for them to visit in order to validate something. I take this long URL, and use the API to shorten it. The target link (a PHP script for example) is getting hit when I call the shorten API, which makes the validation process useless.

One solution would be to make an intermediate button on the target page which the user has to click to confirm, but that solution makes another step in the validation process, which I would like to simplify.

I would like to know if anyone has already encountered this problem of if anyone has a clue in how to solve it.

Thanks for nay help.

Raphael C
  • 2,296
  • 1
  • 22
  • 22
  • You should be validating all data passed to the PHP script anyway, not just allowing data from anywhere – ggdx Sep 28 '14 at 10:11

1 Answers1

3

I can't speak to Google but at Bitly we crawl a portion of the URLs shortened via our service to support various product features (spam checking, title fetching, etc) which is the cause of the behavior you are seeing.

In this type of situation we make two recommendations:

  1. Use robots.txt to mark relevant paths as "disallowed". This is a light form of protection as there's nothing forcing clients to respect robots.txt but well behaved bots like BitlyBot or GoogleBot will respect your robots.txt file.
  2. As mentioned by dwhite.me in a comment and as you acknowledged in your post, it is usually best to not do any state changing actions in response to GET requests. As always there's a judgement call on the risks associated vs the added complexity of a safer approach.
SeanOC
  • 1,331
  • 1
  • 8
  • 12
  • Thanks for your answer. I will try to implement robots.txt as you have recommended. Concerning the decision to change a state on a GET request: in this particular situation, the time between the moment the url is shortened and the moment the user will visit the URL is supposed to be relatively short, and a cron deletes and unvalidates these links every once in a while. But as I have stated earlier, adding a confirmation button on that page just weightens the validation process. Saving a click is important here. – Raphael C Sep 29 '14 at 10:57
  • I have also thought of setting a local cookie variable on client side, and check for that cookie upon validation, constraining users to accept the cookie and to validate from the same device. – Raphael C Sep 29 '14 at 10:58