3

I'm trying to use Python's requests module to access a website, but I keep getting redirected to a different page. I'm new to this, so I have no idea where to begin with fixing the problem.

My code is :

import requests
r = requests.get('https://swiftlink.enbridge.com/portal/')
print r.text

However, instead of printing the text on that page, I get an output starting with:

    <html lang="en-US"><head><script>
/*
** Copyright (c) 2008, Oracle and/or its affiliates. All rights reserved.
*/

/**
 * This is the loopback script to process the url before the real page loads. It introduces
 * a separate round trip. During this first roundtrip, we currently do two things: 
 * - check the url hash portion, this is for the PPR Navigation. 
 * - do the new window detection
 * the above two are both controled by parameters in web.xml
 * 
 * Since it's very lightweight, so the network latency is the only impact. 
 * 
 * here are the list of will-pass-in parameters (these will replace the param in this whole
 * pattern: 
 *        viewIdLength                           view Id length (characters), 
 *        loopbackIdParam                        loopback Id param name, 
 *        loopbackId                             loopback Id,
 *        loopbackIdParamMatchExpr               loopback Id match expression, 
 *        windowModeIdParam                      window mode param name, 
 *        windowModeParamMatchExpr               window mode match expression, 
 *        clientWindowIdParam                    client window Id param name, 
 *        clientWindowIdParamMatchExpr           client window Id match expression, 
 *        windowId                               window Id, 
 *        initPageLaunch                         initPageLaunch, 
 *        enableNewWindowDetect                  whether we want to enable new window detection
 *        jsessionId                             session Id that needs to be appended to the redirect URL
 *        enablePPRNav                           whether we want to enable PPR Navigation
 *
 */

After a bit of Googling, I found out that I'm getting redirected to an Oracle ADF loopback script page, but I haven't been able to find ways to get around it. I've tried prohibiting redirects in my code, but that just leads me to pages stating that the page I'm looking for has been moved temporarily. I know that the url I'm using is valid, since it takes me to the right page.

Politank-Z
  • 3,653
  • 3
  • 24
  • 28
lerpyderpy
  • 31
  • 2
  • Sniff into all the network requests made by this get url using Firebug or Google Chrome Developer Tools. Then you will be able to say what all is happening under the hood. The html code you posted also contains a javascript which calls another redirect with certain params, You will need to understand that call with params and make that call manually after you make the original requests call. Also, you need to use [Requests Session](http://docs.python-requests.org/en/latest/user/advanced/#session-objects) object for this. – Vikas Ojha Jun 19 '15 at 07:20
  • 1
    Possible duplicate of [how to bypass Oracle ADF loopback script for scripting website using php cURL library?](https://stackoverflow.com/questions/53995849/how-to-bypass-oracle-adf-loopback-script-for-scripting-website-using-php-curl-li) – Ashwin Prabhu Mar 01 '19 at 09:24

3 Answers3

2

The right way to crawl ADF pages is to pass in a parameter

org.apache.myfaces.trinidad.outputMode=webcrawler

to all the GET requests from the script. Keep in mind that when you switch to crawler mode, the pages will look different since it is not meant for human consumption, but it should contain all the raw details you would care about to crawl.

Although, this is an old question and the OP might have long moved on to better things, thought of answering this here to help anybody else hitting the same problem.

Ashwin Prabhu
  • 9,285
  • 5
  • 49
  • 82
0

Although not a direct answer to your question. Bear in mind that HTTP is a client server protocol where the client requests data and the server either responds with said data, indicates that redirection is required or flags an error. The HTTP server is under no obligation to respond with the requested data.

I noticed the protocol in the URL is https. I am not familiar with the requests module but you may want to check that it handles the SSL socket wrapper correctly as that could be a source of error.

uname01
  • 1,221
  • 9
  • 9
0

An ADF page requires additional parameters. So, you need to extract them from the first call and then append those in new requests, to see the page.

Here is a post on how to configure Jmeter, that shows which all parameters are needed.

Configuring Jmeter

Ramandeep Nanda
  • 519
  • 3
  • 9