1

I've been trying to log in to this website using my credentials in order to scrape my profile name using google apps script. The status code is 200 and I can see that the script is able to get cookies. However, I get Undefined as result instead of profile name.

This is how I'm trying:

function loginAndParseProfile() {
  var link = 'https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f';

  var options = {
    "method": "get",
    "headers": {
      "User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36"
    }

  };
  var res = UrlFetchApp.fetch(link, options);
  var $ = Cheerio.load(res.getContentText());
  var fkey = $("input[name='fkey']").first().attr('value');

  var payload = {
    'fkey': fkey,
    'ssrc': 'head',
    'email': 'emailaddress',
    'password': 'password',
    'oauth_version': '',
    'oauth_server': ''
  };

  var options = {
    "method" : "post",
    'payload': payload,
    'muteHttpExceptions': true,
    "headers": {
        "Content-Type": "application/x-www-form-urlencoded",
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36",
     }
    };

  var loginURL = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f";

  var resp = UrlFetchApp.fetch(loginURL,options);
  console.log(resp.getResponseCode());
  console.log(resp.getAllHeaders()['Set-Cookie']);
  var $ = Cheerio.load(resp.getContentText());
  var item = $('a.my-profile > [class^="gravatar-wrapper"]').first().attr('title');
  console.log(item);
}

How can I make the script work?

TheMaster
  • 45,448
  • 6
  • 62
  • 85
robots.txt
  • 96
  • 2
  • 10
  • 36
  • 1
    What is the end goal of this request? Have you considered using the [Stack Exchange API](https://api.stackexchange.com/docs/authentication)? – Kessy Jan 17 '22 at 16:16
  • The end goal of this request is to log in to that website and scrape the profile name @Kessy. – robots.txt Jan 17 '22 at 17:28
  • 2
    This question is linked to [chat](https://chat.stackoverflow.com/rooms/217630/google-apps-script-chat-community) – TheMaster Jan 20 '22 at 22:54
  • 2
    Actually, the request *fails* - if you inspect the HTML closely, you get back the logged out page - this is why there is no element you are looking for. Nor is there an `acct` cookie set (this is what should be used to verify success). With that said... I dunno why the login fails - you seem to be doing everything by the book - it might be something out of our control – Oleg Valter is with Ukraine Jan 21 '22 at 11:56
  • 1
    Related: https://stackoverflow.com/questions/19567105 https://stackoverflow.com/questions/28794290 – TheMaster Jan 24 '22 at 19:13
  • 2
    Tracker: https://issuetracker.google.com/issues/36754794 – TheMaster Jan 24 '22 at 19:28

1 Answers1

3
  1. Disable redirects by setting followRedirects to false:

    var options = {
      "method" : "post",
      'payload': payload,
      'muteHttpExceptions': true,
      "headers": {
        "Content-Type": "application/x-www-form-urlencoded",
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36",
      },
      'followRedirects': false
    };
    
  2. Grab the acct cookie from the response to the POST /users/login request:

    const acct = resp.getAllHeaders()['Set-Cookie']
      .find(cookie => cookie.includes('acct=t='))
      .match(/(acct=t=.*?)\s/)[1];
    
  3. Make a GET / request supplying the acct cookie and grab your profile name:

    const profileRequest = UrlFetchApp.fetch('https://stackoverflow.com', {
      method: 'get',
      headers: {
        Cookie: acct
      }
    });
    
    const $main = Cheerio.load(profileRequest.getContentText());
    const myName = $main('a.my-profile > [class^="gravatar-wrapper"]').first().attr('title');
    console.log(myName);
    

If your credentials are correct, this should output robots.txt.

double-beep
  • 5,031
  • 17
  • 33
  • 41