1

I am trying to use Python's BeautifulSoup library to extract HTML from my LinkedIn "Recently Added Connections" Page. Specifically, I want the name of the most recent connection - it appears towards the top of the page.

When I inspect the HTML for this specific section, what I see wrapping the content is:

<span class="mn-connection-card__name t-16 t-black t-bold">
      Bob McBobface
    </span>

However, the HTML I get back with BeautifulSoup is disappointing:

{"request":"/voyager/api/configuration","status":200,"body":"bpr-guid-3322365"}

{"status":401}

I've tried fiddling with the Requests library, but to no avail. I'm a beginner, so I'm hoping I don't need to spend a few weeks learning about OAuth or Selenium.

Here's my code:

from bs4 import BeautifulSoup
import urllib.request

url = "https://www.linkedin.com/mynetwork/invite-connect/connections/"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, 'html.parser')
#print(soup)
content_list = soup.find_all('span',class_="mn-connection-card__name t-16 t-black t-bold")
print(content_list)

Running this returns an empty list: [], whereas I would expect: "Bob McBobface".

When I print(soup), it just returns a short HTML blurb with the 401-Error notice you see above.

Any advice?

1 Answers1

1

LinkedIn requires you to be logged in to access that page. It does not look like you're adding any authentication to your call. 401 is typically an authentication error, so that would line up here.

This question answers how to authenticate properly with LinkedIn

Erik Overflow
  • 2,220
  • 1
  • 5
  • 16
  • this is more of a comment – SuperStew Aug 22 '19 at 18:01
  • The question seemed to ask "Why is this happening?", and not "How to authenticate?", but I've included a link to a similar question/answer on how to authenticate to make it more complete. Thanks. – Erik Overflow Aug 22 '19 at 18:05
  • Hi Erik, I actually linked to that in my post. The problem is this bit of code doesn't work: "csrf = soup.find(id="loginCsrfParam-login")['value']". The HTML for that webpage doesn't include "loginCsrfParam-login". – HappySpaceBoy Aug 23 '19 at 12:40