3

I'm using facebook_scraper to try to scrape a closed group that I'm a member of.

With my credentials logging in, it's working with my private groups that are not searchable, but not with private groups that are searchable (by searchable I mean that if you're not a member of the group and you type its name into the searchbar it will show up with some information, but not the posts or discussion, whereas unsearchable groups won't show up at all and if you type in their specific url you're redirected to login)

I think the reason for this is because when the url for an unsearchable group is entered you are immediately redirected to a login page and so the login works, whereas searchable groups show some information and the scraper is just scraping that.

Following the GitHub repo for facebook-scraper I think I've located the problem at:

facebook_scraper.page_iterators.generic_iter_pages(url, GroupPageParser, FacbookScraper.get)

The issue becomes more apparent at:

facebook_scraper.page_iterators.GroupPageParaser(url).get_html().find('article')

Which returns an empty list if the group is searchable and a populated list if it isn't

A full example of my stalled debugging with the two actual groups I'm in:

>>> from facebook_scraper import FacebookScraper, page_iterators

>>> scraper = FacebookScraper()
>>> parser = page_iterators.GroupPageParser

>>> credentials = (myemail@email.com, mypassword)
>>> scraper.login(credentials[0], credentials[1]) #It's not a login problem except for with the searchable groups

>>> searchable_url = 'https://m.facebook.com/groups/1401745746503709'
>>> unsearchable_url = 'https://m.facebook.com/groups/618892088578525'

>>> searchable_get = scraper.get(seachable_url)
>>> unsearchable_get = scraper.get(unsearchable_url)

>>> searchable_html = parser(searchable_get).get_html()
>>> unsearchable_html = parser(unsearchable_get).get_html()

>>> searchable_get.html
<HTML url='https://m/facebook.com/groups/1401745746503709' #This stays the same
>>> unsearchable_get.html
<HTML url='https://m.facebook.com/groups/618892088578525?_rdr' #This url is changed to a redirection to login

>>> len(searchable_html.find('article'))
0
>>> len(unsearchable_html.find('article'))
21

I'm trying to figure this out in python, I'm not at all familiar with html so it's been tricky. Any help would be greatly appreciated. Thankyou.

liam
  • 118
  • 8

0 Answers0