0

I use the internet archive to save tweets, not just because some are my favorite, but this is a concern of what happens to tweets posted by an account that hasn't been logged in for 6 months.

If you have goodtwitter and linkgopher, you can obtain tons of twitter tweet links very easily by scrolling down (which loads more tweets which themselves are links to the tweet) and then use linkgopher.

The problem is this: goodtwitter and many other browser extensions that force twitter to use the old layout will break on june 1 because the extensions themselves merely trick twitter that you are running IE, it does not have complex javascript code that handles user-side to control the rendering.

If you try to extract links (either manuelly via HTML) or linkgopher, it will only get links from loaded tweets. The new twitter layout unloads content when they are scrolled offscreen and are not caught by linkgopher or no longer stored in the HTML (it disappears when you view the page source on the devtools).

I am looking for an extension that logs every changes in the HTML file (on the fly as I scroll down and load more tweets, a similar fashion to devtools's network monitor when you have “presrve logs” or “persist logs” being checked) and outputs it in a single txt file. Every changes detected will copy the entire HTML code, which contains currently loaded tweets links (the status/(tweetID)). From there on, I can simply use notepad++ to search all those tweet links.

AAsomb113
  • 61
  • 5
  • You are overcomplicating it. You don't want to see all changes in the HTML file because e.g. some javascript rendered SVG animation is going to fill it up with junk real quick. The rendered HTML is constantly changing. What you should be looking at is a crawler, a script in Greasemonkey or better the official Twitter API. – Tin Nguyen May 15 '20 at 08:14
  • @TinNguyen I was somewhat thinking of a crawler too, reading this: https://support.archive-it.org/hc/en-us/articles/360000343186-What-is-Brozzler- that it can extract javascript-loaded links (something that devharsh's link extractor couldn't do because it only gets the main HTML). I'll look into that. – AAsomb113 May 15 '20 at 20:18

0 Answers0