5

Looking to grab all the comments from a given video, rather than go one page at a time.

from gdata import youtube as yt
from gdata.youtube import service as yts

client = yts.YouTubeService()
client.ClientLogin(username, pwd) #the pwd might need to be application specific fyi

comments = client.GetYouTubeVideoComments(video_id='the_id')
a_comment = comments.entry[0]

The above code with let you grab a single comment, likely the most recent comment, but I'm looking for a way to grab all the comments at once. Is this possible with Python's gdata module?


The Youtube API docs for comments, the comment feed docs and the Python API docs

TankorSmash
  • 12,186
  • 6
  • 68
  • 106
  • This was answered [here](http://stackoverflow.com/questions/10941803/using-youtube-api-to-get-all-comments-from-a-video-with-the-json-feed) with a solution utilizing PHP, since the YouTube PHP API has a call that allows it. I don't think a pure Python answer is out there. – Ken Bellows Oct 10 '12 at 19:23
  • @KenB I saw that too. That's a shame. The video in question has 9k comments and I don't think making 360 `GetNextLink` calls is the best way. – TankorSmash Oct 10 '12 at 19:26
  • 1
    The URL `www.youtube.com/all_comments?v=video_id` has a parseable comment list, but it's a long load time. Suppose I could try that. – TankorSmash Oct 10 '12 at 19:33
  • Regardless of what method you use, if you're returning 9k comments all at once, you will have a long load time. I think this is why the API doesn't offer it; you usually wouldn't want to do it all at once. – Ken Bellows Oct 10 '12 at 20:26

2 Answers2

7

The following achieves what you asked for using the Python YouTube API:

from gdata.youtube import service

USERNAME = 'username@gmail.com'
PASSWORD = 'a_very_long_password'
VIDEO_ID = 'wf_IIbT8HGk'

def comments_generator(client, video_id):
    comment_feed = client.GetYouTubeVideoCommentFeed(video_id=video_id)
    while comment_feed is not None:
        for comment in comment_feed.entry:
             yield comment
        next_link = comment_feed.GetNextLink()
        if next_link is None:
             comment_feed = None
        else:
             comment_feed = client.GetYouTubeVideoCommentFeed(next_link.href)

client = service.YouTubeService()
client.ClientLogin(USERNAME, PASSWORD)

for comment in comments_generator(client, VIDEO_ID):
    author_name = comment.author[0].name.text
    text = comment.content.text
    print("{}: {}".format(author_name, text))

Unfortunately the API limits the number of entries that can be retrieved to 1000. This was the error I got when I tried a tweaked version with a hand crafted GetYouTubeVideoCommentFeed URL parameter:

gdata.service.RequestError: {'status': 400, 'body': 'You cannot request beyond item 1000.', 'reason': 'Bad Request'}

Note that the same principle should apply to retrieve entries in other feeds of the API.

If you want to hand craft the GetYouTubeVideoCommentFeed URL parameter, its format is:

'https://gdata.youtube.com/feeds/api/videos/{video_id}/comments?start-index={sta‌​rt_index}&max-results={max_results}'

The following restrictions apply: start-index <= 1000 and max-results <= 50.

Pedro Romano
  • 10,973
  • 4
  • 46
  • 50
  • 1
    Great. Do you know if there's a way to manually set the `start_index` or `items_per_page`? Setting it on the first set of comments doesn't seem to doing anything. – TankorSmash Oct 10 '12 at 21:06
  • 1
    You just need to pass an URL with the following format to `GetYouTubeVideoCommentFeed`: `https://gdata.youtube.com/feeds/api/videos/{video_id}/comments?start-index={start_index}&max-results={max_results}`. The following restrictions apply: `start-index <= 1000` and `max-results <= 50`. – Pedro Romano Oct 10 '12 at 21:10
  • Note that to pass the URI you need to pass `uri` as a kwarg: `yt_service.GetYouTubeVideoCommentFeed(uri='https://gdata.youtube.com/feeds/...')` – Roshambo Jun 26 '13 at 23:15
  • 1
    @Roshambo: `uri` is the first positional argument, so specifying it as a kwarg is redundant and not required. What makes you say that it must be a kwarg? – Pedro Romano Jun 27 '13 at 13:19
  • @PedroRomano ah, didn't realize it was the first argument. The PyDocs are so dense I must’ve missed that. – Roshambo Jun 27 '13 at 15:36
2

The only solution I've got for now, but it's not using the API and gets slow when there's several thousand comments.

import bs4, re, urllib2
#grab the page source for vide
data = urllib2.urlopen(r'http://www.youtube.com/all_comments?v=video_id') #example XhFtHW4YB7M
#pull out comments
soup = bs4.BeautifulSoup(data)
cmnts = soup.findAll(attrs={'class': 'comment yt-tile-default'})
#do something with them, ie count them
print len(cmnts)

Note that due to 'class' being a builtin python name, you can't do regular searches for 'startwith' via regex or lambdas as seen here, since you're using a dict, over regular parameters. It also gets pretty slow due to BeautifulSoup, but it needs to get used because etree and minidom don't find matching tags for some reason. Even after prettyfying() with bs4

Community
  • 1
  • 1
TankorSmash
  • 12,186
  • 6
  • 68
  • 106
  • Hi, interest answer but I think that the html structure has changed. Do you use an alternative tag instead of `comment yt-tile-default`? Thank you! – Thoth Jun 28 '14 at 21:30
  • @Thoth I haven't made use of this in a while, but open up the dev tools and edit my answer if you find out – TankorSmash Jun 28 '14 at 22:21