0

I'm using Py-StackExchange to get a list of questions from CrossValidated. I need to filter by the titles of pages that include the word "keras".

This is my code. Its execution takes a very long time and finally returns nothing.

cv = stackexchange.Site(stackexchange.CrossValidated, app_key=user_api_key, impose_throttling=True)
cv.be_inclusive()

for q in cv.questions(pagesize=100):
    if "keras" in q.title:
       print('--- %s ---' % q.title)
       print(q.creation_date)

I checked the same query manually with a search and obtained the list of questions very quickly.

How can I do the same using Py-StackExchange?

double-beep
  • 5,031
  • 17
  • 33
  • 41
Fluxy
  • 2,838
  • 6
  • 34
  • 63
  • You have two options: use the API or SEDE. The API has real-time data, but you'd have to do a few calls to get the questions. SEDE (Stack Exchange Data Explorer) is updated weekly (every Sunda), but you can fetch all the questions at once. Which one would you like? – double-beep Sep 26 '20 at 07:10

1 Answers1

1

You have two options:

  1. Use this SEDE query. This will give you all questions which contain keras in their title on Cross Validated. However, note that SEDE is updated weekly.

  2. Use the Stack Exchange API's /search/advanced method. This method has a title parameter which accepts:

    text which must appear in returned questions' titles.

    I haven't used Py-StackExchange before, so I don't know how it works. Therefore, in this example I'm going to use the StackAPI library (docs):

    from stackapi import StackAPI
    
    q_filter = '!4(L6lo9D9ItRz4WBh'
    word_to_search = 'keras'
    SITE = StackAPI('stats')
    keras_qs = SITE.fetch('search/advanced',
                          filter = q_filter,
                          title = word_to_search)
    print(keras_qs['items'])
    print(f"Found {len(keras_qs['items'])} questions.")
    

    The filter I'm using here is !-MOiN_e9RRw)Pq_PfQ*ovQp6AZCUT08iP; you can change that or not provide it at all. There's no reason to provide an API key (the lib uses one) unless there's a readon to do so.

double-beep
  • 5,031
  • 17
  • 33
  • 41
  • Thanks. Does it retrieve all questions with the key word "keras". Even without filters I get a very small number of questions (less than 10), which is unrealistic. I need to retrieve all questions with the keyword in a title. – Fluxy Sep 28 '20 at 15:27
  • Thanks for suggestions. Yes, I specified `SITE`. I've been struggling with this tuning for a while. I always get a non-realistic subset. Could you please give a complete example with pagination, that is supposed to retrieve a complete data set (at least for 1 year, so that I can slice over years). – Fluxy Sep 28 '20 at 16:02