So I want to search the wikipedia database for some keywords and then extract the text that the relative pages have to then use for a tf-idf module to later on implement in a text classification program. I am currently looping through a pandas dataframe with all the keywords and then searching the wikipedia database for the respective keywords, but I am getting an error saying the webpage does not exists. Here is my code:
import wikipedia
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
df_wiki_pages=pd.read_csv(r'C:\\Users\\jason\\Downloads\\Categories.csv',usecols=[0])
df_wiki_pages = df_wiki_pages.dropna()
print(df_wiki_pages)
wikipages = []
for pages in wikipages:
tokenized_texts = []
for index, row in df_wiki_pages.iterrows():
currentRow = row['Categories']
print("Now testing: "+ currentRow)
wiki = wikipedia.page('currentRow')
It gives me this error once it loops to the keyword "Customer_advocacy": PageError: Page id "customer advocate" does not match any pages. Try another id!
I do not understand why it is searching for 'customer advocate' since my query is for 'Customer_advocacy'. I do not understand why it changes the query by itself because the page for 'Customer_advocacy' exists meanwhile the page for 'customer advocate' does not. Am I doing something wrong in my query?