I've been wracking my brain on this issue for a couple of days now and so wanted to get the community to help. I'm using Python 3 in a Jupyter Notebook but want to eventually make this a script.
Problem
I have a Pandas DataFrame that has three columns (query, URL, Trend). All of the data is working. I'm even able to produce the plot images for each query; however, I can't get the images of the Trend plots to display in the Trends column of my DataFrame. It just shows AxesSubplot(0.125,0.125;0.775x0.755)
. There are no error messages (although, I've rangled a few).
What I've tried
I've taken a look at fig.savefig() and this seems to be close to what I need but not 100% sure how to make that work in my situation as it outputs all the plots and I need one plot per query.
I've also given this solution on Stack a go but I haven't been able to implement it for my code (it's close though).
This article is also very close to the implementation I'm looking for results-wise (and my current implementation) but I don't know how to get the HTML URL of the image of a plot I just created. As well, ideally, the image doesn't need to be saved as this is likely to be a daily report going out via email.
Results so far
Dataframe
Query URL Trend
0 Margaret Trudeau https://montreal.ctvnews.ca/margaret-trudeau-d... AxesSubplot(0.125,0.125;0.775x0.755)
1 Nick Cordero https://people.com/theater/nick-cordero-doctor... AxesSubplot(0.125,0.125;0.775x0.755)
And, so on.
These are the Trend plots that come out of the code just below the DataFrame.
And, so on.
It appears everything works but...
Expected result
The DataFrame should be the same as above but I want AxesSubplot(0.125,0.125;0.775x0.755)
to be an image of the trend plot.
Current Code
from googlesearch import search
import pandas as pd
import matplotlib.pyplot as plt
from pytrends.request import TrendReq
pytrend = TrendReq()
def trending_searches(geo):
geo_lower = str.lower(geo) # Need geolocation to be lowercase to work
df = pytrend.trending_searches(pn=geo_lower)
queries = [] # Hold all processed queries
for index, row in df.iterrows():
i = str(row)
j = str.strip(i, '0 ')
k = j.split("\n", 1)[0]
queries.append(k)
# Gets the trend data
trend = pd.DataFrame(get_trend(queries))
# Gets the first URL from each query on Google
urls = pd.DataFrame(get_urls(queries))
# column lable
trend.columns = ["Trend"]
urls.columns = ["URL"]
df.columns = ['Query']
#Concat all into one dataframe
result = pd.concat([df, urls, trend], axis=1)
#html = result.to_html() #Convert to HTML for emails
return result #html
def get_trend(kw: list) -> list:
query = kw
# Get trend data
my_results_list = []
for j in query:
pytrend.build_payload(kw_list=[j])
df = pytrend.interest_over_time()
my_results_list.append(df)
# Plot trend data
plots = []
for i in range(len(my_results_list)):
if my_results_list[i].empty == True: # To mitigate queries that have no data
plots.append("No data avaiable")
else:
plot = my_results_list[i].plot(kind='line', figsize=(5,1), sharex=True)
plots.append(plot)
return plots
def get_urls(kw: list) -> list:
query = kw
my_results_list = []
for j in query:
for i in search(j, # The query you want to run
tld = 'ca', # The top level domain
lang = 'en', # The language
num = 10, # Number of results per page
start = 0, # First result to retrieve
stop = 1, # Last result to retrieve
pause = 2.0, # Lapse between HTTP requests
):
my_results_list.append(i)
return my_results_list
trending_searches('Canada')