2

I've been wracking my brain on this issue for a couple of days now and so wanted to get the community to help. I'm using Python 3 in a Jupyter Notebook but want to eventually make this a script.

Problem

I have a Pandas DataFrame that has three columns (query, URL, Trend). All of the data is working. I'm even able to produce the plot images for each query; however, I can't get the images of the Trend plots to display in the Trends column of my DataFrame. It just shows AxesSubplot(0.125,0.125;0.775x0.755). There are no error messages (although, I've rangled a few).

What I've tried

I've taken a look at fig.savefig() and this seems to be close to what I need but not 100% sure how to make that work in my situation as it outputs all the plots and I need one plot per query.

I've also given this solution on Stack a go but I haven't been able to implement it for my code (it's close though).

This article is also very close to the implementation I'm looking for results-wise (and my current implementation) but I don't know how to get the HTML URL of the image of a plot I just created. As well, ideally, the image doesn't need to be saved as this is likely to be a daily report going out via email.

Results so far

Dataframe

Query URL Trend

0 Margaret Trudeau https://montreal.ctvnews.ca/margaret-trudeau-d... AxesSubplot(0.125,0.125;0.775x0.755)

1 Nick Cordero https://people.com/theater/nick-cordero-doctor... AxesSubplot(0.125,0.125;0.775x0.755)

And, so on.

These are the Trend plots that come out of the code just below the DataFrame.

Image of trend plot output

Image of trend plot output

And, so on.

It appears everything works but...

Expected result

The DataFrame should be the same as above but I want AxesSubplot(0.125,0.125;0.775x0.755) to be an image of the trend plot.

Current Code

from googlesearch import search

import pandas as pd
import matplotlib.pyplot as plt
from pytrends.request import TrendReq

pytrend = TrendReq()

def trending_searches(geo):
    geo_lower = str.lower(geo) # Need geolocation to be lowercase to work
    df = pytrend.trending_searches(pn=geo_lower)

    queries = [] # Hold all processed queries

    for index, row in df.iterrows():
        i = str(row)
        j = str.strip(i, '0    ')
        k = j.split("\n", 1)[0]
        queries.append(k)

    # Gets the trend data 
    trend = pd.DataFrame(get_trend(queries))

    # Gets the first URL from each query on Google
    urls = pd.DataFrame(get_urls(queries))

    # column lable
    trend.columns = ["Trend"]
    urls.columns = ["URL"]
    df.columns = ['Query']

    #Concat all into one dataframe
    result = pd.concat([df, urls, trend], axis=1)

    #html = result.to_html() #Convert to HTML for emails

    return result #html

def get_trend(kw: list) -> list:

    query = kw

    # Get trend data
    my_results_list = []
    for j in query:    
        pytrend.build_payload(kw_list=[j])
        df = pytrend.interest_over_time()
        my_results_list.append(df)

    # Plot trend data
    plots = []
    for i in range(len(my_results_list)):
        if my_results_list[i].empty == True: # To mitigate queries that have no data
            plots.append("No data avaiable")
        else:
            plot = my_results_list[i].plot(kind='line', figsize=(5,1), sharex=True)
            plots.append(plot)

    return plots

def get_urls(kw: list) -> list:

    query = kw

    my_results_list = []
    for j in query:    
        for i in search(j,        # The query you want to run
                    tld = 'ca',  # The top level domain
                    lang = 'en',  # The language
                    num = 10,     # Number of results per page
                    start = 0,    # First result to retrieve
                    stop = 1,  # Last result to retrieve
                    pause = 2.0,  # Lapse between HTTP requests
                   ):

            my_results_list.append(i)

    return my_results_list

trending_searches('Canada')
zsltg
  • 725
  • 5
  • 14
  • 23

1 Answers1

2

Once you have the fig already built, you have to transform it to binary, and then store into a var as a value

buf = io.BytesIO()
fig.savefig(buf, format='png')
buf.seek(0)
string = base64.b64encode(buf.read())

Then you have to aggregate the HTML tags which corresponds to output an image

uri = 'data:image/png;base64,' + urllib.parse.quote(string)
html = '<img src = "%s"/>' % uri

To simulate your dataframe I've created 3 lists and appended each result to it as follows:

for i in range(10):
    ...
    query.append(i+1)
    url.append("google.com")
    trend.append(html)

df = pd.DataFrame({"query": query,"url": url,"trend":trend})
df.head()

enter image description here

Then I did the dataframe.to_html(escape=False) and it did result in the HTML with the images.

enter image description here

Obviously I am using here jupyter, but with the HTML built you should be able to print it as you were doing without further issues.

So finally, I arrived at the code below:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import urllib, urllib.parse, base64, io, base64
from IPython.core.display import display, HTML

# simulate dataframe
query = []
url = []
trend = []
for i in range(10):

    #save fig
    plt.plot(np.random.rand(5));
    fig = plt.gcf();

    #store it as binary
    buf = io.BytesIO()
    fig.savefig(buf, format='png')
    buf.seek(0)
    string = base64.b64encode(buf.read())

    #complement with HTML tags
    uri = 'data:image/png;base64,' + urllib.parse.quote(string)
    html = '<img src = "%s"/>' % uri

    #clear matplotlib's cache
    plt.clf()

    #append results
    query.append(i+1)
    url.append("google.com")
    trend.append(html)

#build df
df = pd.DataFrame({"query": query,"url": url,"trend":trend})
#parse into html
html = df.to_html(escape=False)

Could you check whether it worked? :)

Caio Oliveira
  • 1,243
  • 13
  • 22
  • 1
    Hey Caio! This is really helpful. I'm having a similar issue to my original problem. So, I tried your code as stand-alone, which worked! I then integrated it into my code. It's now putting the value into the dataframe but won't display the image. After a few trials, I added plt.show() to ensure a plot was being created for each row. Yup, it is. Then tried calling the function wrapped in the display HTML. display(HTML(trending_searches('Canada'))). Not sure what I'm missing but we're obviously close! – Jordan Stevens Apr 30 '20 at 13:35
  • 1
    I figured it out. Need to add escape=False when calling the to_html() – Jordan Stevens Apr 30 '20 at 13:40
  • How to save it as .xlsx? – Hamza usman ghani Jan 20 '21 at 10:42
  • From Pandas dataframe there's the option to save as an excel `df.to_excel('path/file.xlsx')`. Not sure though whether it will properly display the images. – Caio Oliveira Jan 21 '21 at 21:53