2

I have a tool that processes data in-memory with pandas DataFrames, and I'd like to be able to use Spanner as a data source for some of that processing. How can I use Python to run a query in Spanner and then download all the query results to a pandas DataFrame?

Tim Swast
  • 14,091
  • 4
  • 38
  • 61

3 Answers3

1

A quick and dirty way to get spanner results into a pandas data frame.

import pandas as pd
from google.cloud import spanner

# Initialize client
client = spanner.Client()

# Get a Cloud Spanner instance by ID.
instance = client.instance('instance-name')

# Get a Cloud Spanner database by ID.
database = instance.database('database-name')

with database.snapshot() as snapshot:
    result = snapshot.execute_sql("SELECT * FROM somewhere")
    
    # Stream in rows
    rows = list()
    for row in result:
        rows.append(row)

    # Get column names
    cols = [x.name for x in result.fields]

    # Convert to pandas dataframe
    result_df = pd.DataFrame(rows, columns = cols)

This likely won't scale and you may run into issues with Spanner types vs pandas types, but it will solve the immediate problem of "I want to analyze data from Spanner in pandas."

bsauce
  • 624
  • 4
  • 12
1

Pandas lib is using sqlalchemy, so we can use this doc: https://cloud.google.com/spanner/docs/use-sqlalchemy

pip install sqlalchemy-spanner

then in python code (if sqlalchemy version >= 1.4):

import pandas as pd

url = 'spanner+spanner:///projects/project-id/instances/instance-id/databases/database-id'
sql = 'SELECT * FROM my_table;'

df = pd.read_sql(sql, url, index_col='id_column')

or in case it's sqlalchemy version 1.3:

...
url = 'spanner:///projects/project-id/instances/instance-id/databases/database-id'
...
0

To use Python to query your data in Cloud Spanner yoou need to install and use the Python Cloud Spanner client library.

As of now there is no a straightforward way to download data from Spanner to Pandas DataFrame.

I would suggest to use the "StreamedResultSet API" to export your data to Pandas.

Also please take a look at this post about streaming data from Cloud Spanner to Panda dataframe, as it may be proven helpful implementing your use case as well.

tzovourn
  • 1,293
  • 8
  • 18