1

Hello I'm using IBM Bluemix. Here I'm using an Apache Spark notebook and loading data from dashDB I'm trying to provide a visualization and it's not displaying the rows, just the columns.

def get_file_content(credentials):

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)


props = {}
props['user'] = credentials['username']
props['password'] = credentials['password']

# fill in table name
table = credentials['username'] + "." + "BATTLES"

   data_df=sqlContext.read.jdbc(credentials['jdbcurl'],table,properties=props)
data_df.printSchema()

return StringIO.StringIO(data_df)

When i use this command:

data_df.take(5)

I get the information of the first 5 rows of data with both columns and rows. But when I do this:

content_string = get_file_content(credentials)
BATTLES_df = pd.read_table(content_string)

I get this error:

ValueError: No columns to parse from file

And then when i try to see the .head() or .tail() only the column names are displayed.

Does anyone see the possible problem here? I have very poor knowledge of python. Please and thank you.

Saraida
  • 39
  • 7

2 Answers2

1

This is the solution that works for me. I replaced BATTLES_df = pd.read_table(content_string)

with

BATTLES_df=data_df.toPandas()

Thank you

Saraida
  • 39
  • 7
0
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS=notebook

And go to your spark directory

cd ~/spark-1.6.1-bin-hadoop2.6/

./bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_scalaversion:spark_version-M1

And you can write following code.

import pandas as pd
Beyhan Gul
  • 1,191
  • 1
  • 15
  • 25