2

There are good examples to use it, as here, but on spark-shell " ... createDF is not a member of org.apache.spark.sql.SparkSession".

PS: using Spark v2.2.


EDIT: sorry all, it is an external lib. Little change in the question: how to import a Github lib in a Spark-shell session?

Peter Krauss
  • 13,174
  • 24
  • 167
  • 304

1 Answers1

2

createDF() is not SparkSession method. It is spark-daria method. You need to install dependancy and import the spark-daria library them you should be able to use it. Below article for your reference.

https://medium.com/@mrpowers/manually-creating-spark-dataframes-b14dae906393


how to import a Github lib in a Spark-shell session?

You can use this alias with your appropriate "etc" in the properties-file value.

alias sshell_daria='export SPARK_MAJOR_VERSION=2; spark-shell --packages mrpowers:spark-daria:0.35.0-s_2.11 --properties-file /opt/_etc_/_etc2_/conf/sparkShell.conf'

but, it not work fine always, Spark-shell stop to work after this this messages

SPARK_MAJOR_VERSION is set to 2, using Spark2
Ivy Default Cache set to: /home/_etc_/.ivy2/cache
The jars for the packages stored in: /home/_etc_/.ivy2/jars
:: loading settings :: url = jar:file:/usr/hdp/2.6.4.0-91/spark2/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.mrpowers#spark-daria added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]

You can download the current version as jar file at dl.bintray.com, and --jars option instead packages. So, the correct alias in this case is

alias sshell_daria='export SPARK_MAJOR_VERSION=2; spark-shell --jars _your_path_/spark-daria-0.35.0-s_2.12.jar  --properties-file /opt/_etc_/_etc2_/conf/sparkShell.conf'
Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
Manish
  • 1,144
  • 8
  • 12
  • Hi @Manish! Thanks the post and sorry my mistake (it is a external library Daria)... Well, can you explain how to import https://github.com/MrPowers/spark-daria with Spark-shell? – Peter Krauss Oct 02 '19 at 20:54
  • Download spark-daria jar copy it the node. Assuming scala version is 2.11 spark-shell should started with below command > spark-shell --packages mrpowers:spark-daria:0.35.0-s_2.11 – Manish Oct 02 '19 at 21:06
  • Just FYI jar download link : [link] http://dl.bintray.com/spark-packages/maven/mrpowers/spark-daria/0.35.0-s_2.12/spark-daria-0.35.0-s_2.12.jar – Manish Oct 02 '19 at 21:22
  • Hi, after all, using `sshell_daria` and the example `val someDF = spark.createDF(...)` of the illustration link, get *"error: value createDF is not a member of org.apache.spark.sql.SparkSession"* – Peter Krauss Oct 03 '19 at 15:25
  • PS: to run `createDataFrame()` of the [illustration link](https://medium.com/@mrpowers/manually-creating-spark-dataframes-b14dae906393), I used `sshell_daria` and `import org.apache.spark.sql.Row`. The error at `val someSchema` declaration was *"error: not found: value StructField"*. – Peter Krauss Oct 03 '19 at 15:30