Spark-Shell Startup Errors

Question

I am seeing errors when starting spark-shell, using spark-1.6.0-bin-hadoop2.6. This is new behavior that just arose.

The upshot of the failures displayed in the log messages below, is that sqlContext is not available (but sc is).

Is there some kind of Derby lock that could be released? Another instance of Derby may have already booted the database /root/spark-1.6.0-bin-hadoop2.6/bin/metastore_db.

<console>:16: error: not found: value sqlContext
         import sqlContext.implicits._
                ^
<console>:16: error: not found: value sqlContext
         import sqlContext.sql

16/05/25 11:00:00 ERROR Schema: Failed initialising database.
Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@c2191a8, see the next exception for details.
org.datanucleus.exceptions.NucleusDataStoreException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@c2191a8, see the next exception for details.


16/05/25 11:06:02 WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient


16/05/25 11:06:02 ERROR Schema: Failed initialising database.
Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@372e972d, see the next exception for details.
org.datanucleus.exceptions.NucleusDataStoreException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@372e972d, see the next exception for details.

Caused by: java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@c2191a8, see the next exception for details.
        at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
        ... 134 more
Caused by: java.sql.SQLException: Another instance of Derby may have already booted the database /root/spark-1.6.0-bin-hadoop2.6/bin/metastore_db.
        at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
        at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
        ... 131 more
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /root/spark-1.6.0-bin-hadoop2.6/bin/metastore_db.

Yes, there is a Derby lock file that prevents two unrelated apps from trying to update the same embedded database simultaneously. — Bryan Pendleton, Jun 01 '16 at 20:16
Would it make sense to modify or delete that file in order to resolve this behavior? — slachterman, Jun 03 '16 at 09:52
No, that would lead to a corrupted database. The right approach is to figure out which are the two applications that are independently trying to open the same embedded database simultaneously, and why they are both trying to run at once. Either give them each their own DB, don't run them concurrently, or re-configure your system so they can share the DB by using the Derby server rather than the Derby embedded DB configuration. — Bryan Pendleton, Jun 03 '16 at 13:43
Thanks Bryan. This appears to happen when spark-shell does not exit gracefully (as in a hung session), and then a new session invokes spark-shell. Is there anything one can do in this case to "reset" the embedded DB, i.e. close the hung connection? — slachterman, Jun 07 '16 at 16:09
When Derby is running in embedded mode, the Derby code is running directly in the containing JVM. There is no way to communicate with the Derby code other than from Java code in that JVM, and no way to stop the Derby engine without stopping the the containing JVM. The "connection" to the DB is wholly within that containing JVM. There are other Derby configurations (e.g., client-server) which behave differently, but for the embedded configuration this is the reality of the situation. — Bryan Pendleton, Jun 07 '16 at 21:46

score 36 · Answer 1 · answered Sep 24 '16 at 15:49

36

I had a similar problem in a Spark 2.0.0 shell when trying to create a DataFrame, just remove metastore_db/dbex.lck and the problem is fixed.

answered Sep 24 '16 at 15:49

bachr

5,780
12
57
92

better not do this for data safe – Pengfei.X Dec 10 '16 at 05:03
@Pengfei.X why? and what do you suggest? – bachr Dec 12 '16 at 12:12
It worked for me too, Not sure why suddenly this error was appeared.. Thanks. – Rashmit Rathod Aug 12 '17 at 07:08
1

I did delete the metastore_db directory, the problem was solved. This directory is temporary. If you delete it, it will recreate. – Ivan Lee Sep 28 '17 at 03:00
1

Deleting dbex.lck didn't work for me (Spark 2.3.0). I also tried deleting all of metastore_db, but that didn't work either. Running spark-shell with sudo works as andy's said below, but this doesn't seem to be an ideal solution. – Logan Apr 05 '18 at 17:31

Sheel · Answer 2 · 2018-03-12T10:56:43.630

2

This issue happens because of Metastore_db,it is created when the spark shell is looking for db and start registering it.You can remove metastore_db completely because each and every time it will be created.If you are not able to delete then first delete metastore_db/dbex.lck then you can able to delete the metastore_db folder.I have used spark2.1.10 and faced same issue earlier.

edited Mar 12 '18 at 10:56

answered Nov 09 '17 at 10:04

Sheel

847
2
8
20

score 2 · Answer 3 · answered Feb 08 '18 at 11:50

2

similar to Andy's answer. I had the same issue in windows, and here is the solution:

Run cmd on windows as administrator
navigate to spark home directory
open spark-shell

c:\park\bin> spark-shell

answered Feb 08 '18 at 11:50

Wael Almadhoun

401
4
7

andy · Answer 4 · 2017-09-29T14:15:52.340

1

The best way to resolve the problem is to first restart your system, then go to spark home directory and from there try to run with spark-shell with sudo user.

sudo bin/spark-shell

or if you want to use pyspark instance, the type

sudo bin/pyspark

the problem mainly arises, due to insufficient privileges for mertstore_db.

edited Sep 29 '17 at 14:15

answered Sep 29 '17 at 14:06

andy

525
3
6
22

score 1 · Answer 5 · edited Oct 01 '18 at 14:14

1

In my case Hive was also started along with Spark. So I closed the Hive server and restarted the Spark shell to make it work.

edited Oct 01 '18 at 14:14

Stephen Rauch

47,830
31
106
135

answered Oct 01 '18 at 13:50

RushHour

494
6
25

score -3 · Answer 6 · answered May 27 '16 at 20:48

-3

Issue appears to have been ephemeral as we are no longer experiencing this behavior.

answered May 27 '16 at 20:48

slachterman

1,515
4
17
23

1

I'm having it on both 2.1.0 and 2.1.0 – Frank B. Feb 23 '17 at 17:15

Spark-Shell Startup Errors

6 Answers6