4

I use Spark 1.6.

We have a HDFS write method that wrote to HDFS using SqlContext. Now we needed to switch over to using HiveContext. When we did that existing unit tests do not run and give the error

Error XSDB6: Another instance of Derby may have already booted the database <local path>\metastore_db

This happens whether I run a single test via IntelliJ test runner or via maven on the command line.

As I understand the issue happens when multiple HiveContexts or multiple processes are trying to access the metastore_db. However I am running a single test and no other jobs on my local machine so I fail to understand where the multiple processes are coming from

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Satyam
  • 645
  • 2
  • 7
  • 20
  • I doubt executing a **single** test would give you the exception. You can create a separate project with just a single test and check it out yourself. How many tests do you have in your project? How do you execute them? Is this maven or sbt? – Jacek Laskowski Jun 02 '17 at 07:11
  • The project has many tests but I restricted it to running one test using maven from the command line with the -Dtest=XYZ option and still faced the issue – Satyam Jun 02 '17 at 07:25
  • 1
    Use `lsof` to figure out which processes have the Derby lock file open. – Bryan Pendleton Jun 02 '17 at 13:49

3 Answers3

1

When HiveContext gets instantiated, it creates a metastore directory with the name of metastore_db in your test path. so deleting this directory after your test would allow you to create HiveContext again.

Java:

FileUtils.deleteDirectory(new Path(path of metastore_db));
mkl
  • 90,588
  • 15
  • 125
  • 265
0

Figured out why I was getting an error. In the unit test we were writing data to ORC on the local file system and then reading to verify the write was done properly.

The write and read methods were creating their own HiveContexts in the same process which resulted in the lock on the metastore. I am guessing that when it was SqlContext it wasn't a blocker since a local metastore was not needed.

We have now moved to creating the HiveContext when we construct our persistence service. Semantically that makes more sense. This option was chosen over creating and destroying a new SparkContext (and thereby a new HiveContext) for every test since that would add considerable overhead to our test suite without providing much benefit (please do correct me if you have a different opinion)

Satyam
  • 645
  • 2
  • 7
  • 20
0

Even I was getting the same error though I was running a test suite.

I could run the individual test file successfully but when I ran suite a few tests kept failing. There were many tests doing IO in local file system using SparkSession.

In that situation, use after method in every test file(in my case, it was missing in 1-2 files) to close this session.

after { 
 sparkSession.stop() 
}
Pardeep
  • 945
  • 10
  • 18