0

Cloudera claim to have a Quick Start approach. That is not working for me I note.

When I invoke spark-shell I get:

... WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version

I find it confusing, this is after all a Quick Start and this looks odd.

So:

  1. I see that there is mysql running with metastore db. I can access this fine.

  2. Do I need to start hive metastore if using mysql as hive metastore? I think so, but ...

  3. Do I need hive server 2 now to run locally? Or can I run without?

  4. The Cloudera Manager on the Hive Tab tells me I am using mysql and I see an auto generated hive-site.xml.

In short I am not sure how nto proceed to fix this. One of the logs is talking about failure to create derby e.g. ...

Caused by: java.sql.SQLException: Failed to create database 'metastore_db', see the next exception for details.

In short I am seeking guidance on how to fix this.

Before one of the numerous crashes I have had, I had an sbt assembly of SPARK / SCALA working fine accessing a remote MYSQL db, so I am wondering if that is the way to go and that the spark-shell and the local Cloudera VM are all to unstable.

Seeking guidance amidst frustration. Data Bricks works like a dream.

Thanks in advance.

thebluephantom
  • 16,458
  • 8
  • 40
  • 83
  • Hive metastore is a standalone process. It needs to run on top of the active mysql process. HiveServer2 is what you would use Hive JDBC to connect with, but Spark doesn't need it... But `metastore_db` is usually an embdedded Derby database, not a MySQL table name – OneCricketeer Mar 27 '18 at 17:34
  • Understand that on metastore DB, but when I look in Cloudera Manager it states I am accessing / intending to access mysql as hive metastore, – thebluephantom Mar 27 '18 at 17:38
  • Every release things work less well than before – thebluephantom Mar 27 '18 at 19:28
  • I would recommend asking the Cloudera Forums about those problems. I haven't used the quickstart vm in years – OneCricketeer Mar 27 '18 at 20:20
  • I have lost faith as they worked in the past and I used CM to update and none of it works. A shame, thx. I look at those forums, but to no avail – thebluephantom Mar 27 '18 at 21:10
  • 1
    Installed Cloudera QucikStart 5.13 and no such issues but other issues. Tip SKIP 5.12 – thebluephantom Mar 27 '18 at 21:38
  • Cool thanks. Like I said, I have a script to actually install CDH cluster from scratch, so I don't use the bloated quickstart VM – OneCricketeer Mar 28 '18 at 03:39
  • That's also a lot of work, I did that but there are many considerations. The CM changes do simply not take effect and many bugs recurring. Can you point me to the best tut / book for installation? – thebluephantom Mar 28 '18 at 08:08

1 Answers1

0

Install 5.13, other problems but these ones disappear. Noted however what the cause is.

When a clean install is done and

sudo jps 

executed, then all Hadoop services are fine and working. Checked this.

What is then noted is that the Cloudera Manager Console (CMS) never shows. Advice on Internet is to execute the command to invoke CM Express.

Once you do that, then the CMS shows, but many Hadoop Services need to be (re-)started. Point then is that spark-shell goes haywire and the metastore no longer accessible. All in all a sorry mess for which the solution is not so obvious.

Manual install of Hadoop may well be the best option, but a definitive integrated spec is needed. Then also have issues with Spark 2.x not being supported and KUDU not there, parcel vs. packages.

thebluephantom
  • 16,458
  • 8
  • 40
  • 83