Highest Voted 'google-hadoop' Questions

1

vote

1 answer

Adding or removing nodes from an existing GCE hadoop/spark cluster with bdutil

I'm getting started with running a spark cluster on google compute engine backed by google cloud storage that is deployed with bdutil (on the GoogleCloudPlatform github), I am doing this as follows: ./bdutil -e…

google-cloud-platform google-hadoop

asked Feb 11 '15 at 12:37

Gavin

98
8

1

vote

1 answer

Using ignoreUnknownValues from Hadoop BigQuery Connector

I'm piping unstructured event data through Hadoop and want to land it in BigQuery. I have a schema that includes most of the fields, but there are some fields I want to ignore or don't know about. BigQuery has a configuration field called…

hadoop google-bigquery google-hadoop

asked Feb 03 '15 at 03:01

tmandry

1,295
16
33

1

vote

1 answer

Google Cloud Hadoop Nodes not yet sshable error

I ran the following commands referring to https://cloud.google.com/hadoop/setting-up-a-hadoop-cluster on cygwin. gsutil.cmd mb -p [projectname] gs://[bucketname] ./bdutil -p [projectname] -n 2 -b [bucketname] -e hadoop2_env.sh …

cygwin google-cloud-storage google-cloud-platform google-hadoop

asked Jan 19 '15 at 23:58

sri

41
1
4

1

vote

1 answer

Hadoop on Google Compute Engine

I am trying to setup hadoop cluster in Google Compute Engine through "Launch click-to-deploy software" feature .I have created 1 master and 1 slave node and tried to start the cluster using start-all.sh script from master node and i got error…

hadoop google-compute-engine google-hadoop

asked Nov 11 '14 at 10:13

Arjun Marthala

33
2

1

vote

1 answer

Strange errors when running a Spark job

I am running a spark cluster with 80 machines. Each machine is a VM with 8-core, and 50GB memory (41 seems to be available to Spark). I am running on several input folders, I estimate the size of input to be ~250GB gz compressed. I get errors in the…

hadoop bigdata apache-spark google-hadoop

asked Oct 24 '14 at 22:54

Yaniv Donenfeld

565
2
8
26

1

vote

1 answer

Fail to run Spark job when using globStatus and Google Cloud Storage bucket as input

i am using Spark 1.1. I have a Spark job that seeks for a certain pattern of folders only under a bucket (i.e. folders that start with...), and should process only those. I achieve this by doing the following: FileSystem fs = FileSystem.get(new…

hadoop google-cloud-storage apache-spark google-hadoop

asked Oct 19 '14 at 12:50

Yaniv Donenfeld

565
2
8
26

1

vote

1 answer

Issues Google Cloud Storage connector on Spark

I am trying to install the Google Cloud Storage on Spark on Mac OS to do local testing of my Spark app. I have read the following document (https://cloud.google.com/hadoop/google-cloud-storage-connector). I have added…

apache-spark google-hadoop

asked Oct 02 '14 at 10:07

poiuytrez

21,330
35
113
172

1

vote

1 answer

Maintaining persistent HDFS in Google Cloud

I am having my students use bdutil to create a Google Compute Engine cluster with persistent disks and HDFS as the default filesystem. We want to have persistent disks so that the students can work on projects over a period of weeks. However, HDFS…

google-compute-engine google-cloud-platform google-hadoop

asked Oct 01 '14 at 13:48

user2913094

981
9
16

1

vote

3 answers

Unable to SSH into VM causing problems with Hadoop install using bdutil

I have been through most of Questions surrounding this issue on this site however nothing seems to have helped me. Basically what I am trying to do is instantiate a Hadoop instance on my VM via the bdutil script supplied by Google , however the…

hadoop ssh google-compute-engine google-cloud-platform google-hadoop

asked Sep 18 '14 at 22:10

cleveen

11
1

0

votes

1 answer

Hive external table location in google cloud storage is ignoring subdirectories

I have a bunch of large csv.gz files in google cloud storage that we got from an external source. We need to bring this in BigQuery so we can start querying but BigQuery cannot directly ingest CSV GZIPPED files larger than 4GB. So, I decided to…

hadoop hive google-cloud-storage google-hadoop

asked Dec 21 '22 at 08:56

jatinw21

1
2

0

votes

1 answer

Google BigQuery Spark Connector: How to ignore unknown values on append

We use the Google BigQuery Spark Connector to import data stored in Parquet files into BigQuery. Using custom tooling we generated a schema file needed by BigQuery and reference that in our import code (Scala). However, our data doesn't really…

scala google-bigquery google-hadoop

asked Mar 15 '17 at 12:40

Olaf Bergner

3
1

0

votes

1 answer

Google Hadoop Filesystem Encryption

In normal operation one can provide encryption keys to the google storage api to encrypt a given bucket/blob: https://cloud.google.com/compute/docs/disks/customer-supplied-encryption Is this possible for the output of spark/hadoop jobs "on the…

google-cloud-platform google-cloud-dataproc google-hadoop

asked Feb 20 '17 at 13:28

devl

429
6
15

0

votes

1 answer

(bdutil) Unable to get hadoop/spark cluster working with a fresh install

I'm setting up a tiny cluster in GCE to play around with it but although instances are created some failures prevent to get it working. I'm following the steps in https://cloud.google.com/hadoop/downloads So far I'm using (as of now) lastest…

apache-spark google-hadoop

asked Feb 10 '17 at 16:35

jose lorenzo

3
3

0

votes

1 answer

Google Cloud connector for Hadoop doesn't work with Pig

I'm using Hadoop with HDFS 2.7.1.2.4 and Pig 0.15.0.2.4 (Hortonworks HDP 2.4) and trying to use Google Cloud Storage Connector for Spark and Hadoop (bigdata-interop on GitHub). It works correctly when I try, say, hadoop fs -ls gs://bucket-name But…

hadoop apache-pig google-hadoop

asked Apr 13 '16 at 16:40

sckol

131
2
13

0

votes

1 answer

Never successfully built a large hadoop&spark cluster

I was wondering if anybody could help me with this issue in deploying a spark cluster using the bdutil tool. When the total number of cores increase (>= 1024), it failed all the time with the following reasons: Some machine is never sshable, like…

google-hadoop

asked Dec 08 '15 at 22:02

Parthus

1
2

Questions tagged [google-hadoop]