Highest Voted 'google-hadoop' Questions

2

votes

2 answers

Where is the source of datastore-connector-latest.jar? Could I add this as a maven dependency?

I got connectors from https://cloud.google.com/hadoop/datastore-connector But I'm trying to add the datastore-connector (and bigquery-connector too) as a dependency in the pom... I don't know if it this is possible. I could not find the right…

asked Jan 30 '15 at 01:16

Eric N. Jurio

23
5

2

votes

1 answer

NullPointerException running a Spark job

I am running a job on Spark in standalone mode, version 1.2.0 The first operation I am doing is taking an RDD of folder paths, and generating an RDD of file names, composed of the files reside in each folder: JavaRDD filePaths =…

apache-spark google-hadoop

asked Jan 15 '15 at 12:26

Yaniv Donenfeld

565
2
8
26

2

votes

1 answer

Spark - "too many open files" in shuffle

Using Spark 1.1 I have 2 datasets. One is very large and the other was reduced (using some 1:100 filtering) to much smaller scale. I need to reduce the large dataset to the same scale, by joining only those items from the smaller list with their…

bigdata apache-spark google-hadoop

asked Dec 01 '14 at 21:18

Yaniv Donenfeld

565
2
8
26

2

votes

1 answer

Getting 'sudo: unknown user: hadoop' and 'sudo: unable to initialize policy plugin error' on Google Cloud Platform while running hadoop cluster

I am trying to deploy the sample Hadoop app provided by Google at https://github.com/GoogleCloudPlatform/solutions-google-compute-engine-cluster-for-hadoop on Google Cloud Platform. I followed all the setup instructions given there step-by-step. I…

linux hadoop google-compute-engine google-cloud-platform google-hadoop

asked Nov 04 '14 at 14:32

user2602096

35
1
2
4

2

votes

1 answer

Hadoop 2.4.1 and Google Cloud Storage connector for Hadoop

I am trying to run Oryx on top of Hadoop using Google's Cloud Storage Connector for Hadoop: https://cloud.google.com/hadoop/google-cloud-storage-connector I prefer to use Hadoop 2.4.1 with Oryx, so I use the hadoop2_env.sh set-up for the hadoop…

hadoop google-compute-engine google-hadoop oryx

asked Oct 01 '14 at 20:36

Rich

132
9

2

votes

1 answer

How to enable Snappy/Snappy Codec over hadoop cluster for Google Compute Engine

I am trying to run Hadoop Job on Google Compute engine against our compressed data, which is sitting on Google Cloud Storage. While trying to read the data through SequenceFileInputFormat, I get the following…

google-api google-api-java-client google-compute-engine snappy google-hadoop

asked Aug 21 '14 at 22:24

obaid

285
1
4
12

1

vote

2 answers

GCS - Global Consistency with delete + rename

My issue may be a result of my misunderstanding with global consistency in google storage, but since I have not experienced this issue until just recently (mid November) and now it seems easily reproducible, I wanted some clarification. The issue…

google-cloud-storage google-hadoop

asked Dec 22 '15 at 18:54

lukeforehand

750
6
11

1

vote

1 answer

GoogleHadoopFileSystemBase.setTimes() not working

I have a reference to the GoogleHadoopFileSystemBase in my java code, and I’m trying to call setTimes(Path p, long mtime, long atime) to modify the timestamp of a file. It doesn’t seem to be working though, even though other FileSystem apis work…

google-hadoop

asked Nov 10 '15 at 22:13

Alvin C

47
1
6

1

vote

1 answer

Spark - Can't read files from Google Cloud Storage when configuring gcs connector manually

I have a Spark Cluster deployed using bdutil for Google Cloud. I installed a GUI on my driver instance to be able to run IntelliJ from it, so that I can try to run my Spark processes in interactive mode. The first issue I faced was that the…

scala intellij-idea apache-spark google-hadoop

asked Jul 27 '15 at 12:05

Gouffe

161
1
10

1

vote

0 answers

Loading data into Google Datastore kind from local hdfs(local machine) using google-datastore-connector for hadoop?

I have used google-cloud-storage-connector for Hadoop and able to run mapreduce job that takes input from my local HDFS (Hadoop running in my local machine) and places the result in Google Cloud Storage bucket. Now I want to run a mapreduce job…

hadoop google-cloud-datastore google-hadoop

asked Jun 03 '15 at 14:48

hadoop godc

11
2

1

vote

0 answers

want help for running MapReduce programs on Google Cloud storage

I am using Google Cloud Storage for Hadoop 2.3.0 using GCS connector. I have added GCS.jar to lib directory of my hadoop installation an added path to GCS connector in hadoop-env.sh file as: export…

hadoop mapreduce google-cloud-storage google-cloud-platform google-hadoop

asked Jun 03 '15 at 02:49

Navjyot Grewal

11
4

1

vote

1 answer

Connect hadoop cluster to mutiple Google Cloud Storage backets in multiple Google Projects

It is possible, to connect my Hadoop cluster to multiple Google Cloud Projects at once ? I can easly use any Google Storage bucket in single Google Project via Google Cloud Storage Connector as explained in this thread Migrating 50TB data from local…

hadoop google-cloud-storage google-hadoop

asked May 06 '15 at 16:04

user2084180

13
2

1

vote

2 answers

Google Cloud Engine : LibSnappy not installed errur during command-line installation of Hadoop

I'm trying to install a custom Hadoop implementation (>2.0) on Google Compute Engine using the command line option. The modified parameters of my bdutil_env.sh file are as…

hadoop google-compute-engine google-hadoop

asked Apr 30 '15 at 05:36

darshanvalia

37
6

1

vote

1 answer

What is the number of reducer slots on GCE Hadoop worker nodes?

I am testing the scaling of some MapReduce jobs on Google Compute Engine's Hadoop cluster, and finding some unexpected results. In short, I've been told this behavior may be explained by a having a multiple number of reducer slots per each worker…

hadoop mapreduce google-compute-engine google-hadoop

asked Apr 01 '15 at 18:58

Rich

132
9

1

vote

1 answer

Hive queries of external tables stored on Google Cloud Storage extremely slow

I have begun testing The Google Cloud Storage connector for Hadoop. I am finding it incredibly slow for hive queries run against it. It seems a single client must scan the entire file system before starting the job, 10s of 1000s of files this takes…

google-hadoop

asked Feb 27 '15 at 13:18

Sean

61
6

Questions tagged [google-hadoop]