Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

votes

1 answer

How can i use sql like and clauses in ElasticSearch?

i read this document to understand equality of sql in elasticsearch.(https://taohiko.wordpress.com/2014/07/18/query-dsl-elasticsearch-vs-sql/) i developed a kinda elasticsearch application it is making indexes from my data if i call below post query…

elasticsearch amazon-elastic-beanstalk elasticsearch-plugin elastic-map-reduce spring-data-elasticsearch

asked Feb 27 '17 at 08:00

loki

2,926
8
62
115

votes

4 answers

Read from HDFS and write to HBASE

The Mapper is reading file from two places 1) Articles visited by user(sorting by country) 2) Statistics of country (country wise) The output of both Mapper is Text, Text I am running program of Amazon Cluster My aim is read data from two different…

hadoop hbase elastic-map-reduce

asked Feb 24 '17 at 14:35

Ankush Singh

votes

1 answer

How to free up resources on AWS EMR cluster?

I have a common problem where I start an AWS EMR Cluster and log in via SSH and then run spark-shell to test some Spark code and sometimes I lose my internet connection and Putty throws an error that the connection was lost. But it seems the Spark…

apache-spark hadoop-yarn emr amazon-emr elastic-map-reduce

asked Feb 07 '17 at 11:26

V. Samma

2,558
8
30
34

votes

1 answer

Which YARN configuration parameters are re-read on each application?

I've got one job that's much bigger than the other 50 or so that run in my daily workflow. I'd like the property yarn.app.mapreduce.am.resource.mb to be larger for just the big job. Am I in luck? How can I tell which properties require a complete…

hadoop mapreduce hadoop-yarn elastic-map-reduce

asked Jan 27 '17 at 23:04

Judge Mental

5,209
17
22

votes

0 answers

Hadoop MapReduce MultipleOutputs - one in Mapper, one in Reducer

I want to use multiple outputs in a Hadoop job in Elastic MapReduce. So, I set up MultipleOutputs in the main() method like so: MultipleOutputs.addNamedOutput(hadoopJob, "One", TextOutputFormat.class, NullWritable.class,…

java hadoop mapreduce elastic-map-reduce

asked Oct 11 '16 at 15:03

John Chrysostom

3,973
1
34
50

votes

1 answer

Hive query throwing exception - Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null

I just upgraded hive version to 2.1.0 for both hive-exec and hive-jdbc. But because of this, some queries started failing that previously working fine. Exception - Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error while…

amazon-web-services hadoop mapreduce hive elastic-map-reduce

asked Sep 16 '16 at 14:16

devsda

4,112
9
50
87

votes

0 answers

Dealing with a LARGE data in mongodb

It is going to be a "general-ish" question but I have a reason for that. I am asking this because I am not sure what kind of approach shall I take to make things faster. I have a mongoDB server running on a BIG aws instance (r3.4xlarge 16 core vCPU…

mongodb mapreduce amazon-redshift elastic-map-reduce bigdata

asked Aug 30 '16 at 16:30

SRC

2,123
3
31
44

votes

1 answer

How can I write a MapReduce code in Python to implement a matrix transpose.

Assume the input file is a .txt and I am trying to run it on a cluster(like EMR on AWS) to test.

python mapreduce elastic-map-reduce

asked Aug 23 '16 at 05:42

Rick Starsky

votes

1 answer

API to get count of task instance group instances in AWS EMR

I want to get the count of task instance groups instances in AWS EMR. For this, I used Cloudwatch to check heartbeat of each task instance groups instances. But I think, at the end EMR is a framework that uses hadoop, and hadoop's master must have…

java hadoop mapreduce amazon-emr elastic-map-reduce

asked Aug 11 '16 at 18:52

devsda

4,112
9
50
87

votes

1 answer

Hadoop Access Control Exception: Permissions

Job setup failed : org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE,…

hadoop mapreduce hdfs hadoop-yarn elastic-map-reduce

asked May 17 '16 at 13:57

rufiobangerang

votes

1 answer

Is it possible to access the underlying org.apache.hadoop.mapreduce.Job from a Scalding job?

In my Scalding job, I have code like this: import org.apache.hadoop.mapreduce.lib.input.FileInputFormat class MyJob(args: Args) extends Job(args) { FileInputFormat.setInputPathFilter(???, classOf[MyFilter]) // ... rest of job ... } class…

hadoop amazon-web-services elastic-map-reduce cascading scalding

asked May 05 '16 at 10:36

fblundun

votes

1 answer

How can I make my Scalding job operate recursively on its input bucket?

I have a Scalding job which runs on EMR. It runs on an S3 bucket containing several files. The source looks like this: MultipleTextLineFiles("s3://path/to/input/").read /* ... some data processing ... */ .write(Tsv("s3://paths/to/output/)) I…

hadoop amazon-web-services elastic-map-reduce cascading scalding

asked May 04 '16 at 14:28

fblundun

votes

0 answers

Can a Scalding source select a subset of the files in an S3 bucket to process?

I have a Scalding job which operates on all the files in a particular timestamped S3 bucket. It looks like this: JsonLine("s3://path/to/timestampedbuckets/2016-02-03/", ('key1, 'key2)).read I want to alter the job to operate on the files in several…

hadoop amazon-web-services elastic-map-reduce cascading scalding

asked May 03 '16 at 16:22

fblundun

votes

2 answers

Project file name field using '-tagFile' option, LOAD USING PigStorage '-tagFile', Pig 0.14

Amazon EMR-4.5, Hadoop 2.7.2, Pig 0.14 I would like to project the file name field and selected fields to a new relation after loading using the -tagFile option. The results do not seem to make sense. Examples: tagfile-test.txt (tab-delimited) AAA …

hadoop hive apache-pig emr elastic-map-reduce

asked Apr 12 '16 at 19:21

chillvibes

votes

0 answers

Getting error when invoking elasticSearch from spark

I have a use case, where I need to read messages from kafka and for each message, extract data and invoke elasticsearch Index. The response will be further used to do further processing. I am getting below error when invoking…

java apache-spark spark-streaming elastic-map-reduce

asked Apr 10 '16 at 07:23

ash200

Prev 1 2 3

…

30 31 Next