Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

Synonymous tag : emr

452 questions

vote

1 answer

HBase HFile Corruption on AWS S3

I am running HBase on an EMR cluster (emr-5.7.0) enabled on S3. We are using 'ImportTsv' and 'CompleteBulkLoad' utilities for importing the data into HBase. During our process, we have observed that intermittently there were failures stating that…

hadoop amazon-s3 mapreduce hbase elastic-map-reduce

asked Dec 27 '17 at 21:41

Sridher

vote

1 answer

Clear data from HDFS on AWS EMR in Hadoop 1.0.3

For various reasons I'm running some jobs on EMR with AMI 2.4.11/Hadoop 1.0.3. I'm trying to run a cleanup of HDFS after my jobs by adding an additional EMR step. Using boto: step = JarStep( 'HDFS cleanup', 'command-runner.jar', …

amazon-web-services hadoop hdfs elastic-map-reduce

asked Aug 07 '17 at 14:07

Chet

21,375
10
40
58

vote

2 answers

When can we init resources for a hadoop Mapper?

I have a small sqlite database (post code -> US city name) and I have a big S3 file of users. I would like to map every user to the city name associated to their postcode. I follow the famous WordCount.java example but Im not sure how mapReduce…

hadoop elastic-map-reduce

asked Apr 24 '17 at 09:18

Thomas

8,306
8
53
92

vote

0 answers

Run map reduce program in my eclipse but it is always do spilling

I have written a MapReduce program. At first it was running fine, but after a while, I changed something then suddenly my computer said my computer have no memory. Then I realize the job I have run used lots of memory and I don't know why. And …

hadoop mapreduce elastic-map-reduce

asked Apr 21 '17 at 14:38

JEUDominic

vote

1 answer

Amazon EMR MapReduce progress rollback?

Hi I just came up with a strange task: I run a java-MapReduce jobs with EMR. The data was about 1T and I used 1 master + 8 slaves. All of the instances are r2.2xlarge. Initially, everything looks fine like below: INFO mapreduce.Job: map 0% reduce…

amazon-web-services hadoop mapreduce amazon-emr elastic-map-reduce

asked Apr 13 '17 at 17:00

rz.He

vote

1 answer

Elasticsearch master slave cofiguration

How to configure elasticsearch in master node and data node?What is the difference between both type of elasticsearch cluster ?How we get beneficial in elasticsearch with hadoop?

apache-spark elasticsearch elasticsearch-plugin elastic-map-reduce

asked Mar 30 '17 at 18:53

xyz_scala

vote

2 answers

on EMR Spark, JDBC load fails first time, then works

I'm using spark-shell with Spark 2.1.0 in AWS Elastic Map Reduce 5.3.1 to load data from a Postgres database. loader.load always fails and then succeeds. Why would this happen? [hadoop@[SNIP] ~]$ SPARK_PRINT_LAUNCH_COMMAND=1 spark-shell…

hadoop apache-spark apache-spark-sql emr elastic-map-reduce

asked Feb 28 '17 at 17:29

rcrogers

2,281
1
17
14

vote

1 answer

Can I force YARN to use the master node for the Application Master container?

My big ol' master node hardware is doing practically nothing during my Hadoop/Spark runs because YARN uses a random slave node for its AM on each task. I like the old Hadoop 1 way better; lots of log chasing and ssh pain was avoided that way when…

hadoop apache-spark hadoop-yarn elastic-map-reduce

asked Jan 27 '17 at 23:08

Judge Mental

5,209
17
22

vote

1 answer

In AWS EMR how do I log the classpath to debug classloader issues

I am in Classloader hell - Hadoop (up to 2.7.2) uses an out-dated version of HttpClient (4.2.5) https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/dependency-analysis.html This is clashing with the version of…

java amazon-web-services hadoop elastic-map-reduce

asked Sep 14 '16 at 15:25

kellyfj

6,586
12
45
66

vote

1 answer

How to create elastic search template with respect to dynamic index type

I am trying to create the Elastic search dynamic template with respect to index type (by date, index will be created by date pertition) My sample index URL will be…

elasticsearch elastic-map-reduce

asked Sep 09 '16 at 11:32

Ratan Kumar Nath

vote

1 answer

.persist() line sometimes leads to Java Out of Heap Space error

As far as I know, when you use .persist(), writing the line persist sets only the persistence level, and then the next action in the script will cause the actual persistence work to be invoked. However, sometimes, seemingly depending on the…

python apache-spark pyspark elastic-map-reduce

asked Aug 30 '16 at 18:06

Kristian

21,204
19
101
176

vote

2 answers

How to manually make an AWS EMR step fail

I came across a problem and thought of a question I did not find a good answer to. And that is, how can I purposely make an AWS EMR step fail? I have a Spark Scala script which is added as a Spark step with some command line arguments and the output…

scala aws-sdk emr amazon-emr elastic-map-reduce

asked Aug 23 '16 at 11:27

V. Samma

2,558
8
30
34

vote

1 answer

Amazon S3 Error Code: 400 while running mr-job on EMR

Got this error running a custom jar on EMR. Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID:…

amazon-web-services hadoop mapreduce elastic-map-reduce

asked Jul 20 '16 at 12:11

Sulabh Kumar

vote

2 answers

Mapping a range of warc.gz files, EMR

I have been running a streaming step in AWS/EMR with a mapper and reducer written in Python to map some of the archives in Common Crawl for sentiment analysis. I am moving from the older common crawl textData format to the newer warc.gz format and…

python hadoop elastic-map-reduce

asked Jul 07 '16 at 15:51

DataGuy

1,695
4
22
38

vote

0 answers

--jars from different locations causes different jdbc behavior

When I load a MySQL JDBC driver by first copying it to the driver, and then including it via --jars /path/to/jdbc/driver.jar, then referencing that jdbc driver and loading data into a dataframe succeeds. $ pyspark --jars /path/to/jdbc/driver.jar >>>…

apache-spark pyspark elastic-map-reduce

asked Jun 16 '16 at 03:39

Kristian

21,204
19
101
176

Prev 1 2 3

…

30 31 Next