Questions tagged [amazon-emr]

Amazon Elastic MapReduce (Amazon EMR) is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

3368 questions

vote

1 answer

Why can't I change "spark.driver.memory" value in AWS Elastic Map Reduce?

I want to tune my spark cluster on AWS EMR and I couldn't change the default value of spark.driver.memory which leads every spark application to crash as my dataset is big. I tried editing the spark-defaults.conf file manually on the master…

asked Apr 11 '19 at 15:43

yassidhbi

vote

0 answers

Unable to load S3 parquet with postgresql driver in spark-shell

I am trying to load parquet file from in EMR spark-shell. Command: // to start spark spark-shell --driver-class-path postgresql-42.2.5.jar --jars postgresql-42.2.5.jar // to read…

amazon-web-services apache-spark amazon-s3 amazon-emr

asked Apr 11 '19 at 10:16

bob

4,595
2
25
35

vote

1 answer

Access cross region s3 endpoint through private subnet

I have an EMR which is spinning up in eu-west-1 private subnet. I have defined a gateway endpoint for S3 in the route table. I have to access this public bucket/location exposed by AWS:…

amazon-s3 amazon-emr amazon-vpc vpc private-subnet

asked Apr 10 '19 at 08:05

ishan3243

1,870
4
30
49

vote

1 answer

Proper way to check if a folder exists in AWS S3 from AWS EMR?

Before calling this a duplicate, please read my question. I have found two methods of checking if a folder exists in S3 from EMR but I wonder which one is correct. To get the credentials of the EMR (eg. from a Spark application) machine to access…

amazon-web-services apache-spark amazon-s3 amazon-emr

asked Apr 09 '19 at 16:01

belka

1,480
1
18
31

vote

0 answers

Python modules not on worker nodes for AWS-EMR

I am doing a ML project on AWS EMR clusters and use a bootstrap to setup my environment. I am running into a very common problem where my modules (in this case .py file I built) are not installed on my worker nodes. My workflow is to code in a .py…

amazon-web-services amazon-emr

asked Apr 08 '19 at 13:52

J Doe

vote

1 answer

TEZ mapper resource request

We recently migrated from MapReduce to TEZ for executing Hive queries on EMR. We are seeing cases where for the exact hive query launches very different number of mappers. See Map 3 phase below. On the first run it requested for 305 resources and on…

hive amazon-emr apache-tez

asked Apr 05 '19 at 02:22

kvb

vote

0 answers

Where is stored information from YARN applications AWS EMR (Application history)?

Context I run spark applications on an Amazon EMR cluster. These applications are orchestrated by Yarn. I didn't define yarn.nodemanager.log-dirs, spark.yarn.historyServer.address or other configurations. In Application history tab there is…

amazon-web-services apache-spark amazon-emr

asked Apr 04 '19 at 13:13

Tan4ek

vote

1 answer

AWS EMR dependencies

I am trying to translate the Java code in "End-to-End Amazon EMR Java Source Code Sample" to Scala. I am using SBT for dependency management. Here are my current relevant dependencies in build.sbt: //…

scala amazon-emr

asked Apr 03 '19 at 15:53

Paul Reiners

8,576
33
117
202

vote

1 answer

Problem in executing a shell script present on host using docker exec

I'm trying to execute a script on the master node of AWS EMR cluster. The intention is to create a new conda env and link it to jupyter. I'm following this doc from AWS. Problem is, whatever be the content of the script, I'm getting the same error:…

bash docker amazon-emr

asked Apr 03 '19 at 11:36

Bitswazsky

4,242
3
29
58

vote

1 answer

Adding S3 sync step in EMR

After performing all the steps, I want to execute the last step to copy S3 data to another bucket. I didn't find any supported script for running shell commands https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-commandrunner.html s3-dist-cp is…

amazon-web-services amazon-s3 amazon-emr

asked Apr 03 '19 at 09:12

Dev

13,492
19
81
174

vote

1 answer

With statements inside an Insert statement HIVE EMR AWS

Hive does not recognize my WITH statement inside of an INSERT command. How can I make hive understand this? I've created the external hive tables to store all of the data referenced in this query. That all executes fine and the data is available.…

sql amazon-web-services hive hdfs amazon-emr

asked Apr 02 '19 at 19:01

Fish357

vote

2 answers

get ip of emr master node from yarn cli

In order to get a list of the ip addresses of emr slave nodes, one must run the following code: yarn node -list 2>/dev/null \ | sed -n "s/^\(ip[^:]*\):.*/\1/p" yarn node -list happens to print off the ip of the master node to stderr: 19/04/02…

bash amazon-web-services sed hadoop-yarn amazon-emr

asked Apr 02 '19 at 19:01

Walrus the Cat

2,314
5
35
64

vote

1 answer

Call multiple spark jobs within single EMR cluster

I want to call multiple spark jobs using spark-submit within single EMR cluster. Does EMR supports this? How to achieve this? I use AWS Lambda to invoke EMR job for my spark job at this point of time but we would like to extend to multiple spark…

apache-spark aws-lambda amazon-emr

asked Mar 31 '19 at 04:46

Ankur Shrivastava

vote

3 answers

Create A record in CloudFormation for EMR master node private IP address

I would like to know if there is a way to declare a AWS::Route53::RecordSet in a CloudFormation config that points to the private IP address of the master node on a EMR cluster that is also defined in the same configuration? The CloudFormation…

amazon-web-services aws-cloudformation amazon-emr amazon-route53

asked Mar 27 '19 at 20:07

James Wierzba

16,176
14
79
120

vote

1 answer

Sqoop Import Error "Could not load db driver class" with Amazon EMR Service

I have created a EMR cluster with hadoop,Sqoop and Spark configuration. I am trying Sqoop Import but getting error "Could not load db driver class: com.mysql.jdbc.Driver" . My question is which location do we put the Mysql Driver ? I have…

hadoop sqoop amazon-emr

asked Mar 26 '19 at 12:01

Rahul Goyal

Prev 1 2 3

…

99 100 Next