Questions tagged [elastic-map-reduce]

Amazon Elastic MapReduce is a web service that enables the processing of large amounts of data.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

http://aws.amazon.com/elasticmapreduce/

See also

Synonymous tag :

452 questions
0
votes
1 answer

How to enable AWS EMR CloudTrail logging?

We have a team shared AWS account, that sometimes things are hard to debug. Especially, for EMR APIs, throttling happens regularly, that it'll be nice to have CloudTrail logs tell people who is not being nice when using EMR. I think our CloudTrail…
0
votes
1 answer

NullPointerException in ObjectMapper in Spark Cluster Mode on AWS EMR

I am getting nullpointer exception in this line (running spark in cluster mode (yarn) on aws emr) but runs fine in client mode (with master as local) Map json = (Map) mapper.readValue(line, Map.class); This is the…
0
votes
3 answers

DUMP command in PIG not working

I wrote a simple PIG program as follows to analyze a small and a modified version of the google n-grams dataset on AWS. The data looks something like this: I am 1936 942 90 I am 1945 811 5 I am 1951 47 12 very cool 1923 118 10 very cool 1980 320…
thegreatcoder
  • 2,173
  • 3
  • 19
  • 28
0
votes
1 answer

AWS EMR script-runner access error

I'm running emr-5.12.0, with Amazon 2.8.3, Hive 2.3.2, Hue 4.1.0, Livy 0.4.0, Spark 2.2.1 and Zeppelin 0.7.3 on 1 m4.large as my master node and 1 m4.large as core node. I am trying to execute a bootstrap action that configures some parts of the…
0
votes
1 answer

AWS EMR - Hive creating new table in S3 results in AmazonS3Exception: Bad Request

I have a Hive script I'm running in EMR that is creating a partitioned Parquet table in S3 from a ~40GB gzipped CSV file also stored in S3. The script runs fine for about 4 hours but reaches a point (pretty sure when it is just about done creating…
Marty
  • 2,104
  • 2
  • 23
  • 42
0
votes
1 answer

Getting list of EMR Release labels via Amazon API

I need to receive the list of available EMR Release labels in order to run my Java application which starts an EC2 instance and executes a hadoop job. The main problem here that EMR Release labels are specific for each region and I need to get this…
0
votes
1 answer

Parquet Data Ingestion in Druid Error in Timestamp parsing using Joda

Context: I am able to submit a MapReduce job from druid overlord to an EMR. My Data source is in S3 in Parquet format. The timestamp field value is in format "2017-09-01 21:14:11:552 IST". Error is while parsing the timestamp Issue Stack trace is:…
Shiva Achari
  • 955
  • 1
  • 9
  • 18
0
votes
1 answer

How to subtract in Map Reduce paradigm

I have the following dataset s1, s2, count 1, 2, x1 1, 3, x2 1, 4, x3 2, 1, y1 2, 3, y2 2, 4, y3 3, 1, z1 3, 2, z2 I want to get the following output s1, s2, count 1, 2, x1-y1 1, 3, x2-z1 1, 4, x3 2, 3, y2-z2 2, 4, y3 The idea is that s1 is an…
0
votes
1 answer

Query DynamoDB Data with EMR

I am looking for a way to query the AWS DynamoDB data with SQL Syntax using amazon EMR. I have my DynamoDB table set up and ready. How can I import/query the data using Hue? The table in DynamoDB has a size of around 8GB.
Hendrik
  • 4,849
  • 7
  • 46
  • 51
0
votes
1 answer

Multiple Filtering in PySpark

I have imported a data set into Juputer notebook / PySpark to process through EMR, for example: data sample I want to clean up the data before using it using the filter function. This includes: Removing rows that are blank or '0' or NA cost or…
lseactuary
  • 45
  • 1
  • 1
  • 7
0
votes
1 answer

Lower case response from elastic search where as upper case is expected

I am trying to fetch data using elastic search with java using method .addAggregation(terms(term)) The JSON response that I am expecting is { "key" : "TEST" } but I am getting the response as { "key" : "test" } which is in lower case, I…
0
votes
0 answers

Identical code works in pyspark shell but not via spark-submit

So I have a Pyspark project in the following structure: main.py: doing the real stuff (imports pyspark udf's from utils.py and stuff from common.py) utils.py: some utility functions (imports from common.py) common.py: some params Inside a Pyspark…
Rex911
  • 19
  • 4
0
votes
2 answers

Elasticsearch to query across multiple indices and multiple types

I am newbie to elasticsearch .I am using AWS elastic search instance 5.1.1. I have a requirement where I need to specify multiple indices and types in request body of Elasticsearch for search operation ,is it possible ? What is the simplest way to…
SSG
  • 1,265
  • 2
  • 17
  • 29
0
votes
1 answer

AWS Elasticsearch : URL encoding for search across multiple indices and types

I am using AWS elasticsearch and using AWS signature V4 to communicate with the instance. Simple queries to create/search indexes are working fine. But I want to have a functionality where I should be able to search across multiple indices and…
MMT
  • 81
  • 1
  • 3
0
votes
0 answers

I want to perform partial match or exact match in Elastic search

Suppose we have two entries for index "Phones" 1]iphone 6 2]iphone 7 1]if I search for "iphone 6" Exact match will have one record 2]if I search for "iphone" Partial match will have both record So I want to toggle between above methods based on…
MMT
  • 81
  • 1
  • 3