Questions tagged [mapr]

MapR is a commercial data platform that offers a HDFS compatible distributed file system, a database that allows to store data in BigTable or JSON and a streaming platform for messaging. MapR leverages APIs from open source tools such as Hadoop, Kafka, HBase and provides a proprietary implementation written in C optimised for improved performance.

MapR is a complete enterprise-grade distribution for Apache Hadoop. The MapR Converged Data Platform has been engineered to improve Hadoop’s reliability, performance, and ease of use.

The MapR distribution provides a full Hadoop stack that includes the MapR File System (MapR-FS), the MapR-DB NoSQL database management system, MapR Streams, the MapR Control System (MCS) user interface, and a full family of Hadoop ecosystem projects. You can use MapR with Apache Hadoop, HDFS, and MapReduce APIs.

MapR supports the Hadoop 2.x architecture and YARN (Yet Another Resource Negotiator). Hadoop 2.x and YARN make up a resource management and scheduling framework that distributes resource management and job management duties.

enter image description here

There are three MapR editions.

  • MapR Community Edition (formerly M3)
    • Free community edition.
  • MapR Enterprise Edition (formerly M5)
    • Adds high availability and data protection, including multi-node NFS.
  • MapR Enterprise Database Edition (formerly M7)
    • Adds structured table data natively in the storage layer and provides a flexible NoSQL database.

MapR can be installed on many versions of Red Hat Enterprise linux, CentOS, Ubuntu, Oracle Linux, and SUSE. A full matrix of supported Linux operating systems can be found here.

To install MapR the following requirements are needed.

  • A 64-bit CPU.
  • One of the above mentioned operating systems. (Red Hat Enterprise linux, CentOS, Ubuntu, Oracle Linux, or SUSE)
  • A minimum of 8GB of RAM.
  • At least one single unformatted disk.
  • A Resolvable hostname.
  • A common user on each server you wish to install MapR on.
  • Java 1.7.0 or higher.
  • Other
    • NTP, Syslog, PAM



Try MapR

Download the MapR Sandbox for VMware or Virtualbox for free.

OR

Install MapR on your own. Check to see if the installer is supported for your OS

You will have to meet the prerequisites for a successful installation

Get the mapr-setup sctipt from the MapR repository.

wget http://package.mapr.com/releases/installer/mapr-setup.sh

Run the mapr-setup script to start the installation.

bash ./mapr-setup.sh -y

Open the web UI with the following URL

https://<Installer node hostname/IPaddress>:9443

Following the prompts and you will be on your way to installing MapR.

There is also manual installation available. Full instructions can be viewed here.

Extensive documentation can be found on MapR's documentation site. http://maprdocs.mapr.com/home/



The Stackoverflow tag [mapr] can be used for questions about issues you have with the MapR platform.

381 questions
1
vote
0 answers

Dynamic output path for partitioned parquet files in Spark

We're using MapR FS with rolling volumes and there's a necessity to align partitioned output parquet files with corresponding volumes. df .write .partitionBy("year", "month", "day", "hour") …
ChernikovP
  • 471
  • 1
  • 8
  • 18
1
vote
0 answers

Extract TDE file from Tableau server fails under MapR

I want to extract a TDE file via Java new Extract(fileName) but I get the following error message: Caused by: com.tableausoftware.TableauException: server did not call us back at com.tableausoftware.extract.Extract.(Unknown Source) I read…
mbauhardt
  • 23
  • 3
1
vote
2 answers

Create temporary SparkSession with enableHiveSupport

I am working on connecting to data in Hadoop that allows dynamic data type connections. I need to be able to connect to Hive Thrift Server A, pull in some data, and then connect to Hive Thrift Server B and pull in more data. To my understanding…
Ryan
  • 86
  • 8
1
vote
1 answer

Java + Spark - temp folder not getting cleaned

We are using Spark + Java in our project, and the Hadoop distribution being used is MapR. In our Spark jobs we persist data (at disk level). After the job completes, there is lot of temp data inside the /tmp/ folder. How can we ensure that /tmp/…
Anuj Mehra
  • 320
  • 3
  • 19
1
vote
1 answer

Spark dataframe insertinto hive table fails since some of the staging part files created with username mapr

I am using Spark dataframe to insert into a hive table. Even though the application is being submitted using the username 'myuser', some of the hive staging part files gets created with username 'mapr'. So the final write into the hive table fails…
Shasankar
  • 672
  • 6
  • 16
1
vote
2 answers

multiple column in "IN" clause with Hive

does hive support query with multiple column in "IN" clause like below ? select * from address where (se10,ctry_nm) IN (44444444,"USA"); I am getting below error with this query - at…
Rup
  • 283
  • 1
  • 4
  • 7
1
vote
1 answer

Spark Application Not reading log4j.properties present in Jar

I am using MapR5.2 - Spark version 2.1.0 And i am running my spark app jar in Yarn CLuster mode. I have tried all the available options that i found But unable to succeed. This is our Production environment. But i need that for my particular spark…
AJm
  • 993
  • 2
  • 20
  • 39
1
vote
2 answers

Which node to edit hadoop .xml files on?

When editing hadoop .xml config files (eg. hdfs-site.xml), which node of the hadoop cluster should be the one used to edit the files? Ie. with a cluster of many nodes, all of them having a hadoop folder containing .xml and .properties files, which…
lampShadesDrifter
  • 3,925
  • 8
  • 40
  • 102
1
vote
0 answers

Spark Executor Custom Logs

I've been supplying custom log4j properties to spark-submit in below manner: spark-submit --master yarn --queue qqqq \ --driver-java-options "-Dlog4j.configuration=file:/absolute path/to properties file/driver-log4j.properties" \ --conf…
user123
  • 281
  • 1
  • 3
  • 16
1
vote
0 answers

Stream data to Apache Phoenix using flume

When I am trying to stream data to Phoenix using flume I am getting the following error ERROR client.ZooKeeperSaslClient: Exception while trying to create SASL client java.security.PrivilegedActionException: javax.security.sasl.SaslException:…
1
vote
0 answers

what is difference between Mapr nfs and HDFS nfs?

What is difference between Mapr nfs and HDFS nfs. My understanding is as following- Mapr nfs is read/write but HDFS nfs is read only. Mapr nfs don't use any intermediate file system but HDFS nfs stores file in an intermediate file system(ex-…
bittu
  • 88
  • 6
1
vote
1 answer

pyspark split load uniformly across all executors

I have a 5 node cluster.I am loading a 100k csv file to a dataframe using pyspark and performing some etl operations and writing the output to a parquet file. When I load the data frame how can divide the dataset uniformly across all executors os…
1
vote
1 answer

Unable to start Hive CLI Hadoop(MapR)

I am trying to access hive CLI. However, it is failing to start with the following AccessControl issue. Strangly enough, I am able to query hive data from Hue without the AccessControl issue. However, hive CLI is not working. I am on a MapR cluster.…
user2159301
  • 119
  • 1
  • 8
1
vote
0 answers

mapr stream api causing java fatal crash

platform : MapR 5.2 on sandbox JAVA FATAL CRASH WHEN trying to write using producer { public static void configureProducer() { Properties props = new Properties(); props.put("acks", "all"); props.put("retries", 0); …
1
vote
1 answer

How can I identify the Input Formats in MapReduce Program

I just started learning Hadoop and there are various formats of input types. I have few programs to study and my main question is how can I identify if the input format is TextInputFormat or KeyValueTextInputFormat or any other. Your help is really…
Harsh
  • 109
  • 1
  • 2
  • 12