Questions tagged [mapr]

MapR is a commercial data platform that offers a HDFS compatible distributed file system, a database that allows to store data in BigTable or JSON and a streaming platform for messaging. MapR leverages APIs from open source tools such as Hadoop, Kafka, HBase and provides a proprietary implementation written in C optimised for improved performance.

MapR is a complete enterprise-grade distribution for Apache Hadoop. The MapR Converged Data Platform has been engineered to improve Hadoop’s reliability, performance, and ease of use.

The MapR distribution provides a full Hadoop stack that includes the MapR File System (MapR-FS), the MapR-DB NoSQL database management system, MapR Streams, the MapR Control System (MCS) user interface, and a full family of Hadoop ecosystem projects. You can use MapR with Apache Hadoop, HDFS, and MapReduce APIs.

MapR supports the Hadoop 2.x architecture and YARN (Yet Another Resource Negotiator). Hadoop 2.x and YARN make up a resource management and scheduling framework that distributes resource management and job management duties.

enter image description here

There are three MapR editions.

  • MapR Community Edition (formerly M3)
    • Free community edition.
  • MapR Enterprise Edition (formerly M5)
    • Adds high availability and data protection, including multi-node NFS.
  • MapR Enterprise Database Edition (formerly M7)
    • Adds structured table data natively in the storage layer and provides a flexible NoSQL database.

MapR can be installed on many versions of Red Hat Enterprise linux, CentOS, Ubuntu, Oracle Linux, and SUSE. A full matrix of supported Linux operating systems can be found here.

To install MapR the following requirements are needed.

  • A 64-bit CPU.
  • One of the above mentioned operating systems. (Red Hat Enterprise linux, CentOS, Ubuntu, Oracle Linux, or SUSE)
  • A minimum of 8GB of RAM.
  • At least one single unformatted disk.
  • A Resolvable hostname.
  • A common user on each server you wish to install MapR on.
  • Java 1.7.0 or higher.
  • Other
    • NTP, Syslog, PAM



Try MapR

Download the MapR Sandbox for VMware or Virtualbox for free.

OR

Install MapR on your own. Check to see if the installer is supported for your OS

You will have to meet the prerequisites for a successful installation

Get the mapr-setup sctipt from the MapR repository.

wget http://package.mapr.com/releases/installer/mapr-setup.sh

Run the mapr-setup script to start the installation.

bash ./mapr-setup.sh -y

Open the web UI with the following URL

https://<Installer node hostname/IPaddress>:9443

Following the prompts and you will be on your way to installing MapR.

There is also manual installation available. Full instructions can be viewed here.

Extensive documentation can be found on MapR's documentation site. http://maprdocs.mapr.com/home/



The Stackoverflow tag [mapr] can be used for questions about issues you have with the MapR platform.

381 questions
1
vote
0 answers

ls command: how can I get a recursive full-path listing, one line per file, filtering by permissions?

My goal is to be able to identify all of the paths to Streams (files) within a MapR cluster filesystem. Working through the problem I've identified that within a MapR cluster, Streams are stored as links to MapR Tables with read-only…
dijikul
  • 148
  • 4
  • 11
1
vote
0 answers

MapR and Java bug

I am using java, spring boot and MapR 5.2 QueryCondition cond = MapRDB .newCondition() .in("propertyName", searchedStrings) .build(); List docs = jsonStore.query(cond); That query is working fine returing the…
saferJo
  • 497
  • 1
  • 5
  • 16
1
vote
3 answers

Convert org.apache.avro.generic.GenericRecord to org.apache.spark.sql.Row

I have list of org.apache.avro.generic.GenericRecord, avro schemausing this we need to create dataframe with the help of SQLContext API, to create dataframe it needs RDD of org.apache.spark.sql.Row and avro schema. Pre-requisite to create DF is we…
Sagar balai
  • 479
  • 6
  • 13
1
vote
1 answer

Impala scan MapR-FS slow

I recently installed Impala on a 3-node MapR cluster. When I run a simple query.The performance is not as good as Impala + HDFS. Here is the query: SELECT * FROM ft_test, ft_wafer WHERE ft_test_parquet.id = ft_wafer_parquet.id and month = 1 and day…
Jesse
  • 174
  • 12
1
vote
1 answer

tSqoopImport component of Talend Open Studio for BigData(5.6.2) throws error when connecting to MySQL database on MapR cluster

Use case: Need to connect Talend's bigdata component i.e. tSqoopImport to MySQL DB residing on MapR cluster. Talend Open studio for Big-data(5.6.2) resides on my workstation. MySQL (5.5) database installed on 5 node MapR (M3-edition) cluster. …
1
vote
1 answer

Python spark: IndexError: tuple index out of range

I'm working on spark and python. When I call any action on csv file, it gives me IndexError: tuple index out of range here is code snippet. test_rdd = sc.textFile("/mapr/data/airflow-test.csv").map(lambda line:…
Mubin
  • 4,325
  • 5
  • 33
  • 55
1
vote
0 answers

Error connecting Phoenix to Secure HBase on MapR cluster

Whenever I try connecting to secure HBase, I get the following error. started Phoenix using command: ./sqlline.py :5181:/hbase:: Port used is 5181 because it is MapR hadoop. HBase version is 1.1.1 and Phoenix version is 4.8.1. Phoenix worked without…
1
vote
1 answer

Hadoop, hive -> get list of sql being run against the cluster

So we have a group of people hitting our cluster and would like to monitor every SQL statement being run via hive/odbc. The job history server web page will give me part of the SQL but not everything. Is there a way to retrieve the full SQL of…
MikeKulls
  • 873
  • 1
  • 10
  • 22
1
vote
1 answer

Store documents (.pdf, .doc and .txt files) in MaprDB

I need to store documents such as .pdf, .doc and .txt files to MaprDB. I saw one example in Hbase where it stores files in binary and is retrieved as files in Hue, but I not sure how it could be implemented. Any idea how can a document be stored in…
Amu
  • 161
  • 3
  • 12
1
vote
1 answer

Will Drill Leverage Hive indexing

If we index a table in hive, Will drill leverage indexing while querying the hive table with hive plugin in drill. It's cuz we have partitioned table in hive and the analytics query has a partitioned and non-partitioned column in where clause, so we…
Ragzz
  • 103
  • 1
  • 7
1
vote
1 answer

(, ValueError('need more than 1 value to unpack',), ) NULL NULL NULL

I have HIVE table like this in MapR. data was separated with commas at back end. I am trying to use custom map reduce in using python. Here is the python code. import sys import datetime try: for line in sys.stdin: line = line.strip() …
subro
  • 1,167
  • 4
  • 20
  • 32
1
vote
1 answer

Why ExceptionInInitializerError when submitting Spark application in YARN cluster mode?

I am using spark "Spark 1.6.1-mapr-1604 " version. My job in local mode executes successfully but when I launch same job in yarn cluster mode it throws ExceptionInInitializerError. Local mode command: spark-submit --class…
Rahul Sharma
  • 5,614
  • 10
  • 57
  • 91
1
vote
1 answer

Map transformation performance spark dataframe vs RDD

I have a four node hadoop cluster(mapr) with 40GB memory each. I need to 'apply' a function on one of the fields of the big dataset (500million rows). The flow of my code is that I read the data from hive table as a spark dataframe and apply the…
Mike
  • 197
  • 1
  • 2
  • 15
1
vote
1 answer

HDFS directory on a MAPR cluster

I need to save my Spark Streaming checkpoint files on a HDFS directory. I can access a remote cluster which has MAPR installed on it. But, I am not sure on which path MAPR denoting to a HDFS directory is it opt/mapr/..?
Mahdi
  • 787
  • 1
  • 8
  • 33
1
vote
2 answers

Hive on Spark in Mapr Distribution

Currently we are working on Hive, which by default uses map reduce as processing framework in our MapR cluster. Now we want to change from map reduce to spark for better performance. As per my understanding we need to set…
Puneeth Kumar
  • 171
  • 3
  • 15