Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

Hadoop distributed filesystem ( standard )
The map-reduce architecture ( standard )
Hive, which provides a SQL like interface to the M/R arch
Hbase, a distributed key-value service

Recommended reference sources:

Hive Language Reference

261 questions

vote

1 answer

Redirecting Ambari-Server backup file creation to a different location

I am taking backup of my Ambari server using the command ambari-server backup This creates the backup file in the location /var/lib/ambari-server/ I want the backup to go to a different location, and I am not finding the way to do it. The help…

backup hadoop

asked Jan 13 '18 at 13:27

Gautam Somani

vote

0 answers

dose ambari cluster needs ssh access between ambari-server machine to all other host

We installed ambari cluster with 3 masters machines While ambari server installed on master02 linux machine ambari cluster also include 25 DataNodes machines and 5 kafka's machines dose ambari-server needs ssh access to all other machines in the…

linux ssh hadoop

asked Dec 21 '17 at 16:14

shalom

vote

0 answers

CDH Community Edition Upgrade from 5.7 to 5.13 without Cloudera Express or Cloudera Enterprise

I am having a cluster like below - 3 Hbase Master (1 Active & 2 standby) 4 Region Servers 4 Data Nodes 1 Primary & 1 Secondary Name Node 3 Journal Node 4 Nodemanager 3 Resource Manager (1 Active & 2 standby) Query 1 What should be the order of…

hadoop hbase cloudera

asked Dec 06 '17 at 09:41

tuk

vote

1 answer

kafka + how to revert topic deletion

just in case we delete the wrong topic as /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper hdpmaster01:2181 --delete --topic gtom.poli.pri.proc Topic gtom.poli.pri.procis already marked for deletion and now we want to revert it what…

linux hadoop kafka

asked Dec 03 '17 at 09:14

jango

vote

0 answers

How to prevent arbitrary executable execution on hadoop cluster

I am involved with configuring a Hadoop cluster for complete auditability and security. I am new to the Hadoop ecosystem, but I have a decent idea of the basics. I have a few concerns for which I hope someone might be able to point me in the right…

hadoop hdfs

asked Nov 15 '17 at 16:50

STN

vote

0 answers

YARN AM logs report different time-stamp from what is shown in terminal and sparkscala shell

I am trying to understand why the following occurred: I have a Docker container with Yarn and Spark running fine except that the timestamp of that container was minus X hours of what I wanted it to be. So when I was running date it was returning a…

java docker hadoop jvm

asked Aug 22 '17 at 23:14

rudimuse

vote

0 answers

Need to set 000 permission to specific hdfs data block through commandline

I am trying to set the “000” permission to the specific block. I used below command to find the block information: su - hdfs -c "hdfs fsck -locations -files -blocks /user/rohit/partition_filter_table/india.25.20.101.95000" Now, I want to set…

hadoop hdfs

asked Oct 19 '16 at 09:22

Rohit Agrawal

vote

1 answer

Hadoop FileAlreadyExistsException: Output directory hdfs://:9000/input already exists

I have Hadoop setup in fully distributed mode with one master and 3 slaves. I am trying to execute a jar file named Tasks.jar which is taking arg[0] as input directory and arg[1] as output directory. In my hadoop environment, I have the input files…

ubuntu hadoop mapreduce

asked Oct 14 '16 at 02:37

Harinarayanan Mohan

vote

0 answers

Why Hadoop TeraSort not using all cluster nodes

Question Regarding the TeraSort demo in hadoop, please suggest if the symptom is as expected or the workload should be distributed. Symptom Started Hadoop (3 nodes in a cluster) and run the TeraSort benchmark as below in Executions. I expected all…

hadoop distribution

asked Sep 26 '16 at 21:11

mon

vote

0 answers

Is it possible to configure hdfs in a federation mode and in an HA mode in the same time?

I don't understand if it possible to configure HDFS in both modes in the same time. Does it make sense? Can somebody show a simple configuration of HDFS in both modes? (nameNode1, nameNode2, nameNodeStandby1, nameNodeStandby2)

high-availability hadoop distributed-filesystems hdfs federated

asked Aug 15 '16 at 16:49

Oleksandr

vote

1 answer

zfs for Hadoop cloud instead of ext4

Right now I have couple of linodes with ext4. I have a hadoop setup. What benefit would I get if I migrate my file system from ext4 to zfs. Will there be any benefit in response times? Any speed optimization while data gets exchanged in local lan…

zfs ext4 hadoop

asked Apr 20 '16 at 06:01

M-BoB

vote

0 answers

Zombie process blocking port when restarting Hadoop (Secondary) Namenode

I'm having weird issues with the Hadoop Namenode and Secondary Namenode. Our HDFS cluster runs smoothly most of the time. But every now and then, either the Primary Namenode freezes (crashing the whole cluster) or the Secondary Namenode freezes and…

ubuntu hadoop hdfs zombie

asked Feb 29 '16 at 16:41

Janek Bevendorff

vote

1 answer

/usr/bin/env: python2.5: No such file or directory

I'm trying to play with Cloudera's Distribution for Hadoop on my EC2 account. For configuring it i'm using THIS tutorial. Everything seems to me fine but when I'm trying to run hadoop-ec2 I'm getting following…

python hadoop

asked Oct 12 '09 at 22:23

Maksim

vote

0 answers

Hadoop Hive, Impala, Pig, and more — SQL access to Hadoop?

It appears that Hive, Impala, Pig, and others all provide SQL or SQL-like access to data stored on Hadoop clusters. They all seem to have support for HDFS, S3, and other forms. So why are there so many different ways for accessing Hadoop…

sql hadoop

asked Oct 31 '15 at 17:27

vy32

2,088
2
17
21

vote

1 answer

hadoop fs commands not giving any output

I am running hadoop 1.2.1 on ubuntu 14.04 LTS in pseudo distributed mode, and the fs commands are not doing anything, i.e. niether prompt is returned nor any error message. what is the problem? Thanks in advance

hadoop

asked Oct 24 '15 at 12:58

ojas mohril

Prev 1 2 3

…

17 18 Next