Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

261 questions
0
votes
2 answers

IP Port same for source and destination

Yesterday, while debugging in a hadoop cluster i noticed something strange # netstat -taupen | grep 54310 tcp 0 0 10.0.12.209:54310 10.0.12.209:54310 TIME_WAIT You can notice that source ip:port is same as destination ip:port. How is it…
pradeepchhetri
  • 2,698
  • 6
  • 37
  • 47
0
votes
1 answer

Using same password for kerberos and openldap

We have a working structure for our hadop where openldap was used for authentication with below structure,along with ranger and knox. openldap root:- dn: dc=abchadoop,dc=com,dc=za Subtree inside openldap like below:- dn:…
anwaar_hell
  • 101
  • 2
0
votes
1 answer

Bare metal to Big Data: Can all of these operate together on the same cluster?

I am a VERY new sysadmin (Class of '16) and I've been asked to create a big data cluster with 3 bare metal PowerEdge Servers. I have the following request to be put on the cluster: *Hadoop2 *YARN *Java 7&8 *Spark *SBT *Maven *Scala *P7zip *Pig *Hive…
Beth L
  • 3
  • 1
0
votes
0 answers

Using OpenLDAP as back end for Kerberos

We want to integrate our existing security setup (Apache Knox, OpenLDAP, Apache Ranger) with Kerberos. So what I understood through some blogs that we can use OpenLDAP as a back-end for the Kerberos database. But facing some issues and confusion…
anwaar_hell
  • 101
  • 2
0
votes
1 answer

dose setting Jumbo frame on linux redhat influencing OS performance?

We set the MTU to 9000 on all our linux machines ( we have redhat machines version - 7.3 ) , linux machines are part of hadoop clusters we want to know if set the MTU to 9000 can be negative on OS performance? Dose set other Jumbo frame value as…
shalom
  • 461
  • 13
  • 29
0
votes
1 answer

how to determine yarn.scheduler.maximum-allocation-vcores value in ambari cluster

we have ambari cluster ( version 2.6 ) with 3 workers machine , and each worker machine have 16 CPU CORE ( see pic down ) , while each machine have 32G memory according to: yarn.nodemanager.resource.cpu-vcores: Set to the appropriate number in…
shalom
  • 461
  • 13
  • 29
0
votes
1 answer

why ambari agent insist to create another repository file

we are installing the new hadoop version - 2.6.3.0 on ambari - 2.6.0 from ambari agent log we see the follwing: Writing File['/etc/yum.repos.d/ambari-hdp-51.repo'] because contents don't match why ambari create the file - ambari-hdp-51.repo , ? is…
shalom
  • 461
  • 13
  • 29
0
votes
0 answers

Datanode machines disks size

is it important that ( workers ) datanode machines disks will be with the same size? for example we have ambari cluster with 3 workers machines ( datanode machines ) each datanode machine have 10 disks ( 7 disk with 50G and the 3 disks with 48G…
shalom
  • 461
  • 13
  • 29
0
votes
1 answer

what is the safe and best way to delete the kafka topic folders

on all our kafka machines ( production machines ) , we see that: ( no free space ) df -h /var/kafka Filesystem Size Used Avail Use% Mounted on /dev/sdb 11T 11T 2.3M 100% /var/kafka and under /var/kafka/kafka-logs we see all topic…
jango
  • 59
  • 2
  • 3
  • 12
0
votes
1 answer

grafana + graphite + failed on test connection

first I'm new to Grafana so excuse me if i don't use the right words to describe my issue I installed Grafana + Graphite solution in my ambari cluster ( I am using ambari cluster with graphite service ) graphite service is working fine on the…
shalom
  • 461
  • 13
  • 29
0
votes
1 answer

what is effected when running - hadoop namenode -format

we have amabri cluster ( version 2.6 ) with 24 workers machines we want to run following commands only on worker23 machine ( because problem on worker23 ) , dose these commands effected on all FileSystem of all the workers? or only on worker23 ? $…
jango
  • 59
  • 2
  • 3
  • 12
0
votes
1 answer

calc md5 checksum for all files in hadoop directory

iam using hadoop apache 2.7.1 on centos and iam new to centos if i want to calc md5 checksum for specific file in hadoop i can issue the following command hdfs dfs -cat /hadoophome/myfile | md5sum but how if i want to calc md5 checksum for all…
oula alshiekh
  • 103
  • 1
  • 2
  • 6
0
votes
1 answer

How to reconfigure Ambari services values with blueprint.json file

we have many Ambari LAB clusters - Apache Ambari Version 2.5.0.3 , while ambari agent installed on Linux redhat machines my target is to find a way to update the values of services , on all the Ambari clusters , by automate the process what we do…
shalom
  • 461
  • 13
  • 29
0
votes
1 answer

Hadoop Cluster Hardware: Few large vs many small

we are looking for some help deciding what hardware to buy to support an internal Hadoop cluster. My company currently uses 1 dedicated server for Hadoop which has 196GB ram and 24cores and 6 1TB SATA hard drives. We are wanting to scale up our…
TWith2Sugars
  • 113
  • 1
  • 5
0
votes
1 answer

Apache Ambari and how to check Ambari services status

How to verify service statuses via Ambari? For example, I want to check in Ambari if the HDFS service is stopped or has been STARTED. Until now I use the following syntax in order to check the service…
jango
  • 59
  • 2
  • 3
  • 12