For questions regarding the Hadoop distributed file system (HDFS) which is part of the Apache Hadoop project.
Questions tagged [hdfs]
71 questions
1
vote
1 answer
What version of HDFS is compatible with HBase stable?
HBase stable is currently hbase-0.90.4, what version(s) of HDFS is it compatible with?

Aleksandr Levchuk
- 2,465
- 3
- 22
- 41
1
vote
1 answer
Processing pre-existing log files with Flume
I have a large set of log files that I need to extract data from. Is it possible to use Flume to read these files and dump them into an HDFS (Cassandra, or another data source) which I can then query?
The documentation seems to suggest it's all…

duckus
- 11
- 2
1
vote
0 answers
HDP cluster + journal nodes get out of Sync
we have HDP cluster version 2.6.5
when we look on name-node logs we can see the following warning
2023-02-20 15:56:37,731 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file…

King David
- 549
- 6
- 20
1
vote
0 answers
HDFS + how to disable the "du -sk" verifcation on data node disks
We are using HDP cluster with 182 data node machines:
HDP version - 2.6.4
Ambari version 2.6.1
We note the following behavior on the data nodes machines (its happens on all data-node machines and on all disks).
When we perform the command as above…

King David
- 549
- 6
- 20
0
votes
1 answer
AWS FSx for lustre with S3 vs EMR (with EMRFS) for spark jobs
We are currently using EMR for easy job submission for our spark jobs.
Recently I came across the "FSx lustre + S3" solution that is being advertised as ideal for HPC situations.
EMRFS however is also said to be optimized for this particular…

dimisjim
- 235
- 2
- 11
0
votes
1 answer
is it possible mix different RHEL OS version in hadoop cluster?
we are using the following HDP cluster with ambari ,
list of nodes and their RHEL version
3 masters machines ( with namenode & resource manager ) , installed on RHEL 7.2
312 DATA-NODES machines , installed on RHEL 7.2
5 kafka machines , installed…

shalom
- 461
- 13
- 29
0
votes
1 answer
HDFS block deletion speed - cause, expectance, tuning?
I have a small (testing) HDFS cluster which I use as snapshot backup space for Flink. Flink creates and deletes roughly 1000 (small) files per second. The namenode seems to handle this without problems at first, but over time the Number of Blocks…

Caesar
- 111
- 5
0
votes
0 answers
Any benefits of ZFS over EXT4 for data stream processing on top of HDFS?
I'm working on a data stream processing project in which i will be using Apache Flink and Apache Spark and I want to use HDFS for storage. The development and testing will be done on a single node cluster with multiple physical disks.
I have already…

HUSMEN
- 1
- 2
0
votes
1 answer
HDFS balancing , how to balanced hdfs data?
we have Hadoop version - 2.6.4
On the datanode machine we can see that hdfs data isn’t balanced
On some disks we have different used size as sdb 11G and sdd 17G
/dev/sdd 20G 3.0G 17G 15% /grid/sdd
/dev/sdb 20G 11G 9.3G 53% /grid/sdb <-- WHY…

shalom
- 461
- 13
- 29
0
votes
0 answers
Datanode machines disks size
is it important that ( workers ) datanode machines disks will be with the same size?
for example
we have ambari cluster with 3 workers machines ( datanode machines )
each datanode machine have 10 disks ( 7 disk with 50G and the 3 disks with 48G…

shalom
- 461
- 13
- 29
0
votes
1 answer
what is effected when running - hadoop namenode -format
we have amabri cluster ( version 2.6 ) with 24 workers machines
we want to run following commands only on worker23 machine ( because problem on worker23 ) , dose these commands effected on all FileSystem of all the workers? or only on worker23 ?
$…

jango
- 59
- 2
- 3
- 12
0
votes
1 answer
copying files in hdfs stalls
Have a 35 node cluster with a high number of blocks in it: ≈450K blocks per data node.
After configuration change (which contained rack reassignments and NameNode Xmx increase) HDFS became a problem. It's unable to perform copy operations on random…

inteloid
- 101
- 2
0
votes
1 answer
how to install hadoop2.4.1 in windows with spark 2.0.0
i want to setup a cluster using hadoop in yarn mode..i want to use spark API for map-reduce and will use spark submit to deploy my applications..i want to work on cluster..can anyone help me how to install HADOOP in cluster using windows
0
votes
1 answer
Why does DFSZKFailoverController kills Namenode process in hadoop?
I try to configure hadoop high availability cluster by following this tutorial:
http://www.edureka.co/blog/how-to-set-up-hadoop-cluster-with-hdfs-high-availability/
When I follow that article I faces with two main problems:
1. hdfs namenode…

Oleksandr
- 733
- 2
- 10
- 17
0
votes
1 answer
Flume- Error Log while using FileChannel
I am using Flume flume-ng-1.5.0 ( with CDH 5.4) to collect logs from many Servers and Sink to HDFS
Here is my configuration :
#Define Source , Sinks, Channel
collector.sources = avro
collector.sinks = HadoopOut
collector.channels = fileChannel
#…

Summer Nguyen
- 214
- 3
- 10