Questions tagged [distributed-filesystem]

Any file system that allows access to files from multiple hosts sharing via a computer network making it possible for multiple users on multiple machines to share files and storage resources.

56 questions
1
vote
1 answer

Is Dropbox considered a Distributed File System?

I was just reading this https://en.wikipedia.org/wiki/Clustered_file_system#Distributed_file_systems The definition of a DFS seems to exactly describe Dropbox to me but it isn't in the list of examples, which of course it would be if it was one I…
user2802557
  • 747
  • 1
  • 9
  • 19
1
vote
0 answers

System calls involved in writing a blocks to datanodes in HDFS

As per my understanding of HDFS, HDFS is a higher level file system that abstracts the local file system with a huge block size (64 MB). When the client wants to write a file to HDFS, depending on the replication factor a pipeline will be…
1
vote
0 answers

How do I tweak GlusterFS performance?

I have 2 dedicated servers with the following specs: - E3 1270V3 CPU - 32GB RAM - 960GB SSD - 1Gbps private ethernet network. Using local drives, dd tests usually in the range of 600MB/s, which is very good. I recently setup a GlusterFS replicated…
1
vote
2 answers

Read a properties file from HDFS

I'm trying to read a Java properties file that is on HDFS like this: try { properties.load(new FileInputStream("hdfs://user/hdfs/my_props.properties")); } catch (IOException e) { throw new RuntimeException("Properties file not…
1
vote
2 answers

Hadoop mapper class not found

I have developed a map-reduce program using Apache Hadoop 1.2.1. I did the initial development using the Eclipse IDE to simulate the hadoop distributed computing environment with all the input and output files coming from my local file system. …
shutch
  • 197
  • 1
  • 1
  • 10
1
vote
1 answer

Distributed File System for storing and retrieving

I require a highly available distributed file system where the documents of various types can be stored and retrieved and it should be able to scale horizontally. What would be the ideal choice for this? What should be the data layers that should be…
ptntialunrlsd
  • 794
  • 8
  • 23
1
vote
2 answers

The splitting logic of HDFS?

what is the significance of the isSplittable() method of FileInputFormat class? http://hadoop.apache.org/docs/r2.2.0/api/index.html
Sugandha
  • 51
  • 4
1
vote
0 answers

HBASE with Distributed File System?

Well its quite clear that HBASE is database that save its file in HDFS. Can HBASE even be integrated with other distributed file system? If yes, then what should be the underlying approaches?For example If I am using Hadoop with Ceph, then can HBASE…
Satyajit
  • 31
  • 7
0
votes
0 answers

Get DFS Namespace Target Folders inside folders with no target recursively

DFS namespace (e.x. "\my.domain.com\NS1") with folders that don't have a target: "\\my.domain.com\NS1\target1\" > Target: "\\myserver.my.domain.com\target1_share\" "\\my.domain.com\NS1\folder\" > No target "\\my.domain.com\NS1\folder\target2\" >…
0
votes
0 answers

Delete all objects in S3 bucket except one which comes last in lexicographical order group by a prefix

Consider an S3 bucket with objects with keys like: abc_1_epoch1.ext abc_1_epoch2.ext abc_2_epoch1.ext xyz_1_epcoh1.ext When i group by keys with prefix, then due to epoch it forms a lexicographical order. I want to delete all objects except the one…
0
votes
1 answer

Getting NERR_DfsNoSuchVolume (Error code:2662) while checking DFS share path

I have an exe (C# application) which runs with a service account and tried to get the DSF link for a file share. Share exists and accessible. i have another similar exe running with the same service account and running fine where as my exe is not…
0
votes
1 answer

Access random line in large file on Google Cloud Storage

I'm trying to read a random line out of a large file stored in a public cloud storage bucket. My understanding is that I can't do this with gsutil and have looked into FUSE but am not sure it will fill my use…
0
votes
1 answer

S3 vs EFS propagation delay for distributed file system?

I'm working on a project that utilizes multiple docker containers which all need to have access to the same files for comparison purposes. What's important is that if a file appears visible to one container, then there is minimal time between when…
0
votes
1 answer

How to obtain an InputStream when opening an IgnitePath (returns HadoopIgfsSecondaryFileSystemPositionedReadable)?

Usually, when working with Hadoop and Flink, opening/reading a file from a distributed file system will return a Source (counterpart of Sink) object extending the java.io.InputStream. However, in Apache Ignite, the IgfsSecondaryFileSystem, and more…
blizzfire
  • 1
  • 2
0
votes
1 answer

What does the GlusterFS server option cluster.readdir-optimize control?

I have been trying to optimise the small file performance of my GlusterFS storage cluster. A number of forum threads and blog posts seem to suggest setting the cluster.readdir-optimize property on the volume, like: $ gluster volume get test-share…
PicoutputCls
  • 1,392
  • 1
  • 12
  • 24