Questions tagged [alluxio]

Alluxio is an open source memory-centric distributed file system written in Java. It acts as an in-memory data caching layer between applications and data storage systems. The software is published under the Apache License.

Alluxio (formerly Tachyon) is an open source memory-speed distributed file system. It is a data layer between compute and storage, abstracting the files or objects in underlying persistent storage systems and providing a shared data access layer for compute applications. Alluxio was developed in University of California, Berkeley AMPLab.

Alluxio can be used as a distributed shared caching service for big data analytics like , , etc, so that compute applications talking to Alluxio can transparently cache frequently accessed data, especially data from remote locations, to provide in-memory I/O throughput

Alluxio can also simplify cloud and object storage adoption: Cloud and object storage systems use different semantics that have performance implications compared to traditional file systems. For example, when accessing data in cloud storage there is no node-level locality or cross-application caching. There are also different performance characteristics in common file system operations like directory listing (‘ls’) and ‘rename’, which often add significant overhead to analytics. Deploying Alluixo with cloud or object storage can close the semantics gap and achieve significant performance gains.

Alluxio is written in and hosted on github.

The latest stable version:

Recommended reference sources:

90 questions
0
votes
1 answer

Is it normal for Alluxio master to have verbose output dynamically

I'm using Alluxio 2.0 to accelerate compute layer's performance. When no query is performing, I found there are about verbose netty output appendding to $Alluxio_home/logs/master.log. 2019-11-25 10:26:32,141 DEBUG NettyServerHandler - {} {}…
Eugene
  • 10,627
  • 5
  • 49
  • 67
0
votes
2 answers

Hive metastore with alluxio storage in parquet data type problem

I am using prestodb with hive metastore for schema storage and alluxio cache as external storage for data. The storage format used in alluxio and hive schema is PARQUET. While retrieving timestamp field from presto using hive catalog. I get follow…
0
votes
1 answer

Master not able to start everytime I restart my alluxio machine

Hi I have deployed a Single node Alluxio cluster, and its working really fine and fast, but the problem I am facing is regarding Master node fails to start every time I restart my Alluxio machine. Receiving the below error: 2019-08-02 05:37:30,942…
Anurag Rawat
  • 445
  • 1
  • 4
  • 13
0
votes
1 answer

How to set master address and 19998 port in Alluxio 2.0 java api?

I want to know how to set the hostname and rpc_port of master in alluxio 2.0 java api. When I use the code that works in alluxio 1.8, I find that it doesn't work in alluxio 2.0. Here is my code, it doesn't work. I don't know how to write correct…
0
votes
1 answer

Why do files need time to synchronize after writing in writeType THROUGH in Alluxio?

When I write file in directory mounted by alluxio-fuse using writeType THROUGH. I find that it takes 2-3 minutes to synchronize files. Why do files need time to synchronize? Following is mount direactory. write time : 15:40. after sync: 15:43
0
votes
1 answer

Qustion about config of level0.dirs.quota and alluxio.user.file.write.tier.default in Alluxio

I set level0.dirs.quota=1GB, level1.dirs.quota=10GB and alluxio.user.file.write.tier.default=1. Then when I use alluxio-fuse to write files over 1G, it will fail. But if I use ./bin/alluxio fs copyFromLocal to write files over 1G, it will…
0
votes
1 answer

Why does UfsSyncPathCache.java:68 parameter not work in Alluxio?

I found that UfsSyncPathCache.java:68 parameter had no effect. When I debugged after set this parameter, I found that lastSync of the path that I got from cache was always null. It seems that the pathsToLoad of DefaultFileSystemMaster.java:3345 was…
0
votes
1 answer

What's the difference between using HDFS RAMDisk and Alluxio?

Since HDFS support RAMDisk, what's the advantage by using Alluxio. In our case we are not going to support integrate different type of under storage beside HDFS.
Jerome tan
  • 155
  • 1
  • 1
  • 10
0
votes
1 answer

Can Impala run on top of Alluxio?

I have tried to configured Impala to run on top of Alluxio, but failed. Here is the Impala configurations: /etc/impala/conf/core-site.xml(http://www.alluxio.org/docs/1.6/en/Running-Hadoop-MapReduce-on-Alluxio.html)
Allen Xu
  • 133
  • 1
  • 10
0
votes
1 answer

Indexing with Solr-spark and Alluxio : cannot acces files in Alluxio

I am indexing documents to solr using java. My code works perfectly when I index files that are in my computer. But when I try to index files that are located in alluxio I have an exception "No fileSystem for scheme: alluxio". I have added alluxio…
Dilak
  • 105
  • 2
  • 2
  • 13
0
votes
1 answer

Can Apache Alluxio use Azure Data Lake as under store?

I have created a HDInsight Cluster with Spark2.2 & HDI 3.6 that read data from Azure Data Lake. Users will execute Spark-SQL on it, I want to use Alluxio as a cache to speed up queries. After some research, I found Azure Blob Storage is supported:…
Lucas Yang
  • 63
  • 8
0
votes
1 answer

Alluxio data is not evenly distributed

I have an EMR setup with 4 r3.4Xlarge machines (total of 128GB (32G/Node) and 1000GB(250GB) SSD is allocated to alluxio). I have loaded around 650GB of ORC data. But I can see 3 workers have used 80% + space allocated but one of the worker have…
Rijo Joseph
  • 1,375
  • 3
  • 17
  • 33
0
votes
1 answer

Configure OFF-HEAP for Apache Spark 2.x

Please help me to understand following What are the steps to configure OFF-HEAP storage for Apache Spark 2.x ? Is it possible to configure Alluxio as the OFF-HEAP storage in 2.0 ? Is it removed since 2.x ? How does OFF-HEAP works with Dynamic…
0
votes
1 answer

writing data to alluxio with CACHE_THROUGH is failing

I am trying to write data to alluxio using map reduce. I have around 11 gig of data on hdfs which I am writing to alluxio.It is working fine with MUST_CACHE write type (the default value of alluxio.user.file.writetype.default). But when I am trying…
0
votes
1 answer

alluxio not distributing files across the cluster

I'm using 6 node cluster for alluxio(1.4 version),but it does't distributing files across the cluster, one worker only using 98% and other worker are using 50%-55% master node using 18% only And i'm using…
Mukundan
  • 21
  • 5