Questions tagged [alluxio]

Alluxio is an open source memory-centric distributed file system written in Java. It acts as an in-memory data caching layer between applications and data storage systems. The software is published under the Apache License.

Alluxio (formerly Tachyon) is an open source memory-speed distributed file system. It is a data layer between compute and storage, abstracting the files or objects in underlying persistent storage systems and providing a shared data access layer for compute applications. Alluxio was developed in University of California, Berkeley AMPLab.

Alluxio can be used as a distributed shared caching service for big data analytics like , , etc, so that compute applications talking to Alluxio can transparently cache frequently accessed data, especially data from remote locations, to provide in-memory I/O throughput

Alluxio can also simplify cloud and object storage adoption: Cloud and object storage systems use different semantics that have performance implications compared to traditional file systems. For example, when accessing data in cloud storage there is no node-level locality or cross-application caching. There are also different performance characteristics in common file system operations like directory listing (‘ls’) and ‘rename’, which often add significant overhead to analytics. Deploying Alluixo with cloud or object storage can close the semantics gap and achieve significant performance gains.

Alluxio is written in and hosted on github.

The latest stable version:

Recommended reference sources:

90 questions
2
votes
1 answer

Can Spark read Alluxio's metadata just like Hive?

I'm trying to decrease the time Spark using to read and write data by using Alluxio. But I found that I have to specify the path to read data. I've found that I can use metatool of Hive to change Hive's warehouse from HDFS to Alluxio, so I can…
lulijun
  • 415
  • 3
  • 22
2
votes
1 answer

can't add alluxio.security.login.username to spark-submit

I have a spark driver program which I'm trying to set the alluxio user for. I read this post: How to pass -D parameter or environment variable to Spark job? and although helpful, none of the methods in there seem to do the trick. My environment: -…
jb44
  • 393
  • 1
  • 6
  • 23
2
votes
1 answer

Test Spark with Tachyon

I have installed Tachyon and Spark according to instructions: http://tachyon-project.org/documentation/Running-Spark-on-Tachyon.html However, as a newbie I have no idea how to put file "X" into Tachyon File System as they said: $ ./spark-shell $ val…
HP.
  • 19,226
  • 53
  • 154
  • 253
2
votes
1 answer

OFF_HEAP rdd was removed automatically by Tachyon, after the spark job done

I run a spark application, it uses a StorageLevel.OFF_HEAP to persist a rdd(my tachyon and spark are both in local mode). like this: val lines = sc.textFile("FILE_PATH/test-lines-1") val words = lines.flatMap(_.split(" ")).map(word => (word,…
zeromem
  • 381
  • 1
  • 3
  • 12
2
votes
1 answer

Tachyon: Failed to rename during copyFromLocal command

I'm using Apache Spark to build an application. To make the RDDs available from other applications I'm trying two approaches: Using tachyon Using a spark-jobserver I'm new to Tachyon. I completed the following tasks given in the a Running Tachyon…
Anju
  • 631
  • 2
  • 9
  • 25
1
vote
0 answers

Trino Hive connector can't synchronize the partition metadata automatically

Stack: Trino version: 395 Storage: Alluxio with AWS S3 Metadata store: AWS glue I have a daily spark job to save parquet file with 3 partition key(year, month, day) in S3, then all the data will be synchronized to Alluxio. However, although I…
Jonathan Lam
  • 1,761
  • 2
  • 8
  • 17
1
vote
1 answer

difference between WORKER_EVICTOR and WORKER_BLOCK_ANNOTATOR

can you explain what's the difference between WORKER_EVICTOR and WORKER_BLOCK_ANNOTATOR,and why alluxio abandoned WORKER_EVICTOR?
ChanChan Mao
  • 157
  • 8
1
vote
1 answer

Manage file size for S3 using Spark and Alluxio

I am using Spark to write data in Alluxio with UFS as S3 using Hive parquet partitioned table. I am using repartition function on Hive partition fields for making write operation efficient in Alluxio. This is resulting in creation of single file in…
1
vote
1 answer

How to monitor the status of standby masters in Alluxio?

In Alluxio, I can monitor the leading master through port 19998. But I also want to monitor the standby master. However, the standby master does not have RPC port 19998. Is there any way to monitor the standby master? I want to monitor the status of…
1
vote
1 answer

Unable to access Alluxio File System API in IDE

I am trying to access a file in alluxio in a scala code in the IDE and i am getting this error Exception in thread "main" java.io.IOException: No FileSystem for scheme: alluxio My code is as follows, package com.example.sparkalliuxiodemo import…
Sasi
  • 140
  • 1
  • 8
1
vote
1 answer

Alluxio + Hive on EMR

I have Alluxio 1.8 installed on an EMR 5.19.0 cluster, and can see my S3 tables using /usr/local/alluxio/bin/alluxio fs ls /. However, when I start up hive and issue hive> [[DDL w/ LOCATION = alluxio://master_host:19998/my_table ]]], I get the…
rongenre
  • 1,334
  • 11
  • 21
1
vote
1 answer

Timeout to read from Alluxio

I encountered this error while performing a Presto query on Alluxio. What does this timeout mean, and how can I fix it? com.facebook.presto.spi.PrestoException: Error opening Hive split alluxio://xxxxx:19998/s3/data/m-00020 (offset=134217728, …
AAudibert
  • 1,223
  • 11
  • 23
1
vote
1 answer

Channel is closed while reading from Alluxio using Presto

I encountered this stack trace while running a Presto query on top of Alluxio. Sometimes my query is able to succeed, but sometimes it fails with this error. What does it mean, and how can I fix it? com.facebook.presto.spi.PrestoException: Error…
AAudibert
  • 1,223
  • 11
  • 23
1
vote
1 answer

Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: root in alluxio mapreduce

Caused by: org.apache.thrift.transport.TTransportException: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: root It works fine when I run wordcount program locally with alluxio . I also passed the…
UDIT JOSHI
  • 1,298
  • 12
  • 26
1
vote
3 answers

Difference between Alluxio(Tachyon) and Tungsten in Spark?

Tachyon is a distributed, in-memory storage system that is developed separately from Spark which could be used as an off-heap persistence storage during a Spark application Tungsten is a new Spark SQL component that provides more efficient Spark…
Michael
  • 402
  • 1
  • 3
  • 16