Questions tagged [alluxio]

Alluxio is an open source memory-centric distributed file system written in Java. It acts as an in-memory data caching layer between applications and data storage systems. The software is published under the Apache License.

Alluxio (formerly Tachyon) is an open source memory-speed distributed file system. It is a data layer between compute and storage, abstracting the files or objects in underlying persistent storage systems and providing a shared data access layer for compute applications. Alluxio was developed in University of California, Berkeley AMPLab.

Alluxio can be used as a distributed shared caching service for big data analytics like mapreduce, apache-spark, etc, so that compute applications talking to Alluxio can transparently cache frequently accessed data, especially data from remote locations, to provide in-memory I/O throughput

Alluxio can also simplify cloud and object storage adoption: Cloud and object storage systems use different semantics that have performance implications compared to traditional file systems. For example, when accessing data in cloud storage there is no node-level locality or cross-application caching. There are also different performance characteristics in common file system operations like directory listing (‘ls’) and ‘rename’, which often add significant overhead to analytics. Deploying Alluixo with cloud or object storage can close the semantics gap and achieve significant performance gains.

Alluxio is written in java and hosted on github.

The latest stable version:

Alluxio 1.8.1 - Sept 27, 2018

Recommended reference sources:

90 questions

votes

1 answer

java.io.IOException: Frame size [...] larger than max length [...]!

I am running Spark in Standalone mode + Alluxio for data access. More specifically, I have 1 spark master & 1 spark worker. When running my job I am getting the following error: 17/03/22 14:35:43 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID…

apache-spark alluxio

asked Mar 23 '17 at 13:43

Elouan Keryell-Even

votes

2 answers

Need help on setup alluxio in single node

I am trying to setup alluxio on my local machine .Followed the alluxio doc http://www.alluxio.org/docs/master/en/Running-Alluxio-Locally.html Able to see the service .But getting error while checking on localhost:19999 HTTP ERROR 500 Problem…

apache-spark apache-spark-sql alluxio

asked Oct 30 '16 at 17:14

senthil kumar p

votes

2 answers

Read multiple files with Spark java from Alluxio is slow

I have installed Alluxio on local with Spark and I have inserted 1000 files in the memory of Alluxio. Nevertheless read file is very slow. File-reading time from Alluxio memory is equal file-reading time from disk. I don't understand why. File Name …

java apache-spark alluxio

asked Aug 16 '16 at 12:12

TiGi

votes

1 answer

How install alluxio1.2 on openstack

I try to install alluxio1.2 on a VM centos on openstack with spark and hdfs but the installation doesn't works. Spark and hdfs are already install and work ERROR logger.type (AlluxioMaster.java:main) - Uncaught exception while running Alluxio…

installation openstack alluxio

asked Aug 04 '16 at 08:17

TiGi

votes

1 answer

Memory usage for transformation on RDD's in alluxio/tachyon for spark

Lets say we create an RDD from alluxio memory rdd1 = sc.textFile("alluxio://.../file1.txt") rdd2 = rdd1.map(...) Does rdd2 reside on alluxio or on spark's heap. Also would an operation like (both pairRDD's on alluxio) pairRDD1.join(pairRDD2)…

python apache-spark pyspark alluxio

asked Jun 09 '16 at 07:54

PAN

votes

1 answer

Deploy tachyon with Ansible without ssh connexion between servers i.e. how to format master

For the moment tachyon is deploy on local mode i.e. http://tachyon-project.org/documentation/v0.7.1/Running-Tachyon-Locally.html My main issue here is the ssh connexion. The classic way is to do: ssh-keygen -t rsa cat id_rsa.pub >>…

ssh ansible alluxio

asked Jan 13 '16 at 11:28

jnaour

votes

1 answer

tachyon0.8.2 deployed with hadoop2.6.0,but the IPC version are not matched

Now,I want to deploy the tachyon0.8.2 on my ubuntu14.04,I already has hadoop and spark: on the master bd@master$ jps 11871 Jps 3388 Master 2919 NameNode 3266 ResourceManager 3123 SecondaryNameNode on the slave bd@slave$ jps 4350 Jps 2778…

java hadoop alluxio

asked Dec 28 '15 at 02:11

Inner Ac

votes

1 answer

How to enable lineage-based fault tolerance for Spark-Tachyon integration?

I am trying to implement RDD/Dataframe sharing using Tachyon. It is my understanding that with HDFS underFS, writes are asynchronous (with replication to HDFS happening behind the scene) and therefore should be faster but in my testing I see that…

apache-spark alluxio

asked Dec 11 '15 at 12:17

Shane Kinsella

votes

1 answer

Is it possible to prevent Tachyon from writing to underFS?

Is it possible to prevent Tachyon from writing to underFS ? I would like it to store data just on memory drive and omit writing them to underFS. Is it possible or supported ? Regards, Mike

apache-spark apache-spark-sql alluxio

asked Nov 11 '15 at 16:22

qwertz1123

1,173
10
27

votes

2 answers

How to set TTL of file in Tachyon

I see that in Tachyon configuration there is a key tachyon.master.ttlchecker.interval.ms ("Time interval (in milliseconds) to periodically delete the files with expired ttl value.") but I have looked all over and cannot find a way of setting the TTL…

java scala alluxio

asked Oct 27 '15 at 11:32

Shane Kinsella

votes

1 answer

Simple Tachyon example fails with "failed to rename" within underFSStorage in GCE

When running a simple example I get this error. I tried changing permissions and used different directories. Caused by: java.io.IOException: FailedToCheckpointException(message:Failed to rename…

scala apache-spark google-compute-engine alluxio

asked Oct 17 '15 at 04:01

BAR

15,909
27
97
185

votes

1 answer

apache-spark deployment: stand alone VS multiple VM's

I have one machine on which to deploy Spark, Hadoop, and Tachyon. Are spark operations from hdfs/tachyon going to be faster on one node with all cores/RAM or a number of VM nodes evenly dividing the resources? Ram is < 200GB. Performance and…

apache-spark hadoop hdfs alluxio

asked May 21 '15 at 17:12

SpmP

votes

1 answer

Tachyon configuration for s3 under filesystem

I am trying to set up Tachyon on S3 filesystem. For HDFS, tachyon has a parameter called TACHYON_UNDERFS_HDFS_IMPL which is set to "org.apache.hadoop.hdfs.DistributedFileSystem". Does anyone know if such a parameter exists for S3? If so, what is its…

alluxio

asked Oct 29 '14 at 22:49

user3033194

1,775
7
42
63

votes

1 answer

Error in setting up Tachyon on S3 under filesystem

I am trying to set up Tachyon on the S3 filesystem. I am completely new to Tachyon and am still really reading what I can find on it. My tachyon-env.sh is given below: !/usr/bin/env bash # This file contains environment variables required to run…

amazon-s3 alluxio

asked Oct 29 '14 at 17:52

user3033194

1,775
7
42
63

-1

votes

1 answer

How can I have Alluxio show all the not-yet-accessed files in the directory?

When mounted an s3 bucket under alluxio://s3/, the bucket already has objects. However, when I get the directory list (either by alluxio fs ls or ls the fuse-mounted directory or on the web ui) i see no files. When I write a new file or read an…

amazon-s3 alluxio

asked Jun 28 '22 at 09:20

ChanChan Mao

Prev 1 2 3 4 5