Questions tagged [alluxio]

Alluxio is an open source memory-centric distributed file system written in Java. It acts as an in-memory data caching layer between applications and data storage systems. The software is published under the Apache License.

Alluxio (formerly Tachyon) is an open source memory-speed distributed file system. It is a data layer between compute and storage, abstracting the files or objects in underlying persistent storage systems and providing a shared data access layer for compute applications. Alluxio was developed in University of California, Berkeley AMPLab.

Alluxio can be used as a distributed shared caching service for big data analytics like mapreduce, apache-spark, etc, so that compute applications talking to Alluxio can transparently cache frequently accessed data, especially data from remote locations, to provide in-memory I/O throughput

Alluxio can also simplify cloud and object storage adoption: Cloud and object storage systems use different semantics that have performance implications compared to traditional file systems. For example, when accessing data in cloud storage there is no node-level locality or cross-application caching. There are also different performance characteristics in common file system operations like directory listing (‘ls’) and ‘rename’, which often add significant overhead to analytics. Deploying Alluixo with cloud or object storage can close the semantics gap and achieve significant performance gains.

Alluxio is written in java and hosted on github.

The latest stable version:

Alluxio 1.8.1 - Sept 27, 2018

Recommended reference sources:

90 questions

vote

2 answers

Unable to launch Alluxio on Kubernetes

I am trying alluxio 1.7.1 with docker 1.13.1, kubernetes 1.9.6, 1.10.1 I created the alluxio docker image as per the instructions on https://www.alluxio.org/docs/1.7/en/Running-Alluxio-On-Docker.html Then I followed the…

sockets docker unix kubernetes alluxio

asked Jun 08 '18 at 11:22

Sushil Kumar Sah

1,042
10
13

vote

1 answer

spark LOCAL and alluxio client

I'm running spark in LOCAL mode and trying to get it to talk to alluxio. I'm getting the error: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found I have looked at the page…

apache-spark classpath local alluxio

asked Apr 12 '18 at 21:07

jb44

vote

1 answer

Alluxio - access existing files in underfs

I am running a small Alluxio (1.7.0) cluster using Swift as the underfs. I've confirmed Alluxio works great writing and reading files and persisting them to the Swift store. I would like to access files on the Swift store via Alluxio that are saved…

filesystems hdfs bigdata alluxio

asked Jan 25 '18 at 16:47

Kimi Merroll

vote

1 answer

why alluxio files keep a state of TO_BE_PERSISTED all the time

I have deployed an Alluxio cluster based on HDFS cluster. When I use Alluxio Native Java Api to write some files to Alluxio with setting writetype ASYNC_THROUGH, the files (even just having 1G) seem not be written to HDFS, keep a state of…

hadoop hdfs alluxio

asked Nov 28 '17 at 11:40

Long.zhao

1,085
2
11
16

vote

1 answer

Save a RDD by saveAsObject, Exception "had a not serializable result: org.apache.hadoop.hbase.io.ImmutableBytesWritable"

i need to serialize a RDD read from HBASE into alluxio memory file system as way to cache and update it periodically to be used in incremental SPARK computation. Codes are like this, but run into titled exception val inputTableNameEvent =…

apache-spark serialization hbase deserialization alluxio

asked Feb 23 '17 at 13:13

bronzels

1,283
2
10
16

vote

0 answers

Spark job with oozie TFS FileSystem implementation error

I am new to spark. I need to run a spark job within oozie. individually i am able to run the spark job but with oozie after the job is launched i am getting the following error: 017-01-12 13:51:57,696 INFO [main]…

apache-spark hadoop2 oozie alluxio

asked Jan 13 '17 at 06:43

Rohit Mishra

vote

1 answer

Alluxio frame size() larger than max() on Spark

I have a strange error on Alluxio with Spark. I read 20.000 files with Spark from Alluxio and it works. But I read 40.000 files with Spark from Alluxio and it does'n work. I use Alluxio 1.2, Spark 1.6.0 and I read data with file API: FileSystem fs =…

java apache-spark thrift alluxio

asked Aug 19 '16 at 14:34

TiGi

vote

1 answer

Spark on Tachyon(alluxio). Frame size (273247862) larger than max length (16777216)

I follow the guide to deploy the spark on Alluxio. When I try to load data from alluxio to run rdd operation, val ccc = sc.textFile("alluxio://localhost:19998/findbugs.xml") ccc.count error shows up like following: 16/07/24 23:27:16 INFO…

apache-spark thrift alluxio

asked Jul 24 '16 at 15:35

Carl H

vote

0 answers

Running HBase on top of Alluxio

Has anyone succeeded running hbase on top of Alluxio? There is no wiki on Alluxio's webpage related to this matter... No Lucky with the google result neither! My Environment is: Hadoop 2.6 HBase 0.98.20 Alluxio 1.1.0 Edit java.io.IOException:…

hbase alluxio

asked Jun 28 '16 at 09:17

spark1631

vote

0 answers

Instructions on installing Tachyon in DCOS (Mesosphere)?

I have spark-notebook setup in DCOS. Tachyon is part of the ecosystem. But I couldn't find any DCOS-way instructions on getting Tachyon installed. I could install it from scratch but there seems to be some DCOS-compliance way to get a service…

dcos spark-notebook alluxio

asked May 20 '16 at 00:32

1001b

vote

0 answers

Wordcount ran on Tachyon showing ClassNotFoundException Exception

I am trying to run Hadoop Wordcount on Tachyon. I followed this link. But once I run wordcount Jar with below command hadoop jar HadoopWordCount-0.0.1-SNAPSHOT-jar-with-dependencies.jar edu.WordCount -libjars…

hadoop mapreduce alluxio

asked Jan 08 '16 at 07:46

USB

6,019
15
62
93

vote

1 answer

Tachyon Doesn't Seem to be Aware of Available Memory

Just to see if Tachyon would give me an error about configured memory being more than available I set: # Some value over combined available mem and disk space. export TACHYON_WORKER_MEMORY_SIZE=1000GB And observed the allocation in the web UI…

alluxio

asked Oct 18 '15 at 01:57

BAR

15,909
27
97
185

vote

2 answers

Tachyon on Dataproc Master Replication Error

I have a simple example running on a Dataproc master node where Tachyon, Spark, and Hadoop are installed. I have a replication error writing to Tachyon from Spark. Is there any way to specify it needs no replication? 15/10/17 08:45:21 WARN…

scala apache-spark hadoop google-cloud-dataproc alluxio

asked Oct 17 '15 at 22:12

BAR

15,909
27
97
185

vote

0 answers

Most efficient way to store spark streaming window in table incrementally with Spark

I would like to use spark-streaming to insert windows of events to daily table, while making that table always up to date to the last second. Essentially I have this with spark 1.4.1: val lines = KafkaUtils.createStream(ssc, zkQuorum, group,…

apache-spark spark-streaming parquet alluxio

asked Aug 23 '15 at 15:34

Pierre Lacave

2,608
2
19
28

vote

1 answer

spark persist MEMOERY_AND_DISK vs. Tachyon

I want to make sure I understand tachyon. Is the use of Tachyon with hdfs under it more or less equivalent to to persisting RDD using MEMORY_AND_DISK. In both cases, when the amount of data over run the memory, they get bumped off to the hard drive.…

apache-spark in-memory alluxio

asked Jun 27 '15 at 01:24

bhomass

3,414
8
45
75

Prev 1 2

4 5 6 Next