Questions tagged [alluxio]

Alluxio is an open source memory-centric distributed file system written in Java. It acts as an in-memory data caching layer between applications and data storage systems. The software is published under the Apache License.

Alluxio (formerly Tachyon) is an open source memory-speed distributed file system. It is a data layer between compute and storage, abstracting the files or objects in underlying persistent storage systems and providing a shared data access layer for compute applications. Alluxio was developed in University of California, Berkeley AMPLab.

Alluxio can be used as a distributed shared caching service for big data analytics like , , etc, so that compute applications talking to Alluxio can transparently cache frequently accessed data, especially data from remote locations, to provide in-memory I/O throughput

Alluxio can also simplify cloud and object storage adoption: Cloud and object storage systems use different semantics that have performance implications compared to traditional file systems. For example, when accessing data in cloud storage there is no node-level locality or cross-application caching. There are also different performance characteristics in common file system operations like directory listing (‘ls’) and ‘rename’, which often add significant overhead to analytics. Deploying Alluixo with cloud or object storage can close the semantics gap and achieve significant performance gains.

Alluxio is written in and hosted on github.

The latest stable version:

Recommended reference sources:

90 questions
1
vote
2 answers

Unable to launch Alluxio on Kubernetes

I am trying alluxio 1.7.1 with docker 1.13.1, kubernetes 1.9.6, 1.10.1 I created the alluxio docker image as per the instructions on https://www.alluxio.org/docs/1.7/en/Running-Alluxio-On-Docker.html Then I followed the…
Sushil Kumar Sah
  • 1,042
  • 10
  • 13
1
vote
1 answer

spark LOCAL and alluxio client

I'm running spark in LOCAL mode and trying to get it to talk to alluxio. I'm getting the error: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found I have looked at the page…
jb44
  • 393
  • 1
  • 6
  • 23
1
vote
1 answer

Alluxio - access existing files in underfs

I am running a small Alluxio (1.7.0) cluster using Swift as the underfs. I've confirmed Alluxio works great writing and reading files and persisting them to the Swift store. I would like to access files on the Swift store via Alluxio that are saved…
Kimi Merroll
  • 311
  • 1
  • 4
  • 8
1
vote
1 answer

why alluxio files keep a state of TO_BE_PERSISTED all the time

I have deployed an Alluxio cluster based on HDFS cluster. When I use Alluxio Native Java Api to write some files to Alluxio with setting writetype ASYNC_THROUGH, the files (even just having 1G) seem not be written to HDFS, keep a state of…
Long.zhao
  • 1,085
  • 2
  • 11
  • 16
1
vote
1 answer

Save a RDD by saveAsObject, Exception "had a not serializable result: org.apache.hadoop.hbase.io.ImmutableBytesWritable"

i need to serialize a RDD read from HBASE into alluxio memory file system as way to cache and update it periodically to be used in incremental SPARK computation. Codes are like this, but run into titled exception val inputTableNameEvent =…
bronzels
  • 1,283
  • 2
  • 10
  • 16
1
vote
0 answers

Spark job with oozie TFS FileSystem implementation error

I am new to spark. I need to run a spark job within oozie. individually i am able to run the spark job but with oozie after the job is launched i am getting the following error: 017-01-12 13:51:57,696 INFO [main]…
Rohit Mishra
  • 53
  • 1
  • 13
1
vote
1 answer

Alluxio frame size() larger than max() on Spark

I have a strange error on Alluxio with Spark. I read 20.000 files with Spark from Alluxio and it works. But I read 40.000 files with Spark from Alluxio and it does'n work. I use Alluxio 1.2, Spark 1.6.0 and I read data with file API: FileSystem fs =…
TiGi
  • 37
  • 8
1
vote
1 answer

Spark on Tachyon(alluxio). Frame size (273247862) larger than max length (16777216)

I follow the guide to deploy the spark on Alluxio. When I try to load data from alluxio to run rdd operation, val ccc = sc.textFile("alluxio://localhost:19998/findbugs.xml") ccc.count error shows up like following: 16/07/24 23:27:16 INFO…
Carl H
  • 405
  • 1
  • 8
  • 20
1
vote
0 answers

Running HBase on top of Alluxio

Has anyone succeeded running hbase on top of Alluxio? There is no wiki on Alluxio's webpage related to this matter... No Lucky with the google result neither! My Environment is: Hadoop 2.6 HBase 0.98.20 Alluxio 1.1.0 Edit java.io.IOException:…
spark1631
  • 83
  • 3
1
vote
0 answers

Instructions on installing Tachyon in DCOS (Mesosphere)?

I have spark-notebook setup in DCOS. Tachyon is part of the ecosystem. But I couldn't find any DCOS-way instructions on getting Tachyon installed. I could install it from scratch but there seems to be some DCOS-compliance way to get a service…
1001b
  • 265
  • 3
  • 13
1
vote
0 answers

Wordcount ran on Tachyon showing ClassNotFoundException Exception

I am trying to run Hadoop Wordcount on Tachyon. I followed this link. But once I run wordcount Jar with below command hadoop jar HadoopWordCount-0.0.1-SNAPSHOT-jar-with-dependencies.jar edu.WordCount -libjars…
USB
  • 6,019
  • 15
  • 62
  • 93
1
vote
1 answer

Tachyon Doesn't Seem to be Aware of Available Memory

Just to see if Tachyon would give me an error about configured memory being more than available I set: # Some value over combined available mem and disk space. export TACHYON_WORKER_MEMORY_SIZE=1000GB And observed the allocation in the web UI…
BAR
  • 15,909
  • 27
  • 97
  • 185
1
vote
2 answers

Tachyon on Dataproc Master Replication Error

I have a simple example running on a Dataproc master node where Tachyon, Spark, and Hadoop are installed. I have a replication error writing to Tachyon from Spark. Is there any way to specify it needs no replication? 15/10/17 08:45:21 WARN…
BAR
  • 15,909
  • 27
  • 97
  • 185
1
vote
0 answers

Most efficient way to store spark streaming window in table incrementally with Spark

I would like to use spark-streaming to insert windows of events to daily table, while making that table always up to date to the last second. Essentially I have this with spark 1.4.1: val lines = KafkaUtils.createStream(ssc, zkQuorum, group,…
Pierre Lacave
  • 2,608
  • 2
  • 19
  • 28
1
vote
1 answer

spark persist MEMOERY_AND_DISK vs. Tachyon

I want to make sure I understand tachyon. Is the use of Tachyon with hdfs under it more or less equivalent to to persisting RDD using MEMORY_AND_DISK. In both cases, when the amount of data over run the memory, they get bumped off to the hard drive.…
bhomass
  • 3,414
  • 8
  • 45
  • 75