Questions tagged [alluxio]

Alluxio is an open source memory-centric distributed file system written in Java. It acts as an in-memory data caching layer between applications and data storage systems. The software is published under the Apache License.

Alluxio (formerly Tachyon) is an open source memory-speed distributed file system. It is a data layer between compute and storage, abstracting the files or objects in underlying persistent storage systems and providing a shared data access layer for compute applications. Alluxio was developed in University of California, Berkeley AMPLab.

Alluxio can be used as a distributed shared caching service for big data analytics like mapreduce, apache-spark, etc, so that compute applications talking to Alluxio can transparently cache frequently accessed data, especially data from remote locations, to provide in-memory I/O throughput

Alluxio can also simplify cloud and object storage adoption: Cloud and object storage systems use different semantics that have performance implications compared to traditional file systems. For example, when accessing data in cloud storage there is no node-level locality or cross-application caching. There are also different performance characteristics in common file system operations like directory listing (‘ls’) and ‘rename’, which often add significant overhead to analytics. Deploying Alluixo with cloud or object storage can close the semantics gap and achieve significant performance gains.

Alluxio is written in java and hosted on github.

The latest stable version:

Alluxio 1.8.1 - Sept 27, 2018

Recommended reference sources:

90 questions

votes

1 answer

Can Spark read Alluxio's metadata just like Hive？

I'm trying to decrease the time Spark using to read and write data by using Alluxio. But I found that I have to specify the path to read data. I've found that I can use metatool of Hive to change Hive's warehouse from HDFS to Alluxio, so I can…

apache-spark hadoop alluxio

asked Dec 14 '17 at 18:13

lulijun

votes

1 answer

can't add alluxio.security.login.username to spark-submit

I have a spark driver program which I'm trying to set the alluxio user for. I read this post: How to pass -D parameter or environment variable to Spark job? and although helpful, none of the methods in there seem to do the trick. My environment: -…

apache-spark spark-submit alluxio

asked Apr 23 '17 at 13:46

jb44

votes

1 answer

Test Spark with Tachyon

I have installed Tachyon and Spark according to instructions: http://tachyon-project.org/documentation/Running-Spark-on-Tachyon.html However, as a newbie I have no idea how to put file "X" into Tachyon File System as they said: $ ./spark-shell $ val…

scala apache-spark alluxio

asked Oct 08 '15 at 23:09

HP.

19,226
53
154
253

votes

1 answer

OFF_HEAP rdd was removed automatically by Tachyon, after the spark job done

I run a spark application, it uses a StorageLevel.OFF_HEAP to persist a rdd(my tachyon and spark are both in local mode). like this: val lines = sc.textFile("FILE_PATH/test-lines-1") val words = lines.flatMap(_.split(" ")).map(word => (word,…

apache-spark rdd alluxio

asked Mar 14 '15 at 05:07

zeromem

votes

1 answer

Tachyon: Failed to rename during copyFromLocal command

I'm using Apache Spark to build an application. To make the RDDs available from other applications I'm trying two approaches: Using tachyon Using a spark-jobserver I'm new to Tachyon. I completed the following tasks given in the a Running Tachyon…

apache-spark alluxio

asked Jan 21 '15 at 12:17

Anju

vote

0 answers

Trino Hive connector can't synchronize the partition metadata automatically

Stack: Trino version: 395 Storage: Alluxio with AWS S3 Metadata store: AWS glue I have a daily spark job to save parquet file with 3 partition key(year, month, day) in S3, then all the data will be synchronized to Alluxio. However, although I…

amazon-web-services apache-spark presto trino alluxio

asked Nov 04 '22 at 01:39

Jonathan Lam

1,761
2
8
17

vote

1 answer

difference between WORKER_EVICTOR and WORKER_BLOCK_ANNOTATOR

can you explain what's the difference between WORKER_EVICTOR and WORKER_BLOCK_ANNOTATOR，and why alluxio abandoned WORKER_EVICTOR?

alluxio

asked Jun 27 '22 at 08:00

ChanChan Mao

vote

1 answer

Manage file size for S3 using Spark and Alluxio

I am using Spark to write data in Alluxio with UFS as S3 using Hive parquet partitioned table. I am using repartition function on Hive partition fields for making write operation efficient in Alluxio. This is resulting in creation of single file in…

apache-spark amazon-s3 hive alluxio

asked Jul 02 '19 at 15:26

Nupur Bharati

vote

1 answer

How to monitor the status of standby masters in Alluxio?

In Alluxio, I can monitor the leading master through port 19998. But I also want to monitor the standby master. However, the standby master does not have RPC port 19998. Is there any way to monitor the standby master? I want to monitor the status of…

alluxio

asked Mar 12 '19 at 13:55

Shuocheng Wang

vote

1 answer

Unable to access Alluxio File System API in IDE

I am trying to access a file in alluxio in a scala code in the IDE and i am getting this error Exception in thread "main" java.io.IOException: No FileSystem for scheme: alluxio My code is as follows, package com.example.sparkalliuxiodemo import…

scala maven apache-spark alluxio

asked Mar 07 '19 at 05:03

Sasi

vote

1 answer

Alluxio + Hive on EMR

I have Alluxio 1.8 installed on an EMR 5.19.0 cluster, and can see my S3 tables using /usr/local/alluxio/bin/alluxio fs ls /. However, when I start up hive and issue hive> [[DDL w/ LOCATION = alluxio://master_host:19998/my_table ]]], I get the…

hive amazon-emr alluxio

asked Dec 03 '18 at 23:54

rongenre

1,334
11
21

vote

1 answer

Timeout to read from Alluxio

I encountered this error while performing a Presto query on Alluxio. What does this timeout mean, and how can I fix it? com.facebook.presto.spi.PrestoException: Error opening Hive split alluxio://xxxxx:19998/s3/data/m-00020 (offset=134217728, …

presto alluxio

asked Nov 12 '18 at 20:59

AAudibert

1,223
11
23

vote

1 answer

Channel is closed while reading from Alluxio using Presto

I encountered this stack trace while running a Presto query on top of Alluxio. Sometimes my query is able to succeed, but sometimes it fails with this error. What does it mean, and how can I fix it? com.facebook.presto.spi.PrestoException: Error…

presto alluxio

asked Nov 12 '18 at 20:44

AAudibert

1,223
11
23

vote

1 answer

Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: root in alluxio mapreduce

Caused by: org.apache.thrift.transport.TTransportException: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: root It works fine when I run wordcount program locally with alluxio . I also passed the…

hadoop mapreduce hadoop-yarn alluxio

asked Oct 15 '18 at 06:38

UDIT JOSHI

1,298
12
26

vote

3 answers

Difference between Alluxio(Tachyon) and Tungsten in Spark?

Tachyon is a distributed, in-memory storage system that is developed separately from Spark which could be used as an off-heap persistence storage during a Spark application Tungsten is a new Spark SQL component that provides more efficient Spark…

apache-spark apache-spark-sql rdd alluxio

asked Oct 04 '18 at 11:39

Michael

Prev 1

3 4 5 6 Next