Questions tagged [carbon-data]

Apache Carbon Data is a new big data file format for faster interactive query using advanced columnar storage, index, compression and encoding techniques to improve computing efficiency, which helps in speeding up queries by an order of magnitude faster over Peta Bytes of data.

Apache Carbon Data is an indexed columnar data format for fast analytics on big data platform, e.g. Apache Hadoop, Apache Spark, etc.

These are the video gallary to understand in a better way

CarbonData files contain groups of data called blocklets, along with all required information like schema, offsets and indices etc, in a file header and footer, co-located in HDFS.

The file footer can be read once to build the indices in memory, which can be utilized for optimizing the scans and processing for all subsequent queries

3 questions
1
vote
1 answer

Carbondata Class not found CarbonSessionStateBuilder Error in Spark

I'm using spark version 2.2.1 and Carbondata 1.5.3 version. Following the instructions in Carbondata official guide, I can run the import statements, import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ But failing on…
appleboy
  • 661
  • 1
  • 9
  • 15
1
vote
2 answers

Lost carbon data - graphite

I am currently using graphite to monitor metrics from an API server. I use statsd/carbon to retrieve metrics via graphite (I use the docker from here / https://github.com/hopsoft/docker-graphite-statsd). The thing is, I cannot keep the carbon data…
user3722267
-1
votes
2 answers

spark dealing with carbondata

Below is the code snippet I'm trying to use to create a carbondata table in S3. However, inspite of setting the aws credentials in hadoopconfiguration, it still complains about secret key and access key not being set. What is the issue here? import…
Vikas J
  • 358
  • 1
  • 5
  • 17