Questions tagged [sequencefile]

A SequenceFile is a Hadoop binary file containing key/value pairs.

A SequenceFile is a file format used by Hadoop for the efficient storage and retrieval of key/value pairs. It is also possible to use compression techniques for more efficient storage.

For more information view the API documentation or the Wiki page.

157 questions
1
vote
2 answers

unable to create dataframe from sequence file in Spark created by Sqoop

I want to read orders data and create RDD out of it which is stored as sequence file in hadoop fs in cloudera vm. Below are my steps: 1) Importing orders data as sequence file: sqoop import --connect jdbc:mysql://localhost/retail_db --username…
RushHour
  • 494
  • 6
  • 25
1
vote
1 answer

How to make my java class writable by extending it with scala class?

I need to saveAsSequenceFile my Java class object(I cannot correct class itself for some reason), for that I have to make it Writable. I'm trying to extent my Java class with Scala class(can use only Scala) by implementing write and readFields…
1
vote
1 answer

How to read and write compressed sequence file in spark using Python with any supported compression codec

How to read and write compressed SequenceFile in Spark using Python. I am using Spark 1.6 on CDH 5.12 Quickstart VM with Python 2.7 Found example as below, but not working. rdd.saveAsSequenceFile(,…
singhak.bhu
  • 33
  • 1
  • 9
1
vote
0 answers

How to read snappy compressed sequence File in spark

We have our huge legacy files sitting in our hadoop cluster in compressed sequence file Format. The sequence files were created using hive ETL. Lets say I had table in hive created using the following DDL: CREATE TABLE sequence_table( col1…
1
vote
1 answer

Spark DataFrame from SequenceFile

sqlContext.read.format('orc').load(hdfspath) sqlContext.read.format('parquet').load(hdfspath) This works fine sqlContext.read.format('sequencefile').load(hdfspath) But sequencefile format does not work like that. How can I read a sequence file as…
Tronald Dump
  • 1,300
  • 3
  • 16
  • 27
1
vote
2 answers

How to split a big Sequence file into multiple sequence files?

I have a large sequence file with around 60 million entries (almost 4.5GB). I want to split it. For example, I want to split it into three parts, each having 20 million entries. So far my code is like this: //Read from sequence file …
user3086871
  • 671
  • 3
  • 7
  • 25
1
vote
0 answers

Storing an RDD as sequence file with partitions?

I want to store an java rdd as sequence file with hourly partitioning.Is there any way to achieve this? For eg: I have records of type: time,a1,a2,a3,a4,a5,a6,a7,a8 I want to have key as a2,a3,a4 and values as all the values in this key and the…
mahan07
  • 887
  • 4
  • 14
  • 32
1
vote
1 answer

Can I create sequence file in Spark?

Currently we have an implementation in pig to generate sequence files from records where some of the attributes of a record are treated as key of sequence file and all the records corresponding to that key are stored in one sequence file. As we are…
rk.the1
  • 89
  • 1
  • 10
1
vote
1 answer

Appending to existing sequence file is overwriting the content

I am using below code snippet for sequence file writer but it works fine if sequence file doesn't exist but it does then it overwrites the content rather than appending to it. SequenceFile.Writer writer =…
user3400887
  • 409
  • 1
  • 4
  • 18
1
vote
1 answer

Are Hadoop Sequence Files Supported by Filesystems other than HDFS

Is the sequence file format supported by any other file system apart from HDFS? I am specifically interested in whether the sequence file format can be used for merging and storing the small files on filesystems like e.g. HFS+ or NTFS. Any help is…
user3400887
  • 409
  • 1
  • 4
  • 18
1
vote
1 answer

Data storage format for unstructured data rows on HDFS

We are consuming very large data that needs to be written as fast as we receive and we are using HDFS, so we prefer using it. The data is almost unstructured, and we will be doing basic queries on them rarely. The data is flat with some fields, each…
Mustafa
  • 10,013
  • 10
  • 70
  • 116
1
vote
2 answers

SequenceFile as text CLI with custom class

I have an HDFS file in SequenceFile format. The key is Text and the value is a custom serializable class (say) MyCustomClass. I want to read this file via the hadoop fs -text command but it fails as hadoop does not know what MyCustomClass definition…
Nik
  • 5,515
  • 14
  • 49
  • 75
1
vote
1 answer

How to extract key,value pairs from hbase SequenceFile using mapreduce?

I used the Hbase Export utility tool to export a hbase table into HDFS as a SequenceFile. And now I want to use a mapreduce job to process this file: public class MapSequencefile { public static class MyMapper extends Mapper
Guo
  • 1,761
  • 2
  • 22
  • 45
1
vote
1 answer

How to use Hadoop's MapFileOutputFormat in Flink?

I've got stuck while I'm writing a program using Apache Flink. The problem is that I'm trying to generate Hadoop's MapFile as a result of computation but Scala compiler complains about type mismatch. To illustrate the problem, let me show you the…
eastcirclek
  • 107
  • 10
1
vote
0 answers

python hadoop : mapreduce job is not working

my map reduce program is processing 20 videos so i have uploaded 20 videos in hdfs, when i start executing the map reduce code on terminal its not proceeding. when i run this command pydoop submit --upload-file-to-cache stage1.py stage1…
uday franklin
  • 71
  • 1
  • 3