Questions tagged [sequencefile]

A SequenceFile is a Hadoop binary file containing key/value pairs.

A SequenceFile is a file format used by Hadoop for the efficient storage and retrieval of key/value pairs. It is also possible to use compression techniques for more efficient storage.

For more information view the API documentation or the Wiki page.

157 questions
0
votes
1 answer

Why does SequenceFile writer's append operation overwrites all values with the last value?

First, Consider this CustomWriter class: public final class CustomWriter { private final SequenceFile.Writer writer; CustomWriter(Configuration configuration, Path outputPath) throws IOException { FileSystem fileSystem =…
nuaavee
  • 1,336
  • 2
  • 16
  • 31
0
votes
1 answer

Hadoop SequenceFile - auto increment key for records

I'm thinking to use a SequenceFile as "a little database" to store small files. I need that concurrency-client could store small file in this SequenceFile and retrieve an unique id (key of the record). Is it possibile ? I'm new to hadoop and I'm…
Simone Pessotto
  • 1,561
  • 1
  • 15
  • 19
0
votes
3 answers

Export file file hive to hdfs in Sequence File format

I am trying to executing a hive query, and export its output in HDFS with SEQUENCE FILE format. beeline> show create table test_table; +--------------------------------------------------------------------------------------+ | …
Nageswaran
  • 7,481
  • 14
  • 55
  • 74
0
votes
1 answer

Each run of the same Hadoop SequenceFile creation routine creates a file with different crc. Is it ok?

I have a simple code which creates Hadoop's Sequence file. Each the code is ran it leaves in working dir two files: mySequenceFile.txt .mySequenceFile.txt.crc After each run the sizes of both files remain the same. But the crc file contents…
MiamiBeach
  • 3,261
  • 6
  • 28
  • 54
0
votes
0 answers

Sequence File created gives strange output in hadoop

I want to combine several small bzip2 files into a sequence file .I saw a code to create sequence file and tried it. But it gives strange output as below. Is this because it is unable to read bzip2 files?…
0
votes
1 answer

Will sequence file help in improve performance for reading in HDFS compared to Local File System?

I want to compare performance for HDFS and Local File System for 1000 of small files (1-2 mb). Without using Sequence files, HDFS takes almost double the time for reading up 1000 files as compared to local file system. I heard of sequence files…
arg21
  • 137
  • 2
  • 7
0
votes
2 answers

SequenceFile Compactor of several small files in only one file.seq

Novell in HDFS and Hadoop: I am developing a program which one should get all the files of a specific directory, where we can find several small files of any type. Get everyfile and make append in a SequenceFile compressed, where the key must be the…
charles
  • 361
  • 1
  • 5
  • 15
0
votes
1 answer

Save and Read Key-Value pair in Spark

I have a JavaPairRDD in the following format: JavaPairRDD< String, Tuple2< String, List< String>>> myData; I want to save it as a Key-Value format (String, Tuple2< String, List< String>>). myData.saveAsXXXFile("output-path"); So my next job could…
Edamame
  • 23,718
  • 73
  • 186
  • 320
0
votes
1 answer

FileNotFoundException sequence files Mahout

I'm reading Apache Mahout Cookbook. But I have a problem in chapter 2, creating a sequence file. I'm using Mahout 0.9 The command I'm executing is as follows: $MAHOUT_HOME/bin/mahout seqdirectory -i /home/haritz/Escritorio/work_dir/original -o…
Naster
  • 704
  • 1
  • 5
  • 18
0
votes
1 answer

Hadoop Input format

While preparing hadoop exam came across below question for which I could not understand correct answer, not sure about correctness about question. Given a directory of files with the following structure: line number, tab character,…
Jigar Parekh
  • 6,163
  • 7
  • 44
  • 64
0
votes
1 answer

Mahout: Missing class to create Sequence Files

I'm following the instructions at the mahout site for converting an existing file to a sequence file: VectorWriter vectorWriter = SequenceFile.createWriter(filesystem, configuration, …
Denise
  • 1,947
  • 2
  • 17
  • 29
0
votes
0 answers

image added in hadoop sequence file

I am trying to run java program on my hadoop system to store image in sequence file and then trying to read that sequence file after that. My Sequence is created but image data is not getting appended in sequence file. I am trying to run below code…
user1817490
  • 105
  • 1
  • 1
  • 6
0
votes
2 answers

Converting Text to sequence using MapReduce creates junk characters

I am Converting a text file to Sequence file using MapReduce and back to Text. I am getting some numbers at the start of each line. How can i remove them or stop them from coming in my output. e.g. Text : d001 Marketing d002 Finance d003 …
0
votes
1 answer

How to read file names and word count in respective files in Hadoop?

I am trying to fetch file names from sequence file from hadoop with the help of dumbo package of python. But it provides me some kind of identifier. How can i map this to file name? Below is my steps on hadoop system for getting filenames : Steps 1)…
Sanjay Bhosale
  • 685
  • 2
  • 8
  • 18
0
votes
1 answer

Sequence file formats in hadoop

Is there any option to write the Hadoop distributed File system files as sequence files using c# code. If so can u suggest me a link or other details
user3797438
  • 405
  • 3
  • 6
  • 24