Questions tagged [sequencefile]

A SequenceFile is a Hadoop binary file containing key/value pairs.

A SequenceFile is a file format used by Hadoop for the efficient storage and retrieval of key/value pairs. It is also possible to use compression techniques for more efficient storage.

For more information view the API documentation or the Wiki page.

157 questions
2
votes
1 answer

Hadoop append to Sequencefile

Currently I use the following code to append to an existing SequenceFile: // initialize sequence writer Writer writer = SequenceFile.createWriter( FileContext.getFileContext(this.conf), this.conf, new…
Christian D.
  • 65
  • 2
  • 9
2
votes
3 answers

Why the SequenceFile is truncated?

I am learning Hadoop and this problem has baffled me for a while. Basically I am writing a SequenceFile to disk and then read it back. However, every time I get an EOFException when reading. A deeper look reveals that when writing the sequence file,…
Andy
  • 23
  • 3
2
votes
1 answer

Convert a text file to sequence format in Spark Java

In Spark Java, how do I convert a text file to a sequence file? The following is my code: SparkConf sparkConf = new SparkConf().setAppName("txt2seq"); sparkConf.setMaster("local").set("spark.executor.memory", "1g"); …
Edamame
  • 23,718
  • 73
  • 186
  • 320
2
votes
1 answer

Is it possible to check if a file on HDFS is a SequenceFile without (mis)using exceptions?

I want to read a specific SequenceFile from HDFS from a client application. I can do this by using the SequenceFile.Reader and it works fine. But is it also possible to check whether a file is a SequenceFile other from analyzing the thrown…
rabejens
  • 7,594
  • 11
  • 56
  • 104
2
votes
1 answer

I'm in trouble in K-Means using Mapreduce (modified)

I think my code is not wrong but, it doesn't work correctly. This is K-means clustering using mapreduce. (https://github.com/30stm/K-Means-using-mapreduce/tree/master) Make a dataset using DatasetWriter.java, and make centroids using…
2
votes
4 answers

How to copy the output of -text HDFS command into another file?

Is there any way we can copy text content of hdfs file into another file system using HDFS command: hadoop fs -text /user/dir1/abc.txt Can I print the output of -text into another file by using -cat or any method ?: hadoop fs -cat…
dewet
  • 31
  • 1
  • 4
2
votes
0 answers

Reading SequenceFile written by Spark

I have bunch of sequence files that I want to read using Scalding and I am having some troubles. This is my code: class ReadSequenceFileApp(args:Args) extends ConfiguredJob(args) { SequenceFile(args("in"), ('_, 'wbytes)) .read …
Rob Schneider
  • 679
  • 4
  • 13
  • 27
2
votes
1 answer

How to extract data from Hadoop sequence file?

Hadoop sequence file is really strange. I pack the images into sequence file and can't recovery image. I do some simple test. And I found the size of bytes even not same before and after use sequence file. Configuration confHadoop = new…
hakunami
  • 2,351
  • 4
  • 31
  • 50
2
votes
1 answer

writing/reading key/value pairs in sequence file format in Hadoop.

I have a mapreduce program whose output is all in text files right now. A sample of the program is below. What I do not understand how to do is output the key/value pairs from the reducer in sequence file format. No, I can't use SequeceFileFormat…
user2654569
  • 957
  • 6
  • 10
2
votes
2 answers

Is there a simple way to migrate from SequenceFiles to Avro?

I'm currently using hadoop mapreduce jobs with SequenceFiles of writables. The same Writable type are used for serialization also in the non-hadoop related parts of the system. This method is hard to maintain - mainly because of the lack of schema…
Ophir Yoktan
  • 8,149
  • 7
  • 58
  • 106
2
votes
2 answers

Hadoop Serializer Not Found Exception

I have a job whose output format is SequenceFileOuputFormat. I set the output key and value class like this: conf.setOutputKeyClass(IntWritable.class); conf.setOutputValueClass(SplitInfo.class); The SplitInfo class implements…
Razvan
  • 9,925
  • 6
  • 38
  • 51
2
votes
0 answers

Converting existing vectors to Mahout Vectors

I"m trying to convert term-frequency values into mahout vector representation, so that I can use LDA on the given vectors. I'm following the mahout wiki where the code snippest suggest how to convert exisitng vectors to Mahout Vectors.…
OpenMaze
  • 21
  • 4
2
votes
1 answer

How to convert below text to sequence file which again, will be converted to vector for mahout kmeans?

Good afternoon to you all, My data is in below format: ID : VALUE(tags assigned by users) 0001: "PC, THINKPAD, T500" 0002: "PHONE, CELLPHONE, IPHONE, APPLE, IPHONE5" .......and so on. How can I write a code…
phoenixbai
  • 35
  • 4
1
vote
1 answer

Deserialize an in-memory Hadoop sequence file object

Pyspark has a function sequenceFile that allows us to read a sequence file which is stored in HDFS or some local path available to all nodes. However, what if I already have a bytes object in the driver memory that I need to deserialize and write as…
Liam385
  • 101
  • 5
1
vote
1 answer

java.io.EOFException not a SequenceFile on empty file

I'm trying to read a table using spark. spark.table("table_name") sc.sequenceFile(path, classOf[Text], classOf[Text], 1000). map(x => x._2.toString.split(delimiter, -1)) Both work if there are no empty files and both fail with…
gjin
  • 860
  • 1
  • 14
  • 28
1 2
3
10 11