Highest Voted 'sequencefile' Questions

2

votes

1 answer

Migrating a huge Bigtable database in GCP from one account to another using DataFlow

I have a huge database stored in Bigtable in GCP. I am migrating the bigtable data from one account to another GCP Account using DataFlow. but, when I created a job to create a sequence file from the bigtable it has created 3000 sequence files on…

asked Sep 07 '21 at 11:30

avnshrai

73
6

2

votes

1 answer

Java Map Reduce use SequenceFIle as reducer output

I have a working Java Map Reduce Program with 2 jobs. The output of the first reduce is written on a file and read by the second mapper. I would like to change the first reducer output to be a SequenceFile. How can i do this? This is the main of my…

java hadoop mapreduce sequencefile

asked Aug 10 '21 at 09:39

Boschi Francesco

43
4

2

votes

0 answers

How can we read multiple sequence files in Apache Flink parallely as Batch Job

I have a use case of reading sequence files as a Batch job in Flink Dataset. The files are stored in S3 bucket which I have to consume in a Flink Dataset. I am not able to read the files by providing comma(,) separated file paths to read in the…

dataset apache-flink sequencefile

asked May 15 '18 at 10:50

Abhinav Prakash

21
2

2

votes

0 answers

SequenceFile.Writer leads to NullPointerException

Hi I am trying to create a simple sequencefile with mahout libraries using the bellow code. While running the code I am getting NullPointerException after creating a empty file, public class SequenceFileWriter { public static void main(String[]…

nullpointerexception mahout sequencefile

asked Oct 26 '17 at 17:07

Vijayvignesh G S

21
1

2

votes

0 answers

One field in Protocol Buffers is always missing when reading from SequenceFile

Something mysterious is happening for me: What I wanted to do: 1. Save a Protocol Buffers object as SequenceFile format. 2. Read this SequenceFile text and extract the field that I need. The mystery part is: One field that I wanted to retrieve is…

hadoop apache-spark protocol-buffers sequencefile java-pair-rdd

asked Sep 16 '17 at 21:37

Fisher Coder

3,278
12
49
84

2

votes

1 answer

Convert data from gzip to sequenceFile format using Hive on spark

I'm trying to read a large gzip file into hive through spark runtime to convert into SequenceFile format And, I want to do this efficiently. As far as I know, Spark supports only one mapper per gzip file same as it does for text files. Is there…

hadoop apache-spark hive pyspark sequencefile

asked Jul 21 '17 at 05:24

Marcel Mars

388
5
16

2

votes

1 answer

Sequence file reading issue using spark Java

i am trying to read the sequence file generated by hive using spark. When i try to access the file , i am facing org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: I have…

hadoop apache-spark sequencefile bigdata

asked May 03 '17 at 19:24

vishal raj

21
1

2

votes

1 answer

How to create splits from a sequence file in Hadoop?

In Hadoop, I have a sequence file of 3GB size. I want to process it in parallel. Therefore, I am going to create 8 maptasks and hence 8 FileSplits. FileSplit class has constructors that require the: Path of the file Start position Length For…

hadoop hadoop2 sequencefile filesplitting recordreader

asked Apr 12 '17 at 12:49

Mosab Shaheen

1,114
10
25

2

votes

2 answers

Reading Sequence File in PySpark 2.0

I have a sequence file whose values look like (string_value, json_value) I don't care about the string value. In Scala I can read the file by val reader = sc.sequenceFile[String, String]("/path...") val data = reader.map{case (x, y) =>…

apache-spark pyspark sequencefile

asked Jan 09 '17 at 19:16

Max

837
4
11
20

2

votes

1 answer

Can I create sequence file using spark dataframes?

I have a requirement in which I need to create a sequence file.Right now we have written custom api on top of hadoop api,but since we are moving in spark we have to achieve the same using spark.Can this be achieved using spark dataframes?

hadoop apache-spark apache-spark-sql sequencefile outputformat

asked Nov 27 '16 at 17:54

mahan07

887
4
14
32

2

votes

1 answer

Get HDFS file path in PySpark for files in sequence file format

My data on HDFS is in Sequence file format. I am using PySpark (Spark 1.6) and trying to achieve 2 things: Data path contains a timestamp in yyyy/mm/dd/hh format that I would like to bring into the data itself. I tried SparkContext.wholeTextFiles…

apache-spark pyspark sequencefile

asked Oct 19 '16 at 16:29

Arnkrishn

29,828
40
114
128

2

votes

1 answer

Hadoop SequenceFile vs splittable LZO

We're choosing the file format to store our raw logs, major requirements are compressed and splittable. Block-compressed (whichever codec) SequenceFiles and Hadoop-LZO look the most suitable so far. Which one would be more efficient to be processed…

hadoop mapreduce sequencefile lzo

asked Oct 10 '15 at 22:28

k0_

101
3

2

votes

0 answers

Flume - how to create a custom key for a HDFS SequenceFile?

I'm using Flume's HDFS SequenceFile sink for writing data to HDFS. I'm looking for a possibility to create "custom keys". Per default, Flume is using the Timestamp as key within a SequenceFile. However, in my usecase I would like to use a customized…

key hdfs flume sequencefile

asked Sep 07 '15 at 14:23

Thomas Beer

230
3
9

2

votes

1 answer

Spark: how to read CompactBuffer from an objectFile?

I am reading the following structure from an object file: (String, CompactBuffer(person1, person2, person3 ...) ) If I tried to read like this: val input = sc.objectFile[(String, ListBuffer[Person])]("inputFile.txt") val myData = input.map { t => …

scala apache-spark object-files sequencefile

asked Aug 22 '15 at 21:20

Edamame

23,718
73
186
320

2

votes

1 answer

hsync() not working for SequenceFile Writer

I have a small program that writes 10 records to a block compressed SequenceFile on HDFS every second, and then run sync() every 5 minutes to ensure that everything older than 5 minutes are available for processing. As my code is quite a few lines,…

hadoop hdfs sequencefile

asked Mar 09 '15 at 15:53

agnsaft

1,791
7
30
49

Questions tagged [sequencefile]