Questions tagged [outputformat]

For questions relating to the format of an output from a hadoop reducer

For questions relating to the format of an output from a hadoop reducer

There are several output formats that come with hadoop (TextOutputFormat, FileOutputFormat, SequencialFileOutputFormat, for example). Each output format class has as its primary responsibility the construction of a RecordWriter to which is delegated the behavior of actually storing output data by the hadoop Context.

Other output formats are provided by third party vendors, such as for specific NoSQL databases like MongoDB or Cassandra.

27 questions
5
votes
1 answer

How to record raw AAC audio files in Android using MediaRecorder? AAC_ADTS doesn't work

I'm using the Android MediaRecorder to record AAC encoded audio files. Setting the output format to MPEG-4 worked pretty well. But as my audio player supports neither MPEG-4 nor 3GP I tried to get raw AAC files by using the output format AAC_ADTS,…
Kirby
  • 86
  • 1
  • 8
4
votes
1 answer

How to filter keys or values in Hadoop map/reduce job output file?

Normally, Hadoop map/reduce job produces list of key-value pairs that are written to job's output file (using OutputFormat class). Rarely, both keys and values are useful, usually either keys or values contain required information. Is there an…
Rasto
  • 17,204
  • 47
  • 154
  • 245
3
votes
1 answer

Output results in conll format (POS-tagging, stanford pos tagger)

I am trying to use Stanford POS-tagger, I want to ask if it is possible to parse (actually only pos tag would be enough) an english text and output the results in conll format. Is there such an option? I am using the full 3.2.0 version of the…
2
votes
1 answer

Can I create sequence file using spark dataframes?

I have a requirement in which I need to create a sequence file.Right now we have written custom api on top of hadoop api,but since we are moving in spark we have to achieve the same using spark.Can this be achieved using spark dataframes?
mahan07
  • 887
  • 4
  • 14
  • 32
2
votes
1 answer

Write Parquet Output in a Hadoop Streaming job

Is there a way to write text data into a parquet file with hadoop-streaming using python. Basically, I have a string being emitted from my IdentityMapper which I want to store as a parquet file. inputs or examples would be really helpful
ghosts
  • 177
  • 2
  • 15
2
votes
1 answer

XSLTCompiled transform not honoring XSLT Formatting for text files

When working with XSLT Compiled Transform I can't quite get output to format, it always strips all the spaces and is not in human readable form. However If I run the same transform through Visual studio XSLT Debugger the output is neatly…
2
votes
1 answer

Save h:outputFormat result to a variable

Currently I am successfully getting the labels from my resource bundle via But how do I save it in a variable so that I can…
Rey Libutan
  • 5,226
  • 9
  • 42
  • 73
1
vote
1 answer

Can't get Spark to use the magic output committer for s3 with EMR

I'm trying to use the magic output committer, But whatever I do I get the default output committer. INFO FileOutputCommitter: File Output Committer Algorithm version is 10 22/03/08 01:13:06 ERROR Application: Only 1 or 2 algorithm version is…
idan ahal
  • 707
  • 8
  • 21
1
vote
1 answer

XMLSerializer & OutputFormat Deprecated

I'm trying to get some help from Java experts around S.O. regarding this issue. I came across an old implementation for a XMLSerializer & OutputFormat in a long time project...I was wondering if someone could give a pointer on what to do, an opinion…
Henrique C.
  • 948
  • 1
  • 14
  • 36
1
vote
0 answers

Mapreduce Custom TextOutputFormat - Strange characters NUL, SOH, etc

I have implemented a custom output format for converting key value pairs to a Json format. public class JSONOutputFormat extends TextOutputFormat { @Override public RecordWriter
bhermont
  • 119
  • 1
  • 6
1
vote
1 answer

Using CqlOutputFormat for INSERT statement

I'm fairly new to Cassandra. I'm using hadoop to bulk load data into a cassandra cluster using CqlOutputFormat. I'm unable to find sufficient examples in internet to tailor it to my usecase. I'm specifically using it to insert data into the cluster…
Vishnu Prathish
  • 369
  • 4
  • 15
1
vote
1 answer

Can I use f:convertNumber with h:outputFormat

I have a composite component and this is a snippet from it.
Shady Hussein
  • 513
  • 8
  • 24
1
vote
1 answer

Hadoop Custom Output format, when do all reducers end?

I'm building a custom output format for hadoop and was wondering if there is a way in the output format to know when all reducers (RecordWriters) are complete ? In order to know that one RecordWriter completed, the close method of RecordWriter can…
nomier
  • 402
  • 1
  • 3
  • 12
1
vote
2 answers

How can I use MultipleoutputFormai in Hadoop 0.20?

I am working with Hadoop 0.20 and I want to have two reduce output files instead of one output. I know that MultipleOutputFormat doesn't work in Hadoop 0.20. I added the hadoop1.1.1-core jar file in the build path of my project in Eclipse. But it…
ali abdoli
  • 33
  • 6
0
votes
1 answer

how to use dataset api like dataset.output(outputFormat) in latest flink1.14 or flink1.15 table/sql api

I got a project use dataset api like dataset.output(outputFormat). The OutputFormat is userDefined(write batch data to neo4j) so I want to keep it, but I could not find any table/sql api in latest version in flink use outputFormat. Thanks for any…
liss bai
  • 13
  • 2
1
2