Questions tagged [morphline]

Kite Morphlines (previously known as Cloudera Morphlines) is an open source framework that supports Hadoop, Flume and Spark applications that extract, transform, and load data into Apache Solr, Apache HBase, HDFS, etc. A “morphline” is a configuration file that defines a transformation chain for reading, processing and writing data.

Kite Morphlines (previously known as Cloudera Morphlines) is an open source framework that supports Hadoop, Flume and Spark applications that extract, transform, and load data into Apache Solr, Apache HBase, HDFS, enterprise data warehouses, etc.

A “morphline” is a configuration file that defines a transformation chain that consumes any kind of data from any kind of data source, processes the data, and loads the results into a Hadoop component.

Morphlines is a library, embeddable in any JVM codebase. A morphline is an in-memory container of transformation commands. Commands are plugins to a morphline that perform tasks such as loading, parsing, transforming, or otherwise processing a single record. A record is an in-memory data structure of name-value pairs with optional blob attachments or POJO attachments. The framework is extensible via embedded Java fragments or via additional commands written as Java classes.

17 questions
3
votes
1 answer

MapReduceIndexerTool doesn't reindex documents correctly

I am currently trying to batch index data that at the moment I have in a text file using Cloudera Search batch indexing developing on the Cloudera quickstart vm. I believe I have a problem with my schema, and morphline because it completes the job…
1
vote
2 answers

How should look like a morphline for MapReduceIndexerTool?

I want to search through a lot of logs (about 1 TB in size, placed on multiple machines) efficiently. For that purpose, I want to build an infrastructure composed of Flume, Hadoop and Solr. Flume will get the logs from a couple of machines and will…
Cosmin Ioniță
  • 3,598
  • 4
  • 23
  • 48
1
vote
1 answer

How to read decimal value from parquet file using Morphline readAvroParquetFile and solar

Table with two columns(name string,salary decimal(10,3) and stored in parquet format in hive. While performing indexing using Morphline and solar getting the following exception: ERROR morphline.MorphlineMapRunner: Unable to process file
Anand
  • 11
  • 3
1
vote
1 answer

flume-kite-morphline: com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT

While working on flume (1.6& 1.7) I am experiencing the below error 2016-12-02 00:57:11,634 (pool-3-thread-1) [WARN - org.apache.flume.serialization.LineDeserializer.readLine(LineDeserializer.java:143)] Line length exceeds max (2048), truncating…
1
vote
1 answer

Flume morphline interceptor: For Data cleaning

I have a simple structured input coming real time. But it has garbage also in the values like in some place '@' or Hexadecimal characters are there. How can i use morphline flume interceptor to clean the data? My sink here will be hbase.
Gagan
  • 1,775
  • 5
  • 31
  • 59
1
vote
2 answers

Morphlines command extractHBaseCells doesn't support avro objects in hbase, is there a workaround?

I am using CDH4.4. I have an app currently running which serializes records into a single column in hbase via avro. I am in the process of moving my current solr index of this table into solrcloud, so I'm testing the MapReduceIndexerTool to do bulk…
rlong
  • 187
  • 3
  • 10
0
votes
0 answers

How to read DECIMAL(38,10) using Morphlines conf file

I want to read parquet files using Morphlines. Reference:https://medium.com/@bkvarda/index-parquet-with-morphlines-and-solr-20671cd93a41 This Parquet file has DECIMAL datatypes. I do not find any documentation, how to deal with DECIMAL in…
Prince
  • 41
  • 5
0
votes
1 answer

Indexing PDF documents using Cloudera Search

I've been trying to index pdf documents using Cloudera Search aka Apache Solr. First I was able to index twitter tweets. Later I tried to index PDF files. I've created the corresponding collection using solrctl with default schema. The morphline…
0
votes
1 answer

Flume morphline interceptor-split command

Hi I'm trying to use morphline inteceptor and convert my syslog to JSON for start i tried to use split command for splitting my string ,but im getting error as below: "" Source r1 has been removed due to an error during…
ar.sh
  • 1
  • 1
0
votes
1 answer

Error flume MorphlineSolrSink readJson java.lang.NoSuchFieldError: USE_DEFAULTS

I am trying to read json from avro source and sink to Solr. When I tried readLine {} and stored as string it worked. But when trying readJson{} it throwing following error. Version : CDH 5.9.0, Parcels Error 2017-01-26 06:35:38,604 ERROR…
Mahebub A Sayyed
  • 325
  • 5
  • 14
0
votes
1 answer

Is it possible to add the values of two variables using Morphline's inbuilt set of commands?

I'm wondering if there is any way to add the values of two variables in morphlines, without having to write a custom command. For example, something like: addValues { answer : "@{value_one}" + 50 } Any help is appreciated,…
Douglas Stead
  • 150
  • 2
  • 13
0
votes
1 answer

Save entire JsonObject to a variable using the ReadJson command in Morphlines?

I've looked through the documentation for Morphlines (available at http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html), and by the looks of things there is no way to store an entire Json Object to a variable in…
Douglas Stead
  • 150
  • 2
  • 13
0
votes
1 answer

Flume setup in local

Can I do setup of flume on my local machine? I can only see setup guides for flume on a cluster environment. I have to setup flume and have to integrate it with morphline.
earl
  • 738
  • 1
  • 17
  • 38
0
votes
1 answer

Morphline config file not indexing avro nexted data

I am generating index for my avro data in solr. Index are only getting generated for data elements which are at root level and not which are nested. Below is the sample schema (not including all of it) My Avro Schema is as below. { "type" :…
0
votes
1 answer

Morphlines date format exception

I Want to convert field to date format like that: { convertTimestamp { field : document_date inputFormats : ["yyyy-MM-dd"] inputTimezone : UTC outputFormat : "yyyy" outputTimezone : UTC } The input…
Ziemo
  • 941
  • 8
  • 27
1
2