Questions tagged [gora]

Apache Gora is an open source framework that provides an in-memory data model and persistence for big data.

The Apache Gora website notes that Gora supports persisting data to column stores, key value stores, document stores and RDBMSs, and also supports the analysis of that data with extensive Apache Hadoop MapReduce support.

Gora provides

  • An extension of the compiler that generates a Java implementation of the data model as described by a Json schema
  • Hadoop input and output format types to read these objects from and write these objects to one of several databases
  • Integrations with other Apache and open source projects. These integrations take the form of DataStore classes that provide a common -like interface to specific data base implementations
50 questions
4
votes
1 answer

Apache Nutch: FetcherJob throws NoSuchElementException deep in Gora

I'm running Apache Nutch 2.3.1 out of the box, which uses Gora 0.6.1. I've followed the instructions here: http://wiki.apache.org/nutch/RunNutchInEclipse It ran fine with the InjectorJob. Now I'm running the FetcherJob, and Gora uses MemStore as a…
Emmanuel
  • 16,791
  • 6
  • 48
  • 74
3
votes
1 answer

Combiner function in Apache Hadoop with Gora

I have a simple Hadoop, Nutch 2.x, Hbase cluster. I have to write a MR job that will find some statistics. It is two step job i.e., I think I need combiner function also. In simple Hadoop jobs, its not a big problem as a lot of guide is given e.g.,…
Hafiz Muhammad Shafiq
  • 8,168
  • 12
  • 63
  • 121
3
votes
1 answer

what is gora and its features?

what is gora? what does it do for us? how it work with hbase? which features dose it have? do you know a good essay or web page which can help me?
Lrrr
  • 4,755
  • 5
  • 41
  • 63
3
votes
0 answers

Can't run nutch2.3-snapshot on hadoop2.4.0 using gora0.5 and mongodb as backend datastore

I'm running into this problem for a few days. When I use hadoop1.2, it works all right. While I turn to hadoop2.x(hadoop2.4 or hadoop2.5.2), I get this problem: java.lang.Exception: java.lang.IncompatibleClassChangeError: Found interface…
wilco sheh
  • 31
  • 3
2
votes
0 answers

Nutch 2.3.1 on OSX does not connect to MongoDB

I configured a local Nutch 2.3.1 instance on MacOS 10.11.5 (El Capitan) running in Eclipse as described here: https://wiki.apache.org/nutch/RunNutchInEclipse As data store to use I configured MongoDB 2.6.12 which is also running on my local MacOS…
André
  • 477
  • 3
  • 11
2
votes
0 answers

Nutch2.3.1 hangs while inject, parse fetch, generate

I've read various SO threads on why it takes so long (or hangs) while generating/injecting/parsing/fetching, but to no luck. The solutions in the following SO threads I've tried implementing, but no luck. 1) Nutch 2.1 urls injection takes forever 2)…
Praful Bagai
  • 16,684
  • 50
  • 136
  • 267
2
votes
1 answer

runtime exception during nutch generate

I'm trying to run nutch for the first time and while executing /bin/nutch generate -topN 5 I get the following exception: GeneratorJob: starting at 2016-02-13 21:01:42 GeneratorJob: Selecting best-scoring urls due for fetch. GeneratorJob:…
Binoy Dalal
  • 866
  • 10
  • 25
2
votes
1 answer

Configuring Nutch 2.3 with HSQL 2.3.3 - ClassNotFoundException : org/apache/avro/ipc/ByteBufferOutputStream

I'm getting ClassNotFoundException : org/apache/avro/ipc/ByteBufferOutputStream when I run apache Nutch with HSQLDB although I have all the avro related jar files under lib avro-1.7.6.jar avro-compiler-1.7.6.jar avro-ipc-1.7.6.jar …
Sridhar Iyer
  • 189
  • 1
  • 7
2
votes
0 answers

how to fetch all the outlinks refrenced on a particular page with page using nutch's parser job

i am using the nutch2.2 and hbase 0.94 and gora 0.4 and when i am executing the steps as follows 1.nutch inject seed.txt 2.nutch generate -batchId 231 3.nutch fetch 231 4.nutch parse 231 5.nutch updatedb 231 i'll get the html content of a…
sachingupta
  • 709
  • 2
  • 9
  • 30
2
votes
1 answer

Accumulo Gora Mapping for Array/HashMap

I'm able to integrate Apache Gora as ORM with Accumulo using the Avro JSON specification (which is bundled within Gora). It works fine when I use primitive data types such as String , Integer etc. but I run into errors once I define the data type as…
Vijay
  • 41
  • 5
1
vote
0 answers

Hbase Mapreduce Job using wrong table name in maper

I have some crawled content in Hbase table (via Nutch). I have written to process a table and output its stats into a new table via mapreduce job. Following is the code snippet of MR job. NutchJob job = NutchJob.getInstance(getConf(),…
Hafiz Muhammad Shafiq
  • 8,168
  • 12
  • 63
  • 121
1
vote
1 answer

Apache gora, where to set new table name in reducer

I have an application that is basically an Hbase Mapreduce job with Apache Gora. I am very simple case that I want to copy one Hbase table data to a new table. Where to write new table name. I have reviewed this Guide but could not find where to put…
Hafiz Muhammad Shafiq
  • 8,168
  • 12
  • 63
  • 121
1
vote
1 answer

The $bin/gora file is not running it always says in cmd "it is not recognized as internal or exernal comand'

I am new to Apache gora.Just installed it an build it by maven (mvn clean install as in doc). After that i was trying to compile gora-turoial module(It is the example included with the dowloaded project.) But when i try to run …
1
vote
1 answer

Apache Nutch flushes gora record after limit

I have configured Nutch 2.3.1 with Hadoop/Hbase ecosystem. I have not changed gora.buffer.read.limit and gora.buffer.read.limit i.e., using their default values that is 10000 in both cases. At generate phase, I set topN to 100,000. During generate…
Hafiz Muhammad Shafiq
  • 8,168
  • 12
  • 63
  • 121
1
vote
1 answer

How to compile Nutch 2.3.1 with Hbase 1.2.6

I have to setup hadoop stack with Nutch 2.3.1. Supported version of Hbase for hadoop 2.7.4 is 1.2.6 that I have configured and tested successfully. But when I compile Nutch I got following and crawl a sample page I got this…
Hafiz Muhammad Shafiq
  • 8,168
  • 12
  • 63
  • 121
1
2 3 4