I have configured Nutch 2.3.1 with Hadoop/Hbase ecosystem. I have not changed gora.buffer.read.limit
and gora.buffer.read.limit
i.e., using their default values that is 10000 in both cases. At generate phase, I set topN to 100,000. During generate job I get following information
org.apache.gora.mapreduce.GoraRecordWriter: Flushing the datastore after 60000 records
After job completion, I found that 100,000 urls are marked for fetched that I want to be. But I am confused what does above warning shows ? What is impact of gora.buffer.read.limit on my crawling ? Can someone guide ?