Ckan Harvester is not working

Question

I read this documen to create a harvester. https://github.com/ckan/ckanext-harvest.I can reach http://localhost/harvest.After that, I created a harvest source.But what will I do right now?What I want to do is to collect some datasets from another ckan instances.Do i have to implement harvesting interface

score 1 · Accepted Answer · answered Aug 07 '18 at 12:21

1

To harvest from another CKAN instance you can use the ckan_harvester plugin provided with ckanext-harvest. You only need to implement the IHarvester interface if you want to harvest from a different data source for which a harvester isn't available (for example a proprietary database format).

To enable the ckan_harvester plugin, add it to the list of plugins in your CKAN INI file and restart CKAN. You then need to create and configure a new harvester in the CKAN web UI at http://your-ckan-instance/harvest. Finally, make sure to actually run the configured harvesters using the command line tools (or cron).

Refer to the documentation for details.

answered Aug 07 '18 at 12:21

Florian Brucker

9,621
3
48
81

Do I have to run 3 command line commands to start harvest job from UI? First one is paster --plugin=ckanext-harvest harvester gather_consumer --config=/etc/ckan/default/production.ini.Second one is paster --plugin=ckanext-harvest harvester fetch_consumer --config=/etc/ckan/default/production.ini.Third one is paster --plugin=ckanext-harvest harvester run --config=/etc/ckan/default/production.ini – Aug 17 '18 at 07:18
Yes. The `gather_consumer` and `fetch_consumer` need to run continuously (e.g. in separate terminal windows), and you need to execute `run` once to start the harvesting (once you have configured and scheduled a harvester in the UI). – Florian Brucker Aug 20 '18 at 13:42
I start run command after that in the UI,I click the harvest button.Then if I start another harvest job again do I have to start run command? – Aug 22 '18 at 12:12
I run first and second command after that I click harvest button in UI then I can collect 4 datasets from another machine(this machine contains 5 datasets).Then I refresh the page but 4 datasets are same.Why do we run third run command? – Aug 22 '18 at 12:27
`run` should be executed automatically and regularly (say every 15 minutes), for example by cron. It checks which harvesters should be run (depending on their configured frequency and the time of their last run) and puts corresponding tasks on the gather queue. The `gather_consumer` takes these tasks, performs their gather stage and puts them on the fetch queue. The `fetch_consumer` takes the tasks from that queue and performs their fetch (and import) stages. Once this is done, the task is marked as `done` by the next invokation of `run`. – Florian Brucker Aug 23 '18 at 12:58

Ckan Harvester is not working

1 Answers1