I read this documen to create a harvester. https://github.com/ckan/ckanext-harvest.I can reach http://localhost/harvest.After that, I created a harvest source.But what will I do right now?What I want to do is to collect some datasets from another ckan instances.Do i have to implement harvesting interface
Asked
Active
Viewed 1,293 times
1 Answers
1
To harvest from another CKAN instance you can use the ckan_harvester
plugin provided with ckanext-harvest. You only need to implement the IHarvester
interface if you want to harvest from a different data source for which a harvester isn't available (for example a proprietary database format).
To enable the ckan_harvester
plugin, add it to the list of plugins
in your CKAN INI file and restart CKAN. You then need to create and configure a new harvester in the CKAN web UI at http://your-ckan-instance/harvest. Finally, make sure to actually run the configured harvesters using the command line tools (or cron).
Refer to the documentation for details.

Florian Brucker
- 9,621
- 3
- 48
- 81
-
Do I have to run 3 command line commands to start harvest job from UI? First one is paster --plugin=ckanext-harvest harvester gather_consumer --config=/etc/ckan/default/production.ini.Second one is paster --plugin=ckanext-harvest harvester fetch_consumer --config=/etc/ckan/default/production.ini.Third one is paster --plugin=ckanext-harvest harvester run --config=/etc/ckan/default/production.ini – Aug 17 '18 at 07:18
-
Yes. The `gather_consumer` and `fetch_consumer` need to run continuously (e.g. in separate terminal windows), and you need to execute `run` once to start the harvesting (once you have configured and scheduled a harvester in the UI). – Florian Brucker Aug 20 '18 at 13:42
-
I start run command after that in the UI,I click the harvest button.Then if I start another harvest job again do I have to start run command? – Aug 22 '18 at 12:12
-
I run first and second command after that I click harvest button in UI then I can collect 4 datasets from another machine(this machine contains 5 datasets).Then I refresh the page but 4 datasets are same.Why do we run third run command? – Aug 22 '18 at 12:27
-
`run` should be executed automatically and regularly (say every 15 minutes), for example by cron. It checks which harvesters should be run (depending on their configured frequency and the time of their last run) and puts corresponding tasks on the gather queue. The `gather_consumer` takes these tasks, performs their gather stage and puts them on the fetch queue. The `fetch_consumer` takes the tasks from that queue and performs their fetch (and import) stages. Once this is done, the task is marked as `done` by the next invokation of `run`. – Florian Brucker Aug 23 '18 at 12:58