We've been using ckanext-dcat to harvest from remote json sources, sometimes some harvest jobs didn't finish and had to be deleted along with all the datasets from that source, which is not very convinient but then all goes back to normal, I don't know if there is a way to delete just a single job.
But now I get this in gather consumer log:
Traceback (most recent call last):
File "/usr/lib/ckan/default/bin/paster", line 9, in <module>
load_entry_point('PasteScript==1.7.5', 'console_scripts', 'paster')()
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 104, in run
invoke(command, command_name, options, args[1:])
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 143, in invoke
exit_code = runner.run(args)
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 238, in run
result = self.command()
File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 129, in command
gather_callback(consumer, method, header, body)
File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/queue.py", line 219, in gather_callback
harvest_object_ids = harvester.gather_stage(job)
File "/usr/lib/ckan/default/src/ckanext-dcat/ckanext/dcat/harvesters.py", line 186, in gather_stage
content = self._get_content(url, harvest_job, page)
File "/usr/lib/ckan/default/src/ckanext-dcat/ckanext/dcat/harvesters.py", line 66, in _get_content
cl = r.headers['content-length']
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/requests/structures.py", line 54, in __getitem__
return self._store[key.lower()][1]
KeyError: 'content-length
The job finishes but no datasets get created, if I delete the job and reharvest it keeps running but never ends and other harvest jobs don't update either.
How can I fix this?