0

I trained a multitask model using allennlp 2.0 and now want to predict on new examples using the allennlp predict command.

Problem/Error: I am using the following command: allennlp predict results/model.tar.gz new_instances.jsonl --include-package mtl_sd --predictor mtlsd_predictor --use-dataset-reader --dataset-reader-choice validation

This gives me the following error:

Traceback (most recent call last):
File ".../mtl_sd_venv/bin/allennlp", line 10, in <module>
sys.exit(run())
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/__main__.py", line 34, in run
main(prog="allennlp")
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 119, in main
args.func(args)
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/predict.py", line 220, in _predict
manager.run()
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/predict.py", line 186, in run
for batch in lazy_groups_of(self._get_instance_data(), self._batch_size):
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/common/util.py", line 139, in lazy_groups_of
s = list(islice(iterator, group_size))
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/predict.py", line 180, in _get_instance_data
yield from self._dataset_reader.read(self._input_file)
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/data/dataset_readers/multitask.py", line 31, in read
raise RuntimeError("This class is not designed to be called like this")
RuntimeError: This class is not designed to be called like this

As far as I understand, that's what's going on:

This RuntimeError is raised by the MultiTaskDatasetReader because the read()-method of the MultiTaskDatasetReader should not be called itself. The read()-method should only be called for specific DatasetReaders in MultiTaskDatasetReader.readers.

The read()-method of the MultiTaskDatasetReader is called because in the jsonnet-config I have specified the DatasetsReaders as follows:

"dataset_reader": {
    "type": "multitask",
    "readers": {
        "SemEval2016": {
            "type": "SemEval2016",
            "max_sequence_length": 509,
            "token_indexers": {
                "bert": {
                    "type": "pretrained_transformer",
                    "model_name": "bert-base-cased"
                }
            },
            "tokenizer": {
                "type": "pretrained_transformer",
                "model_name": "bert-base-cased"
            }
        }, ...
    }
}

Usually the type of dataset_reader indicates the dataset-reader class to be instanciated for prediction. But in this case the type just points MultiTaskDatasetReader, which has no read()-method implemented and contains multiple DatasetReaders.

As far as I understand, when using allennlp predict I need to specify somehow which of the multiple DatasetReaders should be used.

The questions is:

How can I specify which specific DatasetReader (of the multiple DatasetReaders in MultiTaskDatasetReader.readers) should be used when executing allennlp predict? Or more generally: How can I get allennlp predict to run with a MultiTaskDatasetReader?

Additional code, for the sake of completeness: The predictor:

@Predictor.register('mtlsd_predictor')
class MTLSDPredictor(Predictor):

    def predict(self, sentence: str) -> JsonDict:
        return self.predict_json({'sentence': sentence})

    @overrides
    def _json_to_instance(self, json_dict: JsonDict) -> Instance:
        target = json_dict['text1']
        claim = json_dict['text2']
        return self._dataset_reader.text_to_instance(target, claim)
sinaj
  • 129
  • 1
  • 1
  • 10
  • That seems like a serious issue. I put a GitHub issue at https://github.com/allenai/allennlp/issues/4973 and will look into this. – Dirk Groeneveld Feb 12 '21 at 00:55
  • Thanks, appreciated. – sinaj Feb 12 '21 at 22:01
  • I put a fix here: https://github.com/allenai/allennlp/pull/4991 I'm not sure I have covered all the use cases though. Can you try it with that and let me know if that works? – Dirk Groeneveld Feb 18 '21 at 01:08
  • Thank you and sorry for the slow answer. I still get a NotImplementedError by _json_to_instance() at https://github.com/allenai/allennlp/blob/8fbc9728e99a85483b8a061b246c6547bab15e40/allennlp/predictors/predictor.py#L281. Tracing the error back: – sinaj Feb 25 '21 at 17:28
  • Since I have no default predictor specified for each model-head (https://github.com/allenai/allennlp/blob/8fbc9728e99a85483b8a061b246c6547bab15e40/allennlp/predictors/multitask.py#L51) I see that a "normal" Predictor is instanciated for each head at https://github.com/allenai/allennlp/blob/8fbc9728e99a85483b8a061b246c6547bab15e40/allennlp/predictors/multitask.py#L53. And then where the error occurs the _json_to_instance() is called - although it is not implemented for a "normal" predictor. – sinaj Feb 25 '21 at 17:29
  • This raises the question for me: Can/Should I specify a default predictor for each model-head? If yes, how can I do that in the jsonnet-file? – sinaj Feb 25 '21 at 17:29
  • The error message (also split in parts due to the char-limit): File ".../mtl_sd_venv/bin/allennlp", line 8, in sys.exit(run()) File ".../mtl_sd_venv/lib/python3.8/site-packages/allennlp/__main__.py", line 34, in run main(prog="allennlp") File ".../mtl_sd_venv/lib/python3.8/site-packages/allennlp/commands/__init__.py", line 119, in main args.func(args) File ".../mtl_sd_venv/lib/python3.8/site-packages/allennlp/commands/predict.py", line 239, in _predict manager.run() – sinaj Feb 25 '21 at 17:35
  • File ".../mtl_sd_venv/lib/python3.8/site-packages/allennlp/commands/predict.py", line 211, in run for model_input_json, result in zip(batch_json, self._predict_json(batch_json)): File ".../mtl_sd_venv/lib/python3.8/site-packages/allennlp/commands/predict.py", line 157, in _predict_json results = [self._predictor.predict_json(batch_data[0])] File ".../mtl_sd_venv/lib/python3.8/site-packages/allennlp/predictors/predictor.py", line 54, in predict_json instance = self._json_to_instance(inputs) – sinaj Feb 25 '21 at 17:38
  • File ".../mtl_sd_venv/lib/python3.8/site-packages/allennlp/predictors/multitask.py", line 74, in _json_to_instance instance = predictor._json_to_instance(json_dict) File ".../mtl_sd_venv/lib/python3.8/site-packages/allennlp/predictors/predictor.py", line 287, in _json_to_instance raise NotImplementedError NotImplementedError – sinaj Feb 25 '21 at 17:39
  • Nevermind! I just had to set `default_predictor` as a class-attribute of the model-heads to `"mtlsd_predictor"`. Now it all works! Thank you! – sinaj Feb 26 '21 at 11:23
  • Sweet! I will try to capture some of this in an official answer. – Dirk Groeneveld Feb 26 '21 at 21:02

1 Answers1

0

There are two issues here. One is a bug in AllenNLP that is fixed in version 2.1.0. The other one is that @sinaj was missing the default_predictor in his model head.

Dirk Groeneveld
  • 2,547
  • 2
  • 22
  • 23