Huggingface's use of a mixin keeps teasing me that this should be possible, but I can't find any clear documentation on exactly what the requirements are, or if the dependencies are just too much to make it worth it. The central module is literally thousands and thousands of lines, and I felt from studying it yesterday that I've learnt more about how to write beam search than I have about GenerationMixin. :-)
From reading the source I think the dependencies are self.config
then prepare_inputs_for_generation()
and _update_model_kwargs_for_generation()
; also implicitly forward()
. But I'm not sure that is everything. Nor what each should look like. And I think it may expect forward()
to return data in a specific format.
To make the discussion specific, and generally useful, how could Huggingface's beam search be used with minGPT, which has a forward()
function that returns logits,loss
. (It actually has its own generate()
function that does the equivalent of Huggingface's sample()
and greedy_search()
, but no beam search support.) Or nanoGPT if you prefer - they are identical in this area.