0

Huggingface's use of a mixin keeps teasing me that this should be possible, but I can't find any clear documentation on exactly what the requirements are, or if the dependencies are just too much to make it worth it. The central module is literally thousands and thousands of lines, and I felt from studying it yesterday that I've learnt more about how to write beam search than I have about GenerationMixin. :-)

From reading the source I think the dependencies are self.config then prepare_inputs_for_generation() and _update_model_kwargs_for_generation(); also implicitly forward(). But I'm not sure that is everything. Nor what each should look like. And I think it may expect forward() to return data in a specific format.

To make the discussion specific, and generally useful, how could Huggingface's beam search be used with minGPT, which has a forward() function that returns logits,loss. (It actually has its own generate() function that does the equivalent of Huggingface's sample() and greedy_search(), but no beam search support.) Or nanoGPT if you prefer - they are identical in this area.

Darren Cook
  • 27,837
  • 13
  • 117
  • 217
  • 1
    I am honestly not sure what an acceptable answer should look like. You can of course implement it, but it depends on what your underlying model looks like. From a software engineering perspective, I am also not sure if it is worth the time, since external parties that implement Mixins from transformers are AFAIK out of hf's scope (i.e. they may not keep it backward compatible and change their interface anytime). TLDR: I am not sure If this question is generally useful in the current state. I might still answer over the weekend, but I am not sure. – cronoik Mar 09 '23 at 20:43
  • @cronoik Thanks, that is partly what I'm trying to understand. It seems everyone's generate/beam search implementation is tied in closely with their transformer implementation... by making it a mixin, HF seem to have got closest to some independence. But as I said, maybe it is an illusion, maybe they are just teasing me :-) – Darren Cook Mar 10 '23 at 08:13

0 Answers0