Template for RLHF with the TRL library

Asked Jan 21 '23 at 09:06

Active Jan 21 '23 at 09:06

Viewed 177 times

I'm trying to implement a very very basic working template for RLHF with TRL. The notebook is here:

https://www.kaggle.com/code/mcantoni81/rlhf-with-trl-gpt2

My target here is to make gpt2 answer "i'm the mailman", but maybe i'm not getting right the mechanics of TRL. Looks like the training doesn't influence the model at all.

How can i correct this template?

I've expected the queries of the model to somehow change.

asked Jan 21 '23 at 09:06

Michele Cantoni

Template for RLHF with the TRL library

0 Answers0