Producing robust and hallucination-free generative models is difficult, time-consuming, and expensive. But it doesn't have to be like that. The whole point is that everyone is trying to push the parameter's limits and boundaries. My goal is something opposite. In MedIT Solutions (the company I own) we are working on compact-sized models that are still good enough and cheap. They are dedicated to the healthcare domain, and I need to be 💯 sure that my micro models are well-prepared to do any task they are invented for by evaluating their outputs on some specific and robust data with commonly acceptable techniques.
When evaluating generative models' performance on various tasks, I use LLM harness evaluation from EleutherAI. I was curious whether I can use the model to score its own outputs to improve its reliability and coherence. I figured out that we can incorporate a self-reward head, which would be sufficient for this task. Now, I share my code with everyone interested in leveraging this capability. You can visit my repo: https://github.com/mkurman/self_reward_head_pytorch and bring it to your model. It doesn't have to be a generative one. You can use the self-reward head in every PyTorch model.
How does it work?
A few years ago, I was exploring the possibilities of GANs in producing synthetic data. I was impressed by how easily we combined the actor-critic scenario to create fake data. The main goal of GAN architecture is to teach the actor how to cheat the critic that the produced output is actual, not fake. When I started my journey with LLMs, I thought we could use the same scenario there. That is why my self-reward head is comparable.
It combines the produced output from the language head with the “gold standard”. In the example given in my repo, I set my model’s output scores as equal to zero (I assume that they won’t be ideal, and the model has to learn to produce only “gold” outputs so that the self-rewarding head believes they are authentic, not generated).
In the given example, I use the argmax of the generated output to select only the tokens with the highest probabilities. Then, I use the model embedding layer to replace those tokens (or the gold ones) with vectors. After that, I use the “summator” trick I’ve discovered to sum all vectors into the last token in sequence to contain all the information from the previous tokens from the same sequence. Finally, I only take the last tokens from the batch as the self-reward head input. After all, I calculate and add the binary cross-entropy loss score to the causal cross-entropy loss.
I hope this explanation is understood well and you will benefit from using the self-reward head in your projects. I encourage you to visit my LinkedIn profile (https://linkedin.com/in/mariuszkurman), contact me, and subscribe to my newsletter for more valuable content.