Kent Gruber @KentGruber tweet - Spent some time learning about @OpenAI evals tonight, and submitted a Vigenère decryption evaluation. https://t.co/Xh0CkFzhfe

Kent Gruber

@KentGruber

3 years

Spent some time learning about @OpenAI evals tonight, and submitted a Vigenère decryption evaluation. https://t.co/Xh0CkFzhfe

Replies

Kent Gruber

@KentGruber

3 years

Currently, it has an accuracy of ~0.09 with GPT-3.5-Turbo. It can answer the example from the Wikipedia page, but struggles with the few others I generated. Super curious if GPT-4 will handle this better. https://t.co/r1mpxQd0lc

Kent Gruber

@KentGruber

3 years

I found the JSONL format for the eval data to be a little cumbersome. Like, no VSCode syntax highlighting by default, for example. Each line can be very long, so they can be annoying to read or edit. So for fun, I decided to experiment with using HCL: https://t.co/oLaBXlKhkN