Ashwani Kumar
@ash_at_tt
Followers
377
Following
4K
Media
19
Statuses
63
Deep Learning Engineer | Entrepreneur | PhD
London, United Kingdom
Joined September 2009
Covers: • Noising for discrete tokens/characters • Step-by-step implementation of baby diffusion GPT • Training using: Score Entropy Objective • Annotated training and inference code in PyTorch • Inference using parallel denoising (no autoregressive bottleneck)
1
1
15
Run it in Google Colab: https://t.co/FQEz7U1dFA Link to GitHub repo + Jupyter Notebook:
github.com
From babyGPT to diffusion GPT: An annotated implementation of a character-level discrete diffusion model (adapted from Karpathy’s baby GPT). - ash80/diffusion-gpt
2
7
38
I turned @karpathy's baby GPT into a character-level text diffusion model, using @aaron_lou et al.'s score entropy-based training objective.
17
53
977
1/ Today we announce Pleiades, a series of epigenetic foundation models (90M→7B params) trained on 1.9T tokens of human methylation & genomic data. Pleiades accurately models epigenetics for genomic track prediction, generation & neurodegenerative disease detection from cfDNA,
10
42
145
Defining a target for the Value head was also a bit confusing. It's simply: Values + Advantages where both Values and Advantages are from the old value head and old policy before the start of the mini-batch training.
1
0
0
Advantages are whitened, where the advantages are first normalised via a Z-score normalisation but then shifted back to the mean
1
0
0
The ratio of current and old (not SFT) policies in PPO's clip loss adds to the confusion. Clip loss is calculated during the mini-batch training step in PPO, where the old policy (π_θ_old) is the policy before we start our mini-batch training.
1
0
0
The reward for PPO doesn't just come from the reward model (RM). It also includes a penalty term that penalizes the policy or model if it diverges too far from the SFT policy (π^SFT).
1
0
0
I implemented Reinforcement Learning from Human Feedback (RLHF) from scratch in Python Notebooks and recorded the step-by-step process in a 3+ hour YouTube video. GitHub repo, surprising details I learned, and YouTube video: 👇
2
3
14
The complete implementation in three Jupyter notebooks is available on GitHub:
1
0
0
I recently implemented Reinforcement Learning from Human Feedback (RLHF) step-by-step, including Supervised Fine-Tuning (SFT), Reward Modeling, and Proximal Policy Optimization (PPO). 🧵
1
1
1
BREAKING 🔥🤯 Google releases model with new Griffin architecture that outperforms transformers. Across multiple sizes, Griffin out performs the benchmark scores of transformers baseline in controlled tests in both the MMLU score across different parameter sizes as well as the
5
114
500
Future work aims to extend this framework's capabilities, including building TIs for interaction with different resource types online and offline. Your feedback and contributions are welcome. The code repo is available at:
github.com
A GPT agent with a Text Interface tool. Contribute to ash80/backtracking_gpt development by creating an account on GitHub.
0
0
0
The main limitations of this approach though are: it only works with GPT-4 and requires building text interfaces for interacting with different types of resources.
1
0
0
In summary, key features of the framework are dynamic actions, the ability for backtracking, and a human-like information retrieval processes.
1
0
0
The framework maintains a state, consisting of notes taken by the LLM agent and past actions that allows model to backtrack when it gets stuck on the current path.
1
0
0