Ashwani Kumar @ash_at_tt X Profile

Ashwani Kumar

@ash_at_tt

Followers

377

Following

4K

Media

19

Statuses

63

Deep Learning Engineer | Entrepreneur | PhD

London, United Kingdom

Joined September 2009

Don't wanna be here? Send us removal request.

Ashwani Kumar

@ash_at_tt

17 days

Let me know what you think

1

0

6

Ashwani Kumar

@ash_at_tt

17 days

Covers: • Noising for discrete tokens/characters • Step-by-step implementation of baby diffusion GPT • Training using: Score Entropy Objective • Annotated training and inference code in PyTorch • Inference using parallel denoising (no autoregressive bottleneck)

1

15

Ashwani Kumar

@ash_at_tt

17 days

Run it in Google Colab: https://t.co/FQEz7U1dFA Link to GitHub repo + Jupyter Notebook:

github.com

From babyGPT to diffusion GPT: An annotated implementation of a character-level discrete diffusion model (adapted from Karpathy’s baby GPT). - ash80/diffusion-gpt

2

7

38

Ashwani Kumar

@ash_at_tt

17 days

I turned @karpathy's baby GPT into a character-level text diffusion model, using @aaron_lou et al.'s score entropy-based training objective.

17

53

977

Prima Mente

@PrimaMente

3 months

1/ Today we announce Pleiades, a series of epigenetic foundation models (90M→7B params) trained on 1.9T tokens of human methylation & genomic data. Pleiades accurately models epigenetics for genomic track prediction, generation & neurodegenerative disease detection from cfDNA,

10

42

145

Ashwani Kumar

@ash_at_tt

4 months

Video walkthrough:

0

1

Ashwani Kumar

@ash_at_tt

4 months

Defining a target for the Value head was also a bit confusing. It's simply: Values + Advantages where both Values and Advantages are from the old value head and old policy before the start of the mini-batch training.

1

0

Ashwani Kumar

@ash_at_tt

4 months

Advantages are whitened, where the advantages are first normalised via a Z-score normalisation but then shifted back to the mean

1

0

Ashwani Kumar

@ash_at_tt

4 months

The ratio of current and old (not SFT) policies in PPO's clip loss adds to the confusion. Clip loss is calculated during the mini-batch training step in PPO, where the old policy (π_θ_old) is the policy before we start our mini-batch training.

1

0

Ashwani Kumar

@ash_at_tt

4 months

The reward for PPO doesn't just come from the reward model (RM). It also includes a penalty term that penalizes the policy or model if it diverges too far from the SFT policy (π^SFT).

1

0

Ashwani Kumar

@ash_at_tt

4 months

GitHub Repo:

github.com

RLHF (Supervised fine-tuning, reward model, and PPO) step-by-step in 3 Jupyter notebooks - ash80/RLHF_in_notebooks

1

0

Ashwani Kumar

@ash_at_tt

4 months

I implemented Reinforcement Learning from Human Feedback (RLHF) from scratch in Python Notebooks and recorded the step-by-step process in a 3+ hour YouTube video. GitHub repo, surprising details I learned, and YouTube video: 👇

2

3

14

Ashwani Kumar

@ash_at_tt

4 months

The video is also available on YouTube

0

1

Ashwani Kumar

@ash_at_tt

4 months

The complete implementation in three Jupyter notebooks is available on GitHub:

1

0

Ashwani Kumar

@ash_at_tt

4 months

I recently implemented Reinforcement Learning from Human Feedback (RLHF) step-by-step, including Supervised Fine-Tuning (SFT), Reward Modeling, and Proximal Policy Optimization (PPO). 🧵

1

Rohan Paul

@rohanpaul_ai

2 years

BREAKING 🔥🤯 Google releases model with new Griffin architecture that outperforms transformers. Across multiple sizes, Griffin out performs the benchmark scores of transformers baseline in controlled tests in both the MMLU score across different parameter sizes as well as the

5

114

500

Ashwani Kumar

@ash_at_tt

2 years

Future work aims to extend this framework's capabilities, including building TIs for interaction with different resource types online and offline. Your feedback and contributions are welcome. The code repo is available at:

github.com

A GPT agent with a Text Interface tool. Contribute to ash80/backtracking_gpt development by creating an account on GitHub.

0

Ashwani Kumar

@ash_at_tt

2 years

The main limitations of this approach though are: it only works with GPT-4 and requires building text interfaces for interacting with different types of resources.

1

0

Ashwani Kumar

@ash_at_tt

2 years

In summary, key features of the framework are dynamic actions, the ability for backtracking, and a human-like information retrieval processes.

1

0

Ashwani Kumar

@ash_at_tt

2 years

The framework maintains a state, consisting of notes taken by the LLM agent and past actions that allows model to backtrack when it gets stuck on the current path.

1

0