Explore tweets tagged as #hyperparameters
Gswarm all fam In crypto, verifying computations is straightforward with a pure function and output, utilizing zk proofs or replication. However, model training is more complex due to sequential gradient updates influenced by batch order, randomness, and hyperparameters. Full
0
0
0
Mastering Machine Learning Algorithms and Their Key Hyperparameters
81
319
2K
AI training today is alchemy; tomorrow it must be chemistry. @nesaorg is developing "reaction vessels" for machine learning ⦠verifiable, reproducible environments where training hyperparameters, data sequences, and architectural choices can be precisely replicated and
1
0
1
š ImprovingĀ NanoChatĀ byĀ +60%, simply by swapping data toĀ Nemotron open datasets. No changes to hyperparameters or training setup! We used: ā¢Ā Pretraining:Ā Nemotron-CLIMBmix (built from Nemotron-CC), replaces FineWeb-Edu ā¢Ā Mid-training:Ā Nemotron-Post-Training (replaces
15
37
350
Ever wondered what actually makes deep learning models work better? Andrew Ngās āImproving Deep Neural Networksā course from DeepLearningAI is still one of the most practical free resources out there: > tuning hyperparameters like a pro > regularization tricks that really
1
7
26
šÆ Why Experiment Tracking Matters in LLM Fine-Tuning When you fine-tune models, you test multiple hyperparameters ā different learning rates, LoRA ranks, and configurations. Without proper tracking, things quickly turn chaotic: endless folders, cryptic filenames, and lost
0
1
5
So You Might Have Listened About Many Hyperparameter Tuning Libraries So Let's Understand One Of The Smart One "OPTUNA" So let's See >OPTUNA Works On Bayesian Optimization Algorithm "TPE" >So TPE Does Not Randomly Guess Hyperparameters From The Params Dict It Establishes A
5
3
30
Ablations show our hyperparameters, tuned on the 190M base model, were near-optimally transferred to larger models (we estimate a 0.1% noise floor in the loss). 9/n
1
1
12
Optimizers are only as good as our ability to predict good hyperparameters for them when used at scale. With robust hyperparameter transfer, we find Muon and Shampoo consistently beat AdamW by 1.4x and 1.3x in compute for training language models up to 1.4B parameters. 1/n
4
26
159
Implemented the UNET paper from scratch > trained on a breast cancer segmentation dataset > messed around with some hyperparameters Will be looking to add attention next
0
1
7
Muonās speed arguably comes from approximate orthogonalization: a few fast NewtonāSchulz iterations instead of an expensive full SVD. But this makes the update inexact. So how should hyperparameters change as approximation quality varies?š§µ 1/n
5
24
288
Qwen3 is Alibaba Cloudās latest generation of Qwen large language models. And in this course, you'll learn how to train the LLM from scratch. You'll learn about configuring the model, training hyperparameters, RoPE positional embeddings, self-attention code, and lots more.
5
139
1K
Overfitting is the silent assasin of quant models. Backtests can sparkle with perfect curves, still the strategy collapses in live markets. The culprit is High variance, excessive hyperparameters, or pure data-snooping. What looks robust on paper may be fragile in reality 1/3
1
1
2
[ml grind] - š¤ cs336: lecture 3 hyperparameters and architecture
0
0
4
ACORE AI Subnet Hyperparameters Update [NETUID: 405] āļøš Weāve completed a targeted hyperparameter tune on ACOREAI test subnet to improve stability, fairness, and validation performance. Key highlights: ⢠Activity cutoff: 5000 ⢠Adjustment alpha (norm): ~0.97 tuned for
23
18
42
A feature I find really helpful is video logging to wandb across all simulation backends. So much easier to sweep hyperparameters and browse through the results when you get to actually see the policy behavior on the screen. We also log ONNX files as we train, and our inference
1
2
24
Supplementary information for the new DeepSeek R1 Nature paper is very interesting! Details on training data, hyperparameters, base model importance, and more.
10
153
921
>15 October >L21-25 Done >Started FineTuning Hyperparameters To Improving Neural Network >Like Early Stopping,Data Scaling,& Dropout Layers >HOML-10 Pages Done >Made A Multi-Classification Model For MNIST >TMRW Regularization
10
3
65
Mira supports model provenance, every model release, training run, and checkpoint has an immutable on-chain record. Want to know which dataset, hyperparameters, or seed produced a result? Mira makes it auditable.
2
0
5