Explore tweets tagged as #hyperparameters
@Peter_rock07
Peter
2 minutes
Gswarm all fam In crypto, verifying computations is straightforward with a pure function and output, utilizing zk proofs or replication. However, model training is more complex due to sequential gradient updates influenced by batch order, randomness, and hyperparameters. Full
0
0
0
@PythonPr
Python Programming
2 months
Mastering Machine Learning Algorithms and Their Key Hyperparameters
81
319
2K
@Farhad33279766
alone7šŸ§™ā€ā™‚ļø,šŸ§™ā€ā™‚ļøā™¦ļøšŸ¦™šŸ”„šŸš ⛓(Ƙ,G)🦭/acc.ink
55 minutes
AI training today is alchemy; tomorrow it must be chemistry. @nesaorg is developing "reaction vessels" for machine learning … verifiable, reproducible environments where training hyperparameters, data sequences, and architectural choices can be precisely replicated and
1
0
1
@PavloMolchanov
Pavlo Molchanov
29 days
šŸš€ ImprovingĀ NanoChatĀ byĀ +60%, simply by swapping data toĀ Nemotron open datasets. No changes to hyperparameters or training setup! We used: • Pretraining:Ā Nemotron-CLIMBmix (built from Nemotron-CC), replaces FineWeb-Edu • Mid-training:Ā Nemotron-Post-Training (replaces
15
37
350
@victor_explore
Victor
2 months
Ever wondered what actually makes deep learning models work better? Andrew Ng’s ā€œImproving Deep Neural Networksā€ course from DeepLearningAI is still one of the most practical free resources out there: > tuning hyperparameters like a pro > regularization tricks that really
1
7
26
@ready_tensor
Ready Tensor, Inc.
29 days
šŸŽÆ Why Experiment Tracking Matters in LLM Fine-Tuning When you fine-tune models, you test multiple hyperparameters — different learning rates, LoRA ranks, and configurations. Without proper tracking, things quickly turn chaotic: endless folders, cryptic filenames, and lost
0
1
5
@X_Ibyte
Uzeb Khan
1 month
So You Might Have Listened About Many Hyperparameter Tuning Libraries So Let's Understand One Of The Smart One "OPTUNA" So let's See >OPTUNA Works On Bayesian Optimization Algorithm "TPE" >So TPE Does Not Randomly Guess Hyperparameters From The Params Dict It Establishes A
5
3
30
@ShikaiQiu
Shikai Qiu
4 days
Ablations show our hyperparameters, tuned on the 190M base model, were near-optimally transferred to larger models (we estimate a 0.1% noise floor in the loss). 9/n
1
1
12
@ShikaiQiu
Shikai Qiu
4 days
Optimizers are only as good as our ability to predict good hyperparameters for them when used at scale. With robust hyperparameter transfer, we find Muon and Shampoo consistently beat AdamW by 1.4x and 1.3x in compute for training language models up to 1.4B parameters. 1/n
4
26
159
@ei_asamoah
ike
2 months
Implemented the UNET paper from scratch > trained on a breast cancer segmentation dataset > messed around with some hyperparameters Will be looking to add attention next
0
1
7
@egor_shulg
Egor Shulgin
2 months
Muon’s speed arguably comes from approximate orthogonalization: a few fast Newton–Schulz iterations instead of an expensive full SVD. But this makes the update inexact. So how should hyperparameters change as approximation quality varies?🧵 1/n
5
24
288
@freeCodeCamp
freeCodeCamp.org
5 days
Qwen3 is Alibaba Cloud’s latest generation of Qwen large language models. And in this course, you'll learn how to train the LLM from scratch. You'll learn about configuring the model, training hyperparameters, RoPE positional embeddings, self-attention code, and lots more.
5
139
1K
@alqamadotml
Alqama
3 months
Overfitting is the silent assasin of quant models. Backtests can sparkle with perfect curves, still the strategy collapses in live markets. The culprit is High variance, excessive hyperparameters, or pure data-snooping. What looks robust on paper may be fragile in reality 1/3
1
1
2
@ominousEureka
add later
2 months
[ml grind] - šŸ¤– cs336: lecture 3 hyperparameters and architecture
0
0
4
@acore_ai
ACORE AI
2 months
ACORE AI Subnet Hyperparameters Update [NETUID: 405] āš™ļøšŸ” We’ve completed a targeted hyperparameter tune on ACOREAI test subnet to improve stability, fairness, and validation performance. Key highlights: • Activity cutoff: 5000 • Adjustment alpha (norm): ~0.97 tuned for
23
18
42
@carlo_sferrazza
Carlo Sferrazza
11 days
A feature I find really helpful is video logging to wandb across all simulation backends. So much easier to sweep hyperparameters and browse through the results when you get to actually see the policy behavior on the screen. We also log ONNX files as we train, and our inference
1
2
24
@rosstaylor90
Ross Taylor
3 months
Supplementary information for the new DeepSeek R1 Nature paper is very interesting! Details on training data, hyperparameters, base model importance, and more.
10
153
921
@X_Ibyte
Uzeb Khan
2 months
>15 October >L21-25 Done >Started FineTuning Hyperparameters To Improving Neural Network >Like Early Stopping,Data Scaling,& Dropout Layers >HOML-10 Pages Done >Made A Multi-Classification Model For MNIST >TMRW Regularization
10
3
65
@Twiss_1
Travis Drey
1 month
Mira supports model provenance, every model release, training run, and checkpoint has an immutable on-chain record. Want to know which dataset, hyperparameters, or seed produced a result? Mira makes it auditable.
2
0
5