Frequently Asked Questions
General Questions
What is Agent-Tunix?
Agent-Tunix is a framework for training language models using GRPO (Group Relative Policy Optimization) with parameter-efficient fine-tuning via LoRA. It’s designed for reinforcement learning from reward feedback.
What models does Agent-Tunix support?
Currently supports Google Gemma3 family:
Gemma3 270M (lightweight)
Gemma3 1B (standard)
Gemma3 4B (large)
Can be extended to other models by adding configuration files.
Do I need multiple GPUs to use Agent-Tunix?
No. The framework works on a single GPU. For smaller models (270M), even 11GB GPUs (like RTX 2080 Ti) can work. Multi-GPU support is optional for faster training.
Can I run on CPU?
Not recommended. The framework is optimized for GPUs. You can set device=cpu but training will be very slow.
What’s the difference between GRPO and other RL methods?
GRPO (Group Relative Policy Optimization):
Generates K responses per prompt
Normalizes rewards relative to the group
More sample-efficient than standard PPO
Designed for discrete sequence generation
Do I need to understand GRPO to use Agent-Tunix?
No. You can use the framework with default settings without understanding GRPO details. The configuration handles most complexity.
Setup and Installation
I’m getting CUDA errors. How do I fix this?
First, verify CUDA is installed:
python -m agent_tunix.utils check-gpu
If not installed, follow the Installation guide.
How much VRAM do I need?
Minimum requirements by model:
Gemma3 270M: 11GB (batch size 1)
Gemma3 1B: 24GB (batch size 2)
Gemma3 4B: 48GB (batch size 4)
See Configuration Guide for memory optimization tips.
How do I install with a different CUDA version?
See the Installation guide’s CUDA section for instructions.
Can I use WSL2 on Windows?
Yes. Install CUDA in WSL2 and follow normal installation steps. GPU passthrough works on WSL2.
Configuration
How do I override configuration values?
Use dot notation on command line:
python run_training.py optimizer.learning_rate=1e-5
python run_training.py model=gemma3_1b training.micro_batch_size=2
See Configuration Guide for more examples.
What’s the difference between parameters and overrides?
Parameters: Settings defined in YAML config files
Overrides: Command-line changes to parameters
Example:
# Parameter in conf/optimizer/adamw.yaml
learning_rate: 3e-6
# Override from command line
python run_training.py optimizer.learning_rate=1e-5
Can I use multiple experiments?
Yes. Use the +experiment= syntax:
python run_training.py +experiment=quick_test
Create custom experiments in conf/experiment/.
What’s the difference between +experiment and –multirun?
+experiment: Load preset configuration combining multiple settings
–multirun: Run multiple training jobs with different parameter combinations
Example:
# Single run with preset
python run_training.py +experiment=quick_test
# Multiple runs with parameter sweep
python run_training.py --multirun optimizer.learning_rate=1e-6,1e-5,1e-4
Where are configuration files located?
In conf/ directory structure:
conf/
├── config.yaml # Main config
├── model/ # Model configs
├── optimizer/ # Optimizer configs
├── scheduler/ # Scheduler configs
├── grpo/ # GRPO algorithm configs
├── generation/ # Generation configs
├── training/ # Training configs
├── evaluation/ # Evaluation configs
└── experiment/ # Experiment presets
Training
How long does training take?
Depends on configuration:
Quick test: ~5 minutes (10 steps)
Full training: 1-24 hours (depends on GPU and data size)
Check logs for training speed:
tail -f outputs/tunix-grpo/YYYY-MM-DD/HH-MM-SS/train.log
How often should I save checkpoints?
Default is every 100 steps. Adjust:
python run_training.py training.save_interval_steps=50
More frequent saves → more disk usage but better checkpoint coverage.
Can I resume training from a checkpoint?
Yes. Provide checkpoint directory:
python run_training.py checkpoint_dir=./checkpoints/ckpts/
Automatically loads latest checkpoint.
How do I know if my learning rate is good?
Monitor training loss:
Decreasing smoothly: Good learning rate
Increasing (diverging): Learning rate too high
Decreasing very slowly: Learning rate too low
Noisy: Consider gradient clipping
Use Weights & Biases or TensorBoard to visualize:
make tensorboard
# Open http://localhost:6006
What’s a good batch size?
Depends on GPU memory:
11GB GPU: 1
24GB GPU: 2-4
48GB GPU: 4-8
80GB+ GPU: 8-16
Larger batches = more stable gradients but slower per-step updates.
Should I use more generations per prompt?
More generations = better reward signal but slower training.
Default is 4. Try 2-8:
2: Fast training, sparse reward signal
4: Good balance (default)
8: Better training but 2x slower
How do I debug training issues?
Enable verbose logging:
python run_training.py training.log_level=DEBUG
Check logs:
tail -f outputs/tunix-grpo/YYYY-MM-DD/HH-MM-SS/train.log
Can I use my own data?
Yes. Create a custom data loading function in src/agent_tunix/data.py.
See Data API for guidelines.
How do custom rewards work?
Reward functions evaluate responses and return scores (0.0-1.0):
1.0 = perfect response
0.5 = partial credit
0.0 = incorrect
See Custom Reward Functions for examples.
Can I use multiple GPUs?
Yes. Configure mesh shape:
python run_training.py model.mesh_shape=[[4,1],["fsdp","tp"]]
See Distributed Training.
Evaluation
How do I evaluate my trained model?
python evaluate.py
Uses latest checkpoint by default.
Can I evaluate on a specific checkpoint?
Yes:
python evaluate.py step=1000
List available checkpoints:
ls checkpoints/ckpts/actor/
What metrics are computed?
Accuracy: Exact match percentage
Partial Accuracy: Within 10% of correct answer
Format Accuracy: Response matches expected format
How do different inference strategies compare?
Three strategies available:
Greedy: Deterministic, fastest, reproducible
Standard: Balanced sampling, moderate diversity
Liberal: High diversity, creative outputs
Try all three:
for config in greedy standard liberal; do
python evaluate.py inference_config=$config
done
Can I get confidence estimates?
Yes, use multiple passes:
python evaluate.py num_passes=5
Runs 5 generations per question and compares consistency.
What if evaluation takes too long?
Solutions:
Use greedy inference (fastest):
python evaluate.py inference_config=greedy
Evaluate fewer samples
Use earlier checkpoint:
python evaluate.py step=100
Hyperparameter Tuning
How do I find good hyperparameters?
Use the workflow in Hyperparameter Tuning:
Quick learning rate search (1 hour)
Batch size tuning (2 hours)
Model size testing (varies)
Final training with best parameters
Should I tune one parameter at a time?
Yes, generally. Tune in this order:
Learning rate
Batch size
LoRA rank
Number of generations
KL beta
Tuning one at a time is easier to interpret.
What’s a good starting learning rate?
Default is 3e-6. Good range to try:
Too high: 1e-4 (likely diverges)
Good: 1e-6 to 1e-5
Too low: 1e-7 (very slow)
Start with 1e-5 and adjust based on training loss.
When should I use warmup?
Almost always for stable training. Default warmup_ratio=0.1 (10% of training).
Reduces warmup for very short training:
python run_training.py +experiment=quick_test optimizer.warmup_ratio=0.0
Advanced Topics
Can I use other models besides Gemma3?
Yes. Create configuration in conf/model/. You’ll need to:
Configure model architecture
Set LoRA module paths
Update tokenizer path
Can I fine-tune a model I already trained?
Yes. Resume from checkpoint:
python run_training.py checkpoint_dir=./checkpoints/ckpts/
How do I implement distributed training?
See Distributed Training.
Requires setting up NCCL and configuring mesh shape.
Can I use quantization to reduce memory?
Not currently. LoRA already reduces parameters significantly.
Could be added as feature.
How do I profile training?
Enable profiling:
python run_training.py training.profile=true
Check output directory for profiling results.
Can I use Weights & Biases?
Yes, enabled by default. Disable:
python run_training.py wandb_disabled=true
Set up credentials:
wandb login
Can I export the model for inference?
Yes. Save LoRA weights from checkpoint.
See evaluation guide for inference options.
Performance and Optimization
How do I speed up training?
Use larger batch size (if memory allows)
Reduce sequence length
Use fewer generations per prompt
Reduce gradient accumulation steps
How do I reduce memory usage?
See Training Guide Memory Optimization section:
Reduce batch size
Use smaller model
Reduce LoRA rank
Shorten sequences
Reduce generations
Can I use mixed precision (fp16, bf16)?
Not currently enabled by default. Could be added.
Should I save every checkpoint?
Only save important ones to save disk space:
python run_training.py training.save_interval_steps=1000
Or use checkpointing (saves every N steps to limited slots).
Troubleshooting
Training stops with NaN loss
See Troubleshooting NaN Loss section.
Usually caused by:
Learning rate too high
Gradient overflow
Bad data example
Training is very slow
Check GPU utilization:
watch -n 1 nvidia-smi
Solutions in Troubleshooting Slow Training section.
I can’t find my checkpoint
Find all checkpoints:
find . -path "*/checkpoints/*" -name "*.pt"
Or use specific path:
python evaluate.py checkpoint_dir=/full/path/to/checkpoints/
My model accuracy is poor
See Troubleshooting Model Not Improving section.
Usually need:
More training data
Better reward function
Longer training
Tuned hyperparameters
I’m getting out of memory errors
See Memory requirements section and follow incremental reduction guide in troubleshooting.
Still Can’t Find Answer?
Check:
Troubleshooting - Detailed troubleshooting guide
Training Guide - Training guide
Configuration Guide - Configuration reference
API reference docs for specific modules
Next Steps
Training Guide - Training guide
Troubleshooting - Troubleshooting guide
Configuration Guide - Configuration reference