Frequently Asked Questions

General Questions

What is Agent-Tunix?

Agent-Tunix is a framework for training language models using GRPO (Group Relative Policy Optimization) with parameter-efficient fine-tuning via LoRA. It’s designed for reinforcement learning from reward feedback.

What models does Agent-Tunix support?

Currently supports Google Gemma3 family:

Gemma3 270M (lightweight)
Gemma3 1B (standard)
Gemma3 4B (large)

Can be extended to other models by adding configuration files.

Do I need multiple GPUs to use Agent-Tunix?

No. The framework works on a single GPU. For smaller models (270M), even 11GB GPUs (like RTX 2080 Ti) can work. Multi-GPU support is optional for faster training.

Can I run on CPU?

Not recommended. The framework is optimized for GPUs. You can set device=cpu but training will be very slow.

What’s the difference between GRPO and other RL methods?

GRPO (Group Relative Policy Optimization):

Generates K responses per prompt
Normalizes rewards relative to the group
More sample-efficient than standard PPO
Designed for discrete sequence generation

Do I need to understand GRPO to use Agent-Tunix?

No. You can use the framework with default settings without understanding GRPO details. The configuration handles most complexity.

Setup and Installation

I’m getting CUDA errors. How do I fix this?

First, verify CUDA is installed:

python -m agent_tunix.utils check-gpu

If not installed, follow the Installation guide.

How much VRAM do I need?

Minimum requirements by model:

Gemma3 270M: 11GB (batch size 1)
Gemma3 1B: 24GB (batch size 2)
Gemma3 4B: 48GB (batch size 4)

See Configuration Guide for memory optimization tips.

How do I install with a different CUDA version?

See the Installation guide’s CUDA section for instructions.

Can I use WSL2 on Windows?

Yes. Install CUDA in WSL2 and follow normal installation steps. GPU passthrough works on WSL2.

Configuration

How do I override configuration values?

Use dot notation on command line:

python run_training.py optimizer.learning_rate=1e-5
python run_training.py model=gemma3_1b training.micro_batch_size=2

See Configuration Guide for more examples.

What’s the difference between parameters and overrides?

Parameters: Settings defined in YAML config files
Overrides: Command-line changes to parameters

Example:

# Parameter in conf/optimizer/adamw.yaml
learning_rate: 3e-6

# Override from command line
python run_training.py optimizer.learning_rate=1e-5

Can I use multiple experiments?

Yes. Use the +experiment= syntax:

python run_training.py +experiment=quick_test

Create custom experiments in conf/experiment/.

What’s the difference between +experiment and –multirun?

+experiment: Load preset configuration combining multiple settings
–multirun: Run multiple training jobs with different parameter combinations

Example:

# Single run with preset
python run_training.py +experiment=quick_test

# Multiple runs with parameter sweep
python run_training.py --multirun optimizer.learning_rate=1e-6,1e-5,1e-4

Where are configuration files located?

In conf/ directory structure:

conf/
├── config.yaml              # Main config
├── model/                   # Model configs
├── optimizer/               # Optimizer configs
├── scheduler/               # Scheduler configs
├── grpo/                    # GRPO algorithm configs
├── generation/              # Generation configs
├── training/                # Training configs
├── evaluation/              # Evaluation configs
└── experiment/              # Experiment presets

Training

How long does training take?

Depends on configuration:

Quick test: ~5 minutes (10 steps)
Full training: 1-24 hours (depends on GPU and data size)

Check logs for training speed:

tail -f outputs/tunix-grpo/YYYY-MM-DD/HH-MM-SS/train.log

How often should I save checkpoints?

Default is every 100 steps. Adjust:

python run_training.py training.save_interval_steps=50

More frequent saves → more disk usage but better checkpoint coverage.

Can I resume training from a checkpoint?

Yes. Provide checkpoint directory:

python run_training.py checkpoint_dir=./checkpoints/ckpts/

Automatically loads latest checkpoint.

How do I know if my learning rate is good?

Monitor training loss:

Decreasing smoothly: Good learning rate
Increasing (diverging): Learning rate too high
Decreasing very slowly: Learning rate too low
Noisy: Consider gradient clipping

Use Weights & Biases or TensorBoard to visualize:

make tensorboard
# Open http://localhost:6006

What’s a good batch size?

Depends on GPU memory:

11GB GPU: 1
24GB GPU: 2-4
48GB GPU: 4-8
80GB+ GPU: 8-16

Larger batches = more stable gradients but slower per-step updates.

Should I use more generations per prompt?

More generations = better reward signal but slower training.

Default is 4. Try 2-8:

2: Fast training, sparse reward signal
4: Good balance (default)
8: Better training but 2x slower

How do I debug training issues?

Enable verbose logging:

python run_training.py training.log_level=DEBUG

Check logs:

tail -f outputs/tunix-grpo/YYYY-MM-DD/HH-MM-SS/train.log

Can I use my own data?

Yes. Create a custom data loading function in src/agent_tunix/data.py.

See Data API for guidelines.

How do custom rewards work?

Reward functions evaluate responses and return scores (0.0-1.0):

1.0 = perfect response
0.5 = partial credit
0.0 = incorrect

See Custom Reward Functions for examples.

Can I use multiple GPUs?

Yes. Configure mesh shape:

python run_training.py model.mesh_shape=[[4,1],["fsdp","tp"]]

See Distributed Training.

Evaluation

How do I evaluate my trained model?

python evaluate.py

Uses latest checkpoint by default.

Can I evaluate on a specific checkpoint?

Yes:

python evaluate.py step=1000

List available checkpoints:

ls checkpoints/ckpts/actor/

What metrics are computed?

Accuracy: Exact match percentage
Partial Accuracy: Within 10% of correct answer
Format Accuracy: Response matches expected format

How do different inference strategies compare?

Three strategies available:

Greedy: Deterministic, fastest, reproducible
Standard: Balanced sampling, moderate diversity
Liberal: High diversity, creative outputs

Try all three:

for config in greedy standard liberal; do
    python evaluate.py inference_config=$config
done

Can I get confidence estimates?

Yes, use multiple passes:

python evaluate.py num_passes=5

Runs 5 generations per question and compares consistency.

What if evaluation takes too long?

Solutions:

Use greedy inference (fastest):

python evaluate.py inference_config=greedy

Evaluate fewer samples
Use earlier checkpoint:
```
python evaluate.py step=100
```

Hyperparameter Tuning

How do I find good hyperparameters?

Use the workflow in Hyperparameter Tuning:

Quick learning rate search (1 hour)
Batch size tuning (2 hours)
Model size testing (varies)
Final training with best parameters

Should I tune one parameter at a time?

Yes, generally. Tune in this order:

Learning rate
Batch size
LoRA rank
Number of generations
KL beta

Tuning one at a time is easier to interpret.

What’s a good starting learning rate?

Default is 3e-6. Good range to try:

Too high: 1e-4 (likely diverges)
Good: 1e-6 to 1e-5
Too low: 1e-7 (very slow)

Start with 1e-5 and adjust based on training loss.

When should I use warmup?

Almost always for stable training. Default warmup_ratio=0.1 (10% of training).

Reduces warmup for very short training:

python run_training.py +experiment=quick_test optimizer.warmup_ratio=0.0

Advanced Topics

Can I use other models besides Gemma3?

Yes. Create configuration in conf/model/. You’ll need to:

Configure model architecture
Set LoRA module paths
Update tokenizer path

Can I fine-tune a model I already trained?

Yes. Resume from checkpoint:

python run_training.py checkpoint_dir=./checkpoints/ckpts/

How do I implement distributed training?

See Distributed Training.

Requires setting up NCCL and configuring mesh shape.

Can I use quantization to reduce memory?

Not currently. LoRA already reduces parameters significantly.

Could be added as feature.

How do I profile training?

Enable profiling:

python run_training.py training.profile=true

Check output directory for profiling results.

Can I use Weights & Biases?

Yes, enabled by default. Disable:

python run_training.py wandb_disabled=true

Set up credentials:

wandb login

Can I export the model for inference?

Yes. Save LoRA weights from checkpoint.

See evaluation guide for inference options.

Performance and Optimization

How do I speed up training?

Use larger batch size (if memory allows)
Reduce sequence length
Use fewer generations per prompt
Reduce gradient accumulation steps

How do I reduce memory usage?

See Training Guide Memory Optimization section:

Reduce batch size
Use smaller model
Reduce LoRA rank
Shorten sequences
Reduce generations

Can I use mixed precision (fp16, bf16)?

Not currently enabled by default. Could be added.

Should I save every checkpoint?

Only save important ones to save disk space:

python run_training.py training.save_interval_steps=1000

Or use checkpointing (saves every N steps to limited slots).

Troubleshooting

Training stops with NaN loss

See Troubleshooting NaN Loss section.

Usually caused by:

Learning rate too high
Gradient overflow
Bad data example

Training is very slow

Check GPU utilization:

watch -n 1 nvidia-smi

Solutions in Troubleshooting Slow Training section.

I can’t find my checkpoint

Find all checkpoints:

find . -path "*/checkpoints/*" -name "*.pt"

Or use specific path:

python evaluate.py checkpoint_dir=/full/path/to/checkpoints/

My model accuracy is poor

See Troubleshooting Model Not Improving section.

Usually need:

More training data
Better reward function
Longer training
Tuned hyperparameters

I’m getting out of memory errors

See Memory requirements section and follow incremental reduction guide in troubleshooting.

Still Can’t Find Answer?

Check:

Troubleshooting - Detailed troubleshooting guide
Training Guide - Training guide
Configuration Guide - Configuration reference
API reference docs for specific modules

Next Steps

Training Guide - Training guide
Troubleshooting - Troubleshooting guide
Configuration Guide - Configuration reference