Frequently Asked Questions

General Questions

What is Agent-Tunix?

Agent-Tunix is a framework for training language models using GRPO (Group Relative Policy Optimization) with parameter-efficient fine-tuning via LoRA. It’s designed for reinforcement learning from reward feedback.

What models does Agent-Tunix support?

Currently supports Google Gemma3 family:

  • Gemma3 270M (lightweight)

  • Gemma3 1B (standard)

  • Gemma3 4B (large)

Can be extended to other models by adding configuration files.

Do I need multiple GPUs to use Agent-Tunix?

No. The framework works on a single GPU. For smaller models (270M), even 11GB GPUs (like RTX 2080 Ti) can work. Multi-GPU support is optional for faster training.

Can I run on CPU?

Not recommended. The framework is optimized for GPUs. You can set device=cpu but training will be very slow.

What’s the difference between GRPO and other RL methods?

GRPO (Group Relative Policy Optimization):

  • Generates K responses per prompt

  • Normalizes rewards relative to the group

  • More sample-efficient than standard PPO

  • Designed for discrete sequence generation

Do I need to understand GRPO to use Agent-Tunix?

No. You can use the framework with default settings without understanding GRPO details. The configuration handles most complexity.

Setup and Installation

I’m getting CUDA errors. How do I fix this?

First, verify CUDA is installed:

python -m agent_tunix.utils check-gpu

If not installed, follow the Installation guide.

How much VRAM do I need?

Minimum requirements by model:

  • Gemma3 270M: 11GB (batch size 1)

  • Gemma3 1B: 24GB (batch size 2)

  • Gemma3 4B: 48GB (batch size 4)

See Configuration Guide for memory optimization tips.

How do I install with a different CUDA version?

See the Installation guide’s CUDA section for instructions.

Can I use WSL2 on Windows?

Yes. Install CUDA in WSL2 and follow normal installation steps. GPU passthrough works on WSL2.

Configuration

How do I override configuration values?

Use dot notation on command line:

python run_training.py optimizer.learning_rate=1e-5
python run_training.py model=gemma3_1b training.micro_batch_size=2

See Configuration Guide for more examples.

What’s the difference between parameters and overrides?

  • Parameters: Settings defined in YAML config files

  • Overrides: Command-line changes to parameters

Example:

# Parameter in conf/optimizer/adamw.yaml
learning_rate: 3e-6

# Override from command line
python run_training.py optimizer.learning_rate=1e-5

Can I use multiple experiments?

Yes. Use the +experiment= syntax:

python run_training.py +experiment=quick_test

Create custom experiments in conf/experiment/.

What’s the difference between +experiment and –multirun?

  • +experiment: Load preset configuration combining multiple settings

  • –multirun: Run multiple training jobs with different parameter combinations

Example:

# Single run with preset
python run_training.py +experiment=quick_test

# Multiple runs with parameter sweep
python run_training.py --multirun optimizer.learning_rate=1e-6,1e-5,1e-4

Where are configuration files located?

In conf/ directory structure:

conf/
├── config.yaml              # Main config
├── model/                   # Model configs
├── optimizer/               # Optimizer configs
├── scheduler/               # Scheduler configs
├── grpo/                    # GRPO algorithm configs
├── generation/              # Generation configs
├── training/                # Training configs
├── evaluation/              # Evaluation configs
└── experiment/              # Experiment presets

Training

How long does training take?

Depends on configuration:

  • Quick test: ~5 minutes (10 steps)

  • Full training: 1-24 hours (depends on GPU and data size)

Check logs for training speed:

tail -f outputs/tunix-grpo/YYYY-MM-DD/HH-MM-SS/train.log

How often should I save checkpoints?

Default is every 100 steps. Adjust:

python run_training.py training.save_interval_steps=50

More frequent saves → more disk usage but better checkpoint coverage.

Can I resume training from a checkpoint?

Yes. Provide checkpoint directory:

python run_training.py checkpoint_dir=./checkpoints/ckpts/

Automatically loads latest checkpoint.

How do I know if my learning rate is good?

Monitor training loss:

  • Decreasing smoothly: Good learning rate

  • Increasing (diverging): Learning rate too high

  • Decreasing very slowly: Learning rate too low

  • Noisy: Consider gradient clipping

Use Weights & Biases or TensorBoard to visualize:

make tensorboard
# Open http://localhost:6006

What’s a good batch size?

Depends on GPU memory:

  • 11GB GPU: 1

  • 24GB GPU: 2-4

  • 48GB GPU: 4-8

  • 80GB+ GPU: 8-16

Larger batches = more stable gradients but slower per-step updates.

Should I use more generations per prompt?

More generations = better reward signal but slower training.

Default is 4. Try 2-8:

  • 2: Fast training, sparse reward signal

  • 4: Good balance (default)

  • 8: Better training but 2x slower

How do I debug training issues?

Enable verbose logging:

python run_training.py training.log_level=DEBUG

Check logs:

tail -f outputs/tunix-grpo/YYYY-MM-DD/HH-MM-SS/train.log

Can I use my own data?

Yes. Create a custom data loading function in src/agent_tunix/data.py.

See Data API for guidelines.

How do custom rewards work?

Reward functions evaluate responses and return scores (0.0-1.0):

  • 1.0 = perfect response

  • 0.5 = partial credit

  • 0.0 = incorrect

See Custom Reward Functions for examples.

Can I use multiple GPUs?

Yes. Configure mesh shape:

python run_training.py model.mesh_shape=[[4,1],["fsdp","tp"]]

See Distributed Training.

Evaluation

How do I evaluate my trained model?

python evaluate.py

Uses latest checkpoint by default.

Can I evaluate on a specific checkpoint?

Yes:

python evaluate.py step=1000

List available checkpoints:

ls checkpoints/ckpts/actor/

What metrics are computed?

  • Accuracy: Exact match percentage

  • Partial Accuracy: Within 10% of correct answer

  • Format Accuracy: Response matches expected format

How do different inference strategies compare?

Three strategies available:

  • Greedy: Deterministic, fastest, reproducible

  • Standard: Balanced sampling, moderate diversity

  • Liberal: High diversity, creative outputs

Try all three:

for config in greedy standard liberal; do
    python evaluate.py inference_config=$config
done

Can I get confidence estimates?

Yes, use multiple passes:

python evaluate.py num_passes=5

Runs 5 generations per question and compares consistency.

What if evaluation takes too long?

Solutions:

  1. Use greedy inference (fastest):

    python evaluate.py inference_config=greedy
    
  2. Evaluate fewer samples

  3. Use earlier checkpoint:

    python evaluate.py step=100
    

Hyperparameter Tuning

How do I find good hyperparameters?

Use the workflow in Hyperparameter Tuning:

  1. Quick learning rate search (1 hour)

  2. Batch size tuning (2 hours)

  3. Model size testing (varies)

  4. Final training with best parameters

Should I tune one parameter at a time?

Yes, generally. Tune in this order:

  1. Learning rate

  2. Batch size

  3. LoRA rank

  4. Number of generations

  5. KL beta

Tuning one at a time is easier to interpret.

What’s a good starting learning rate?

Default is 3e-6. Good range to try:

  • Too high: 1e-4 (likely diverges)

  • Good: 1e-6 to 1e-5

  • Too low: 1e-7 (very slow)

Start with 1e-5 and adjust based on training loss.

When should I use warmup?

Almost always for stable training. Default warmup_ratio=0.1 (10% of training).

Reduces warmup for very short training:

python run_training.py +experiment=quick_test optimizer.warmup_ratio=0.0

Advanced Topics

Can I use other models besides Gemma3?

Yes. Create configuration in conf/model/. You’ll need to:

  1. Configure model architecture

  2. Set LoRA module paths

  3. Update tokenizer path

Can I fine-tune a model I already trained?

Yes. Resume from checkpoint:

python run_training.py checkpoint_dir=./checkpoints/ckpts/

How do I implement distributed training?

See Distributed Training.

Requires setting up NCCL and configuring mesh shape.

Can I use quantization to reduce memory?

Not currently. LoRA already reduces parameters significantly.

Could be added as feature.

How do I profile training?

Enable profiling:

python run_training.py training.profile=true

Check output directory for profiling results.

Can I use Weights & Biases?

Yes, enabled by default. Disable:

python run_training.py wandb_disabled=true

Set up credentials:

wandb login

Can I export the model for inference?

Yes. Save LoRA weights from checkpoint.

See evaluation guide for inference options.

Performance and Optimization

How do I speed up training?

  1. Use larger batch size (if memory allows)

  2. Reduce sequence length

  3. Use fewer generations per prompt

  4. Reduce gradient accumulation steps

How do I reduce memory usage?

See Training Guide Memory Optimization section:

  1. Reduce batch size

  2. Use smaller model

  3. Reduce LoRA rank

  4. Shorten sequences

  5. Reduce generations

Can I use mixed precision (fp16, bf16)?

Not currently enabled by default. Could be added.

Should I save every checkpoint?

Only save important ones to save disk space:

python run_training.py training.save_interval_steps=1000

Or use checkpointing (saves every N steps to limited slots).

Troubleshooting

Training stops with NaN loss

See Troubleshooting NaN Loss section.

Usually caused by:

  • Learning rate too high

  • Gradient overflow

  • Bad data example

Training is very slow

Check GPU utilization:

watch -n 1 nvidia-smi

Solutions in Troubleshooting Slow Training section.

I can’t find my checkpoint

Find all checkpoints:

find . -path "*/checkpoints/*" -name "*.pt"

Or use specific path:

python evaluate.py checkpoint_dir=/full/path/to/checkpoints/

My model accuracy is poor

See Troubleshooting Model Not Improving section.

Usually need:

  • More training data

  • Better reward function

  • Longer training

  • Tuned hyperparameters

I’m getting out of memory errors

See Memory requirements section and follow incremental reduction guide in troubleshooting.

Still Can’t Find Answer?

Check:

  1. Troubleshooting - Detailed troubleshooting guide

  2. Training Guide - Training guide

  3. Configuration Guide - Configuration reference

  4. API reference docs for specific modules

Next Steps