Frequently Asked Questions
==========================

General Questions
-----------------

**What is Agent-Tunix?**

Agent-Tunix is a framework for training language models using GRPO (Group Relative Policy Optimization) with parameter-efficient fine-tuning via LoRA. It's designed for reinforcement learning from reward feedback.

**What models does Agent-Tunix support?**

Currently supports Google Gemma3 family:

- Gemma3 270M (lightweight)
- Gemma3 1B (standard)
- Gemma3 4B (large)

Can be extended to other models by adding configuration files.

**Do I need multiple GPUs to use Agent-Tunix?**

No. The framework works on a single GPU. For smaller models (270M), even 11GB GPUs (like RTX 2080 Ti) can work. Multi-GPU support is optional for faster training.

**Can I run on CPU?**

Not recommended. The framework is optimized for GPUs. You can set ``device=cpu`` but training will be very slow.

**What's the difference between GRPO and other RL methods?**

GRPO (Group Relative Policy Optimization):

- Generates K responses per prompt
- Normalizes rewards relative to the group
- More sample-efficient than standard PPO
- Designed for discrete sequence generation

**Do I need to understand GRPO to use Agent-Tunix?**

No. You can use the framework with default settings without understanding GRPO details. The configuration handles most complexity.

Setup and Installation
----------------------

**I'm getting CUDA errors. How do I fix this?**

First, verify CUDA is installed::

    python -m agent_tunix.utils check-gpu

If not installed, follow the :doc:`../getting_started/installation` guide.

**How much VRAM do I need?**

Minimum requirements by model:

- Gemma3 270M: 11GB (batch size 1)
- Gemma3 1B: 24GB (batch size 2)
- Gemma3 4B: 48GB (batch size 4)

See :doc:`../getting_started/configuration` for memory optimization tips.

**How do I install with a different CUDA version?**

See the :doc:`../getting_started/installation` guide's CUDA section for instructions.

**Can I use WSL2 on Windows?**

Yes. Install CUDA in WSL2 and follow normal installation steps. GPU passthrough works on WSL2.

Configuration
-------------

**How do I override configuration values?**

Use dot notation on command line::

    python run_training.py optimizer.learning_rate=1e-5
    python run_training.py model=gemma3_1b training.micro_batch_size=2

See :doc:`../getting_started/configuration` for more examples.

**What's the difference between parameters and overrides?**

- **Parameters**: Settings defined in YAML config files
- **Overrides**: Command-line changes to parameters

Example::

    # Parameter in conf/optimizer/adamw.yaml
    learning_rate: 3e-6

    # Override from command line
    python run_training.py optimizer.learning_rate=1e-5

**Can I use multiple experiments?**

Yes. Use the ``+experiment=`` syntax::

    python run_training.py +experiment=quick_test

Create custom experiments in ``conf/experiment/``.

**What's the difference between +experiment and --multirun?**

- **+experiment**: Load preset configuration combining multiple settings
- **--multirun**: Run multiple training jobs with different parameter combinations

Example::

    # Single run with preset
    python run_training.py +experiment=quick_test

    # Multiple runs with parameter sweep
    python run_training.py --multirun optimizer.learning_rate=1e-6,1e-5,1e-4

**Where are configuration files located?**

In ``conf/`` directory structure::

    conf/
    ├── config.yaml              # Main config
    ├── model/                   # Model configs
    ├── optimizer/               # Optimizer configs
    ├── scheduler/               # Scheduler configs
    ├── grpo/                    # GRPO algorithm configs
    ├── generation/              # Generation configs
    ├── training/                # Training configs
    ├── evaluation/              # Evaluation configs
    └── experiment/              # Experiment presets

Training
--------

**How long does training take?**

Depends on configuration:

- Quick test: ~5 minutes (10 steps)
- Full training: 1-24 hours (depends on GPU and data size)

Check logs for training speed::

    tail -f outputs/tunix-grpo/YYYY-MM-DD/HH-MM-SS/train.log

**How often should I save checkpoints?**

Default is every 100 steps. Adjust::

    python run_training.py training.save_interval_steps=50

More frequent saves → more disk usage but better checkpoint coverage.

**Can I resume training from a checkpoint?**

Yes. Provide checkpoint directory::

    python run_training.py checkpoint_dir=./checkpoints/ckpts/

Automatically loads latest checkpoint.

**How do I know if my learning rate is good?**

Monitor training loss:

- **Decreasing smoothly**: Good learning rate
- **Increasing (diverging)**: Learning rate too high
- **Decreasing very slowly**: Learning rate too low
- **Noisy**: Consider gradient clipping

Use Weights & Biases or TensorBoard to visualize::

    make tensorboard
    # Open http://localhost:6006

**What's a good batch size?**

Depends on GPU memory:

- 11GB GPU: 1
- 24GB GPU: 2-4
- 48GB GPU: 4-8
- 80GB+ GPU: 8-16

Larger batches = more stable gradients but slower per-step updates.

**Should I use more generations per prompt?**

More generations = better reward signal but slower training.

Default is 4. Try 2-8:

- 2: Fast training, sparse reward signal
- 4: Good balance (default)
- 8: Better training but 2x slower

**How do I debug training issues?**

Enable verbose logging::

    python run_training.py training.log_level=DEBUG

Check logs::

    tail -f outputs/tunix-grpo/YYYY-MM-DD/HH-MM-SS/train.log

**Can I use my own data?**

Yes. Create a custom data loading function in ``src/agent_tunix/data.py``.

See :doc:`../api/data` for guidelines.

**How do custom rewards work?**

Reward functions evaluate responses and return scores (0.0-1.0):

- 1.0 = perfect response
- 0.5 = partial credit
- 0.0 = incorrect

See :doc:`../advanced/custom_rewards` for examples.

**Can I use multiple GPUs?**

Yes. Configure mesh shape::

    python run_training.py model.mesh_shape=[[4,1],["fsdp","tp"]]

See :doc:`../advanced/distributed_training`.

Evaluation
----------

**How do I evaluate my trained model?**

::

    python evaluate.py

Uses latest checkpoint by default.

**Can I evaluate on a specific checkpoint?**

Yes::

    python evaluate.py step=1000

List available checkpoints::

    ls checkpoints/ckpts/actor/

**What metrics are computed?**

- **Accuracy**: Exact match percentage
- **Partial Accuracy**: Within 10% of correct answer
- **Format Accuracy**: Response matches expected format

**How do different inference strategies compare?**

Three strategies available:

- **Greedy**: Deterministic, fastest, reproducible
- **Standard**: Balanced sampling, moderate diversity
- **Liberal**: High diversity, creative outputs

Try all three::

    for config in greedy standard liberal; do
        python evaluate.py inference_config=$config
    done

**Can I get confidence estimates?**

Yes, use multiple passes::

    python evaluate.py num_passes=5

Runs 5 generations per question and compares consistency.

**What if evaluation takes too long?**

Solutions:

1. Use greedy inference (fastest)::

       python evaluate.py inference_config=greedy

2. Evaluate fewer samples
3. Use earlier checkpoint::

       python evaluate.py step=100

Hyperparameter Tuning
---------------------

**How do I find good hyperparameters?**

Use the workflow in :doc:`../guide/hyperparameter_tuning`:

1. **Quick learning rate search** (1 hour)
2. **Batch size tuning** (2 hours)
3. **Model size testing** (varies)
4. **Final training** with best parameters

**Should I tune one parameter at a time?**

Yes, generally. Tune in this order:

1. Learning rate
2. Batch size
3. LoRA rank
4. Number of generations
5. KL beta

Tuning one at a time is easier to interpret.

**What's a good starting learning rate?**

Default is 3e-6. Good range to try:

- Too high: 1e-4 (likely diverges)
- Good: 1e-6 to 1e-5
- Too low: 1e-7 (very slow)

Start with 1e-5 and adjust based on training loss.

**When should I use warmup?**

Almost always for stable training. Default warmup_ratio=0.1 (10% of training).

Reduces warmup for very short training::

    python run_training.py +experiment=quick_test optimizer.warmup_ratio=0.0

Advanced Topics
---------------

**Can I use other models besides Gemma3?**

Yes. Create configuration in ``conf/model/``. You'll need to:

1. Configure model architecture
2. Set LoRA module paths
3. Update tokenizer path

**Can I fine-tune a model I already trained?**

Yes. Resume from checkpoint::

    python run_training.py checkpoint_dir=./checkpoints/ckpts/

**How do I implement distributed training?**

See :doc:`../advanced/distributed_training`.

Requires setting up NCCL and configuring mesh shape.

**Can I use quantization to reduce memory?**

Not currently. LoRA already reduces parameters significantly.

Could be added as feature.

**How do I profile training?**

Enable profiling::

    python run_training.py training.profile=true

Check output directory for profiling results.

**Can I use Weights & Biases?**

Yes, enabled by default. Disable::

    python run_training.py wandb_disabled=true

Set up credentials::

    wandb login

**Can I export the model for inference?**

Yes. Save LoRA weights from checkpoint.

See evaluation guide for inference options.

Performance and Optimization
----------------------------

**How do I speed up training?**

1. Use larger batch size (if memory allows)
2. Reduce sequence length
3. Use fewer generations per prompt
4. Reduce gradient accumulation steps

**How do I reduce memory usage?**

See :doc:`../guide/training` Memory Optimization section:

1. Reduce batch size
2. Use smaller model
3. Reduce LoRA rank
4. Shorten sequences
5. Reduce generations

**Can I use mixed precision (fp16, bf16)?**

Not currently enabled by default. Could be added.

**Should I save every checkpoint?**

Only save important ones to save disk space::

    python run_training.py training.save_interval_steps=1000

Or use checkpointing (saves every N steps to limited slots).

Troubleshooting
---------------

**Training stops with NaN loss**

See :doc:`../advanced/troubleshooting` NaN Loss section.

Usually caused by:

- Learning rate too high
- Gradient overflow
- Bad data example

**Training is very slow**

Check GPU utilization::

    watch -n 1 nvidia-smi

Solutions in :doc:`../advanced/troubleshooting` Slow Training section.

**I can't find my checkpoint**

Find all checkpoints::

    find . -path "*/checkpoints/*" -name "*.pt"

Or use specific path::

    python evaluate.py checkpoint_dir=/full/path/to/checkpoints/

**My model accuracy is poor**

See :doc:`../advanced/troubleshooting` Model Not Improving section.

Usually need:

- More training data
- Better reward function
- Longer training
- Tuned hyperparameters

**I'm getting out of memory errors**

See Memory requirements section and follow incremental reduction guide in troubleshooting.

Still Can't Find Answer?
------------------------

Check:

1. :doc:`../advanced/troubleshooting` - Detailed troubleshooting guide
2. :doc:`../guide/training` - Training guide
3. :doc:`../getting_started/configuration` - Configuration reference
4. API reference docs for specific modules

Next Steps
----------

- :doc:`../guide/training` - Training guide
- :doc:`../advanced/troubleshooting` - Troubleshooting guide
- :doc:`../getting_started/configuration` - Configuration reference