Model Fine-Tuning
3 Frameworks × 7 Training Stages × 3 Tuning Methods — Enterprise LLM Alignment & Customization
Product Overview
Core Capabilities
Three Fine-Tuning Frameworks
Built-in support for LlamaFactory, Unsloth, and Axolotl via a unified task submission interface. Frameworks are registered through a plugin mechanism that auto-generates training commands and configuration files — no need to handle framework differences manually.
Seven Training Stages
Covers SFT (supervised fine-tuning), DPO (direct preference optimization), KTO, RM (reward modeling), PPO (proximal policy optimization), GRPO, and PT (continued pre-training) — addressing the full spectrum from instruction following to human preference alignment.
Three Tuning Methods
Supports LoRA (low-rank adaptation), QLoRA (quantized low-rank adaptation), and Full (full-parameter fine-tuning). Default learning rates are auto-set per method — 1e-4 for LoRA, 2e-4 for QLoRA, 5e-5 for Full — reducing the hyperparameter tuning barrier.
Visual Hyperparameter Configuration
Graphical panel for 20+ hyperparameters including epochs, batchSize, gradientAccumulationSteps, cutoffLen, warmupRatio, LoRA rank/alpha/dropout, and more. Parameters are shown or hidden dynamically based on the selected tuning method and training stage.
Automatic Evaluation Trigger
Enable the autoEval toggle and specify an evaluation dataset at creation time; the platform automatically triggers model evaluation upon training completion with no manual intervention. A front-end warning fires when evaluation and training datasets overlap to prevent data leakage.
Model Comparison & Merge Export
Launch temporary inference services post-training to load the base model and fine-tuned model side by side for streaming A/B chat comparison. Export via LoRA merge with None / INT8 / INT4 quantization options. Inference resources are auto-reclaimed on TTL expiry.
Tuning Method × Training Stage Support Matrix
| Training Stage | LoRA | QLoRA | Full |
|---|---|---|---|
| SFT | ✓ | ✓ | ✓ |
| DPO | ✓ | ✓ | ✓ |
| KTO | ✓ | ✓ | ✓ |
| RM | ✓ | ✓ | ✓ |
| PPO | ✓ | ✓ | ✓ |
| GRPO | ✓ | ✓ | ✓ |
Fine-Tuning Workflow
Select Base Model
Choose the base model from the model repository and specify its storage path
Choose Framework & Stage
Select the fine-tuning framework (LlamaFactory/Unsloth/Axolotl), training stage (SFT/DPO/KTO, etc.), and tuning method (LoRA/QLoRA/Full)
Configure Hyperparameters
Visually configure 20+ training hyperparameters; the platform auto-fills recommended defaults based on the selected method
Train & Monitor
Submit the job and track loss/learningRate/gradNorm curves in real time with automatic detection of startup phases like dataset download and model loading
Compare & Export
A/B compare pre- and post-tuning results, then one-click LoRA merge export with optional INT8/INT4 quantization