LLM Fine-Tuning Taxonomy | Arastun Mammadli

Conceptual Table

Concept	Axis	Main Question
Offline vs. Online (RL)	Planning (MDP) vs. Learning (RL)	Do we already know the environment?
Model-based vs. Model-free	Environment model	Do we know/approximate ĤT, ĤR or learn directly from interactions?
Value-based vs. Policy-based	Representation	Do we learn a value function or a policy directly?
Passive vs. Active	Exploration	Are we evaluating or improving a policy?