LLM Fine-Tuning Taxonomy

Conceptual Table

Concept Axis Main Question
Offline vs. Online (RL) Planning (MDP) vs. Learning (RL) Do we already know the environment?
Model-based vs. Model-free Environment model Do we know/approximate ĤT, ĤR or learn directly from interactions?
Value-based vs. Policy-based Representation Do we learn a value function or a policy directly?
Passive vs. Active Exploration Are we evaluating or improving a policy?

Taxonomy




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Bayesian Networks
  • Curriculum Learning Methods
  • Multi-Armed Bandit Problems
  • Natural Policy Gradient Methods
  • Deep Q-Learning