Welcome to Ramith's Website!

Notes on Active Learning

Why? Rather than using all the knowledge to train, can we just pick the 1) best books? 2) ask the right questions ¹¹ asking an oracle to label unlabeled data points

Active Learning Scenarios

Membership query synthesis

Learner may request the label for an unlabeled example in the input space, or even generate a new data point and ask for the label.
Stream-based selective sampling

Algorithm doesn’t access all the data at once, rather based on the data points we receive we have to decide if we wanna label them or not.
Pool-based sampling

Assumes we have a set of labeled data and a vast amount of unlabeled data. How is this different from the previous one?
- Here we have access to a bunch of unlabeled data at once.

Main question: Asking the right question

In the literature, “asking the right question” is technically referred to as a Query Strategy. When i say asking the right question, its essentially considering which data points are worth labeling (both for classification and regression). The following measures can be used to determine which data points to label:

Uncertainty Sampling: Select instances with the highest entropy ²²
Query-By-Commitee: Maintains a group of models and queries the instance they disagree on the most.
Expected Model Change: A decision-theoretic approach that selects the instance that would most significantly impact the current model’s parameters (e.g., Expected Gradient Length).
Expected Error Reduction: Aims to query the instance that will most reduce the model’s future generalization error (or “risk”).
Variance Reduction: Minimizes future error indirectly by minimizing the model’s output variance, often using Fisher information.
Density Weighted Methods: Ensures the learner doesn’t just pick “confusing” outliers by also considering how representative an instance is of the overall data distribution

KUDOSDon’t
MoveThanks!

Notes on Active Learning

Active Learning Scenarios

Membership query synthesis

Stream-based selective sampling

Pool-based sampling

Main question: Asking the right question