libact.base package


libact.base.dataset module

The dataset class used in this package. Datasets consists of data used for training, represented by a list of (feature, label) tuples. May be exported in different formats for application on other libraries.

class libact.base.dataset.Dataset(X=None, y=None)

Bases: object

libact dataset object

  • X ({array-like}, shape = (n_samples, n_features)) – Feature of sample set.
  • y (list of {int, None}, shape = (n_samples)) – The ground truth (label) for corresponding sample. Unlabeled data should be given a label None.

list, shape = (n_samples) – List of all sample feature and label tuple.

append(feature, label=None)

Add a (feature, label) entry into the dataset. A None label indicates an unlabeled entry.

  • feature ({array-like}, shape = (n_features)) – Feature of the sample to append to dataset.
  • label ({int, None}) – Label of the sample to append to dataset. None if unlabeled.

entry_id – entry_id for the appened sample.

Return type:



Returns dataset in (X, y) format for use in scikit-learn. Unlabeled entries are ignored.

  • X (numpy array, shape = (n_samples, n_features)) – Sample feature set.
  • y (numpy array, shape = (n_samples)) – Sample labels.

Return the list of all sample feature and ground truth tuple.

Returns:data – List of all sample feature and label tuple.
Return type:list, shape = (n_samples)

Returns list of labeled feature and their label

Returns:labeled_entries – Labeled entries
Return type:list of (feature, label) tuple

Number of distinct lebels in this object.

Return type:int

Returns list of unlabeled features, along with their entry_ids

Returns:unlabeled_entries – Labeled entries
Return type:list of (entry_id, feature) tuple
labeled_uniform_sample(sample_size, replace=True)

Returns a Dataset object with labeled data only, which is resampled uniformly with given sample size. Parameter replace decides whether sampling with replacement or not.


Number of labeled data entries in this object.

Return type:int

Number of unlabeled data entries in this object.

Return type:int

Add callback function to call when dataset updated.

Parameters:callback (callable) – The function to be called when dataset is updated.
update(entry_id, new_label)

Updates an entry with entry_id with the given label

  • entry_id (int) – entry id of the sample to update.
  • label ({int, None}) – Label of the sample to be update.

Imports dataset file in libsvm sparse format


libact.base.interfaces module

Base interfaces for use in the package. The package works according to the interfaces defined below.

class libact.base.interfaces.ContinuousModel

Bases: libact.base.interfaces.Model

Classification Model with intermediate continuous output

A continuous classification model is able to output a real-valued vector for each features provided.

predict_real(feature, *args, **kwargs)

Predict confidence scores for samples.

Returns the confidence score for each (sample, class) combination.

The larger the value for entry (sample=x, class=k) is, the more confident the model is about the sample x belonging to the class k.

Take Logistic Regression as example, the return value is the signed distance of that sample to the hyperplane.

Parameters:feature (array-like, shape (n_samples, n_features)) – The samples whose confidence scores are to be predicted.
Returns:X – Each entry is the confidence scores per (sample, class) combination.
Return type:array-like, shape (n_samples, n_classes)
class libact.base.interfaces.Labeler

Bases: object

Label the queries made by QueryStrategies

Assign labels to the samples queried by QueryStrategies.


Return the class labels for the input feature array.

Parameters:feature (array-like, shape (n_features,)) – The feature vector whose label is to queried.
Returns:label – The class label of the queried feature.
Return type:int
class libact.base.interfaces.Model

Bases: object

Classification Model

A Model returns a class-predicting function for future samples after trained on a training dataset.

predict(feature, *args, **kwargs)

Predict the class labels for the input samples

Parameters:feature (array-like, shape (n_samples, n_features)) – The unlabeled samples whose labels are to be predicted.
Returns:y_pred – The class labels for samples in the feature array.
Return type:array-like, shape (n_samples,)
score(testing_dataset, *args, **kwargs)

Return the mean accuracy on the test dataset

Parameters:testing_dataset (Dataset object) – The testing dataset used to measure the perforance of the trained model.
Returns:score – Mean accuracy of self.predict(X) wrt. y.
Return type:float
train(dataset, *args, **kwargs)

Train a model according to the given training dataset.

Parameters:dataset (Dataset object) – The training dataset the model is to be trained on.
Returns:self – Returns self.
Return type:object
class libact.base.interfaces.MultilabelModel

Bases: libact.base.interfaces.Model

Multilabel Classification Model

A Model returns a multilabel-predicting function for future samples after trained on a training dataset.

class libact.base.interfaces.ProbabilisticModel

Bases: libact.base.interfaces.ContinuousModel

Classification Model with probability output

A probabilistic classification model is able to output a real-valued vector for each features provided.

predict_proba(feature, *args, **kwargs)

Predict probability estimate for samples.

Parameters:feature (array-like, shape (n_samples, n_features)) – The samples whose probability estimation are to be predicted.
Returns:X – Each entry is the prabablity estimate for each class.
Return type:array-like, shape (n_samples, n_classes)
predict_real(feature, *args, **kwargs)
class libact.base.interfaces.QueryStrategy(dataset, **kwargs)

Bases: object

Pool-based query strategy

A QueryStrategy advices on which unlabeled data to be queried next given a pool of labeled and unlabeled data.


The Dataset object that is associated with this QueryStrategy.


Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:int
update(entry_id, label)

Update the internal states of the QueryStrategy after each queried sample being labeled.

  • entry_id (int) – The index of the newly labeled sample.
  • label (float) – The label of the queried sample.

Module contents