API: pytsk.gradient_descent

This package contains all the APIs you need to build and train a fuzzy neural networks.

pytsk.gradient_descent.antecedent

class AntecedentGMF(in_dim, n_rule, high_dim=False, init_center=None, init_sigma=1.0, eps=1e-08)

Parent: torch.nn.Module

The antecedent part with Gaussian membership function. Input: data, output the corresponding firing levels of each rule. The firing level \(f_r(\mathbf{x})\) of the \(r\)-th rule are computed by:

\[\begin{split}&\mu_{r,d}(x_d) = \exp(-\frac{(x_d - m_{r,d})^2}{2\sigma_{r,d}^2}),\\ &f_{r}(\mathbf{x})=\prod_{d=1}^{D}\mu_{r,d}(x_d),\\ &\overline{f}_r(\mathbf{x}) = \frac{f_{r}(\mathbf{x})}{\sum_{i=1}^R f_{i}(\mathbf{x})}.\end{split}\]

Parameters

in_dim (int) – Number of features \(D\) of the input.
n_rule (int) – Number of rules \(R\) of the TSK model.
high_dim (bool) – Whether to use the HTSK defuzzification. If high_dim=True, HTSK is used. Otherwise the original defuzzification is used. More details can be found at [1]. TSK model tends to fail on high-dimensional problems, so set high_dim=True is highly recommended for any-dimensional problems.
init_center (numpy.array) – Initial center of the Gaussian membership function with the size of \([D,R]\). A common way is to run a KMeans clustering and set init_center as the obtained centers. You can simply run pytsk.gradient_descent.antecedent.antecedent_init_center to obtain the center.
init_sigma (float) – Initial \(\sigma\) of the Gaussian membership function.
eps (float) – A constant to avoid the division zero error.

init(self, center, sigma)

Change the value of init_center and init_sigma.

Parameters

center (numpy.array) – Initial center of the Gaussian membership function with the size of \([D,R]\). A common way is to run a KMeans clustering and set init_center as the obtained centers. You can simply run pytsk.gradient_descent.antecedent.antecedent_init_center to obtain the center.
sigma (float) – Initial \(\sigma\) of the Gaussian membership function.

reset_parameters(self): Re-initialize all parameters.

forward(self, X)

Forward method of Pytorch Module.

Parameters: X (torch.tensor) – Pytorch tensor with the size of \([N, D]\), where \(N\) is the number of samples, \(D\) is the input dimension.
Returns: Firing level matrix \(U\) with the size of \([N, R]\).

class AntecedentShareGMF(in_dim, n_mf=2, high_dim=False, init_center=None, init_sigma=1.0, eps=1e-08)

Parent: torch.nn.Module

The antecedent part with Gaussian membership function, rules will share the membership functions on each feature [2]. The number of rules will be \(M^D\), where \(M\) is n_mf, \(D\) is the number of features (in_dim).

Parameters

in_dim (int) – Number of features \(D\) of the input.
n_mf (int) – Number of membership functions \(M\) of each feature.
high_dim (bool) – Whether to use the HTSK defuzzification. If high_dim=True, HTSK is used. Otherwise the original defuzzification is used. More details can be found at [1]. TSK model tends to fail on high-dimensional problems, so set high_dim=True is highly recommended for any-dimensional problems.
init_center (numpy.array) – Initial center of the Gaussian membership function with the size of \([D,M]\).
init_sigma (float) – Initial \(\sigma\) of the Gaussian membership function.
eps (float) – A constant to avoid the division zero error.

init(self, center, sigma)

Change the value of init_center and init_sigma.

Parameters

center (numpy.array) – Initial center of the Gaussian membership function with the size of \([D,M]\).
sigma (float) – Initial \(\sigma\) of the Gaussian membership function.

reset_parameters(self): Re-initialize all parameters.

forward(self, X)

Forward method of Pytorch Module.

Parameters: X (torch.tensor) – Pytorch tensor with the size of \([N, D]\), where \(N\) is the number of samples, \(D\) is the input dimension.
Returns: Firing level matrix \(U\) with the size of \([N, R], R=M^D\):.

class AntecedentTriMF(in_dim, n_rule, init_center=None, init_left_dist=3.0, init_right_dist=3.0, eps=1e-08)

Parent: torch.nn.Module

The antecedent part with triangle membership function. Input: data, output the corresponding firing levels of each rule. The firing level \(f_r(\mathbf{x})\) of the \(r\)-th rule are computed by:

\[\begin{split}&\mu_{r,d}(x_d) = \max(0, \min(\frac{1}{d^{(l)}_{r,d}}x + 1 - \frac{m_{r,d}}{d^{(l)}_{r,d}}, -\frac{1}{d^{(r)}_{r,d}}x + 1 + \frac{m_{r,d}}{d^{(r)}_{r,d}})),\\ &f_{r}(\mathbf{x})=\prod_{d=1}^{D}\mu_{r,d}(x_d),\\ &\overline{f}_r(\mathbf{x}) = \frac{f_{r}(\mathbf{x})}{\sum_{i=1}^R f_{i}(\mathbf{x})},\end{split}\]

where \(d^{(l)}_{r,d}\)/\(d^{(r)}_{r,d}\) is the distance between the center and the left/right vertex of the triangle membership function, respectively. A simple schematic is shown below:

Parameters

in_dim (int) – Number of features \(D\) of the input.
n_rule (int) – Number of rules \(R\) of the TSK model.
init_center (numpy.array) – Initial center of the triangle membership function with the size of \([D,M]\).
init_left_dist (float) – Initial \(d^{(l)}\) of the triangle membership function.
init_right_dist (float) – Initial \(d^{(r)}\) of the triangle membership function.
eps (float) – A constant to avoid the division zero error.

init(self, center, left_dist, right_dist): Change the value of init_center, init_left_dist and init_right_dist.

Parameters

init_center (numpy.array) – Initial center of the triangle membership function with the size of \([D,R]\).
left_dist (float) – Initial \(d^{(l)}\) of the triangle membership function.
right_dist (float) – Initial \(d^{(r)}\) of the triangle membership function.

reset_parameters(self): Re-initialize all parameters.

forward(self, X)

Forward method of Pytorch Module.

Parameters: X (torch.tensor) – Pytorch tensor with the size of \([N, D]\), where \(N\) is the number of samples, \(D\) is the input dimension.
Returns: Firing level matrix \(U\) with the size of \([N, R]\):.

class AntecedentShareTriMF(in_dim, n_mf=2, init_center=None, init_left_dist=3.0, init_right_dist=3.0, eps=1e-08)

Parent: torch.nn.Module

The antecedent part with triangle membership function, rules will share the membership functions on each feature [2]. The number of rules will be \(M^D\), where \(M\) is n_mf, \(D\) is the number of features (in_dim).

Parameters

in_dim (int) – Number of features \(D\) of the input.
n_mf (int) – Number of membership functions \(M\) of each feature.
init_center (numpy.array) – Initial center of the triangle membership function with the size of \([D,M]\).
init_left_dist (float) – Initial \(d^{(l)}\) of the triangle membership function.
init_right_dist (float) – Initial \(d^{(r)}\) of the triangle membership function.
eps (float) – A constant to avoid the division zero error.

init(self, center, sigma): Change the value of init_center, init_left_dist and init_right_dist.

Parameters

init_center (numpy.array) – Initial center of the triangle membership function with the size of \([D,M]\).
left_dist (float) – Initial \(d^{(l)}\) of the triangle membership function.
right_dist (float) – Initial \(d^{(r)}\) of the triangle membership function.

reset_parameters(self): Re-initialize all parameters.

forward(self, X)

Forward method of Pytorch Module.

Parameters: X (torch.tensor) – Pytorch tensor with the size of \([N, D]\), where \(N\) is the number of samples, \(D\) is the input dimension.
Returns: Firing level matrix \(U\) with the size of \([N, R], R=M^D\):.

antecedent_init_center(X, y=None, n_rule=2, method='kmean', engine='sklearn', n_init=20)

This function run KMeans clustering to obtain the init_center for AntecedentGMF().

>>> init_center = antecedent_init_center(X, n_rule=10, method="kmean", n_init=20)
>>> antecedent = AntecedentGMF(X.shape[1], n_rule=10, init_center=init_center)

Parameters

X (numpy.array) – Feature matrix with the size of \([N,D]\), where \(N\) is the number of samples, \(D\) is the number of features.
y (numpy.array) – None, not used.
n_rule (int) – Number of rules \(R\). This function will run a KMeans clustering to obtain \(R\) cluster centers as the initial antecedent center for TSK modeling.
method (str) – Current version only support “kmean”.
engine (str) – “sklearn” or “faiss”. If “sklearn”, then the sklearn.cluster.KMeans() function will be used, otherwise the faiss.Kmeans() will be used. Faiss provide a faster KMeans clustering algorithm, “faiss” is recommended for large datasets.
n_init (int) – Number of initialization of the KMeans algorithm. Same as the parameter n_init in sklearn.cluster.KMeans() and the parameter nredo in faiss.Kmeans().

[1] Cui Y, Wu D, Xu Y. Curse of dimensionality for tsk fuzzy neural networks: Explanation and solutions[C]//2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021: 1-8.

[2] Shi Y, Mizumoto M. A new approach of neuro-fuzzy learning algorithm for tuning fuzzy rules[J]. Fuzzy sets and systems, 2000, 112(1): 99-116.

pytsk.gradient_descent.tsk

class TSK(in_dim, out_dim, n_rule, antecedent, order=1, eps=1e-08, precons=None)

Parent: torch.nn.Module

This module define the consequent part of the TSK model and combines it with a pre-defined antecedent module. The input of this module is the raw feature matrix, and output the final prediction of a TSK model.

Parameters

in_dim (int) – Number of features \(D\).
out_dim (int) – Number of output dimension \(C\).
n_rule (int) – Number of rules \(R\), must equal to the n_rule of the Antecedent().
antecedent (torch.Module) – An antecedent module, whose output dimension should be equal to the number of rules \(R\).
order (int) – 0 or 1. The order of TSK. If 0, zero-oder TSK, else, first-order TSK.
eps (float) – A constant to avoid the division zero error.
consbn (torch.nn.Module) – If none, the raw feature will be used as the consequent input; If a pytorch module, then the consequent input will be the output of the given module. If you wish to use the BN technique we mentioned in Models & Technique, you can set precons=nn.BatchNorm1d(in_dim).

reset_parameters(self): Re-initialize all parameters, including both consequent and antecedent parts.

forward(self, X, get_frs=False)

Parameters

X (torch.tensor) – Input matrix with the size of \([N, D]\), where \(N\) is the number of samples.
get_frs (bool) – If true, the firing levels (the output of the antecedent) will also be returned.

Returns

If get_frs=True, return the TSK output \(Y\in \mathbb{R}^{N,C}\) and the antecedent output \(U\in \mathbb{R}^{N,R}\). If get_frs=False, only return the TSK output \(Y\).

pytsk.gradient_descent.training

ur_loss(frs, tau=0.5)

The uniform regularization (UR) proposed by Cui et al. [3]. UR loss is computed as \(\ell_{UR} = \sum_{r=1}^R (\frac{1}{N}\sum_{n=1}^N f_{n,r} - \tau)^2\), where \(f_{n,r}\) represents the firing level of the \(n\)-th sample on the \(r\)-th rule.

Parameters

frs (torch.tensor) – The firing levels (output of the antecedent) with the size of \([N, R]\), where \(N\) is the number of samples, \(R\) is the number of ruels.
tau (float) – The expectation \(\tau\) of the average firing level for each rule. For a \(C\)-class classification problem, we recommend setting \(\tau\) to \(1/C\), for a regression problem, \(\tau\) can be set as \(0.5\).

Returns

A scale value, representing the UR loss.

class Wrapper(model, optimizer, criterion, batch_size=512, epochs=1, callbacks=None, label_type='c', device='cpu', reset_param=True, ur=0, ur_tau=0.5, **kwargs)

This class provide a training framework for beginners to train their fuzzy neural networks.

Parameters

model (torch.nn.Module) – The pre-defined TSK model.
optimizer (torch.Optimizer) – Pytorch optimizer.
torch.nn._Loss – Pytorch loss. For example, torch.nn.CrossEntropyLoss() for classification tasks, and torch.nn.MSELoss() for regression tasks.
batch_size (int) – Batch size during training & prediction.
epochs (int) – Training epochs.
callbacks ([Callback]) – List of callbacks.
label_type (str) – Label type, “c” or “r”, when label_type="c", label’s dtype will be changed to “int64”, when label_type="r", label’s dtype will be changed to “float32”.

>>> from pytsk.gradient_descent import antecedent_init_center, AntecedentGMF, TSK, EarlyStoppingACC, EvaluateAcc, Wrapper
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.metrics import accuracy_score
>>> from sklearn.datasets import make_classification
>>> from sklearn.preprocessing import StandardScaler
>>> from torch.optim import AdamW
>>> import torch.nn as nn
>>> # ----------------- define data -----------------
>>> X, y = make_classification(random_state=0)
>>> x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
>>> ss = StandardScaler()
>>> x_train = ss.fit_transform(x_train)
>>> x_test = ss.transform(x_test)
>>> # ----------------- define TSK model -----------------
>>> n_rule = 10  # define number of rules
>>> n_class = 2  # define output dimension
>>> order = 1  # first-order TSK is used
>>> consbn = True  # consbn tech is used
>>> weight_decay = 1e-8  # weight decay for pytorch optimizer
>>> lr = 0.01  # learning rate for pytorch optimizer
>>> init_center = antecedent_init_center(x_train, y_train, n_rule=n_rule)  # obtain the initial antecedent center
>>> gmf = AntecedentGMF(in_dim=x_train.shape[1], n_rule=n_rule, high_dim=True, init_center=init_center)  # define antecedent
>>>  model = TSK(in_dim=x_train.shape[1], out_dim=n_class, n_rule=n_rule, antecedent=gmf, order=order, consbn=consbn)  # define TSK
>>> # ----------------- define optimizers -----------------
>>> ante_param, other_param = [], []
>>> for n, p in model.named_parameters():
>>>     if "center" in n or "sigma" in n:
>>>         ante_param.append(p)
>>>     else:
>>>         other_param.append(p)
>>> optimizer = AdamW(
>>>     [{'params': ante_param, "weight_decay": 0},  # antecedent parameters usually don't need weight_decay
>>>     {'params': other_param, "weight_decay": weight_decay},],
>>>     lr=lr
>>> )
>>> # ----------------- split 20% data for earlystopping -----------------
>>> x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2)
>>> # ----------------- define the earlystopping callback -----------------
>>> EACC = EarlyStoppingACC(x_val, y_val, verbose=1, patience=40, save_path="tmp.pkl")  # Earlystopping
>>> TACC = EvaluateAcc(x_test, y_test, verbose=1)  # Check test acc during training
>>> # ----------------- train model -----------------
>>> wrapper = Wrapper(model, optimizer=optimizer, criterion=nn.CrossEntropyLoss(),
>>>               epochs=300, callbacks=[EACC, TACC], ur=0, ur_tau=1/n_class)  # define training wrapper, ur weight is set to 0
>>> wrapper.fit(x_train, y_train)  # fit
>>> wrapper.load("tmp.pkl")  # load best model saved by EarlyStoppingACC callback
>>> y_pred = wrapper.predict(x_test).argmax(axis=1)  # predict, argmax for extracting classfication label
>>> print("[TSK] ACC: {:.4f}".format(accuracy_score(y_test, y_pred)))  # print ACC

train_on_batch(self, input, target)

Define how to update a model with one batch of data. This method can be overwrite for custom training strategy.

Parameters

input (torch.tensor) – Feature matrix with the size of \([N,D]\), \(N\) is the number of samples, \(D\) is the input dimension.
target (torch.tensor) – Target matrix with the size of \([N,C]\), \(C\) is the output dimension.

fit(X, y)

Train the model with numpy array.

Parameters

X (numpy.array) – Feature matrix \(X\) with the size of \([N, D]\).
y (numpy.array) – Label matrix \(Y\) with the size of \([N, C]\), for classification task, \(C=1\), for regression task, \(C\) is the number of the output dimension of model.

fit_loader(self, train_loader)

Train the model with user-defined pytorch dataloader.

Parameters: train_loader (torch.utils.data.DataLoader) – Data loader, the output of the loader should be corresponding to the inputs of train_on_batch. For example, if dataloader has two output, then train_on_batch should also have two inputs.

predict(self, X, y=None)

Get the prediction of the model.

Parameters

X (numpy.array) – Feature matrix \(X\) with the size of \([N, D]\).
y – Not used.

Returns

Prediction matrix \(\hat{Y}\) with the size of \([N, C]\), \(C\) is the output dimension of the model.

predict_proba(self, X, y=None)

For classification problem only, need label_type="c", return the prediction after softmax.

Parameters

X (numpy.array) – Feature matrix \(X\) with the size of \([N, D]\).
y – Not used.

Returns

Prediction matrix \(\hat{Y}\) with the size of \([N, C]\), \(C\) is the output dimension of the model.

save(self, path)

Save model.

Parameters: path (str) – Model save path.

load(self, path)

Load model.

Parameters: path (str) – Model save path.

[3] Cui Y, Wu D, Huang J. Optimize tsk fuzzy systems for classification problems: Minibatch gradient descent with uniform regularization and batch normalization[J]. IEEE Transactions on Fuzzy Systems, 2020, 28(12): 3065-3075.

pytsk.gradient_descent.callbacks

class Callback

Similar as the callback class in Keras, our package provides a simplified version of callback, which allow users to monitor metrics during the training. We strongly recommend uses to custom their callbacks, here we provide two examples, EvaluateAcc and EarlyStoppingACC.

on_batch_begin(self, wrapper): Will be called before each batch.

on_batch_end(self, wrapper): Will be called after each batch.

on_epoch_begin(self, wrapper): Will be called before each epoch.

on_epoch_end(self, wrapper): Will be called after each epoch.

class EvaluateAcc(X, y, verbose=0)

Evaluate the accuracy during training.

Parameters

X (numpy.array) – Feature matrix with the size of \([N, D]\).
y (numpy.array) – Label matrix with the size of \([N, 1]\).

on_epoch_end(self, wrapper)

>>> def on_epoch_end(self, wrapper):
>>>     cur_log = {}
>>>     y_pred = wrapper.predict(self.X).argmax(axis=1)
>>>     acc = accuracy_score(y_true=self.y, y_pred=y_pred)
>>>     cur_log["epoch"] = wrapper.cur_epoch
>>>     cur_log["acc"] = acc
>>>     self.logs.append(cur_log)
>>>     if self.verbose > 0:
>>>         print("[Epoch {:5d}] Test ACC: {:.4f}".format(cur_log["epoch"], cur_log["acc"]))

Parameters: wrapper (Wrapper) – The training Wrapper.

class EarlyStoppingACC(X, y, patience=1, verbose=0, save_path=None)

Early-stopping by classification accuracy.

Parameters

X (numpy.array) – Feature matrix with the size of \([N, D]\).
y (numpy.array) – Label matrix with the size of \([N, 1]\).
patience (int) – Number of epochs with no improvement after which training will be stopped.
verbose (int) – verbosity mode.
save_path (str) – If save_path=None, do not save models, else save the model with the best accuracy to the given path.

on_epoch_end(self, wrapper)

Calculate the validation accuracy and determine whether to stop training.

>>> def on_epoch_end(self, wrapper):
>>>     cur_log = {}
>>>     y_pred = wrapper.predict(self.X).argmax(axis=1)
>>>     acc = accuracy_score(y_true=self.y, y_pred=y_pred)
>>>     if acc > self.best_acc:
>>>         self.best_acc = acc
>>>         self.cnt = 0
>>>         if self.save_path is not None:
>>>             wrapper.save(self.save_path)
>>>     else:
>>>         self.cnt += 1
>>>         if self.cnt > self.patience:
>>>              wrapper.stop_training = True
>>>     cur_log["epoch"] = wrapper.cur_epoch
>>>     cur_log["acc"] = acc
>>>     cur_log["best_acc"] = self.best_acc
>>>     self.logs.append(cur_log)
>>>     if self.verbose > 0:
>>>         print("[Epoch {:5d}] EarlyStopping Callback ACC: {:.4f}, Best ACC: {:.4f}".format(cur_log["epoch"], cur_log["acc"], cur_log["best_acc"]))

Parameters: wrapper (Wrapper) – The training Wrapper.

pytsk.gradient_descent.utils

check_tensor(tensor, dtype)

Convert tensor into a dtype torch.Tensor.

Parameters

tensor (numpy.array/torch.tensor) – Input data.
dtype (str) – PyTorch dtype string.

Returns

A dtype torch.Tensor.

reset_params(model)

Reset all parameters in model.

Parameters: model (torch.nn.Module) – Pytorch model.

class NumpyDataLoader(*inputs)

Convert numpy arrays into a dataloader.

Parameters: inputs (numpy.array) – Numpy arrays.