Welcome to PyTSK’s documentation!
PyTSK is a package for conveniently developing a TSK-type fuzzy neural networks. It’s dependencies are as follows:
Scikit-learn [Necessary] for machine learning operations.
Numpy [Necessary] for matrix computing operations.
Scipy [Necessary] for matrix computing operations.
PyTorch [Necessary] for constructing and training fuzzy neural networks.
Faiss [Optional] a faster version for k-means clustering.
To run the code in quick start, you also need to install the following packages:
PMLB for downloading datasets.
Installation Guide
Dependency
PyTSK is a package for conveniently developing a TSK-type fuzzy neural networks. It’s dependencies are as follows:
Scikit-learn [Necessary] for machine learning operations.
Numpy [Necessary] for matrix computing operations.
PyTorch [Necessary] for constructing and training fuzzy neural networks.
Faiss [Optional] a faster version for k-means clustering.
To run the code in quick start, you also need to install the following packages:
PMLB for downloading datasets.
Pip install
Install by pip:
pip install scikit-pytsk
Source code can be found in Github: https://github.com/YuqiCui/PyTSK
Quick Start
Training with gradient descent
Complete code can be found at: https://github.com/YuqiCui/PyTSK/blob/master/quickstart_gradient_descent.py
Import everything you need:
import numpy as np
import torch.nn as nn
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from torch.optim import AdamW
from pytsk.gradient_descent.antecedent import AntecedentGMF, antecedent_init_center
from pytsk.gradient_descent.callbacks import EarlyStoppingACC
from pytsk.gradient_descent.training import Wrapper
from pytsk.gradient_descent.tsk import TSK
Prepare dataset:
# Prepare dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2) # X: [n_samples, n_features], y: [n_samples, 1]
n_class = len(np.unique(y)) # Num. of class
# split train-test
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
print("Train on {} samples, test on {} samples, num. of features is {}, num. of class is {}".format(
x_train.shape[0], x_test.shape[0], x_train.shape[1], n_class
))
# Z-score
ss = StandardScaler()
x_train = ss.fit_transform(x_train)
x_test = ss.transform(x_test)
Define TSK parameters:
# Define TSK model parameters
n_rule = 30 # Num. of rules
lr = 0.01 # learning rate
weight_decay = 1e-8
consbn = False
order = 1
Construct TSK model, for example, HTSK model with LN-ReLU:
# --------- Define antecedent ------------
init_center = antecedent_init_center(x_train, y_train, n_rule=n_rule)
gmf = nn.Sequential(
AntecedentGMF(in_dim=X.shape[1], n_rule=n_rule, high_dim=True, init_center=init_center),
nn.LayerNorm(n_rule),
nn.ReLU()
)
# set high_dim=True is highly recommended.
# --------- Define full TSK model ------------
model = TSK(in_dim=X.shape[1], out_dim=n_class, n_rule=n_rule, antecedent=gmf, order=order, precons=None)
Define optimizer, split train-val, define earlystopping callback:
# ----------------- optimizer ----------------------------
ante_param, other_param = [], []
for n, p in model.named_parameters():
if "center" in n or "sigma" in n:
ante_param.append(p)
else:
other_param.append(p)
optimizer = AdamW(
[{'params': ante_param, "weight_decay": 0},
{'params': other_param, "weight_decay": weight_decay},],
lr=lr
)
# ----------------- split 10% data for earlystopping -----------------
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.1)
# ----------------- define the earlystopping callback -----------------
EACC = EarlyStoppingACC(x_val, y_val, verbose=1, patience=20, save_path="tmp.pkl")
Train TSK model:
wrapper = Wrapper(model, optimizer=optimizer, criterion=nn.CrossEntropyLoss(),
epochs=300, callbacks=[EACC])
wrapper.fit(x_train, y_train)
wrapper.load("tmp.pkl")
Evaluate model’s performance:
y_pred = wrapper.predict(x_test).argmax(axis=1)
print("[TSK] ACC: {:.4f}".format(accuracy_score(y_test, y_pred)))
Training with fuzzy clustering
Complete code can be found at: https://github.com/YuqiCui/PyTSK/blob/master/quickstart_fuzzy_clustering.py
Import everything you need:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import RidgeClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from pytsk.cluster import FuzzyCMeans
Prepare dataset:
# Prepare dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2) # X: [n_samples, n_features], y: [n_samples, 1]
n_class = len(np.unique(y)) # Num. of class
# split train-test
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
print("Train on {} samples, test on {} samples, num. of features is {}, num. of class is {}".format(
x_train.shape[0], x_test.shape[0], x_train.shape[1], n_class
))
# Z-score
ss = StandardScaler()
x_train = ss.fit_transform(x_train)
x_test = ss.transform(x_test)
Define & train the TSK model:
# --------------- Fit and predict ---------------
n_rule = 20
model = Pipeline(
steps=[
("GaussianAntecedent", FuzzyCMeans(n_rule, sigma_scale="auto", fuzzy_index="auto")),
("Consequent", RidgeClassifier())
]
)
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
print("ACC: {:.4f}".format(accuracy_score(y_test, y_pred)))
If you need analysis the input of consequent part:
# ---------------- get the input of consequent part for further analysis-----------------
antecedent = model.named_steps['GaussianAntecedent']
consequent_input = antecedent.transform(x_test)
If you need grid search all important parameters:
param_grid = {
"Consequent__alpha": [0.01, 0.1, 1, 10, 100],
"GaussianAntecedent__n_rule": [10, 20, 30, 40],
"GaussianAntecedent__sigma_scale": [0.01, 0.1, 1, 10, 100],
"GaussianAntecedent__fuzzy_index": ["auto", 1.8, 2, 2.2],
}
search = GridSearchCV(model, param_grid, n_jobs=2, cv=5, verbose=10)
search.fit(x_train, y_train)
y_pred = search.predict(x_test)
print("ACC: {:.4f}".format(accuracy_score(y_test, y_pred)))
Evaluate model’s performance:
y_pred = wrapper.predict(x_test).argmax(axis=1)
print("[TSK] ACC: {:.4f}".format(accuracy_score(y_test, y_pred)))
Complete code can be found at: https://github.com/YuqiCui/PyTSK/quick_start.py
Models & Technique
TSK
A basic TSK fuzzy system is a combination of \(R\) rules, the \(r\)-th rule can be represented as:
\(\text{Rule}_r:~\text{IF}~x_1~\text{is}~X_{r,1}~\text{and}~ ... ~\text{and}~x_D~\text{is}~ X_{r,D}\\ ~~~~~~~~~~~~~\text{THEN}~ y=w_1x_1 + ... + w_Dx_D + b,\)
where \(x_d\) is \(d\)-th input feature, \(X_{r,d}\) is the membership function of the \(d\)-th input feature in the \(r\)-th rule. The IF part is called antecedent, the THEN part is called consequent in this package. The antecedent output the firing levels of the rules (for those who are not familiar with fuzzy systems, you can understand the firing levels as the attention weight of a Transformer/Mixture-of-experts(MoE) model), and the consequent part output the final prediction.
To define a TSK model, we need to define both antecedent and consequent modules:
# --------- Data format ------------
# X: feature matrix, [n_data, n_dim] each row represents a sample with n_dim features
# y: label matrix, [n_data, 1]
# --------- Define TSK model parameters ------------
n_rule = 10 # define num. of rules
n_class = 2 # define num. of class (model output dimension)
order = 1 # 0 or 1, zero-order TSK model or first-order TSK model
# --------- Define antecedent ------------
# run kmeans clustering to get initial rule centers
init_center = antecedent_init_center(X, y, n_rule=n_rule)
# define the antecedent Module
gmf = AntecedentGMF(in_dim=X.shape[1], n_rule=n_rule, init_center=init_center)
# --------- Define full TSK model ------------
model = TSK(in_dim=X.shape[1], out_dim=n_class, n_rule=n_rule, antecedent=gmf, order=order)
HTSK
Traditional TSK model tends to fail on high-dimensional problems, so the HTSK (high-dimensional TSK) model is recommended for handling any-dimension problems. More details about the HTSK model can be found in [1].
To define a HTSK model, we need to set high_dim=True
when define antecedent:
init_center = antecedent_init_center(X, y, n_rule=n_rule)
gmf = AntecedentGMF(in_dim=X.shape[1], n_rule=n_rule, init_center=init_center, high_dim=True)
DropRule
Similar as Dropout, randomly dropping rules of TSK (DropRule) can improve the performance of TSK models [2,3,4].
To use DropRule, we need to add a Dropout layer after the antecedent output:
# --------- Define antecedent ------------
init_center = antecedent_init_center(X, y, n_rule=n_rule)
gmf = nn.Sequential(
AntecedentGMF(in_dim=X.shape[1], n_rule=n_rule, high_dim=True, init_center=init_center),
nn.Dropout(p=0.25)
)
# --------- Define full TSK model ------------
model = TSK(in_dim=X.shape[1], out_dim=n_class, n_rule=n_rule, antecedent=gmf, order=order)
Batch Normalization
Batch normalization (BN) can be used to normalize the input of consequent parameters, and the experiments in [5] have shown that BN can speed up the convergence and improve the performance of a TSK model.
To add the BN layer, we need to set precons=nn.BatchNorm1d(in_dim)
when defining the TSK model:
model = TSK(in_dim=X.shape[1], out_dim=n_class, n_rule=n_rule, antecedent=gmf, order=order, precons=nn.BatchNorm1d(in_dim))
Uniform Regularization
[5] also proposed a uniform regularization, which can mitigate the “winner gets all” problem when training TSK with mini-batch gradient descent algorithms. The “winner gets all” problem will cause only a small number of rules dominant the prediction, other rules will have nearly zero contribution to the prediction. The uniform regularization loss is:
where \(f_{n,r}\) represents the firing level of the \(n\)-th sample on the \(r\)-th rule, \(N\) is the batch size. Experiments in [5] has proved that UR can significantly improve the performance of TSK fuzzy system. The UR loss is defined at ur_loss
. If you want to use the UR during training, you can simply set a positive UR weight when initialize Wrapper
:
ur = 1. # must > 0
ur_tau # a float number between 0 and 1
wrapper = Wrapper(
model, optimizer=optimizer, criterion=criterion, epochs=epochs, callbacks=callbacks, ur=ur, ur_tau=ur_tau
)
Layer Normalization
Layer normalization (LN) can be used to normalize the firing levels of the antecedent part [6]. It can be easily proved that the scale of firing levels will decrease when more rules are used. Since the gradient of the parameters in TSK are all relevant with the firing level, it will cause a gradient vanishing problem, making TSKs perform bad, especially when using SGD/SGDM as optimizer. LN normalizes the firing level, similar as the LN layer in Transformer, can solve the gradient vanishing problems and improve the performance. Adding a ReLU acitivation can further filter the negative firing levels generated by LN, improving the interpretability and robustness to outliers.
To add LN & ReLU, we can do:
# --------- Define antecedent ------------
init_center = antecedent_init_center(X, y, n_rule=n_rule)
gmf = nn.Sequential(
AntecedentGMF(in_dim=X.shape[1], n_rule=n_rule, high_dim=True, init_center=init_center),
nn.LayerNorm(n_rule),
nn.ReLU()
)
# --------- Define full TSK model ------------
model = TSK(in_dim=x_train.shape[1], out_dim=n_class, n_rule=n_rule, antecedent=gmf, order=order)
[6] Cui Y, Wu D, Xu Y, Peng R. Layer Normalization for TSK Fuzzy System Optimization in Regression Problems[J]. IEEE Transactions on Fuzzy Systems, submitted.
Deep learning
TSK models can also be used as a classifier/regressor in a deep neural network, which may improving the performance of neural networks. To do that, we first need to get the middle output of neural networks for antecedent initialization, and then define the deep fuzzy systems as follows:
# --------- Define antecedent ------------
# Note that X should be the output of NeuralNetworks, y is still the corresponding label
init_center = antecedent_init_center(X, y, n_rule=n_rule)
gmf = AntecedentGMF(in_dim=X.shape[1], n_rule=n_rule, high_dim=True, init_center=init_center)
# ---------- Define deep learning + TSK -------------
model = nn.Sequential(
NeuralNetworks(),
TSK(in_dim=X.shape[1], out_dim=n_class, n_rule=n_rule, antecedent=gmf, order=order),
)
API: pytsk.cluster
pytsk.cluster
mainly provide fuzzy clustering algorithms. Each algorithm will implemente a transform
method, which convert the input of raw feature \(X \in \mathbb{R}^{N,D}\) into the consequent input matrix \(P \in \mathbb{R}^{N,T}\) of TSK fuzzy systems, where \(D\) is the input dimension, \(N\) is the number of samples, for a zero order TSK fuzzy system, \(T=R\), where \(R\) is the number of rules (equal to the number of clusters of the fuzzy clustering algorithm), for a first-order TSK fuzzy system, \(T = (D+1)\times R\).
- class pytsk.cluster.BaseFuzzyClustering
Parent:
object
.The parent class of fuzzy clustering classes.
- set_params(self, **params)
Setting attributes. Implemented to adapt the API of scikit-learn.
- class pytsk.cluster.FuzzyCMeans(n_cluster, fuzzy_index='auto', sigma_scale='auto', init='random', tol_iter=100, error=1e-06, dist='euclidean', verbose=0, order=1)
Parent:
BaseFuzzyClustering
,sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
.The fuzzy c-means (FCM) clustering algorithm [1]. This implementation is adopted from the scikit-fuzzy package. When constructing a TSK fuzzy system, a fuzzy clustering algorithm is usually used to compute the antecedent parameters, after that, the consequent parameters can be computed by least-squared error algorithms, such as Ridge regression [2]. How to use this class can be found at Quick start.
The objective function of the FCM is:
\[\begin{split}&J = \sum_{i=1}^{N}\sum_{j=1}^{C} U_{i,j}^m\|\mathbf{x}_i - \mathbf{v}_j\|_2^2\\ &s.t. \sum_{j=1}^{C}\mu_{i,j} = 1, i = 1,...,N,\end{split}\]where \(N\) is the number of samples, \(C\) is the number of clusters (which also corresponding to the number of rules of TSK fuzzy systems), \(m\) is the fuzzy index, \(\mathbf{x}_i\) is the \(i\)-th input vector, \(\mathbf{v}_j\) is the \(j\)-th cluster center vector, \(U_{i,j}\) is the membership degree of the \(i\)-th input vector on the \(j\)-th cluster center vector. The FCM algorithm will obtain the centers \(\mathbf{v}_j, j=1,...,C\) and the membership degrees \(U_{i,j}\).
- Parameters
n_cluster (int) – Number of clusters, equal to the number of rules \(R\) of a TSK model.
fuzzy_index (float/str) – Fuzzy index of the FCM algorithm, default auto. If
fuzzy_index=auto
, then the fuzzy index is computed as \(\min(N, D-1) / (\min(N, D-1)-2)\) (If \(\min(N, D-1)<3\), fuzzy index will be set to 2), according to [3]. Otherwise the given float value is used.sigma_scale (float/str) – The scale parameter \(h\) to adjust the actual standard deviation \(\sigma\) of the Gaussian membership function in TSK antecedent part. If
sigma_scale=auto
,sigma_scale
will be set as \(\sqrt{D}\), where \(D\) is the input dimension [4]. Otherwise the given float value is used.init (str/np.array) – The initialization strategy of the membership grid matrix \(U\). Support “random” or numpy array with the size of \([R, N]\), where \(R\) is the number of clusters/rules, \(N\) is the number of training samples. If
init="random"
, the initial membership grid matrix will be randomly initialized, otherwise the given matrix will be used.tol_iter (int) – The total iteration of the FCM algorithm.
error (float) – The maximum error that will stop the iteration before maximum iteration is reached.
dist (str) – The distance type for the
scipy.spatial.distance.cdist()
function, default “euclidean”. The distance function can also be “braycurtis”, “canberra”, “chebyshev”, “cityblock”, “correlation”, “cosine”, “dice”, “euclidean”, “hamming”, “jaccard”, “jensenshannon”, “kulsinski”, “kulczynski1”, “mahalanobis”, “matching”, “minkowski”, “rogerstanimoto”, “russellrao”, “seuclidean”, “sokalmichener”, “sokalsneath”, “sqeuclidean”, “yule”.verbose (int) – If > 0, it will show the loss of the FCM objective function during iterations.
order (int) – 0 or 1. Decide whether to construct a zero-order TSK or a first-order TSK.
- fit(self, X, y=None)
Run the FCM algorithm.
- Parameters
X (numpy.array) – Input array with the size of \([N, D]\), where \(N\) is the number of training samples, and \(D\) is number of features.
y (numpy.array) – Not used. Pass None.
- predict(self, X, y=None)
Predict the membership degrees of
X
on each cluster.- Parameters
X (numpy.array) – Input array with the size of \([N, D]\), where \(N\) is the number of training samples, and \(D\) is number of features.
y (numpy.array) – Not used. Pass None.
- Returns
return the membership degree matrix \(U\) with the size of \([N, R]\), where \(N\) is the number of samples of
X
, and \(R\) is the number of clusters/rules. \(U_{i,j}\) represents the membership degree of the \(i\)-th sample on the \(r\)-th cluster.
- transform(self, X, y=None)
Compute the membership degree matrix \(U\), and use \(X\) and \(U\) to get the consequent input matrix \(P\) using function
x2xp(x, u, order)
- Parameters
X (numpy.array) – Input array with the size of \([N, D]\), where \(N\) is the number of training samples, and \(D\) is number of features.
y (numpy.array) – Not used. Pass None.
- Returns
return the consequent input \(X_p\) with the size of \([N, (D+1)\times R]\), where \(N\) is the number of test samples, \(D\) is number of features, \(R\) is the number of clusters/rules.
- x2xp(X, U, order)
Convert the feature matrix \(X\) and the membership degree matrix \(U\) into the consequent input matrix \(X_p\)
Each row in \(X\in \mathbb{R}^{N,D}\) represents a \(D\)-dimension input vector. Suppose vector \(\mathbf{x}\) is one row, and then the consequent input matrix \(P\) is computed as [5] for a first-order TSK:
\[\begin{split}&\mathbf{x}_e = (1, \mathbf{x}),\\ &\tilde{\mathbf{x}}_r = u_r \mathbf{x}_e,\\ &\mathbf{p} = (\tilde{\mathbf{x}}_1, \tilde{\mathbf{x}}_2, ...,\tilde{\mathbf{x}}_R),\end{split}\]where \(\mathbf{p}\) is the corresponding row in \(P\), which is a \((D+1)\times R\)-dimension vector. Then the consequent parameters of TSK can be optimized by any linear regression algorithms.
- Parameters
x (numpy.array) – size: \([N,D]\). Input features.
u (numpy.array) – size: \([N,R]\). Corresponding membership degree matrix.
order (int) – 0 or 1. The order of TSK models.
- Returns
If
order=0
, return \(U\) directly, else iforder=1
, return the matrix \(X_p\) with the size of \([N, (D+1)\times R]\). Details can be found at [2].
- compute_variance(X, U, V)
Compute the variance of the Gaussian membership function in TSK fuzzy systems. After performing the FCM, one can use \(\mathbf{v}_j\) and \(U_{i,j}\) to construct the Gaussian membership function based antecedent of a TSK fuzzy system. The center of the Gaussian membership function can be directly set as center mathbf{v}_j, the standard deviation of the Gaussian membership function can be computed as follows:
\[\sigma_{r,d}=\left[\sum_{i=1}^N U_{i,r}(x_{i,d}-v_{r,d})^2 / \sum_{i=1}^N U_{i,r} \right]^{1/2},\]where \(v_{r,d}\) represents the cluster center of the \(d\)-th dimension in the \(r\)-th rule.
- Parameters
x (numpy.array) – Input matrix \(X\) with the size of \([N, D]\).
u (numpy.array) – Membership degree matrix \(U\) with the size of \([R, N]\).
v (numpy.array) – Cluster center matrix \(V\) with the size of \([R, D]\).
- Returns
The standard variation matrix \(\Sigma\) with the size of \([R, D]\).
API: pytsk.gradient_descent
This package contains all the APIs you need to build and train a fuzzy neural networks.
pytsk.gradient_descent.antecedent
- class AntecedentGMF(in_dim, n_rule, high_dim=False, init_center=None, init_sigma=1.0, eps=1e-08)
Parent:
torch.nn.Module
The antecedent part with Gaussian membership function. Input: data, output the corresponding firing levels of each rule. The firing level \(f_r(\mathbf{x})\) of the \(r\)-th rule are computed by:
\[\begin{split}&\mu_{r,d}(x_d) = \exp(-\frac{(x_d - m_{r,d})^2}{2\sigma_{r,d}^2}),\\ &f_{r}(\mathbf{x})=\prod_{d=1}^{D}\mu_{r,d}(x_d),\\ &\overline{f}_r(\mathbf{x}) = \frac{f_{r}(\mathbf{x})}{\sum_{i=1}^R f_{i}(\mathbf{x})}.\end{split}\]- Parameters
in_dim (int) – Number of features \(D\) of the input.
n_rule (int) – Number of rules \(R\) of the TSK model.
high_dim (bool) – Whether to use the HTSK defuzzification. If
high_dim=True
, HTSK is used. Otherwise the original defuzzification is used. More details can be found at [1]. TSK model tends to fail on high-dimensional problems, so sethigh_dim=True
is highly recommended for any-dimensional problems.init_center (numpy.array) – Initial center of the Gaussian membership function with the size of \([D,R]\). A common way is to run a KMeans clustering and set
init_center
as the obtained centers. You can simply runpytsk.gradient_descent.antecedent.antecedent_init_center
to obtain the center.init_sigma (float) – Initial \(\sigma\) of the Gaussian membership function.
eps (float) – A constant to avoid the division zero error.
- init(self, center, sigma)
Change the value of
init_center
andinit_sigma
.- Parameters
center (numpy.array) – Initial center of the Gaussian membership function with the size of \([D,R]\). A common way is to run a KMeans clustering and set
init_center
as the obtained centers. You can simply runpytsk.gradient_descent.antecedent.antecedent_init_center
to obtain the center.sigma (float) – Initial \(\sigma\) of the Gaussian membership function.
- reset_parameters(self)
Re-initialize all parameters.
- forward(self, X)
Forward method of Pytorch Module.
- Parameters
X (torch.tensor) – Pytorch tensor with the size of \([N, D]\), where \(N\) is the number of samples, \(D\) is the input dimension.
- Returns
Firing level matrix \(U\) with the size of \([N, R]\).
Parent:
torch.nn.Module
The antecedent part with Gaussian membership function, rules will share the membership functions on each feature [2]. The number of rules will be \(M^D\), where \(M\) is
n_mf
, \(D\) is the number of features (in_dim
).- Parameters
in_dim (int) – Number of features \(D\) of the input.
n_mf (int) – Number of membership functions \(M\) of each feature.
high_dim (bool) – Whether to use the HTSK defuzzification. If
high_dim=True
, HTSK is used. Otherwise the original defuzzification is used. More details can be found at [1]. TSK model tends to fail on high-dimensional problems, so sethigh_dim=True
is highly recommended for any-dimensional problems.init_center (numpy.array) – Initial center of the Gaussian membership function with the size of \([D,M]\).
init_sigma (float) – Initial \(\sigma\) of the Gaussian membership function.
eps (float) – A constant to avoid the division zero error.
Change the value of
init_center
andinit_sigma
.- Parameters
center (numpy.array) – Initial center of the Gaussian membership function with the size of \([D,M]\).
sigma (float) – Initial \(\sigma\) of the Gaussian membership function.
Re-initialize all parameters.
Forward method of Pytorch Module.
- Parameters
X (torch.tensor) – Pytorch tensor with the size of \([N, D]\), where \(N\) is the number of samples, \(D\) is the input dimension.
- Returns
Firing level matrix \(U\) with the size of \([N, R], R=M^D\):.
- class AntecedentTriMF(in_dim, n_rule, init_center=None, init_left_dist=3.0, init_right_dist=3.0, eps=1e-08)
Parent:
torch.nn.Module
The antecedent part with triangle membership function. Input: data, output the corresponding firing levels of each rule. The firing level \(f_r(\mathbf{x})\) of the \(r\)-th rule are computed by:
\[\begin{split}&\mu_{r,d}(x_d) = \max(0, \min(\frac{1}{d^{(l)}_{r,d}}x + 1 - \frac{m_{r,d}}{d^{(l)}_{r,d}}, -\frac{1}{d^{(r)}_{r,d}}x + 1 + \frac{m_{r,d}}{d^{(r)}_{r,d}})),\\ &f_{r}(\mathbf{x})=\prod_{d=1}^{D}\mu_{r,d}(x_d),\\ &\overline{f}_r(\mathbf{x}) = \frac{f_{r}(\mathbf{x})}{\sum_{i=1}^R f_{i}(\mathbf{x})},\end{split}\]where \(d^{(l)}_{r,d}\)/\(d^{(r)}_{r,d}\) is the distance between the center and the left/right vertex of the triangle membership function, respectively. A simple schematic is shown below:
- Parameters
in_dim (int) – Number of features \(D\) of the input.
n_rule (int) – Number of rules \(R\) of the TSK model.
init_center (numpy.array) – Initial center of the triangle membership function with the size of \([D,M]\).
init_left_dist (float) – Initial \(d^{(l)}\) of the triangle membership function.
init_right_dist (float) – Initial \(d^{(r)}\) of the triangle membership function.
eps (float) – A constant to avoid the division zero error.
- init(self, center, left_dist, right_dist)
Change the value of
init_center
,init_left_dist
andinit_right_dist
.
- Parameters
init_center (numpy.array) – Initial center of the triangle membership function with the size of \([D,R]\).
left_dist (float) – Initial \(d^{(l)}\) of the triangle membership function.
right_dist (float) – Initial \(d^{(r)}\) of the triangle membership function.
- reset_parameters(self)
Re-initialize all parameters.
- forward(self, X)
Forward method of Pytorch Module.
- Parameters
X (torch.tensor) – Pytorch tensor with the size of \([N, D]\), where \(N\) is the number of samples, \(D\) is the input dimension.
- Returns
Firing level matrix \(U\) with the size of \([N, R]\):.
Parent:
torch.nn.Module
The antecedent part with triangle membership function, rules will share the membership functions on each feature [2]. The number of rules will be \(M^D\), where \(M\) is
n_mf
, \(D\) is the number of features (in_dim
).- Parameters
in_dim (int) – Number of features \(D\) of the input.
n_mf (int) – Number of membership functions \(M\) of each feature.
init_center (numpy.array) – Initial center of the triangle membership function with the size of \([D,M]\).
init_left_dist (float) – Initial \(d^{(l)}\) of the triangle membership function.
init_right_dist (float) – Initial \(d^{(r)}\) of the triangle membership function.
eps (float) – A constant to avoid the division zero error.
Change the value of
init_center
,init_left_dist
andinit_right_dist
.
- Parameters
init_center (numpy.array) – Initial center of the triangle membership function with the size of \([D,M]\).
left_dist (float) – Initial \(d^{(l)}\) of the triangle membership function.
right_dist (float) – Initial \(d^{(r)}\) of the triangle membership function.
Re-initialize all parameters.
Forward method of Pytorch Module.
- Parameters
X (torch.tensor) – Pytorch tensor with the size of \([N, D]\), where \(N\) is the number of samples, \(D\) is the input dimension.
- Returns
Firing level matrix \(U\) with the size of \([N, R], R=M^D\):.
- antecedent_init_center(X, y=None, n_rule=2, method='kmean', engine='sklearn', n_init=20)
This function run KMeans clustering to obtain the
init_center
forAntecedentGMF()
.>>> init_center = antecedent_init_center(X, n_rule=10, method="kmean", n_init=20) >>> antecedent = AntecedentGMF(X.shape[1], n_rule=10, init_center=init_center)
- Parameters
X (numpy.array) – Feature matrix with the size of \([N,D]\), where \(N\) is the number of samples, \(D\) is the number of features.
y (numpy.array) – None, not used.
n_rule (int) – Number of rules \(R\). This function will run a KMeans clustering to obtain \(R\) cluster centers as the initial antecedent center for TSK modeling.
method (str) – Current version only support “kmean”.
engine (str) – “sklearn” or “faiss”. If “sklearn”, then the
sklearn.cluster.KMeans()
function will be used, otherwise thefaiss.Kmeans()
will be used. Faiss provide a faster KMeans clustering algorithm, “faiss” is recommended for large datasets.n_init (int) – Number of initialization of the KMeans algorithm. Same as the parameter
n_init
insklearn.cluster.KMeans()
and the parameternredo
infaiss.Kmeans()
.
pytsk.gradient_descent.tsk
- class TSK(in_dim, out_dim, n_rule, antecedent, order=1, eps=1e-08, precons=None)
Parent:
torch.nn.Module
This module define the consequent part of the TSK model and combines it with a pre-defined antecedent module. The input of this module is the raw feature matrix, and output the final prediction of a TSK model.
- Parameters
in_dim (int) – Number of features \(D\).
out_dim (int) – Number of output dimension \(C\).
n_rule (int) – Number of rules \(R\), must equal to the
n_rule
of theAntecedent()
.antecedent (torch.Module) – An antecedent module, whose output dimension should be equal to the number of rules \(R\).
order (int) – 0 or 1. The order of TSK. If 0, zero-oder TSK, else, first-order TSK.
eps (float) – A constant to avoid the division zero error.
consbn (torch.nn.Module) – If none, the raw feature will be used as the consequent input; If a pytorch module, then the consequent input will be the output of the given module. If you wish to use the BN technique we mentioned in Models & Technique, you can set
precons=nn.BatchNorm1d(in_dim)
.
- reset_parameters(self)
Re-initialize all parameters, including both consequent and antecedent parts.
- forward(self, X, get_frs=False)
- Parameters
X (torch.tensor) – Input matrix with the size of \([N, D]\), where \(N\) is the number of samples.
get_frs (bool) – If true, the firing levels (the output of the antecedent) will also be returned.
- Returns
If
get_frs=True
, return the TSK output \(Y\in \mathbb{R}^{N,C}\) and the antecedent output \(U\in \mathbb{R}^{N,R}\). Ifget_frs=False
, only return the TSK output \(Y\).
pytsk.gradient_descent.training
- ur_loss(frs, tau=0.5)
The uniform regularization (UR) proposed by Cui et al. [3]. UR loss is computed as \(\ell_{UR} = \sum_{r=1}^R (\frac{1}{N}\sum_{n=1}^N f_{n,r} - \tau)^2\), where \(f_{n,r}\) represents the firing level of the \(n\)-th sample on the \(r\)-th rule.
- Parameters
frs (torch.tensor) – The firing levels (output of the antecedent) with the size of \([N, R]\), where \(N\) is the number of samples, \(R\) is the number of ruels.
tau (float) – The expectation \(\tau\) of the average firing level for each rule. For a \(C\)-class classification problem, we recommend setting \(\tau\) to \(1/C\), for a regression problem, \(\tau\) can be set as \(0.5\).
- Returns
A scale value, representing the UR loss.
- class Wrapper(model, optimizer, criterion, batch_size=512, epochs=1, callbacks=None, label_type='c', device='cpu', reset_param=True, ur=0, ur_tau=0.5, **kwargs)
This class provide a training framework for beginners to train their fuzzy neural networks.
- Parameters
model (torch.nn.Module) – The pre-defined TSK model.
optimizer (torch.Optimizer) – Pytorch optimizer.
torch.nn._Loss – Pytorch loss. For example,
torch.nn.CrossEntropyLoss()
for classification tasks, andtorch.nn.MSELoss()
for regression tasks.batch_size (int) – Batch size during training & prediction.
epochs (int) – Training epochs.
callbacks ([Callback]) – List of callbacks.
label_type (str) – Label type, “c” or “r”, when
label_type="c"
, label’s dtype will be changed to “int64”, whenlabel_type="r"
, label’s dtype will be changed to “float32”.
>>> from pytsk.gradient_descent import antecedent_init_center, AntecedentGMF, TSK, EarlyStoppingACC, EvaluateAcc, Wrapper >>> from sklearn.model_selection import train_test_split >>> from sklearn.metrics import accuracy_score >>> from sklearn.datasets import make_classification >>> from sklearn.preprocessing import StandardScaler >>> from torch.optim import AdamW >>> import torch.nn as nn >>> # ----------------- define data ----------------- >>> X, y = make_classification(random_state=0) >>> x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3) >>> ss = StandardScaler() >>> x_train = ss.fit_transform(x_train) >>> x_test = ss.transform(x_test) >>> # ----------------- define TSK model ----------------- >>> n_rule = 10 # define number of rules >>> n_class = 2 # define output dimension >>> order = 1 # first-order TSK is used >>> consbn = True # consbn tech is used >>> weight_decay = 1e-8 # weight decay for pytorch optimizer >>> lr = 0.01 # learning rate for pytorch optimizer >>> init_center = antecedent_init_center(x_train, y_train, n_rule=n_rule) # obtain the initial antecedent center >>> gmf = AntecedentGMF(in_dim=x_train.shape[1], n_rule=n_rule, high_dim=True, init_center=init_center) # define antecedent >>> model = TSK(in_dim=x_train.shape[1], out_dim=n_class, n_rule=n_rule, antecedent=gmf, order=order, consbn=consbn) # define TSK >>> # ----------------- define optimizers ----------------- >>> ante_param, other_param = [], [] >>> for n, p in model.named_parameters(): >>> if "center" in n or "sigma" in n: >>> ante_param.append(p) >>> else: >>> other_param.append(p) >>> optimizer = AdamW( >>> [{'params': ante_param, "weight_decay": 0}, # antecedent parameters usually don't need weight_decay >>> {'params': other_param, "weight_decay": weight_decay},], >>> lr=lr >>> ) >>> # ----------------- split 20% data for earlystopping ----------------- >>> x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2) >>> # ----------------- define the earlystopping callback ----------------- >>> EACC = EarlyStoppingACC(x_val, y_val, verbose=1, patience=40, save_path="tmp.pkl") # Earlystopping >>> TACC = EvaluateAcc(x_test, y_test, verbose=1) # Check test acc during training >>> # ----------------- train model ----------------- >>> wrapper = Wrapper(model, optimizer=optimizer, criterion=nn.CrossEntropyLoss(), >>> epochs=300, callbacks=[EACC, TACC], ur=0, ur_tau=1/n_class) # define training wrapper, ur weight is set to 0 >>> wrapper.fit(x_train, y_train) # fit >>> wrapper.load("tmp.pkl") # load best model saved by EarlyStoppingACC callback >>> y_pred = wrapper.predict(x_test).argmax(axis=1) # predict, argmax for extracting classfication label >>> print("[TSK] ACC: {:.4f}".format(accuracy_score(y_test, y_pred))) # print ACC
- train_on_batch(self, input, target)
Define how to update a model with one batch of data. This method can be overwrite for custom training strategy.
- Parameters
input (torch.tensor) – Feature matrix with the size of \([N,D]\), \(N\) is the number of samples, \(D\) is the input dimension.
target (torch.tensor) – Target matrix with the size of \([N,C]\), \(C\) is the output dimension.
- fit(X, y)
Train the
model
with numpy array.- Parameters
X (numpy.array) – Feature matrix \(X\) with the size of \([N, D]\).
y (numpy.array) – Label matrix \(Y\) with the size of \([N, C]\), for classification task, \(C=1\), for regression task, \(C\) is the number of the output dimension of
model
.
- fit_loader(self, train_loader)
Train the
model
with user-defined pytorch dataloader.- Parameters
train_loader (torch.utils.data.DataLoader) – Data loader, the output of the loader should be corresponding to the inputs of
train_on_batch
. For example, if dataloader has two output, thentrain_on_batch
should also have two inputs.
- predict(self, X, y=None)
Get the prediction of the model.
- Parameters
X (numpy.array) – Feature matrix \(X\) with the size of \([N, D]\).
y – Not used.
- Returns
Prediction matrix \(\hat{Y}\) with the size of \([N, C]\), \(C\) is the output dimension of the
model
.
- predict_proba(self, X, y=None)
For classification problem only, need
label_type="c"
, return the prediction after softmax.- Parameters
X (numpy.array) – Feature matrix \(X\) with the size of \([N, D]\).
y – Not used.
- Returns
Prediction matrix \(\hat{Y}\) with the size of \([N, C]\), \(C\) is the output dimension of the
model
.
- save(self, path)
Save model.
- Parameters
path (str) – Model save path.
- load(self, path)
Load model.
- Parameters
path (str) – Model save path.
pytsk.gradient_descent.callbacks
- class Callback
Similar as the callback class in Keras, our package provides a simplified version of callback, which allow users to monitor metrics during the training. We strongly recommend uses to custom their callbacks, here we provide two examples,
EvaluateAcc
andEarlyStoppingACC
.- on_batch_begin(self, wrapper)
Will be called before each batch.
- on_batch_end(self, wrapper)
Will be called after each batch.
- on_epoch_begin(self, wrapper)
Will be called before each epoch.
- on_epoch_end(self, wrapper)
Will be called after each epoch.
- class EvaluateAcc(X, y, verbose=0)
Evaluate the accuracy during training.
- Parameters
X (numpy.array) – Feature matrix with the size of \([N, D]\).
y (numpy.array) – Label matrix with the size of \([N, 1]\).
- on_epoch_end(self, wrapper)
>>> def on_epoch_end(self, wrapper): >>> cur_log = {} >>> y_pred = wrapper.predict(self.X).argmax(axis=1) >>> acc = accuracy_score(y_true=self.y, y_pred=y_pred) >>> cur_log["epoch"] = wrapper.cur_epoch >>> cur_log["acc"] = acc >>> self.logs.append(cur_log) >>> if self.verbose > 0: >>> print("[Epoch {:5d}] Test ACC: {:.4f}".format(cur_log["epoch"], cur_log["acc"]))
- class EarlyStoppingACC(X, y, patience=1, verbose=0, save_path=None)
Early-stopping by classification accuracy.
- Parameters
X (numpy.array) – Feature matrix with the size of \([N, D]\).
y (numpy.array) – Label matrix with the size of \([N, 1]\).
patience (int) – Number of epochs with no improvement after which training will be stopped.
verbose (int) – verbosity mode.
save_path (str) – If
save_path=None
, do not save models, else save the model with the best accuracy to the given path.
- on_epoch_end(self, wrapper)
Calculate the validation accuracy and determine whether to stop training.
>>> def on_epoch_end(self, wrapper): >>> cur_log = {} >>> y_pred = wrapper.predict(self.X).argmax(axis=1) >>> acc = accuracy_score(y_true=self.y, y_pred=y_pred) >>> if acc > self.best_acc: >>> self.best_acc = acc >>> self.cnt = 0 >>> if self.save_path is not None: >>> wrapper.save(self.save_path) >>> else: >>> self.cnt += 1 >>> if self.cnt > self.patience: >>> wrapper.stop_training = True >>> cur_log["epoch"] = wrapper.cur_epoch >>> cur_log["acc"] = acc >>> cur_log["best_acc"] = self.best_acc >>> self.logs.append(cur_log) >>> if self.verbose > 0: >>> print("[Epoch {:5d}] EarlyStopping Callback ACC: {:.4f}, Best ACC: {:.4f}".format(cur_log["epoch"], cur_log["acc"], cur_log["best_acc"]))
pytsk.gradient_descent.utils
- check_tensor(tensor, dtype)
Convert
tensor
into adtype
torch.Tensor.- Parameters
tensor (numpy.array/torch.tensor) – Input data.
dtype (str) – PyTorch dtype string.
- Returns
A
dtype
torch.Tensor.
- reset_params(model)
Reset all parameters in
model
.- Parameters
model (torch.nn.Module) – Pytorch model.
- class NumpyDataLoader(*inputs)
Convert numpy arrays into a dataloader.
- Parameters
inputs (numpy.array) – Numpy arrays.