Skip to main content

Table 2 The descriptions and main parameter settings of the employed ML models

From: A cross-sectional study: a breathomics based pulmonary tuberculosis detection method

ML models

Descriptions

Main parameter settingsa

RF

A meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting

n_estimators = 100, max_features = 0.5, min_samples_split = 4, min_samples_leaf = 10, criterion = "entropy"

SVM

Solves the separation hyperplane which can divide the training data set correctly and has the maximum geometric interval

penalty = "l2", loss = "squared_hinge", tol = 1e−5, C = 5.0, max_iter = 1e + 5

LR

Estimates the probability of an event occurring based on a given dataset of independent variables

tol = 1e−5, C = 5.0, max_iter = 1e + 4

XGB

A boosting algorithm based on gradient boosted decision trees algorithm

booster: "gbtree", max_depth: 8, n_estimators: 100, min_child_weight: 3, gamma: 0.15, lambda: 2

DT

Employs a divide and conquer strategy by conducting a greedy search to identify the optimal split points within a tree

criterion = "gini", splitter = "best", min_samples_split = 2, min_samples_leaf = 1

  1. aThese algorithms were achieved based on python packages: xgboost (https://xgboost.readthedocs.io/en/stable/python/python_intro.html) and sklearn (https://scikit-learn.org/stable/user_guide.html)