Skip to main content

Table 3 The frequency of H. pylori risk factors being chosen for all feature selection methods

From: Helicobacter pylori (H. pylori) risk factor analysis and prevalence prediction: a machine learning-based approach

Feature

All*

Multivariate LR

Residence

100%(± 0%)

X

Allergies

43.16%(± 3.82%)

 

Parasites

44.42% ± (4.09%)

 

Cooking area

64.32% ± (3.74%)

 

Dewormed status

60.42%(± 4.11%)

X

Cow

38.63%(± 3.58%)

X

Smoking

73.26%(± 3.5%)

X

Cat: lives inside

47.89%(± 3.86%)

 

Cat: kept outside

50.63%(± 3.91%)

 

Dog: lives inside

54%(± 3.84%)

 

Dog: kept outside

26%(± 2.78%)

 

Electricity use: sometimes

88%(± 2.71%)

X

Electricity use: never

89.47%(± 3.17%)

X

Floor in home: wood

44.32%(± 3.78%)

 

Floor in home: mud

27.37%(± 2.82%)

 

Floor in Home: Other

52%(± 3.76%)

 

Waste disposal: pit

44.84%(± 3.88%)

 

Waste disposal: open field

79.05%(± 3.2%)

X

Waste disposal: burn

29.26%(± 2.98%)

 

Age: 6–10 years

32.63%(± 3.35%)

 

Age: 11–15 years

30.11%(± 3.54%)

 

Family size: 4–5

35.16%(± 3.47%)

 

Family size: > 5

46%(± 3.94%)

 

Toilet: pit

76.74%(± 3.73%)

X

Toilet: open field

54.21%(± 3.57%)

 

Water source: well

44.11%(± 3.38%)

 

Water source: river or rain water

38.63%(± 3.45%)

 
  1. *Results from ranking-based, subset-based, and SFFS feature selection methods are combined. The features are indicated in the first column. The second column shows the average (± 1 standard error) frequency of being picked across all feature selection methods and cross-validation folds. The third column shows the features that the multivariate logistic regression approach determined to be significant. Bold, italic, and bold italic highlighted numbers show features that occur more frequently than 75 percent, 60–75 percent, and 50–60 percent, respectively