Skip to main content

Table 4 Dependent variable distribution of STI before and after applying the balanced imbalanced data handling technique (methods)

From: Spatial distribution and machine learning prediction of sexually transmitted infections and associated factors among sexually active men and women in Ethiopia, evidence from EDHS 2016

Sampling methods

Class 1:

(STI)

Class:2

(No_STI)

Total

Before balancing (unbalanced data)

667

20,132

20,799

3.2%

96.8%

100%

Under sampling (balancing)

534

530

1064

50.2%

49.8%

100%

Oversampling (balancing)

16,152

16,106

32,258

50.1%

49.9%

100%

ROSE sampling (balancing)

8338

8302

16,640

50.1%

49.9%

100%

Both under and over (balancing)

10,350

10,449

20,799

49.8%

50.2%

100%

SMOTE sampling (balancing)

1068

1068

2136

50%

50%

100%