Improvement of Bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization

Improvement of Bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization

Engineering Applications of Artificial Intelligence 87 (2020) 103319 Contents lists available at ScienceDirect Engineering Applications of Artificia...

3MB Sizes 0 Downloads 0 Views

Engineering Applications of Artificial Intelligence 87 (2020) 103319

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai

Improvement of Bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization✩ Seyed Ehsan Roshan, Shahrokh Asadi ∗ Data Mining Laboratory, Department of Engineering, College of Farabi, University of Tehran, Tehran, Iran

ARTICLE

INFO

Keywords: Multi-objective evolutionary Imbalanced datasets Ensemble learning Bagging Undersampling Diversity

ABSTRACT Today, classification of imbalanced datasets, in which the samples belonging to one class is more than the samples pertaining to other classes, has been paid much attention owing to its vast application in real-world problems. Bagging ensemble method, as one of the most favorite ensemble learning algorithms can provide better performance in solving imbalanced problems when is incorporated with undersampling methods. In Bagging method, diversity of classifiers, performance of classifiers, appropriate number of bags (classifiers) and balanced training sets to train the classifiers are important factors in successfulness of Bagging so as to deal with imbalanced problems. In this paper, through inspiring of evolutionary undersampling (the new undersampling method for seeking the subsets of majority class samples) and taking the mentioned factors into account, i.e., diversity, performance of classifiers, number of classifiers and balanced training set, a multiobjective optimization undersampling is proposed. The proposed method uses multi-objective evolutionary to produce set of diverse, well-performing and (near) balanced bags. Accordingly, the proposed method provides the possibility of generating diverse and well-performing classifiers and determining the number of classifiers in Bagging algorithm. Moreover, two different strategies are employed in the proposed method so as to improve the diversity. In order to confirm the proposed method’s efficiency, its performance is measured over 33 imbalanced datasets using AUC and then compared with 6 well-known ensemble learning algorithms. Investigating the obtained results of such comparisons using non-parametric statistical analysis demonstrate the dominancy of the proposed method compared to other employed techniques, as well.

1. Introduction The class imbalance problem exists in many real-world cases such as physician science (Krawczyk et al., 2016), biology (Anand et al., 2010), anomaly detection (Khreich et al., 2010), face recognition (Liu and Chen, 2005), text mining (Munkhdalai et al., 2015), sentimental analysis (Xu et al., 2015), etc. In these problems, the samples belonging to a class (positive or minority class), is less than samples belonging to other class (negative or majority) (Chawla et al., 2004). Accordingly, most of the classifiers learning algorithms with the goal of improving the accuracy have weak performance in handling the class imbalance problems, since owing to the simplicity of learning, they incline to majority samples, while in the classification of minority class samples has weak performance. So, detection of minority samples is harder than majority ones (Batista et al., 2004; Sun et al., 2009; Weiss, 2004) and consequently has higher importance. Imbalanced data in classification problems makes numerous challenges, especially in confronting with standard classification algorithms. These challenges are Haixiang et al. (2017): (1) standard

classifiers such as logistic regression, Support Vector Machine (SVM) and decision tree which have acceptable performance in classification of balanced datasets, however provide sub-optimal solutions in imbalanced problems. In other words, they have satisfactory performance in confronting with majority samples, while minority samples are wrongly classified in most of cases. (2) Using global performance metrics to guide learning process such as standard accuracy rate may lead to tendency of classification results toward majority class. (3) In class imbalance problems, samples of minority class may be considered as noise and vice versa, i.e., noise samples may be taken into account as minority class, since both of them have few samples. (4) Although data imbalance distributions are not always hard to learn (such as when the classes are separable), overlapping of minority class samples to other classes may lead to almost equal prior probability for each class in overlapped areas. Given that, distinction of two classes is very hard or even impossible. (5) Moreover, small disjuncts, lack of density and small sample size with high feature dimensionality can be considered challenge, since they make learning models failed in

✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.engappai.2019.103319. ∗ Corresponding author. E-mail addresses: [email protected] (S.E. Roshan), [email protected] (S. Asadi).

https://doi.org/10.1016/j.engappai.2019.103319 Received 11 May 2019; Received in revised form 21 October 2019; Accepted 22 October 2019 Available online xxxx 0952-1976/© 2019 Elsevier Ltd. All rights reserved.

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

detection of minority class samples. In order to solve the imbalance problems, the researchers have proposed different efficient methods during recent years including four groups (Galar et al., 2012): (1) Data-level (external) approaches rebalance class distribution with resampling of data space (Batista et al., 2004; Stefanowski and Wilk, 2008). Accordingly, they reduce class imbalance effect through adding preprocessing step. So, data-level approaches are independent of classifier and that is why they are more versatile. It has been experimentally proved that application of preprocessing is a suitable solution in order to balance the class distribution (Batista et al., 2004; Batuwita and Palade, 2010). The simplest approaches in this context are random oversampling and random undersampling. Regarding random oversampling, one can expect overfitting probability, since it exactly copy the samples of majority class. While, the main problem about the random undersampling is probability of disregarding useful data, since it randomly removes samples of the majority class (López et al., 2013). During recent years several sampling methods are proposed (Chawla et al., 2002; Hu et al., 2009; Bunkhumpornpat et al., 2009; Sáez et al., 2015; Díez-Pastor et al., 2015a; Lin et al., 2017; Tsai et al., 2019). An extensive analysis is performed over data-level methods in Fotouhi et al. (2019). (2) Algorithm-level (internal) approaches try to adapt existing classifier learning algorithms to bias the learning toward minority (Barandelaa et al., 2003; Lin et al., 2002; Liu et al., 2000). These methods need special knowledge of corresponding classifier and domain of usage to understand failure reason of classifier in facing with imbalanced datasets. In this context, some methods such as Hellinger Distance Decision Tree (HDDT) (Cieslak and Chawla, 2008) and Class Confidence Proportion Decision Tree (CCPDT) (Liu et al., 2010) are presented. (3) Cost-sensitive learning framework lies between data-level and algorithm-level approaches (Chawla et al., 2008; Ling et al., 2006; Zhang et al., 2008). This method, considering costs of misclassification for minority class, bias the classifier toward minority class and tries to minimize total cost of errors in both classes. The main drawback of these methods is the need in defining costs of misclassification. Such as AdaCost (Fan et al., 1999), AdaCost1, AdaCost2 and AdaCost3 (Sun et al., 2007), cost-sensitive decision trees (Ling et al., 2006) which are proposed in this context. (4) Ensembles. In recent years, ensemble learning algorithms are proposed as a solution to handle the class imbalance problems (Van Hulse et al., 2009; Wang and Yao, 2013). Ensemble learning algorithms train some base classifiers to solve a similar problem and then combine them with each other (Zhou, 2012), where the obtained classifier through such combining outperforms the other individual classifiers. It was already shown that ensemble of classifiers have better performance compared to individual ones (Brown et al., 2005; Tumer and Ghosh, 1996). Ensemble learning algorithms in spite of yielding good performance in handling of balanced problems, however they do not have appropriate performance in handling imbalanced ones singly, since they are designed to maximize the accuracy measure. Accordingly, in the literature it is suggested to combine them with other methods such as algorithm-level, data-level, and cost-sensitive so as to handle imbalanced datasets. Generally, according to Galar et al. (2012) the cost-sensitive and algorithm-level methods are problem-dependent, while data-level and ensemble ones are versatile. The modification of ensemble methods is usually used for handling the imbalanced problems including their combination with data-level methods (Khoshgoftaar et al., 2010), where its objective is training the base classifiers with balanced datasets. In addition to helping to ensemble methods for solving imbalanced problems, the diversity is improved through training the classifiers over different datasets. Diversity is one of the most important factors in ensemble methods. In order to enhance ensemble performance, along with accurate

classifiers, their diversity is also important. It means that the classifiers should have different outputs and errors over similar samples (DíezPastor et al., 2015a). Increasing the diversity has significant impact over ensemble methods’ performance in handling the imbalanced problems. Díez-Pastor et al. (2015b) well discussed enhancing techniques of diversity and its effect on imbalanced learning. In addition to significance of diversity, the number of classifiers is important in ensemble learning algorithms, since the low number of classifiers may result in unsuitable performance of ensemble classifiers. On the other hand, when the number of classifiers is high, there may be redundant classifiers among them resulting in low diversity (Kuncheva and Whitaker, 2003). Also, the high number of classifiers in ensemble learning algorithms leads to reducing classification speed and increasing the need for large memory (Guo et al., 2018a). To rectify the obstacles pertaining to number of classifiers, pruning method is presented which removes redundant classifiers, while preserves useful ones (Krawczyk and Wozniak, 2018; Zhou and Tang, 2003). Pruning has been usually ignored in imbalanced problems and less investigated in the literature (Krawczyk and Wozniak, 2018), while according to the already mentioned issues, generating appropriate number of classifiers can be useful in solving the imbalanced problems. The most common ensemble learning algorithms are Bagging (Breiman, 1996) and Boosting (Freund and Schapire, 1996). In Bagging, the classifiers are learnt in parallel through training sets (known as the bag) which are created using bootstrap. On the other hand, Boosting algorithms iteratively focus on difficult samples through the increasing weight of samples which are wrongly classified by base classifier. The shown comparison results of ensembles in Khoshgoftaar et al. (2010) demonstrates that Bagging-based methods are more efficient than Boosting-based ones. Also, it is shown in the literature that combining Bagging with undersampling methods yields higher performance compared to their combination with oversampling techniques (Khoshgoftaar et al., 2010). However, undersampling methods cannot assure the classifiers performance despite diversity improvement among classifiers (Sun et al., 2018). In García and Herrera (2009) an evolutionary approach is presented for selecting the best subset among majority class samples and the obtained results showed that evolutionary undersampling outperforms undersampling methods specifically in problems with high IR. Combining undersampling methods with Bagging are performed with the goal of generating balanced sets for training classifiers in solving imbalanced problems, where in addition to improving performance of Bagging in handling such problems, increases the diversity among classifiers. In these methods, selecting the samples of majority class should be accurately conducted to prevent both removing useful samples and reducing the performance of Bagging over samples of majority class. Above all, the number of bags should be appropriately selected, since bad-selecting the number of bags can generate similar classifiers with high complexity and consequently reduce the diversity. In this paper, a new evolutionary undersampling-based Bagging method is presented to solve imbalanced problems inspiring evolutionary undersampling (EUS) and considering two important challenges, i.e., diversity and appropriate number of classifiers. The proposed technique tries to properly select the majority class samples for each bag and generate diverse, well-performing and (almost) balanced bags for training the classifiers using multi-objective evolutionary algorithm and taking three objectives into account, i.e., (1) performance, (2) diversity and (3) imbalance rate (IR). By doing this, one can prevent the redundant classifiers by determining the number of classifiers. The rest of the paper is organized as follows. Section 2 provides a survey of existing ensemble-based methods for imbalanced datasets. Section 3 describes the proposed method. Section 4 states experimental results and conclusions are finally presented in Section 5. 2

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

2. Related works

(under and over), where both of them were based on class imbalance criterion, named Generalized Imbalance Ratio (GIR). In this method, two sampling approaches convert imbalance problem into several balanced sub-problems for training the classifiers. In each sub-problem, several weak classifiers are trained using Boosting method and finally combined through Bagging. Feng et al. (2018) suggested an ensemble approach based on Margin Theory for handling imbalance problems which were a combination of Bagging, undersampling and margin theory focusing on samples with low margins. Guo et al. (2018b) proposed Hybrid-based Ensemble (HE) for solving two-class imbalance problems. This technique is composed of two steps: (1) learning several projection matrices of balanced data obtained by undersampling the main training set and constructing the new training sets via projecting the main training set to different defined spaces by the matrices, (2) undersampling several subsets from each new training sets and training the classifiers on each subset. Sun et al. (2018) suggested an ensemble approach using EUS and Bagging, where each bag is generated through applying EUS which results in creating accurate and diverse classifiers and consequently the Bagging’s performance is improved. Wu et al. (2018) used combination of EasyEnsemble and SMOTE for solving imbalance problems. Chen et al. (2018) suggested an ensemble learning approach so as to handle imbalance problems and enhance the diversity. Their proposed method generates some artificial samples with inspiration of localized generalization error model, which are located inside of local area of training samples and trains the base classifiers through unifying of main training samples and artificial neighborhood samples. Finally, the base classifiers with balanced datasets are trained. Ajilisa et al. (2018) proposed a hybrid clustering-based undersampling using genetic algorithm (GA) and AdaBoost. In this algorithm, GA is used for attaining optimum clusters. Collell et al. (2018) tried to solve imbalance problems using Bagging based on threshold-moving. This technique, instead of using rebalancing, consists of selecting the best threshold to predict each class. Raghuwanshi and Shukla (2019a) proposed a new UnderBagging based on kernelized extreme learning machine (ELM) for handling imbalance problems. In this method, kernelized ELM was employed as a component classifier and the outputs of component classifiers were then combined using majority voting and soft voting. Table 1 shows presented ensemble methods which are classified according to type of used method and employed sampling method (undersampling (US) and oversampling (OS)). Exploring the literature of ensemble methods considering existing challenges in classification of imbalanced data (diversity, number of classifiers, and negative effect of IR), one can find that these methods usually enhance ensemble methods’ performance through diversity influence and negative effect of IR over classifiers performance. In these methods, in spite of improving the diversity and creating appropriate sets, the proper number of training sets cannot be determined. Thus, the selected sets for training the classifiers may result in generating similar classifiers and consequently reducing the diversity. The most important difference of the proposed method with other ensemble learning ones is the difference in handling the challenges of imbalanced data. The proposed method in this research prohibits the redundant classifiers to a possible extent using an evolutionary multiobjective method and considering existing challenges in imbalanced data simultaneously (i.e., diversity, number of classifiers and appropriate training sets) through creating diverse classifier and determining (near) optimal classifiers in order to enhance the Bagging algorithm’s performance.

Existing imbalance problems in real-world applications are one of the recent challenges for researchers in machine learning field (Raghuwanshi and Shukla, 2019b). During recent years, lots of efficient methods for solving imbalance problems have been proposed, where ensemble methods were always one of the most successful approaches to handle such problems (Lopez-Garcia et al., 2019). The combination of ensemble with sampling methods has been proposed for solving imbalance problems in recent years, which will be shortly discussed in this section. OverBagging (Wang and Yao, 2009) and UnderBagging (UB) (Barandela et al., 2003) use respectively oversampling and undersampling to balance the bags for training each classifier before training the classifiers. SMOTEBagging (Wang and Yao, 2009) hybridizes Bagging and SMOTE, in which SMOTE is used for balancing. In Partitioning method (Molinara et al., 2007) which is similar to ensemble-based undersampling, the samples of the majority class are divided into several separate subsets and each classifier is learnt over one subset of majority class and all of minority class samples. In EasyEnsemble (Liu et al., 2009) which performs similar to UB, several subsets of majority class are sampled. The classifiers are then learnt using AdaBoost over each of them and their outputs are finally combined with each other. In SMOTEBoost (Chawla et al., 2003), after each round of Boosting, SMOTE is employed to generate new artificial samples of minority class. In RUSBoost (Seiffert et al., 2010) after each round of Boosting, random undersampling is used to remove the samples of majority class. The goal of EUSBoost (Galar et al., 2013) which is based on RUSBoost, is to improve the main method using evolutionary random undersampling. This method tries to enhance the diversity through different subsets of the majority class to train each base classifier in each iteration. Nanni et al. (2015) proposed HardEnsemble approach which had a similar strategy to EasyEnsemble with this difference that only undersampled and oversampled datasets are taken into account. HardEnsemble consists of 150 classifiers, where 100 classifiers are trained on undersampling dataset and the rest of 50 classifiers are trained on oversampling dataset. Sun et al. (2015) presented an ensemble approach in which the imbalance dataset was initially converted into several balanced subsets using data balancing methods random splitting or clustering. Several classifiers were then trained over obtained training datasets. Random Balance Boost (RB-Boost) (Díez-Pastor et al., 2015a) has a similar philosophy to SMOTEBoost and RUSBoost. In this method, the base classifiers are trained using obtained training datasets from a data-level technique, i.e., random balance (Díez-Pastor et al., 2015a). The number of removed samples by undersampling is equal to those generated by SMOTE. In this method, artificial samples are generated according to a weighted proportion than total number of samples. Similar to RB-Boost, Random Balance Bagging (RB-Bagging) (Díez-Pastor et al., 2015a) is composed of undersampling and SMOTEBagging. In RBBagging, each new bag is obtained using random balance. Accordingly, the size of each bag is equal to the size of the main training dataset. In 2016, Lu et al. (2016) proposed a method named HS-Bagging for solving imbalance problems in which in each iteration of Bagging, undersampling and SMOTE are employed as preprocessing step. This method selects an optimum sampling rate for SMOTE and undersampling according to the performance over out-of-bags samples. Gong and Kim (2017) proposed ensemble RHBoost so as to solve imbalance problems which was combination of AdaBoost and hybridized sampling (i.e., undersampling and ROSE sampling). Wang et al. (2017) presented an ensemble method named Bagging of Extrapolation Borderline-SMOTE SVM (BEBS) which was a combination of Bagging, SMOTE, and SVM. The main idea of BEBS is integration of different SVMs which revises the initial decision making border through making artificial minority samples toward correct direction. Tang and He (2017) proposed two ensemble sampling approaches

3. Methodology The proposed method in this research improves the Bagging’s performance in solving imbalance problems using a multi-objective optimization approach. The two employed methods are shortly explained in this section. 3

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

Table 1 Classification of ensemble methods based on handling type of imbalance problems. Ref

Algorithm name

Ensemble

Sampling US

OS

✓ – ✓ – ✓ ✓

– ✓ – ✓ – –

✓ ✓ ✓

– – –

✓ ✓

– ✓

Barandela et al. (2003) Chawla et al. (2003) Molinara et al. (2007) Wang and Yao (2009) Wang and Yao (2009) Liu et al. (2009)

UnderBagging SMOTEBoost Partitioning OverBagging SMOTEBagging EasyEnsemble

Seiffert et al. (2010) Galar et al. (2013) Nanni et al. (2015)

RUSBoost EUSBoost HardEnsemble

Sun et al. (2015) Díez-Pastor et al. (2015a) Díez-Pastor et al. (2015a) Lu et al. (2016) Gong and Kim (2017) Wang et al. (2017) Tang and He (2017)

– RB-Boost

Bagging Boosting Bagging Bagging Bagging Bagging, Boosting Boosting Boosting Bagging, Boosting Bagging Boosting

RB-Bagging

Bagging





HS-Bagging RHBoost BEBS –

✓ ✓ – ✓

✓ ✓ ✓ ✓

Samuel et al. (2017)

Easy-SMT





Feng et al. (2018) Guo et al. (2018b)

– Hybrid-based Ensemble(HE) EUS-Bag GABoost PT-Bagging UBKELM

Bagging Boosting Bagging Bagging, Boosting Bagging, Boosting Bagging Bagging

✓ ✓

– –

Bagging Boosting Bagging Bagging

✓ ✓ – ✓

– – – –

Sun et al. (2018) Ajilisa et al. (2018) Collell et al. (2018) Raghuwanshi and Shukla (2019a)

solution for such problems (Zhou et al., 2011). Consider the following m-objective minimization problem: 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 (𝑥) = 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 (𝑓1 (𝑥) , 𝑓2 (𝑥), … , 𝑓𝑚 (𝑥))

(1)

𝑠.𝑡𝑥 ∈ 𝛺 where, 𝑓 (𝑥) is objective vector, 𝑓𝑖 (𝑥) is 𝑖th objective which should be minimized, x is decision vector and 𝛺 is feasible region in decision space. When two following conditions are satisfied, a feasible solution 𝑥 ∈𝛺 may be dominated by another solution 𝑦 ∈𝛺. In other words, y dominates x or one can say that y is better than x (Eq. (2)). ∀𝒊 , 𝒇 𝒊 (𝒙) ≤ 𝒇 𝒊 (𝒚) 𝑎𝑛𝑑 ∃𝒋 , 𝒇 𝒊 (𝒙) < 𝒇 𝒊 (𝒚)

(2)

If no other solution (y) exists dominating 𝑥, one can say that x is a Pareto-optimal solution for multi-objective optimization problem. A set of Pareto-optimal solutions constitutes Pareto solution set. Projection of Pareto-optimal solutions set in objective space is called Pareto Front. Above inequality can be applied over a solution set (or population). If no other solution (y) exists in population dominating x, one can say that 𝑥 is a Non-dominated solution in population. Several multi-objective algorithms have been developed in the literature for solving multi-objective problems so far, where non-dominated sorting genetic algorithm II (NSGA-II) which was proposed by Deb et al. (2002) is one of the most effective and well-known algorithms among evolutionary multi-objective (EMO) problems. 3.2.1. NSGA-II The task of evolutionary multi-objective optimization algorithms is to find Pareto-optimal distribution or near optimal Pareto solutions. Similar to other evolutionary algorithms, NSGA-II first generates an initial population which is usually generated randomly. Then, population of offsprings is generated from current population through selection, crossover, and mutation. Next generation is made of current population and offsprings. The generation process of offsprings is repeated until stop criteria are met. NSGA-II has two features which makes it a high performance EMO algorithm. The first one is its fitness function which is based on Pareto ranking and crowding distance for each solution, and the second one is updating mechanism of elite generation (Ishibuchi and Nojima, 2007). Each solution in current population is investigated as follows. First, all non-dominated solutions are assigned rank 1st in current population. All solutions with rank 1 are removed temporarily from the current generation. Then, all left non-dominated solutions in reduced current population are assigned rank 2. Similarly, all of these solutions are again removed temporarily from the population. This process is repeated until all solutions are temporarily removed (for example, until every solution is assigned a rank). As a result, different ranks are assigned to each solution. Those solutions with lower ranks are better than those with higher ranks. Among the solutions with similar ranks, another criterion named ‘‘crowding distance’’ is considered which measures the distance between adjacent solutions with similar rank in objective space (for detail, refer to Deb et al., 2002). The solutions with more crowding distance value (less crowding) is preferred than those with less crowding distance (high crowding). Tournament selection mechanism is used to select a pair of parents (solutions) of the population, according to Pareto ranking and crowding distance (Ishibuchi and Nojima, 2007). When a new population is created, the current and offsprings population are combined into a unified population. Each solution in this combined population, similar to selection step, is selected according to Pareto ranking and crowding distance. Next population is constituted from the best solutions in the combined population, as the size of an already determined number (for example, size of population). Elitism is applied in this way in NSGA-II. An outline of NSGA-II is as follows: Step 1: Generating the initial population with Npop solutions, where Npop is population size. Step 2: Generating the population of offsprings by repeating the following procedure Npop times:

3.1. Bagging Bagging algorithm which is abbreviation of Bootstrap AGGregatING was proposed by Breiman (1996) and is one of the oldest, simplest, and most effective ensemble-based algorithms. Since in ensemble learning algorithms, independency of base classifiers reduces the error, obtaining independent classifiers (to the possible extent) are important in Bagging algorithm. In Bagging, each subset of training sets is used for training a classifier. Finally, the obtained results of each classifier are combined through voting methods. For each given sample, a class is chosen which most of the classifiers have consensus on it. Bagging algorithm uses bootstrap sampling for attaining data subset in order to train the base classifiers. This enables the Bagging algorithm to generate different base learners. Precisely, one can say that in a training dataset with m training samples, a set for training the classifiers is constituted by sampling with replacement of m training samples. Some of samples may appear more than once, while some others may never appear. Applying this process for T times, T sets from m training samples are obtained and finally one base learner from each set can be trained. 3.2. Multi-objective optimization concept Many of optimization problems in real-world need to be optimized with more than one objective. These problems are called ‘‘multiobjective optimization’’ ones and can be applied in a vast range of real problems in different areas such as traffic control (Khamis and Gomaa, 2012, 2014; Khamis et al., 2012), multi-class imbalance problems (Fernandes and de Carvalho, 2019), discretization problems (Tahan and Asadi, 2018a,b), scheduling problem (Zarandi and Kayvanfar, 2015), etc. In multi-objective problems, as mentioned, some objectives should be optimized simultaneously and usually there is no single optimum 4

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

(1) Selecting a pair of parents/solutions of the current population using Tournament selection mechanism. (2) Generating offsprings from the parents using crossover and mutation. Step 3: Combining the current and children population to a unified population. Then, Npop of the best solutions among the combined population are selected so as to create the next population.

Fig. 1. Bag representation in the proposed method.

Step 4: Go to Step 2, if the already defined stop criteria are not met. Otherwise, the algorithm is terminated. Then, all non-dominated solutions in a combined population in Step 3 are selected as the final solution.

In the following section, the main steps of the proposed algorithm is described. 4.1. Chromosome coding

4. The proposed method

In the proposed method, each chromosome represents a simple bag. The simplest method to show bags is using binary vector, with the values of 1 (presence of samples) and 0 (absence of samples). Suppose N represents the number of training samples, thus the size of each bag equals to N. Each bag consists of two parts including total samples of minority class and a subset of majority class, where each sample belonging to this subset contributes in each bag with the probability p. Also, crossover and mutation operators are conducted over majority class. Fig. 1 depicts chromosome representation used in this study.

According to the weak performance of Bagging algorithm in handling imbalanced data and considering existing challenges of investigated datasets, i.e., IR, diversity and number of classifiers, in this section an evolutionary multi-objective approach is presented so as to improve the Bagging algorithm’s performance. In this approach, diverse, well-performing and almost balanced bags are generated. On one hand, the classifiers’ diversity is improved and on the other hand, since the generated bags are (almost) balanced, it can enhance the classifiers performance. Finally, since the obtained bags on Pareto fronts are selected as final solutions, the number of classifiers are accordingly determined as well, which prohibits generating redundant bags. In this paper, the main goal is simultaneously optimizing the objectives, i.e., diversity, IR and Area Under the ROC Curve (AUC) in order to generate optimal bags for training the classifiers. To do so, Non-dominated sorting genetic algorithm II (NSGA-II) proposed by Deb et al. (2002) is employed. Generally, the proposed method can be implemented in three ways, each of which with different number of objectives and mode of diversity improvement. In the accomplished experiments, the importance of diversity in the proposed algorithm is also highlighted through conducting comparison between them. In the following, the structure of the proposed method is first described and then chromosomes coding and the three versions of the proposed method is explained. The proposed algorithm is described in this subsection, including its pseudo-code which is presented in Algorithm 1. Generally, the proposed algorithm comprises two main sections: (1) optimal bags (in terms of diversity, well-performing and almost balanced), (2) generated bags in previous step; which are used in Bagging algorithm to train the classifiers.

4.2. Fitness function The fitness function is used to select the best bags and differs in each of the three versions of the proposed method. The first and third versions are called 2Obj and 2Obj*, respectively. Both versions include two objectives AUC and IR, however in 2Obj* the alternative algorithm is used instead of crowding distance. The used fitness function in 2Obj and 2Obj* is calculated as Eq. (3). ( ( )) 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝐵𝑖 = 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑑(𝑓1 (𝐵𝑖 ), 𝑓2 (𝐵𝑖 )) (3) The second version of the proposed method is called 3Obj. In this version, in addition to AUC and IR as objective functions, a third objective is taken into account, i.e., diversity. The fitness function used in this version is calculated as Eq. (4). ( ( )) ( ) ( ) (4) 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝐵𝑖 = 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑑(𝑓1 𝐵𝑖 , 𝑓2 𝐵𝑖 , 𝑓3 (𝐵𝑖 )) In Eqs. (3) and (4), fj and Bi signifies jth objective function and ith bag, respectively. In the following, the contradiction between IR and AUC is first illustrated through correlation analysis and then the three versions of the proposed method are described. 4.2.1. Correlation analysis between IR and AUC In order to show the contradiction effect between IR and AUC, the correlation analysis is used. To do so, the performance of 5 standard classifiers (SVM, K Nearest Neighbor (KNN), Bagging, C4.5 and AdaBoost-M1) over 43 imbalanced datasets with different IRs which are available through KEEL (Alcalá-Fdez et al., 2009), is measured using AUC measure. The classifiers performance over datasets is presented in Appendix. Fig. 2 shows the classifiers performance according to IR in which the regression line for each classifier signifies reduction in performance of classifiers by increasing in IR. Also, the calculated correlation coefficient value between performance of classifiers and IR is shown in Table 2. As it can be seen, the correlation value between IR and AUC of classifiers is negative. Accordingly, one can conclude that high IR has negative effect on classifiers performance. As already mentioned, in chromosome coding scheme, the samples of minority class are kept fixed and crossover and mutation operations are applied over majority class. Thus, the resampling rate of samples in majority class is important, since reducing the samples of majority class can lead to a wrong categorization of samples (Galar et al., 2013). Based on the said, finding an appropriate balanced rate can have a positive effect on AUC. 5

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

Fig. 2. The classifiers performance according to IR. Table 2 Calculated correlation coefficient between AUC of classifiers and IR.

Correlation coefficient

Bagging

C4.5

AdaB-M1

1-NN

SVM

−0.4

−0.3816

−0.4112

−0.3585

−0.2888

in a given dataset, several data samples are first generated and then individual classifiers are trained over this data samples. Generally, data sample manipulation is based on the sampling approach. In this version, the third objective is used in addition to AUC and IR ones to improve the diversity as follows. In imbalanced data area, the diversity among classifiers in ensemble learning algorithms has positive effect on performance. In Kuncheva and Whitaker (2003) some measures are proposed to calculate the diversity among classifiers. In this paper, ‘Disagreement’ measure is used owing to simplicity as Eq. (7).

Table 3 Confusion matrix for a binary class problem.

Positive class Negative class

True predicted

False predicted

True Positive (TP) False Positive (FP)

False Negative (FN) True Negative (TN)

𝐷𝑖𝑠𝑖.𝑗 = 4.2.2. The first version of the proposed method (2Obj) The used fitness function in this version comprises two objectives AUC and IR as follows. The first objective function: In handling of imbalanced problems, performance evaluation measure is a key factor. Accuracy rate is commonly used as an evaluation measure in a large variety of problems except imbalanced data, since it does not distinguish between the number of correctly classified examples of different classes. Accordingly, it may lead to wrong results. In the proposed method, performance of each bag is approximated with leave-one-out by classifier 1NN using AUC measure. The AUC measure is calculated according to confusion matrix (Table 3) in terms of Eq. (5). TP (TN) shows the number of minority (majority) class samples which are properly classified, while FP(FN) demonstrates the number of majority (minority) class samples which are wrongly classified. 1 + 𝑇 𝑃𝑟𝑎𝑡𝑒 − 𝐹 𝑃𝑟𝑎𝑡𝑒 (5) 2 Since the goal in the proposed method is minimizing the objective functions, 1-AUC is considered as objective function’s value. The second objective function: The IR is considered as the second objective function which is measured for each bag as ratio of samples number of majority class to minority class (Eq. (6)). The number of majority and minority classes are shown by Nmaj and Nmin , respectively. The number of samples in majority class is different in each bag, while it is fixed in minority class. 𝑁𝑀𝑎𝑗 𝑁𝑀𝑖𝑛

(7)

Since this measure is employed so as to compute the diversity between two classifiers, it is also used to measure the diversity between two bags, as well. In Eq. (7), Nab signifies number of samples with value ‘a’ in ith bag and number of samples with value ‘b’ in jth bag. In the proposed method, it is required to measure the diversity in each bag. Accordingly, taking ‘Disagreement’ into account, the diversity of each bag compared to the whole population is computed using Algorithm 2 as follows. The inputs of diversity measurement function are ‘min_err_bag’ and ‘bags’, where ‘min_err_bag’ signifies the bag with the least error rate (the error rate for each bag is calculated by AUC function) and ‘bags’ shows the population of solutions (bags). The initial value of ‘comp’ includes ‘min_err_bag’ and vector ‘ref’ (a vector with length equals to ‘number of training samples’ which its value is considered (1) and is employed for keeping those bags whose diversity values are already calculated (Line 2). The diversity value of min_err_bag is obtained through calculating the inverse value of ‘Disagreement’ between it and ‘ref’ (Line 3). For each bag, the value of ‘DisA’ is calculated which is ‘Disagreement Average’ between that bag and members of ‘comp’ and it is added to ‘DisA_of_remaining_bags’ (Line 4–12). The ‘Bag’ with the most ‘DisA’ is added to ‘comp’ set and its diversity value is set to inverse of ‘DisA’ (Line 13–15).

𝐴𝑈 𝐶 =

𝐼𝑅 =

𝑁 01 + 𝑁 10 𝑁 11 + 𝑁 00 + 𝑁 01 + 𝑁 10

4.2.4. The third version of the proposed method (2Obj*) This version is similar to 2Obj, however an endeavor is made in this version to improve the diversity. This version focuses on crowding distance in NSGA-II algorithm, since in this algorithm crowding distance is used to keep diversification of solutions and select the dispersed solutions within a front. However, it cannot always yield high quality solutions (Vachhani et al., 2015). On the other hand, if only two objectives AUC and IR are taken into account, crowding distance may not be a good measure to rank the bags, since (in our case) diversity among the value of solutions (i.e., bags) is preferable. Also, in our case, it is assumed that the obtained bags according two objectives AUC and

(6)

4.2.3. The second version of the proposed method (3Obj) In Zhou (2012) different methods are presented to generate diverse classifiers, including data sample manipulation. In this mechanism, 6

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

Table 4 A brief description of imbalanced datasets.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

IR can somewhat lead to appropriate performance of classifiers and as a result, it is possible that the diversity between classifiers be reduced, however it is preferred that the most diverse bags be positioned in better ranks. Accordingly, considering nature of solutions, Algorithm 3 is used in the third strategy so as to rank the solutions within each front. Similar to crowding distance, in the proposed algorithm the primary and end points (bags) are assigned extreme value and the diversity between these points (Q_BP) are determined using Q-Statistics (Kuncheva and Whitaker, 2003) (Eq. (8)). This measure similar to ‘‘Disagreement’’ one, is used for calculating the diversity among bags. The value of QStatistic is ranged over −1 and 1. The lesser its value, the higher its diversity. Since, the positive value of rank is used in comparison, its difference with ‘1’ is considered as the bag’s rank. Then, for each point in Pareto front (except the first and last ones), the value of Q-Statistic is calculated between it and the two primary and end points and then the obtained difference between this value and Q_BP is taken as that point’s rank into account. In fact, the primary and end points (bags) are considered as reference points for calculating diversity of each bag. Since, the primary and end points are always considered as solutions, then it is preferred to take those bags into account which are diverse compared to these points. In Algorithm 3, the number of bags in front is l. Also, div_rank is calculated instead of crowding distance value and Bi signifies the ith bag. ) 𝑙−1 ∑ 𝑙 ( 11 00 ∑ 2 𝑁 𝑁 − 𝑁 10 𝑁 01 𝑄𝑎𝑣 = 1 − (8) 𝑙(𝑙 − 1) 𝑖=1 𝑗=𝑖+1 𝑁 11 𝑁 00 + 𝑁 10 𝑁 01

Datasets

IR

# of attributes

# of instances

Glass04vs5 Ecoli0346vs5 Ecoli0347vs56 yeast05679vs4 Ecoli067vs5 vowel0 Glass016vs2 glass2 Ecoli0147vs2356 Led7Digit02456789vs1 Ecoli01vs5 Glass06vs5 Glass0146vs2 Ecoli0147vs56 Cleveland0vs4 Ecoli0146vs5 ecoli4 yeast1vs7 shuttle0vs4 glass4 page-blocks13vs4 abalone9–18 glass016vs5 shuttle2vs4 yeast1458vs7 Glass5 Yeast2vs8 yeast4 yeast1289vs7 yeast5 Ecoli0137vs26 yeast6 abalone19

9.22 9.25 9.28 9.35 10.00 10.1 10.29 10.39 10.59 10.97 11.00 11.00 11.06 12.28 12.62 13.00 13.84 13.87 13.87 15.47 15.85 16.68 19.44 20.5 22.1 22.81 23.10 28.41 30.56 32.78 39.15 39.15 128.87

9 7 7 8 6 13 9 9 7 7 6 9 9 6 13 6 7 8 9 9 10 8 9 9 8 9 8 8 8 8 7 8 8

92 205 257 528 220 988 192 214 336 433 240 108 205 332 177 280 336 459 1829 214 472 731 184 129 693 214 482 1484 947 1484 281 1484 4174

proportion of number of majority class samples to minority ones (IR). Among the used datasets, multi-class datasets are also utilized which are converted into binary datasets, where the alliance between one or multiple classes are labeled as minority class (positive) and alliance between the rest of classes are labeled as majority class (negative). 5.2. Tuning of experiments Approximating of AUC metric in experiments is obtained using 5fold cross validation. In better words, each dataset is divided into 5 folds with same samples. Then, for each fold, learning algorithms are trained over 4 folds and tested on one fold. This process is repeated 3 times. Datasets partitioning is available at KEEL (Alcalá-Fdez et al., 2011). Six different ensemble methods are taken into account for comparing with the proposed method. Table 5 shows the employed methods for such comparison including a summary of their characteristics. It should also be pointed out that in all methods, C4.5 decision tree is considered as a weak learner.

5. Framework of experiments 5.1. Datasets In this study, 33 imbalance datasets are available at KEEL which are used to evaluate the proposed method. Table 4 shows the characteristics of datasets including number of attributes, number of samples, and 7

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319 Table 7 The average number of generated classifiers by the three versions of the proposed method.

Table 5 The employed state-of-the-art methods to be compared with the proposed method. Algorithm

Strategy

SMOTEBagging (SBAG)

Using SMOTE for generating positive samples in Bagging Hybridization of SMOTE and AdaboostM2 Hybridization of Bagging and undersampling Hybridization of undersampling with Bagging and Adaboost Hybridization of AdaboostM2 with evolutionary undersampling Hybridization of Adaboost.M2 with undersampling

SMOTEBoost (SBO) UnderBagging (UB) EASY Ensemble (EASY) EUSBoost (EUS) RUSBoost (RUS)

Datasets

Glass04vs5 Ecoli0346vs5 Ecoli0347vs56 Yeast05679vs4 Ecoli067vs5 Vowel0 Glass016vs2 Glass2 Ecoli0147vs2356 Led7Digit02456789vs1 Ecoli01vs5 Glass06vs5 Glass0146vs2 Ecoli0147vs56 Cleveland0vs4 Ecoli0146vs5 Ecoli4 Shuttlec0vsc4 Yeastbc1vsc7 Glass4 Pageblocks13vs4 Abalone918 Glass016vs5 Shuttlec2vsc4 Yeast1458vs7 Glass5 Yeast2vs8 Yeast4 Yeast1289vs7 Yeast5 Ecoli0137vs26 Yeast6 Abalone19 Average

Table 6 The tuned parameters. Algorithm

Parameter

C4.5

Prune = True, Confidence level = 0.25 Minimum number of item-sets per leaf = 2

SMOTE

Number of Neighbors k = 5, Quantity = Balance

Proposed method

Population size = 50, Generation = 20, Crossover probability = 0.6, Mutation probability = 0.4

The parameters may affect over any algorithm performance and different combinations of parameters can yield different outcomes. Therefore, an endeavor is made in this study to select the best values to yield the best performance. Different number of classifiers can affect positively or negatively over the ensemble learning algorithms’ performance. According to Galar et al. (2012), ensemble methods based on Boosting and Bagging with respectively 10 and 40 classifiers have acceptable performance. So, in the experiments of this research for those methods based on Boosting and Bagging, 10 and 40 classifiers are considered, respectively. It should be pointed out that implementing scheme of the used algorithms in Table 5 can be accessed in KEEL software. Table 6 shows the tuned parameters for C4.5, preprocessing methods as well as the proposed method in this research. The used parameters in EUSBoost are tuned similar to Galar et al. (2013). In the proposed method, the value of used parameters are determined through trial and error. The existing probability of each sample in each bag is also set to 0.4.

# of classifiers

Rank

2obj

2obj*

3obj

2obj

2obj*

3obj

6 7 8 13 6 6 8 8 8 8 9 7 12 8 7 9 7 5 9 8 9 13 7 5 12 7 7 11 10 9 4 11 11 8.3

9 6 5 9 5 6 7 5 6 6 4 6 11 7 6 6 6 5 7 9 10 14 8 5 10 8 6 7 10 6 4 6 7 7.03

10 7 11 24 13 16 23 17 21 13 9 6 26 18 22 12 13 10 18 7 11 16 10 7 10 12 18 18 16 16 10 11 27 14.48

1 2 2 2 2 1 2 2 2 2 2 3 2 2 2 2 2 1 2 2 1 1 1 1 2 1 2 2 1 2 1 3 2 1.21

2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 2 2 1 1 2 1 1 1 1 1 1 1 1.75

3 2 3 3 3 3 3 3 3 3 2 1 3 3 3 3 3 2 3 1 3 3 3 2 1 3 3 3 2 3 2 2 3 2.6

respectively demonstrate good and bad performance, while zero value signifies the same performance. According to the obtained results of conducted experiments over test datasets, one can conclude that the proposed method significantly outperforms the other employed techniques. Also, the proposed method and EUS, which evolutionarily select the subsets of majority class samples, have better performance compared to the methods which randomly select the subsets of majority class samples. In Figs. 4, 5, 6, 7, 8 and 9 the obtained number of classifiers are presented, where the proper performance of the proposed method in generating the (near) optimal number of classifiers is demonstrated. In these mentioned figures, the obtained number of bags (signifying the number of classifiers) in a given Pareto-front which belongs to a single run is also shown. For simplicity, the obtained results of 2Obj is presented. Each figure includes two dimensions, i.e., AUC and IR. Also, Table 8 shows the average number of generated classifiers using the three versions of proposed method accompanied by rank of each version according to the number of classifiers. In order to investigate the number of generated classifiers using each version of the proposed method, in horizontal axis of Fig. 10, the introduced datasets in Table 4 are sorted in ascending order according to the number of samples. The vertical axis signifies the number of classifiers. The regression line demonstrates that in each of the three versions, the number of classifiers probably increases by increasing the number of samples, while in Fig. 11 the number of classifiers by increasing IR is fixed for 3Obj and increasingly predicted for the other two versions.

5.3. Experimental results In this subsection, the results of experiments are presented including number of obtained classifiers from the proposed method over datasets and ensemble methods’ performance over each dataset. Then, to see whether the employed algorithms have significant difference, statistical analysis are conducted over the obtained results which includes two steps: null hypothesis, which shows those methods with similar performance and Friedman test is used in this part. If null hypothesis is rejected, it means that the used algorithms have different performance, statistically. Then, post-hoc test is used to analyze the detailed differences among used algorithms. 5.3.1. The results of methods’ performance and number of classifiers Table 7 presents the obtained results of proposed AUC and 6 stateof-the-art methods. Each row corresponds to the results of each method over each dataset, where the highest AUC is highlighted. In the last row, the number of times that each method can get the first rank is determined. According to the obtained results, all three versions of the proposed method get the first three ranks, where 3Obj ranks 1st, and 2Obj∗ and 2Obj are ranked 2nd and 3rd, respectively. To show the existing difference in AUC values among methods, the difference of each method with EUS is shown in Fig. 3. Positive and negative values

5.4. Non-parametric statistical tests In order to investigate whether there is significant difference between the performances of different algorithms or not, some statistical 8

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

Fig. 3. AUC comparison of different methods.

Table 8 The average value of AUC belonging to the three versions of the proposed method and 6 state-of-the-art methods. Datasets

RUS

SBO

UB

SBAG

EASY

EUS

Glass04vs5 Ecoli0346vs5 Ecoli0347vs56 Yeast05679vs4 Ecoli067vs5 Vowel0 Glass016vs2 Glass2 Ecoli0147vs2356 Led7Digit02456789vs1 Ecoli01vs5 Glass06vs5 Glass0146vs2 Ecoli0147vs56 Cleveland0vs4 Ecoli0146vs5 Ecoli4 Shuttlec0vsc4 Yeastbc1vsc7 Glass4 Pageblocks13vs4 Abalone918 Glass016vs5 Shuttlec2vsc4 Yeast1458vs7 Glass5 Yeast2vs8 Yeast4 Yeast1289vs7 Yeast5 Ecoli0137vs26 Yeast6 Abalone19 1st rank

0.9941 0.9086 0.8928 0.8032 0.8792 0.9509 0.6179 0.6877 0.8628 0.8763 0.8705 0.9916 0.7197 0.8642 0.8238 0.9295 0.9309 1.0000 0.7552 0.8806 0.9684 0.7276 0.9743 1.0000 0.6343 0.9837 0.7995 0.8377 0.6907 0.9492 0.8254 0.8555 0.6629 3

0.9858 0.9005 0.8784 0.7867 0.8658 0.9887 0.5820 0.7501 0.8883 0.7037 0.8727 0.9732 0.6780 0.8716 0.8073 0.9045 0.9107 1.0000 0.7210 0.9192 0.9877 0.7339 0.9181 1.0000 0.5637 0.9789 0.7750 0.7150 0.6448 0.9104 0.8360 0.7966 0.5252 3

0.9941 0.9065 0.8803 0.7949 0.8858 0.9470 0.7331 0.7753 0.8472 0.8880 0.8265 0.9139 0.7707 0.8565 0.7753 0.8865 0.8899 1.0000 0.7580 0.8728 0.9790 0.7302 0.9429 1.0000 0.6135 0.9488 0.7651 0.8478 0.7407 0.9546 0.7451 0.8678 0.7081 3

0.9817 0.9275 0.8738 0.8144 0.8450 0.9876 0.6700 0.8045 0.8895 0.8830 0.8523 0.9800 0.7501 0.8407 0.7929 0.9147 0.9232 0.9998 0.6896 0.8862 0.9884 0.7294 0.8800 1.0000 0.6243 0.8915 0.8019 0.7730 0.6432 0.9661 0.8306 0.8387 0.5602 2

0.9941 0.8678 0.8659 0.7702 0.8567 0.9449 0.6595 0.7247 0.8622 0.8734 0.8910 0.8470 0.7533 0.8411 0.8022 0.8282 0.8782 1.0000 0.7167 0.8813 0.9711 0.7223 0.9524 0.9905 0.5800 0.9520 0.7320 0.8317 0.6798 0.9502 0.7281 0.8573 0.6980 2

0.9941 0.9047 0.8902 0.8067 0.8825 0.9537 0.7488 0.7138 0.8902 0.8552 0.8932 0.9950 0.7736 0.8858 0.8280 0.8923 0.9273 1.0000 0.7651 0.9073 0.9906 0.7424 0.9886 1.0000 0.6282 0.9878 0.7614 0.8311 0.7132 0.9382 0.8230 0.8661 0.6878 3

tests are conducted over the obtained results of them. To do so, a

Proposed method 2obj

2obj*

3obj

0.9915 0.9058 0.9097 0.8247 0.9489 0.9604 0.7783 0.795 0.9146 0.8963 0.8958 0.9308 0.8665 0.8693 0.9004 0.8665 0.9565 1.000 0.7852 0.9327 0.9924 0.7731 0.9658 0.9639 0.6926 0.9448 0.8022 0.8909 0.7276 0.9459 0.8467 0.9264 0.7267 9

0.9941 0.908 0.913 0.8365 0.9217 0.9765 0.7769 0.8666 0.9088 0.8816 0.9145 0.9347 0.8189 0.8924 0.8694 0.9464 0.9418 1.000 0.8204 0.96 0.9910 0.7892 0.9594 1.000 0.7077 0.9937 0.851 0.85 0.7192 0.9638 0.9254 0.8519 0.7137 12

0.9938 0.9479 0.9177 0.8236 0.906 0.9761 0.757 0.8279 0.9085 0.9155 0.9865 0.9952 0.7204 0.9425 0.929 0.9347 0.9562 1.000 0.6753 0.8769 0.9977 0.7612 0.9887 1.000 0.6291 0.995 0.7949 0.7666 0.7449 0.9658 0.8727 0.796 0.7211 13

Asadi, 2017; Mehmanpazir and Asadi, 2017; Ronoud and Asadi, 2019) is employed. At first, a statistical test based on ranking of algorithms in terms of their performance is applied. The null hypothesis is the equality of the algorithms performance. So, rejecting the null hypothesis, one

two-stage approach proposed by Demšar (2006) which are used in many recently published researches in classifying (Abbaszadeh et al., 2018; Asadi, 2019; Asadi and Shahrabi, 2017; Mansourypoor and 9

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

Fig. 4. The obtained non-dominated solutions using proposed method for imbalanced datasets (1).

can conclude that the employed algorithms have statistically different performance, significantly.

(1979), Hochberg (1988), Hommel (1988), Holland and Copenhaver (1987), Rom (1990), Finner (1993) and Li (2008).

The best algorithm in the previous step (the algorithm with the highest rank) is called ‘‘control algorithm’’ and is then pairwise compared to other algorithms. In this paper, for pairwise comparison, some non-parametric post-hoc statistical tests are employed including Holm

Table 9 presents ranking results according to AUC using Friedman test. As it can be observed from Table 9, the null hypothesis is rejected, i.e., the used algorithms have significantly statistically different 10

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

Fig. 5. The obtained non-dominated solutions using proposed method for imbalanced datasets (2).

performance. Also, the 2Obj* approach (the third version of the pro-

post-hoc statistical tests. The obtained results according to the posthoc statistical tests are reported in Table 10. As it can be seen, the control algorithm outperforms pairwise the other used algorithms at 0.05 significance level. The diversity effect on global performance measures are already illustrated in the literature. In this research, in addition to improving

posed method with two objectives AUC and IR along with diversity enhancement strategy) ranks 1st among the rest of the algorithms in terms of AUC and consequently is set as ‘control algorithm’. Next, this algorithm is pairwise compared to other employed algorithms using 11

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

Fig. 6. The obtained non-dominated solutions using proposed method for imbalanced datasets (3).

the Bagging’s performance, evaluation of diversity effect in classifying imbalanced data are performed. According to the obtained results, one can conclude that two approaches 2Obj* and 3Obj with concentrating on improving the diversity have better performance compared to the other employed algorithms. Moreover, the conducted evaluation using Friedman test shows that the three versions of the proposed method attained higher ranks compared

to other techniques, where 3Obj and 2Obj* obtained the topmost rank owing to improving the diversity. The carried out post-hoc tests in the second step show that the employed algorithms have different performance, where 2Obj* and 3Obj have the best performances, respectively. Accordingly, one can understand that generally diversity has positive effect on the classifiers performance in imbalanced datasets, against balanced datasets. Also, 12

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

Fig. 7. The obtained non-dominated solutions using proposed method for imbalanced datasets (4).

strategies of improving diversity, employed in the proposed method, as well as conducted statistical tests can somehow prove the latter claim.

5.5.1. Investigation of presented methods in experiments UnderBagging method (Barandela et al., 2003) uses undersampling to remove the majority class samples. Although, the diversity has been improved in this method, however it has been shown in the literature and experiments that it has no appropriate performance. In this method, it is probable to remove the useful information from majority samples. Also, non-appropriate selection of number of bags can affect negatively over its performance. SMOTEBagging (Wang and Yao, 2009)

5.5. Discussion In this subsection, the presented methods are discussed in experiments and finally some important points relating to the proposed method and other methods are mentioned. 13

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

Fig. 8. The obtained non-dominated solutions using proposed method for imbalanced datasets (5).

creates artificial samples of minority class in each iteration which improves the diversity. However, it has almost equal performance with UnderBagging in the experiments, in average. In this method, finding k nearest neighbors of minority class samples and extrapolate between them for creating artificial samples besides non-determination of number of bags are disadvantages of this method.

iteration of AdaBoost. Despite both of them perform similarly, however the results show that RUSBoost outperforms the other one. Also, the already mentioned disadvantages in SMOTEBagging and UnderBagging exist in both of them. EasyEnsemble (Liu et al., 2009) uses undersampling technique for generation of bags. In this method AdaBoost is trained over each balanced bags. Many classifiers are used in this method for training every bag, accordingly proper selection of number of bags can affect on

SMOTEBoost (Chawla et al., 2003) and RUSBoost (Seiffert et al., 2010) use respectively from SMOTE and random undersampling in each 14

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

Fig. 9. The obtained non-dominated solutions using proposed method for imbalanced datasets (6).

Fig. 10. Regression line to fit the number of classifiers with the number of samples.

training time and its performance. EUSBoost (Galar et al., 2013) selects subsets of majority class for training classifiers through evolutionary

undersampling in each iteration of AdaBoost. This method could yield higher performance compared to other methods through improving 15

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

Fig. 11. Simple regression line to fit the number of classifiers with IR.

Fig. 12. Regression lines to fit AUC and IR.

Table 9 The average obtained ranks by each method in Friedman test (P-value computed by Friedman Test = 0.05, Friedman statistic = 4.392157). Algorithm

Ranking

3Obj 2Obj* 2Obj EUS SBAG RUS SBO UB EASY

3.2206 2.5735 3.3676 4.7794 6.0588 5.4559 6.5735 5.8382 7.1324

the diversity. In EUSBoost, distribution of training set weight has less effect on evolutionary undersampling process and selection process of training samples for classifiers (Sun et al., 2018). 5.5.2. Comparing and performance analysis of the proposed method (1) According to the conducted experiments, the Bagging’s performance is affected by proper training bags in confronting with imbalanced datasets. Therefore, by generating proper bags, one can attain to an acceptable performance with suitable number of bags. (2) The presented ensemble learning algorithms employing data-level approaches, the number of suitable bags for training the classifiers are not determined. Therefore, redundant classifiers may be generated and diversity reduced. Since the proposed method prevents the existence 16

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319 Table A.1 Performance of these standard classifiers on imbalanced datasets (1).

Table 10 The post-hoc comparison for 𝛼 = 0.05. Algorithm

Holm/Hochberg/Hommel

Holland

Rom

Finner

Li

Data set

IR

Bagging

C4.5

AdaB-M1

1-NN

SVM

EASY SBO SBAG UB RUS EUS 2Obj 3Obj

0.00625 0.007143 0.008333 0.01 0.0125 0.016667 0.025 0.05

0.006391 0.007301 0.008512 0.010206 0.012741 0.016952 0.025321 0.05

0.006574 0.007513 0.008764 0.010515 0.013109 0.016667 0.025 0.05

0.006391 0.012741 0.019051 0.025321 0.03155 0.037739 0.043889 0.05

0.035265 0.035265 0.035265 0.035265 0.035265 0.035265 0.035265 0.05

glass1 ecoli0vs1 wisconsin pima iris0 glass0 yeast1 vehicle1 vehicle2 vehicle3 haberman glass0123vs456 vehicle0 ecoli1 new-thyroid2 new-thyroid1 ecoli2 segment0 glass6 yeast3 ecoli3 page-blocks0

1.82 1.86 1.86 1.90 2.00 2.06 2.46 2.52 2.52 2.52 2.68 3.19 3.23 3.36 4.92 5.14 5.46 6.01 6.38 8.11 8.19 8.77

0.755 0.986 0.974 0.728 0.975 0.817 0.7 0.66 0.956 0.67 0.551 0.8959 0.9538 0.855 0.94 0.952 0.86 0.98 0.836 0.8589 0.786 0.927

0.7327 0.9832 0.9483 0.7033 0.9900 0.8167 0.6684 0.6599 0.9485 0.6654 0.5757 0.9155 0.9247 0.8586 0.9488 0.9488 0.8623 0.9838 0.8132 0.8567 0.7280 0.9217

0.7973 0.9657 0.9602 0.6874 0.9900 0.7911 0.6615 0.6911 0.9777 0.6997 0.5471 0.8703 0.9399 0.8677 0.9687 0.9687 0.8605 0.9934 0.8725 0.8530 0.7246 0.9295

0.7888 0.9626 0.9551 0.6643 1.0000 0.8270 0.6340 0.6201 0.9404 0.6695 0.5473 0.9124 0.9112 0.7969 0.9802 0.9774 0.9062 0.9952 0.8713 0.8181 0.7448 0.8710

0.5000 0.9604 0.9698 0.7137 1.0000 0.5325 0.5753 0.5368 0.8984 0.5000 0.5000 0.8713 0.9408 0.8159 0.7571 0.7714 0.7741 0.9889 0.8752 0.7224 0.5000 0.6989

Table 11 The number of wins, loses and equals of algorithms compared to 2Obj*. Algorithms

Wins

Equals

Loses

EASY SBO SBAG UB RUS EUS 2Obj 3Obj

1 2 5 3 4 3 12 12

2 1 2 3 3 3 1 2

30 30 26 27 26 27 20 19

Table 12 The number of times that each algorithm ranks 1st. Algorithms

2Obj*

3Obj

2Obj

EUS

RUS

UB

SBAG

SBO

EASY

Ranks 1st

12

12

10

3

3

3

2

3

3

Table A.2 Performance of these standard classifiers on imbalanced datasets (2).

of redundant classifiers through generating (near) optimal classifiers, accordingly pre- and post-pruning methods are not needed. (3) In used Bagging methods for conducting comparison (UB, SBAG, and EASY), 40 classifiers are considered, while the average number of obtained classifiers in the proposed method is less than 40. (4) According to Friedman non-parametric post-hoc statistical test, dominancy of the proposed algorithm are statistically proved compared to EASY, SBO, SBAG, UB, RUS, and EUS (Tables 9 and 10). (5) According to Table 7, one can conclude that two approaches 2Obj* and 3Obj with concentrating on improving the diversity have better performance compared to the other employed algorithms. Moreover, the conducted evaluation using Friedman test shows that the three versions of the proposed method are attained higher ranks compared to other techniques, where 3Obj and 2Obj* obtained the topmost rank owing to improving the diversity. (6) Increasing the IR has negative effect on AUC of all methods. Fig. 12 shows the regression line with negative gradient, where by increasing the IR, AUC of all methods is probable to reduce. (7) In Tables 11 and 12, the number of wins, loses, and equals of algorithms compared to 2Obj* and number of times that this algorithm ranks 1st are presented, respectively. The results of these tables, are summarized version of data provided in Table 7 which shows the dominancy of the proposed algorithm compared to other approaches.

Data set

IR

Bagging

C4.5

AdaB-M1

1-NN

SVM

yeast2vs4 vowel0 glass016vs2 glass2 ecoli4 yeast1vs7 shuttle0vs4 glass4 page-blocks13vs4 abalone9–18 glass016vs5 shuttle2vs4 yeast1458vs7 glass5 yeast2vs8 yeast4 yeast1289vs7 yeast5 ecoli0137vs26 yeast6 abalone19

9.08 10.10 10.29 10.39 13.84 13.87 13.87 15.47 15.85 16.68 19.44 20.50 22.10 22.81 23.10 28.41 30.56 32.78 39.15 39.15 128.87

0.8446 0.975 0.523 0.5229 0.89 0.565 0.9997 0.759 0.9786 0.6787 0.902 0.833 0.5 0.7896 0.5 0.619 0.4977 0.8744 0.706 0.7839 0.5

0.8327 0.9706 0.6186 0.6688 0.8139 0.5942 0.9997 0.7925 0.9978 0.5754 0.8914 1.0000 0.5000 0.8976 0.5000 0.5952 0.6156 0.8833 0.7481 0.7115 0.5000

0.8505 0.9706 0.5993 0.6538 0.8203 0.5752 0.9997 0.7900 0.9978 0.7434 0.8914 0.9000 0.5447 0.8976 0.7489 0.6315 0.6079 0.8715 0.6427 0.6955 0.5162

0.8501 1.0000 0.5767 0.6008 0.8702 0.6457 0.9960 0.8183 0.9800 0.6093 0.8357 0.9500 0.5728 0.8927 0.7685 0.6671 0.5525 0.8462 0.8427 0.7482 0.4963

0.7271 0.7744 0.5000 0.5000 0.6750 0.5000 0.9960 0.5333 0.7155 0.5000 0.5000 0.9500 0.5000 0.5000 0.7739 0.5000 0.5000 0.5000 0.8481 0.5000 0.5000

Generally speaking, the proposed evolutionary multi-objective method comprises two objectives, i.e., AUC and imbalance rate (IR), where two different strategies are employed so as to enhance the diversity. Also, in the conducted experiments, it was shown that the performance has been improved through enhancing the diversity. In order to evaluate the proposed method, 6 well-known ensemble learning algorithms as well as 33 imbalanced datasets were employed. The obtained results demonstrated that the proposed method outperformed the other employed techniques through improving the diversity using enhancement strategies. Moreover, the dominancy of the proposed method was shown through conducting non-parametric statistical tests, as well. For future studies, one can extend the proposed method over multiclass datasets. Applying other meta-heuristics and comparing the results of the proposed method can be considered as another future stream.

6. Conclusion In this research, the Bagging’s performance has been improved considering positive effect of diversity on AUC, suitable training sets to train the classifiers and generating the proper number of classifiers (to avoid decreasing the diversity, lessening the classification speed and increasing the memory requirement). In the proposed method, diverse and accurate classifiers (on both positive and negative classes) are generated through producing diverse training bags, well-performing and (almost) balanced in Pareto front. On the other hand, since the number of trained classifiers in Bagging are equal to the number of bags, one can determine the (near) optimal number of bags and consequently one can avoid generating redundant classifiers.

Appendix See Tables A.1 and A.2. 17

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319

References

Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F., 2012. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. C 42, 463–484. Galar, M., Fernández, A., Barrenechea, E., Herrera, F., 2013. EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognit. 46, 3460–3471. García, S., Herrera, F., 2009. Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol. Comput. 17, 275–306. Gong, J., Kim, H., 2017. RHSBoost: Improving classification performance in imbalanced data. Comput. Statist. Data Anal. 111, 1–13. Guo, H., Liu, H., Li, R., Wu, C., Guo, Y., Xu, M., 2018a. Margin & diversity based ordering ensemble pruning. Neurocomputing 275, 237–246. Guo, H., Zhou, J., Wu, C.-a., She, W., 2018b. A novel hybrid-based ensemble for class imbalance problem. Int. J. Artif. Intell. Tools 27, 1850025. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G., 2017. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 73, 220–239. Hochberg, Y., 1988. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800–802. Holland, B.S., Copenhaver, M.D., 1987. An improved sequentially rejective Bonferroni test procedure. Biometrics 41, 7–423. Holm, S., 1979. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 5–70. Hommel, G., 1988. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75, 383–386. Hu, S., Liang, Y., Ma, L., He, Y., 2009. MSMOTE: improving classification performance when training data is imbalanced. In: Computer Science and Engineering, 2009. WCSE’09. Second International Workshop on. IEEE, pp. 13–17. Ishibuchi, H., Nojima, Y., 2007. Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning. Internat. J. Approx. Reason. 44, 4–31. Khamis, M.A., Gomaa, W., 2012. Enhanced multiagent multi-objective reinforcement learning for urban traffic light control. In: 2012 11th International Conference on Machine Learning and Applications. IEEE, pp. 586–591. Khamis, M.A., Gomaa, W., 2014. Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework. Eng. Appl. Artif. Intell. 29, 134–151. Khamis, M.A., Gomaa, W., El-Shishiny, H., 2012. Multi-objective traffic light control system based on Bayesian probability interpretation. In: 2012 15th International IEEE Conference on Intelligent Transportation Systems. IEEE, pp. 995–1000. Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A., 2010. Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern. A 41, 552–568. Khreich, W., Granger, E., Miri, A., Sabourin, R., 2010. Iterative Boolean combination of classifiers in the ROC space: An application to anomaly detection with HMMs. Pattern Recognit. 43, 2732–2752. Krawczyk, B., Galar, M., Jeleń, Ł., Herrera, F., 2016. Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl. Soft Comput. 38, 714–726. Krawczyk, B., Wozniak, M., 2018. Leveraging ensemble pruning for imbalanced data classification. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, pp. 439–444. Kuncheva, L.I., Whitaker, C.J., 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51, 181–207. Li, J.D., 2008. A two-step rejection procedure for testing multiple hypotheses. J. Statist. Plann. Inference 138, 1521–1527. Lin, Y., Lee, Y., Wahba, G., 2002. Support vector machines for classification in nonstandard situations. Mach. Learn. 46, 191–202. Lin, W.-C., Tsai, C.-F., Hu, Y.-H., Jhang, J.-S., 2017. Clustering-based undersampling in class-imbalanced data. Inform. Sci. 409, 17–26. Ling, C.X., Sheng, V.S., Yang, Q., 2006. Test strategies for cost-sensitive decision trees. IEEE Trans. Knowl. Data Eng. 18, 1055–1067. Liu, W., Chawla, S., Cieslak, D.A., Chawla, N.V., 2010. A robust decision tree algorithm for imbalanced data sets. In: Proceedings of the 2010 SIAM International Conference on Data Mining. SIAM, pp. 766–777. Liu, Y.-H., Chen, Y.-T., 2005. Total margin based adaptive fuzzy support vector machines for multiview face recognition. In: Systems, Man and Cybernetics, 2005 IEEE International Conference on. IEEE, pp. 1704–1711. Liu, B., Ma, Y., Wong, C.K., 2000. Improving an association rule based classifier. In: European Conference on Principles of Data Mining and Knowledge Discovery. Springer, pp. 504–509. Liu, X.-Y., Wu, J., Zhou, Z.-H., 2009. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. B 39, 539–550. López, V., Fernández, A., García, S., Palade, V., Herrera, F., 2013. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inform. Sci. 250, 113–141. Lopez-Garcia, P., Masegosa, A.D., Osaba, E., Onieva, E., Perallos, A., 2019. Ensemble classification for imbalanced data based on feature space partitioning and hybrid metaheuristics. Appl. Intell. 1–16.

Abbaszadeh, P., Alipour, A., Asadi, S., 2018. Development of a coupled wavelet transform and evolutionary L evenberg-M arquardt neural networks for hydrological process modeling. Comput. Intell. 34, 175–199. Ajilisa, O., Jagathyraj, V., Sabu, M., 2018. GABoost: A clustering based undersampling algorithm for highly imbalanced datasets using genetic algorithm. In: International Conference on Innovations in Bio-Inspired Computing and Applications. Springer, pp. 235–246. Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., 2011. KEEL Data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Logic Soft Comput. 1–10. Alcalá-Fdez, J., Sánchez, L., Garcia, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., 2009. KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 13, 307–318. Anand, A., Pugalenthi, G., Fogel, G.B., Suganthan, P., 2010. An approach for classification of highly imbalanced data using weighting and undersampling. Amino Acids 39, 1385–1391. Asadi, S., 2019. Evolutionary fuzzification of RIPPER for regression: Case study of stock prediction. Neurocomputing 331, 121–137. Asadi, S., Shahrabi, J., 2017. Complexity-based parallel rule induction for multiclass classification. Inform. Sci. 380, 53–73. Barandela, R., Valdovinos, R.M., Sánchez, J.S., 2003. New applications of ensembles of classifiers. Pattern Anal. Appl. 6, 245–256. Barandelaa, R., Sanchezb, J., Garcia, V., 2003. Strategies for learning in class imbalance problems. Batista, G.E., Prati, R.C., Monard, M.C., 2004. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6, 20–29. Batuwita, R., Palade, V., 2010. Efficient resampling methods for training support vector machines with imbalanced datasets. In: Neural Networks (IJCNN), the 2010 International Joint Conference on. IEEE, pp. 1–8. Breiman, L., 1996. Bagging predictors. Mach. Learn. 24, 123–140. Brown, G., Wyatt, J., Harris, R., Yao, X., 2005. Diversity creation methods: a survey and categorisation. Inf. Fusion 6, 5–20. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C., 2009. Safe-level-smote: Safelevel-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp. 475–482. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: synthetic minority over-sampling technique. J. Artificial Intelligence Res. 16, 321–357. Chawla, N.V., Cieslak, D.A., Hall, L.O., Joshi, A., 2008. Automatically countering imbalance and its empirical relationship to cost. Data Min. Knowl. Discov. 17, 225–252. Chawla, N.V., Japkowicz, N., Kotcz, A., 2004. Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6, 1–6. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W., 2003. SMOTEBoost: Improving prediction of the minority class in boosting. In: European Conference on Principles of Data Mining and Knowledge Discovery. Springer, pp. 107–119. Chen, Z., Lin, T., Xia, X., Xu, H., Ding, S., 2018. A synthetic neighborhood generation based ensemble learning for the imbalanced data classification. Appl. Intell. 48, 2441–2457. Cieslak, D.A., Chawla, N.V., 2008. Learning decision trees for unbalanced data. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, pp. 241–256. Collell, G., Prelec, D., Patil, K.R., 2018. A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data. Neurocomputing 275, 330–340. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T., 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197. Demšar, J., 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30. Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C., Kuncheva, L.I., 2015a. Random balance: ensembles of variable priors classifiers for imbalanced data. Knowl.-Based Syst. 85, 96–111. Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C.I., Kuncheva, L.I., 2015b. Diversity techniques improve the performance of the best imbalance learning ensembles. Inform. Sci. 325, 98–117. Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K., 1999. AdaCost: misclassification cost-sensitive boosting. In: ICML. pp. 97–105. Feng, W., Huang, W., Ren, J., 2018. Class imbalance ensemble learning based on the margin theory. Appl. Sci. 8 (815). Fernandes, E.R., de Carvalho, A.C., 2019. Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning. Inform. Sci. 494, 141–154. Finner, H., 1993. On a monotonicity problem in step-down multiple test procedures. J. Amer. Statist. Assoc. 88, 920–923. Fotouhi, S., Asadi, S., Kattan, M.W., 2019. A comprehensive data level analysis for cancer diagnosis on imbalanced data. J. Biomed. Inform.. Freund, Y., Schapire, R.E., 1996. Experiments with a new boosting algorithm, in: ICML, Bari, Italy, pp. 148-156. 18

S.E. Roshan and S. Asadi

Engineering Applications of Artificial Intelligence 87 (2020) 103319 Tahan, M.H., Asadi, S., 2018a. EMDID: Evolutionary multi-objective discretization for imbalanced datasets. Inform. Sci. 432, 442–461. Tahan, M.H., Asadi, S., 2018b. MEMOD: a novel multivariate evolutionary multi-objective discretization. Soft Comput. 22, 301–323. Tang, B., He, H., 2017. GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recognit. 71, 306–319. Tsai, C.-F., Lin, W.-C., Hu, Y.-H., Yao, G.-T., 2019. Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inform. Sci. 477, 47–54. Tumer, K., Ghosh, J., 1996. Error correlation and error reduction in ensemble classifiers. Connect. Sci. 8, 385–404. Vachhani, V.L., Dabhi, V.K., Prajapati, H.B., 2015. Survey of multi objective evolutionary algorithms. In: Circuit, Power and Computing Technologies (ICCPCT), 2015 International Conference on. IEEE, pp. 1–9. Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A., 2009. An empirical comparison of repetitive undersampling techniques. In: Information Reuse & Integration, 2009. IRI’09. IEEE International Conference on. IEEE, pp. 29–34. Wang, Q., Luo, Z., Huang, J., Feng, Y., Liu, Z., 2017. A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM. Comput. Intell. Neurosci.. Wang, S., Yao, X., 2009. Diversity analysis on imbalanced data sets by using ensemble models. In: Computational Intelligence and Data Mining, 2009. CIDM’09. IEEE Symposium on. IEEE, pp. 324–331. Wang, S., Yao, X., 2013. Relationships between diversity of classification ensembles and single-class performance measures. IEEE Trans. Knowl. Data Eng. 25, 206–219. Weiss, G.M., 2004. Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newsl. 6, 7–19. Wu, Z., Lin, W., Ji, Y., 2018. An integrated ensemble learning model for imbalanced fault diagnostics and prognostics. IEEE Access 6, 8394–8402. Xu, R., Chen, T., Xia, Y., Lu, Q., Liu, B., Wang, X., 2015. Word embedding composition for data imbalances in sentiment and emotion classification. Cogn. Comput. 7, 226–240. Zarandi, M.F., Kayvanfar, V., 2015. A bi-objective identical parallel machine scheduling problem with controllable processing times: a just-in-time approach. Int. J. Adv. Manuf. Technol. 77, 545–563. Zhang, S., Liu, L., Zhu, X., Zhang, C., 2008. A strategy for attributes selection in cost-sensitive decision trees induction. In: Computer and Information Technology Workshops, 2008. CIT Workshops 2008. IEEE 8th International Conference on. IEEE, pp. 8–13. Zhou, Z.-H., 2012. Ensemble Methods: Foundations and Algorithms. CRC press. Zhou, A., Qu, B.-Y., Li, H., Zhao, S.-Z., Suganthan, P.N., Zhang, Q., 2011. Multiobjective evolutionary algorithms: A survey of the state of the art. Swarm Evol. Comput. 1, 32–49. Zhou, Z.-H., Tang, W., 2003. Selective ensemble of decision trees. In: International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing. Springer, pp. 476–483.

Lu, Y., Cheung, Y.-m., Tang, Y.Y., 2016. Hybrid sampling with bagging for class imbalance learning. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp. 14–26. Mansourypoor, F., Asadi, S., 2017. Development of a reinforcement learning-based evolutionary fuzzy rule-based system for diabetes diagnosis. Comput. Biol. Med. 91, 337–352. Mehmanpazir, F., Asadi, S., 2017. Development of an evolutionary fuzzy expert system for estimating future behavior of stock price. J. Ind. Eng. Int. 13, 29–46. Molinara, M., Ricamato, M.T., Tortorella, F., 2007. Facing imbalanced classes through aggregation of classifiers. In: Image Analysis and Processing, 2007. ICIAP 2007. 14th International Conference on. IEEE, pp. 43–48. Munkhdalai, T., Namsrai, O.-E., Ryu, K.H., 2015. Self-training in significance space of support vectors for imbalanced biomedical event data. BMC Bioinformatics 16 (S6). Nanni, L., Fantozzi, C., Lazzarini, N., 2015. Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158, 48–61. Raghuwanshi, B.S., Shukla, S., 2019a. Class imbalance learning using underbagging based kernelized extreme learning machine. Neurocomputing 329, 172–187. Raghuwanshi, B.S., Shukla, S., 2019b. SMOTE based class-specific extreme learning machine for imbalanced learning. Knowl.-Based Syst.. Rom, D.M., 1990. A sequentially rejective test procedure based on a modified Bonferroni inequality. Biometrika 77, 663–665. Ronoud, S., Asadi, S., 2019. An evolutionary deep belief network extreme learning-based for breast cancer diagnosis. Soft Comput. 23, 13139–13159. Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F., 2015. SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inform. Sci. 291, 184–203. Samuel, O.W., Asogbon, G.M., Sangaiah, A.K., Fang, P., Li, G., 2017. An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction. Expert Syst. Appl. 68, 163–172. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A., 2010. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. A 40, 185–197. Stefanowski, J., Wilk, S., 2008. Selective pre-processing of imbalanced data for improving classification performance. In: International Conference on Data Warehousing and Knowledge Discovery. Springer, pp. 283–292. Sun, B., Chen, H., Wang, J., Xie, H., 2018. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front. Comput. Sci. 12, 331–350. Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y., 2007. Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit. 40, 3358–3378. Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., Zhou, Y., 2015. A novel ensemble method for classifying imbalanced data. Pattern Recognit. 48, 1623–1637. Sun, Y., Wong, A.K., Kamel, M.S., 2009. Classification of imbalanced data: A review. Int. J. Pattern Recognit. Artif. Intell. 23, 687–719.

19