- Email: [email protected]

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

An experimental study on evolutionary fuzzy classiﬁers designed for managing imbalanced datasets Michela Antonelli, Pietro Ducange, Francesco Marcelloni n Dipartimento di Ingegneria dell'Informazione, University of Pisa, 56122 Pisa, Italy

art ic l e i nf o

a b s t r a c t

Article history: Received 16 October 2013 Received in revised form 7 February 2014 Accepted 3 April 2014 Available online 11 July 2014

In this paper, we show an experimental study on a set of evolutionary fuzzy classiﬁers (EFCs) purposely designed to manage imbalanced datasets. Three of these EFCs represent the state-of-the-art of the main approaches to the evolutionary generation of fuzzy rule-based systems for imbalanced dataset classiﬁcation. The fourth EFC is an extension of a multi-objective evolutionary learning (MOEL) scheme we have recently proposed for managing imbalanced datasets: the rule base and the membership function parameters of a set of FRBCs are concurrently learned by optimizing the sensitivity, the speciﬁcity and the complexity. By using non-parametric tests, we ﬁrst compare the results obtained by the four EFCs in terms of area under the ROC curve. We show that our MOEL scheme outperforms two of the comparison algorithms and results to be statistically equivalent to the third. Further, the classiﬁers generated by our MOEL scheme are characterized by a lower number of rules than the ones generated by the other approaches. To validate the effectiveness of our MOEL scheme in dealing with imbalanced datasets, we also compare our results with the ones achieved, after rebalancing the datasets, by two state-of-the-art algorithms, namely FURIA and FARC-HD, proposed for generating fuzzy rule-based classiﬁers for balanced datasets. We show that our MOEL scheme is statistically equivalent to FURIA, which is associated with the highest accuracy rank in the statistical tests. However, the rule bases generated by FURIA are characterized by a low interpretability. Finally, we show that the results achieved by our MOEL scheme are statistically equivalent to the ones achieved by four state-of-the-art approaches, based on ensembles of non-fuzzy classiﬁers, appropriately designed for dealing with imbalanced datasets. & 2014 Published by Elsevier B.V.

Keywords: Genetic and evolutionary fuzzy systems Fuzzy rule-based classiﬁers Imbalanced datasets

1. Introduction In the framework of classiﬁcation systems, the automatic identiﬁcation of patterns belonging to two classes or categories is denoted as “binary classiﬁcation”. In most of the cases, these two classes correspond, respectively, to a normal condition (negative or majority class) and to an anomalous or alert condition (positive or minority class) of a certain phenomenon or system. A number of real world applications can be found in the context of binary classiﬁcation, such as intrusion detection [1], medical diagnostics [2,3] and fault detection [4]. In order to automatically design an accurate model for classiﬁcation tasks, we need an effective set of training examples. Usually, in the case of the previously cited applications of binary classiﬁcation, the available data samples associated with the

n

Corresponding author. Tel.: þ 39 0502217678; fax: þ39 0502217600. E-mail address: [email protected] (F. Marcelloni).

http://dx.doi.org/10.1016/j.neucom.2014.04.070 0925-2312/& 2014 Published by Elsevier B.V.

negative classes are much more abundant than the ones corresponding to the positive classes, i.e. imbalanced datasets have to be handled [5]. Indeed, luckily, anomalous conditions are in general rare and it is often difﬁcult to collect a large number of representative examples. The issue of learning classiﬁers when managing imbalanced datasets is still a hot topic in the machine learning research community [5,6]. Indeed, classical machine learning algorithms generate models that aim to maximize the percentage of correct classiﬁcation or to minimize the percentage of classiﬁcation error. If we use these algorithms with imbalanced datasets, the generated models usually are characterized by a bias towards the recognition of the majority class, making the minority class poorly recognized. Currently, a large number of contributions regarding classiﬁcation with imbalanced datasets [7–13] are still being published in the specialized literature. As stated also in [7], the proposed approaches can be organized into four main categories, namely methods acting at data level [14,15], approaches acting on the learning algorithms [16–18], cost sensitive methodologies which

126

M. Antonelli et al. / Neurocomputing 146 (2014) 125–136

combine both algorithmic and data level approaches [19–21], and techniques which consider ensembles of classiﬁers [7,8,22–24]. Since 2008, fuzzy rule-based classiﬁers (FRBCs) have attracted the attention of researchers also in the framework of binary classiﬁcation with imbalanced datasets [15,17,25]. FRBCs have proved to be very suitable for classiﬁcation tasks: on the one side they can achieve very high accuracy and on the other side they can explain through the rules how the classiﬁcation is performed. In speciﬁc application domains, this feature is very appealing, because it allows understanding which features and which values of these features actually permit to discriminate a class among the other classes. With the aim of managing imbalanced datasets, however, some appropriate strategies have to be exploited for designing the FRBC structure, that is, designing the rule base (RB) and the data base (DB). The ﬁrst study on the use of FRBCs with imbalanced datasets has been discussed in [15]. Here, the authors have analyzed the synergy between preprocessing mechanisms for re-balancing the training instances and speciﬁc algorithms aimed at generating the RB. The results achieved by three state-of-the-art learning algorithms for FRBCs, using both the original imbalanced training sets and their rebalanced versions, have been compared. The Synthetic Minority Oversampling Technique (SMOTE) [14] algorithm resulted to be the most performing one among different preprocessing mechanisms. In [25], authors have also studied the positive inﬂuence of re-balancing techniques when performing the genetic rule selection for hierarchical FRBCs. A similar study has been also conducted in [26], where the inﬂuence of the Adaptive Inference System for FRBCs in the framework of imbalanced datasets has been analyzed. Also in this case, the authors have demonstrated that the use of SMOTE allows improving the accuracies of the generated FRBCs. Recently, a number of contributions have exploited evolutionary optimization algorithms for generating FRBCs suitable for imbalanced datasets. These approaches are denoted as evolutionary fuzzy classiﬁers (EFCs). EFCs are a speciﬁc category of the genetic and evolutionary fuzzy systems [27]. In this paper, we aim to perform an experimental comparison among a set of EFCs suitable for managing imbalanced datasets. We have selected, to the best of our knowledge, the three most recent EFCs, discussed in [11,28,29], which represent the last advances in the framework of EFCs for imbalanced datasets. Further, we have compared the results achieved by these EFCs with the ones achieved by an extended version of the method we proposed in [17]. We carry out an extensive statistical analysis by using nonparametric statistical tests. First, we compare the four EFCs using twenty-two highly imbalanced datasets. We show that, even though we do not re-balance the training set, the set of FRBCs generated by our MOEL scheme outperforms two comparison EFCs and achieves statistically equivalent results as the remaining EFC. As regards the complexity of the generated FRBCs, the classiﬁers generated by our MOEL scheme are always characterized by a lower number of rules than the ones obtained by the three comparison EFCs. Further, we statistically compare the results achieved by our MOEL scheme with the ones achieved by two of the most interesting stateof-the-art methods for generating FRBCs, namely FURIA [30] and FARC-HD [31]. In this analysis, we consider forty-four imbalanced datasets and we rebalance the training sets for FURIA and FARC-HD, since these algorithms were not proposed for imbalanced datasets. FURIA results to be the algorithm associated with the highest performance rank in the tests. On the other hand, we show that the solutions generated by our MOEL scheme are statistically equivalent to the ones generated by FURIA. Finally, by using non-parametric statistical tests, we also compare the results achieved by the FRBCs generated by our MOEL scheme with the ones achieved by four state-of-the art approaches, based on boosting and bagging strategies, for

generating ensembles of classiﬁers for imbalanced datasets. We show that the results are statistically equivalent, thus proving that our MOEL scheme produces classiﬁers which are not only interpretable, but also very efﬁcient. In a nutshell, the main contributions of this paper are:

A new version of an MOEL scheme for generating highly interpretable FRBCs for imbalanced datasets.

A statistical comparison, in terms of accuracy and interpret

ability, among the proposed MOEL scheme and three EFCs recently proposed for dealing with imbalanced datasets. A statistical comparison among the results achieved by the FRBCs generated by our MOEL and by two state-of-the-art learning algorithms. A statistical comparison among the results achieved by the FRBCs generated by our MOEL and the ones achieved by ensembles of non-fuzzy classiﬁers purposely designed for dealing with imbalanced datasets.

The paper is organized as follows: Section 2 introduces some preliminary concepts and notations regarding the FRBCs. In Section 3 we discuss the four EFCs considered in the experimental comparison. Section 4 shows the different statistical analyses and Section 5 draws some conclusion.

2. Fuzzy rule-based classiﬁers Pattern classiﬁcation consists of assigning a class Ck from a predeﬁned set C ¼ fC 1 ; …; C K g of classes to an unlabeled pattern. We consider a pattern as an F-dimensional point in a feature space RF . Let X ¼ fX 1 ; …; X F g be the set of input variables and Uf, f ¼ 1; …; F, be the universe of discourse of the fth variable. Let P f ¼ fAf ;1 ; …; Af ;T f g be a fuzzy partition of Tf fuzzy sets on variable Xf. Tf deﬁnes the granularity of the partition Pf. The DB of an FRBC is the set of parameters which describe the partitions Pf of each input variable. The RB contains a set of M rules usually expressed as Rm : IF X 1 is A1;jm;1 AND…AND X F is AF;jm;F THEN Y is C jm with RW m

ð1Þ

where Y is the classiﬁer output, C jm is the class label associated with the mth rule, jm;f A ½1; T f , f ¼ 1; …; F, identiﬁes the index of the fuzzy set (among the Tf linguistic terms of partition Pf), which has been selected for Xf in rule Rm. RWm is the rule weight, i.e. a certainty degree of the classiﬁcation in the class C jm for a pattern belonging to the fuzzy subspace delimited by the antecedent of the rule Rm. Let ðxt ; yt Þ be the tth input–output pair, with xt ¼ ½xt;1 …; xt;F A RF and yt A C. The strength of activation (matching degree of the rule with the input) of the rule Rm is calculated as F

wm ðxt Þ ¼ ∏ Af ;jm;f ðxt;f Þ;

ð2Þ

f ¼1

where Af ;jm;f ðxÞ is the membership function (MF) associated with the fuzzy set Af ;jm;f . The association degree with the class C jm is calculated as ð3Þ hm ðxt Þ ¼ wm ðxt Þ RW m Two different deﬁnitions of the rule weight RWm can be commonly found in the literature [32,33]:

1. The certainty factor: CF m ¼

∑xt A C jm wm ðxt Þ : ∑N t ¼ 1 wm ðx t Þ

ð4Þ

M. Antonelli et al. / Neurocomputing 146 (2014) 125–136

3. Evolutionary fuzzy classiﬁers for imbalanced datasets

2. The penalized certainty factor: PCF m ¼ CF m

∑xt 2= C jm wm ðxt Þ ∑N t ¼ 1 wm ðx t Þ

:

ð5Þ

An FRBC is also characterized by its reasoning method, which uses the information from the RB to determine the class label for a speciﬁc input pattern. Also in this case, two different approaches are often adopted in the literature:

1. The maximum matching: An input pattern is classiﬁed into the class corresponding to the rule with the maximum association degree calculated for the pattern. 2. The additive combination: An input pattern is classiﬁed into the class corresponding to the maximum total strength of vote. In particular, the total strength of vote for each class is computed as follows: V C k ðxt Þ ¼

∑

Rm A RB;C jm ¼ C k

hm ðxt Þ

ð6Þ

where k ¼ f1; …; Kg. With this method, for a new pattern xt, each fuzzy rule gives a vote for its consequent class. There exists also a normalized version of the additive combination, where the total strength of vote is calculated as V C k ðxt Þ ¼

V C k ðxt Þ maxl ¼ 1…K ðV C l ðxt ÞÞ

ð7Þ

The RB of an FRBC can also consist of disjunctive normal form (DNF) rules [34]. A DNF rule is expressed as R^ m : IF X 1 is A^ 1;jm;1 AND…AND X F is A^ F;jm;F THEN Y is C jm with RW m

127

ð8Þ

where A^ f ;jm;f ¼ fAf ;j1 or …or Af ;jz g is a set of z fuzzy sets, z o T f , of the partition Pf joined by a disjunctive operator. The matching degree formula is similar to the one in (2), where the membership degree of each xf ;t to the speciﬁc A^ f ;jm;f is calculated considering the disjunctive operator among the membership degrees to each fuzzy set in fAf ;j1 ; …; Af ;jz g. The two types of fuzzy rules that we have described are based on single granularities, i.e. the speciﬁc fuzzy partitions Pf of Tf fuzzy sets must be deﬁned for each Xf. Actually, in the literature there exist a number of contributions which deal with RBs deﬁned on multiple granularities [33]. In this case, the speciﬁc partitions Pf are not deﬁned for each Xf. On the contrary, a set of global partitions with a different number of fuzzy sets are deﬁned. Each rule has the same structure than Rm but the fuzzy sets Af ;jm;f , which describe the antecedent condition of each variable Xf, can be chosen among the fuzzy sets that deﬁne the different global partitions. Thus, different partitions of the linguistic variables can be used for different rules. Multiple granularities allow avoiding to ﬁx a-priori the number of fuzzy sets, i.e. the granularity, of each linguistic variable. On the other hand, as stated in [35], the semantic interpretability of RBs deﬁned on multiple granularities is lower than the one of classical RBs deﬁned on single granularities. Usually, to take the “don't care” condition into account, a new fuzzy set Af ;0 (f ¼ 1; …; F) is deﬁned for all the F input variables, both for rules deﬁned on single and multiple granularities. This fuzzy set is characterized by an MF equal to 1 on the overall universe. The term Af ;0 allows generating rules that contain only a subset of the input variables.

A number of approaches based on evolutionary algorithms have been proposed to generate FRBCs for imbalanced datasets. Recently, some papers [27,36,37] have proposed to organize genetic and evolutionary fuzzy systems (GEFSs) into categories. The two coarsest categories, which group GEFSs, contain fuzzy systems generated by evolutionary parameters learning and evolutionary parameters tuning, respectively. The former includes evolutionary rule base learning, evolutionary data base parameter learning, evolutionary rule selection, and simultaneous evolutionary learning/selection of rules and data base parameter learning. The latter includes data base parameter tuning and genetic adaptive inference engines. Obviously, a number of different subcategories can be derived considering different types of fuzzy rules, of data base structures, of inference engines, of evolutionary algorithms and so on. A deep discussion on a speciﬁc taxonomy can be found in [27]. The four EFCs selected for our experimental analysis belong to different subcategories of the GEFS taxonomy. In particular, the ﬁrst EFC is based on an embedded learning DB that concurrently selects relevant input variables, determines the granularities of the fuzzy partitions and generates the RB of an FRBC [29]. The second EFC deals with hierarchical FRBCs identiﬁed by means of an RB learning based on genetic programming and a subsequent genetic rule selection and MF parameter tuning [11]. The third EFC aims to generate accurate FRBCs, based on multiple granularities, by using ﬁrst a state-of-art single-objective evolutionary RB learning process and then a post-process for MF parameter tuning [28]. Following the suggestions provided in [15], these three EFCs use the SMOTE oversampling algorithm [14] to re-balance the training set. In SMOTE the minority class is oversampled by taking each minority class sample and introducing synthetic examples along the line segments joining any or all of the k minority class nearest neighbors. Depending upon the amount of oversampling required, neighbors from the k-nearest neighbors are randomly chosen. The fourth EFC exploits a multi-objective evolutionary algorithm to concurrently generate the RBs and the DBs of a set of FRBCs characterized by different trade-offs between accuracy and interpretability. This method is an extension of the approach proposed by us in [36]. Unlike the other three approaches, our EFC does not need to re-balance the training set. In the following, for each EFC, we brieﬂy introduce the subcategory and describe the main aspects of the speciﬁc algorithm. 3.1. Evolutionary DB learning with embedded RB generation In the evolutionary DB learning with embedded RB generation, an evolutionary algorithm evolves a population of DBs. The knowledge base corresponding to each DB is obtained by applying a heuristic RB learning method [27]. In the work discussed in [29], the evolutionary process concurrently selects the most relevant input variables and determines the correct granularities, i.e. the number Tf of triangular fuzzy sets for each linguistic input variable Xf. Once the DB has been determined, i.e. for each selected variable the corresponding granularity has been chosen, the RB is generated by using the Chi algorithm [38]. A standard genetic algorithm is used to deﬁne the DB by exploiting a chromosome C composed of two parts, namely the Cv and the Cg parts. The Cv part is a binary vector where each gene codiﬁes if the corresponding variable is selected or not for the DB. The Cg part is an integer vector where each value codiﬁes the number of fuzzy sets used in the partition of the corresponding variable (if selected). The size of each part is equal to the number F of input variables. The binary tournament and the classical

128

M. Antonelli et al. / Neurocomputing 146 (2014) 125–136

one-point crossover are used as selection and crossover operators, respectively. As regards mutation, the ﬂip-ﬂop mutation operator is used for the Cv part. For the Cg part, the mutation operator randomly selects a variable Xf and then randomly increases or decreases its granularity. The authors denote the proposed approach as GA-FS þGL (genetic algorithm for feature selection and granularity learning). The following ﬁtness function is used for evaluating the overall KB:

ϕ ¼ ω1 ð1 AUCÞ þ ω2 ðN g =FÞ

ð9Þ

where AUC is the area under the Receiver Operating Characteristic (ROC) curve [39–41], Ng is the sum of the granularities of all the selected variables, ω1 and ω2 are two appropriately deﬁned weights (see [29] for more details). AUC is computed as AUC ¼

1 þ TPR FPR : 2

ð10Þ

where TPR and FPR are the True Positive Rate and the False Positive Rate, respectively [42]. Actually, in [43] a multi-objective version of GA-FSþGL, denoted as MGA-FSþGL, has been proposed by the same authors. In MGA-FSþGL the AUC and N g =F have been considered as different objectives to be optimized by using the well known NSGA-II algorithm [44]. We do not consider MGA-FSþ GL in the experimental comparisons because, as stated by the authors themselves in [43], the results achieved by MGAFSþ GL are worse than the ones achieved by GA-FSþGL, both in terms of accuracy and in terms of interpretability. 3.2. Evolutionary RB learning and DB tuning of hierarchical fuzzy classiﬁers Hierarchical FRBCs [45] are composed by a set of Q layers. For each layerðt; nðtÞÞ, where t ¼ 1; …; Q and n(t) is the number of fuzzy sets that compose the partition P t;f of each input variable Xf, a speciﬁc DBðt; nðtÞÞ and RBðt; nðtÞÞ must be deﬁned. The ﬁnal hierarchical knowledge base is formed by the union of the knowledge bases of each layer. It follows that, for each variable Xf, Q different fuzzy partitions of n(t) fuzzy sets are generated and the ﬁnal RB is composed of Q blocks of rules deﬁned on different fuzzy partitions. In the work discussed in [11] a two-layer hierarchical FRBC is considered, where nð1Þ ¼ 5 and nð2Þ ¼ 9. For each layer, triangular and uniformly distributed fuzzy partitions have been used. In order to generate the hierarchical RB the authors have used a modiﬁed version of GP-COACH [46]. GP-COACH is a GP-based RB learning algorithm where each chromosome codiﬁes a DNF rule and the overall population represents the RB of the fuzzy classiﬁer. The RB learning is achieved by exploiting a cooperative–competitive approach. The GP-based algorithm also includes a token competition strategy for maintaining the diversity in the population. The GP-COACH exploits two evaluation functions: a local ﬁtness, to evaluate the performance of each rule, and a global ﬁtness, to evaluate the behavior of the overall RB. The deﬁnition of these functions can be found in [46]. With the aim of dealing with hierarchical RBs, in [11] GPCOACH has been modiﬁed by including a rule expansion process to determine, at each generation of the algorithm, rules deﬁned on different granularities. Indeed, the population of GP-COACH is a mixed population that combines primary and secondary rules, where secondary rules have granularities different from the primary rules. Obviously, also the genetic operators and the local and global ﬁtness functions of the original GP-COACH have been modiﬁed. We denote this variant of GP-COACH as GP-COACH-H in the following. Once the hierarchical RB has been generated by using GPCOACH-H, a post-processing step, which includes a genetic rule

selection and a genetic MF function parameter tuning, is performed. In order to carry out the genetic tuning, the authors use the linguistic 2-tuples representation [47] which allows the lateral displacement of the MF labels considering only one parameter. The CHC evolutionary algorithm is used to concurrently perform the rule selection and the genetic tuning. The chromosome used in the CHC is formed by two parts: (i) a binary part that codiﬁes the indexes of the selected rules and (ii) a real part that codiﬁes, for each MF of each input variable, the value of the lateral displacement. To evaluate each chromosome during the evolution of the CHC, the accuracy on the training set is used as ﬁtness function. 3.3. Evolutionary MF parameters tuning based on 2-tuples representation In the work in [28], the authors discuss how the MF parameter genetic tuning, based on the linguistic 2-tuples representation [47], allows improving the accuracy of an FRBC previously generated for dealing with imbalanced datasets. Actually, in [28], two different methods for generating the initial FRBC have been compared, namely the Chi algorithm [38] and the FH-GBML [48]. Further, two different versions of the genetic lateral tuning have been discussed, namely the Global Tuning of the Semantics and the Local Tuning of the Rules [47]. In our analysis, we consider the FH-GBML with the ﬁrst version of the tuning because this scheme is associated with the highest accuracies for the datasets characterized by a very high imbalance ratio. We recall that this index is deﬁned as the ratio between the number of instances of the majority and minority classes [15]. FH-GBML uses a single objective evolutionary algorithm for learning FRBCs based on multiple granularities. The algorithm handles the complete RB as a single individual and includes also a Genetic Cooperative–Competitive Learning scheme which allows a local optimization of each single rule. The conditions of each rule can be chosen among 14 linguistic terms which correspond to Ruspini's strong partitions with two, three, four and ﬁve uniformly distributed fuzzy sets. Further, also the “don't care” terms can be chosen during the RB learning process. In order to directly deal with imbalanced datasets, the authors modiﬁed the ﬁtness function optimized by FH-GBML. Instead of using the percentage of correctly classiﬁed patterns, the algorithm directly optimizes the AUC. Nevertheless, in order to achieve better performances, the training set has been rebalanced by using SMOTE. 3.4. Multi-objective evolutionary fuzzy classiﬁers In the last years, multi-objective evolutionary algorithms (MOEA) have been extensively used for generating fuzzy rule-based systems characterized by different trade-offs between accuracy and interpretability. A number of MOEA-based approaches have been proposed for designing fuzzy rule-based systems, both for regression and classiﬁcation problems, and the term Multi-objective Evolutionary Fuzzy Systems has been coined. An extensive review on this topic has been recently published in [37]. In order to generate the structure of FRBCs for imbalanced datasets, we extend the multi-objective evolutionary RB learning scheme introduced in [17]. By executing a multi-objective evolutionary algorithm we concurrently generate the RB and the MF parameters of the FRBCs from scratch. Actually, since we use a multi-objective optimization, we generate a set of different classiﬁers characterized by different trade-offs between accuracy and interpretability. The proposed MOEL scheme exploits the RB and DB coding proposed in [17] and in [49], respectively. The chromosome C is composed of two parts (CRB, CDB), which deﬁne the RB and the MF parameters of the input variables, respectively.

M. Antonelli et al. / Neurocomputing 146 (2014) 125–136

As shown in Fig. 1, the CRB part of the chromosome is an integer vector which codiﬁes the RB: for each rule Rm it contains the indexes jm;f of the antecedent, for each input variable Xf, and the consequent class C jm . Fig. 2 shows the CDB part, which consists of F vectors of real numbers: the fth vector contains the ½bf ;2 ; …bf ;T f 1 cores which deﬁne the positions of the MFs for the linguistic variable Xf. Indeed, we adopt triangular fuzzy sets Af ;j deﬁned by the tuple (af ;j , bf ;j , cf ;j ), where af ;j and cf ;j correspond to the left and right extremes of the support of Af ;j , and bf ;j to the core. Since we adopt strong fuzzy partitions with, for j ¼ 2; …; T f 1, bf ;j ¼ cf ;j 1 and bf ;j ¼ af ;j þ 1 , in order to deﬁne each fuzzy set of the partition it is sufﬁcient to ﬁx the positions of the cores bf ;j throughout the universe Uf of the fth input variable. Since bf ;1 and bf ;T f coincide with the extremes of the universe, the partition of each input variable Xf is completely deﬁned by T f 2 parameters. To generate new solutions we exploit mutation and crossover operators for both the parts of the chromosome. As regards the CRB part, we employ the one-point crossover and three appropriately deﬁned mutation operators which allow to, respectively, add, remove or modify one or more rules (see [17] for more details). As regards the CDB part, we exploit the BLX-α crossover, with α ¼0.5, and a mutation operator that randomly chooses a variable Xf, f A ½1; F, and a fuzzy set j A ½2; T f 1 and then replaces the value of bf ;j with a value randomly chosen within its deﬁnition interval. As multi-objective evolutionary algorithm we use the (2 þ2)MPAES that has been successfully employed in our previous works [49–51]. During the evolutionary process, the (2 þ2)M-PAES generates an approximation of the optimal Pareto front by concurrently maximizing the accuracy and the interpretability of the generated solutions. (2þ 2)M-PAES, which is a modiﬁed version of the well-known (2þ 2)PAES introduced in [52], is a steady state multi-objective evolutionary algorithm which uses two current solutions s1 and s2 and stores the nondominated solutions in an archive. Unlike the classical (2þ2)PAES, which maintains the current solutions until they are not replaced by solutions with particular characteristics, we randomly extract, at each iteration, the current solutions. If the archive contains a unique solution, s1 and s2 correspond to this unique solution. In Fig. 3 we show the pseudo-code of the proposed MOEL scheme. At the beginning, the archive is initialized as an empty structure and two initial current solutions s1 and s2 are randomly generated. At each iteration, the application of crossover and mutation operators produces two new candidate solutions, o1 and o2 from the current solutions s1 and s2. These candidate solutions are added to the archive only if they are dominated by no solution contained in the archive; possible solutions in the archive dominated by the candidate solutions are removed. Typically, the size of the archive is ﬁxed at the beginning of the execution of the (2þ2)M-PAES. In this case, when the archive is

129

Fig. 3. Pseudo-code of the proposed MOEL scheme.

full and a new solution oi, where i¼1,2, has to be added to the archive, if it dominates no solution in the archive, then we insert oi into the archive and remove the solution (possibly oi itself) that belongs to the region with the highest crowding degree (these steps are implemented by the function insert_in_archive()). The crowing degree is calculated by using and adaptive grid deﬁned on the objective spaces. If the region contains more than one solution, then, the solution to be removed is randomly chosen. (2þ 2)M-PAES terminates after a given number Z of iterations. The candidate solution acceptance strategy generates an archive which contains only non-dominated solutions. On (2þ2)M-PAES termination, the archive includes the set of solutions which are an approximation of the Pareto front. In order to deal with imbalanced datasets, as experimented in [17], we split the accuracy into two objectives, namely TPR and FPR. We recall that TPR and FPR coincide, respectively, with the sensitivity and the complement to 1 of the speciﬁcity. Finally, we calculate the interpretability in terms of complexity of the RB, i.e. as the total number of conditions in the antecedent part of the rules in the RB. In the following, we denote our approach as PAES-3obj.

Fig. 1. The CRB part of a chromosome.

4. Experimental study

Fig. 2. The CDB part of a chromosome.

In this section, we discuss three different experimental analyses. The ﬁrst analysis regards the comparison between PAES3obj and the three EFCs described in the previous section. The second analysis deals with the comparison between PAES-3obj and two well known state-of-the-art algorithms for generating fuzzy rule-based classiﬁers, which are appropriately adapted for dealing with imbalanced datasets. Finally, we compare the results achieved by the FRBCs generated by PAES-3obj with the ones

130

M. Antonelli et al. / Neurocomputing 146 (2014) 125–136

achieved by four state-of-the-art ensembles of non-fuzzy classiﬁers purposely designed for dealing with imbalanced datasets. 4.1. Experimental setup We executed PAES-3obj on 44 imbalanced datasets used in the most recent contribution regarding EFCs for imbalanced datasets [11]. We extracted these datasets from the KEEL repository.1 As regards the ﬁrst analysis, we made the comparison analysis considering only the 22 common datasets, since only for these datasets the results were available for all the three EFCs in the papers where they were introduced. Actually, since GP-COACH-H has been experimented by the authors with all the 44 datasets, we also performed a pairwise comparison between GP-COACH-H and PAES-3obj considering 44 datasets. We collected, for each comparison approach, the results corresponding to the most performing conﬁguration of the algorithm described in the speciﬁc paper. On the other hand, the second analysis has been performed considering all the 44 datasets. For the same reasons that justify the number of datasets used for the ﬁrst analysis, also the third analysis has been performed considering 22 datasets. As shown in Table 1, the datasets are characterized by different numbers of input variables (from 7 to 13) and instances (from 92 to 4174), and by different values of the imbalance ratio. The last column of Table 1 speciﬁes if a dataset is included or not into the reduced set composed by 22 datasets. In [11,28,29], the authors have used a stratiﬁed ﬁve-fold cross validation for GA-FSþGL, GP-COACH-H and FH-GBML, respectively. Thus, also for PAES-3Obj we have carried out a stratiﬁed ﬁve-fold cross-validation and used the same data partitions as GAFSþGL, GP-COACH-H and FH-GBML. Further, we have executed six trials for each fold with different seeds for the random function generator (30 trials in total). In order to analyze the results of PAES-3obj, each threedimensional Pareto front approximation is projected onto the FPR-TPR plane: each FRBC of the Pareto front approximation is represented as a point corresponding to the pair (FPR, TPR). These pairs give origin to the so-called ROC curve [39–41]. We recall that, in the framework of binary classiﬁers, the ROC curve analysis is typically adopted to evaluate the capability of correctly classifying patterns. In particular, one classiﬁer in the ROC space is better than (dominates) another if it is located more north-west (higher TPR and/or lower FPR) than the other. For this reason, in order to select a set of potentially optimal FRBCs, we extract the non-dominated solutions obtained on the training set in the FPR-TPR plane and use only these solutions for plotting the ROC curve. Similar to the works in [8,36,53], to evaluate the overall quality of the set of solutions generated by our approach, we calculate the AUC. Actually, in [36,53] authors consider the Area Under the ROC Convex Hull. This area is obtained by eliminating from the FPR-TPR plane all the solutions which do not belong to the convex hull of the ROC curve. Indeed, when a cost function which linearly combines TPR and FPR is used for selecting a speciﬁc classiﬁer [40], the solutions in the ROC convex hull are assumed to be the most performing ones. Since in this work we dot not assume to use any cost function for selecting a single optimal classiﬁer, we consider all the non-dominated solutions in the FPR-TPR plane. As regards all the comparison algorithms, they do not generate a set of FRBCs characterized by different values of TPR and FPR but they just provide one classiﬁer with its associated sensitivity and speciﬁcity. In this case, the corresponding AUC is computed as in (10). We recall that a similar approach has been followed also in [53] for comparing classiﬁers generated by multi-objective evolutionary 1

Available at http://sci2s.ugr.es/keel/datasets.php

Table 1 Datasets used in the experiments. Dataset

# Instances

# Variables

Imbalance ratio

Reduced set

abalone19 abalone9vs18 cleveland0vs4 ecoli0137vs26 ecoli0146vs5 ecoli0147vs2356 ecoli0147vs56 ecoli01vs235 ecoli01vs5 ecoli0234vs5 ecoli0267vs35 ecoli0346vs5 ecoli0347vs56 ecoli034vs5 ecoli046vs5 ecoli067vs35 ecoli067vs5 ecoli4 glass0146vs2 glass015vs2 glass016vs2 glass016vs5 glass04vs5 glass06vs5 glass2 glass4 glass5 led7digit02456789vs1 page-blocks13vs2 shuttle0vs4 shuttle2vs4 vowel0 yeast0256vs3789 yeast02579vs368 yeast0359vs78 yeast05679vs4 yeast1289vs7 yeast1458vs7 yeast1vs7 yeast2vs4 yeast2vs8 yeast4 yeast5 yeast6

4174 731 177 281 280 336 332 244 240 202 224 205 257 200 203 222 220 336 205 172 192 184 92 108 214 214 214 443 472 1829 129 988 1004 1004 506 528 947 693 459 514 482 1484 1484 1484

8 8 13 7 6 7 6 7 6 7 7 7 7 7 6 7 6 7 9 9 9 9 9 9 9 9 9 7 10 9 9 13 8 8 8 8 8 8 8 8 8 8 8 8

128.87 16.68 12.62 39.15 13 10.59 12.28 9.17 11 9.1 9.18 9.25 9.28 9 9.15 9.09 10 13.84 11.06 9.12 10.29 19.44 9.22 11 10.39 15.47 22.81 10.97 15.85 13.87 20.5 10.1 9.14 9.14 9.12 9.35 30.56 22.1 13.87 9.08 23.1 28.41 32.78 39.15

Yes Yes No Yes No No No No No No No No No No No No No Yes No No Yes Yes No No Yes Yes Yes No Yes Yes Yes Yes No No No Yes Yes Yes Yes Yes Yes Yes Yes Yes

algorithms and classiﬁers generated using single-objective evolutionary algorithms and non-evolutionary state-of-the-art machine learning algorithms. In order to verify if there exist statistical differences among the AUC on the test set associated with the classiﬁers generated by the different algorithms, we have performed a statistical analysis. As suggested in [54], we have applied non-parametric statistical tests for multiple comparisons by combining all the datasets: for each approach we have generated a distribution consisting of the mean values of AUC calculated on the test set. We ﬁrst have applied the Friedman test in order to compute a ranking among the distributions [55]. Then, we have applied the Iman and Davenport test in order to evaluate whether there exist statistically relevant differences among the mean values of AUC computed for the different algorithms [56]. If there exist, then we apply a post-hoc procedure, namely the Holm test [57]. This test allows detecting effective statistical differences between the control approach, i.e. the one with the lowest Friedman rank, and the remaining approaches.

4.2. Comparison among evolutionary fuzzy classiﬁers designed for imbalanced datasets In this section, we present the results of the experimental comparison among the four previously described EFCs, namely GAFSþGL, GP-COACH-H, FH-GBML and PAES-3obj. Table 2 summarizes the values of some common parameters used for the four algorithms in the experiments. As regards the probabilities of applying the speciﬁc mutation and crossover operators, they can be found in [29,11,28] for

M. Antonelli et al. / Neurocomputing 146 (2014) 125–136

131

Table 2 Parameters used in the experiments. Parameters

GA-FSþ GL

GP-COAC-H

FH-GBML

PAES-3obj

Reasoning method Rule weight # Fuzzy sets Use of DNF rules # Fitness Evaluations

Maximum matching Penalized certainty factor Learnt by the GA No 500nF

Normalized additive combination Certainty factor nð1Þ ¼ 5 and nð2Þ ¼ 9 Yes 30,000

Maximum matching Penalized certainty factor Based on multiple granularities No 1000þ (5000nF)

Maximum matching Certainty factor 3 No 5000

GA-FSþGL, GP-COACH-H and FH-GBML, respectively, and in [17] and in [49] for, respectively, the CRB and the CDB parts of PAES-3obj. We observe that in [28,29] the authors show the results on the accuracy in terms of AUC calculated as in (10). On the contrary, in [11] the results are shown in terms of sensitivity and speciﬁcity and so we re-calculated the values of the AUCs. Table 3 shows, for each of the 22 datasets, the average values of the AUCs, computed both on the training set (AUCtr) and on the test set (AUCts), for the solutions generated by GA-FSþ GL, GPCOACH-H, FH-GBML and PAES-3obj. We point out that the ROC curve on the test set is obtained by connecting the points selected in the training set. For each dataset, the values of the highest AUCs are shown in bold. From Table 3, we observe that in most of the datasets, the solutions generated by PAES-3obj are associated with the highest values of AUCs, especially on the training set. This conﬁrms the higher capability of the multi-objective evolutionary learning of exploring the search space. Table 4 shows the results of the non-parametric statistical tests: for each algorithm, we show the Friedman rank and the Iman and Davenport p-value. If the p-value is lower than the level of signiﬁcance α (in the experiments α ¼0.05), we can reject the null hypothesis and afﬁrm that there exist statistical differences between the multiple distributions associated with each approach. Otherwise, no statistical difference exists among the distributions and therefore the four EFCs are statistically equivalent. We observe that the Iman and Davenport statistical hypothesis of equivalence is rejected and therefore statistical differences among the four algorithms are detected. Thus, we have to apply the Holm posthoc procedure considering the PAES-3obj as control algorithm (associated with the lowest rank and in bold in the table). The statistical hypothesis of equivalence can be rejected for GA-FSþGL and GPCOACH-H, thus conﬁrming that PAES-3obj outperforms, in terms of AUC on the test set, these two EFCs. On the contrary, even though the p-value is quite low, the statistical hypothesis of equivalence cannot be rejected for FH-GBML. This means that the AUCs of the solutions generated by PAES-3obj and FH-GBML are statistically equivalent. Actually, we have to highlight some drawbacks of FH-GBML: ﬁrst of all, as shown in Table 2, the evolutionary optimization has been carried out considering a higher number of ﬁtness evaluations than PAES3obj. Further, in order to achieve such comparable AUCs, the training set has been rebalanced by using SMOTE. Finally, while PAES-3obj returns a set of FRBCs characterized by different trade-offs between TPR, FPR and complexity, FH-GBML returns just one solution which is, as shown in the following, much more complex than the ones generated by PAES-3obj. As regards GA-FS þGL, FH-GBML and PAES-3obj, we also analyze the interpretability of the generated FRBCs. We do not analyze the interpretability of GP-COACH-H because in [11] the authors provide no information on the interpretability of the generated hierarchical FRBCs. On the other hand, since GPCOACH-H generates hierarchical FRBCs with DNF rules and adopts different granularities for the two layers, such solutions are not easily interpretable. Since in [28,29] authors show only the number of rules of the generated FRBCs, we carried out the interpretability analysis considering only this measure. Actually,

Table 3 Average values of the AUC computed on the test set. Dataset

Abalone19 Abalone9-18 Ecoli0137vs26 Ecoli4 Glass016vs2 Glass016vs5 Glass2 Glass4 Glass5 Page-Blocks13vs4 Shuttle0vs4 shuttle2vs4 Vowel0 Yeast05679vs4 Yeast1289vs7 Yeast1458vs7 yeast1vs7 Yeast2vs4 Yeast2vs8 Yeast4 Yeast5 Yeast6

GA-FS þ GL

GP-COACH-H

FH-GBML

PAES-3obj

AUCtr

AUCts

AUCtr

AUCts

AUCtr

AUCts

AUCtr

AUCts

0.684 0.666 0.901 0.884 0.692 0.912 0.708 0.900 0.907 0.949 1.000 0.996 0.950 0.815 0.734 0.673 0.761 0.900 0.781 0.838 0.956 0.897

0.699 0.628 0.790 0.860 0.626 0.867 0.729 0.853 0.755 0.937 0.999 0.992 0.928 0.793 0.750 0.641 0.754 0.884 0.710 0.835 0.935 0.870

0.791 0.823 0.968 0.974 0.850 0.978 0.882 0.974 0.959 0.978 0.999 0.997 0.962 0.834 0.807 0.775 0.825 0.936 0.944 0.873 0.949 0.908

0.594 0.780 0.851 0.934 0.646 0.871 0.609 0.793 0.856 0.884 0.996 1.000 0.946 0.785 0.696 0.588 0.681 0.897 0.731 0.813 0.928 0.880

0.844 0.861 0.982 0.993 0.916 0.990 0.916 0.991 0.997 0.998 1.000 1.000 0.990 0.863 0.829 0.798 0.885 0.968 0.883 0.893 0.985 0.927

0.632 0.767 0.810 0.881 0.636 0.877 0.680 0.853 0.888 0.993 1.000 0.980 0.966 0.798 0.714 0.637 0.707 0.934 0.773 0.787 0.959 0.859

0.844 0.863 0.998 0.999 0.858 0.996 0.865 0.993 0.996 0.991 1.000 1.000 0.938 0.904 0.840 0.802 0.899 0.974 0.943 0.916 0.988 0.957

0.727 0.811 0.892 0.909 0.662 0.885 0.705 0.864 0.898 0.951 0.996 0.937 0.903 0.835 0.688 0.645 0.775 0.944 0.814 0.868 0.952 0.888

Table 4 Results of the non-parametric statistical tests on the AUC computed on the test set. Algorithm

Friedman rank Iman and Davenport p-value

GA-FSþ GL GP-COACH-H FH-GBML PAES-3obj

2.886 2.931 2.431 1.749

Hypothesis

0.00547

Rejected

Holm post-hoc procedure i

Algorithm

z-Value

3 GP-COACH-H 3.03614 2 GA-FSþ GL 2.91937 1 FH-GBML 1.75162

p-Value

alpha/i

Hypothesis

0.00239 0.00350 0.07983

0.016 0.025 0.05

Rejected Rejected Not rejected

the total number of rules, together with the total number of conditions in the antecedent of each rule, are considered as interesting interpretability measures for evaluating the RB complexity [35]. As stated above, PAES-3obj generates a set of classiﬁers characterized by different trade-offs between FPR, TPR and complexity. For this reason, for each Pareto front approximation projected onto the FPR-TPR plane, we extract the minimum and the maximum number of rules for the solutions on the Pareto front approximation. Table 5 shows, for each dataset, the average values of the number of rules for the solutions generated by GA-FSþGL and FH-GBML, and

132

M. Antonelli et al. / Neurocomputing 146 (2014) 125–136

Table 5 Average number of rules for the EFCs generated by GA-FS þGL, FH-GBML and PAES3obj. Dataset

Abalone19 Abalone9-18 Ecoli0137vs26 Ecoli4 Glass016vs2 Glass016vs5 Glass2 Glass4 Glass5 Page-Blocks13vs4 Shuttle0vs4 shuttle2vs4 Vowel0 Yeast05679vs4 Yeast1289vs7 Yeast1458vs7 yeast1vs7 Yeast2vs4 Yeast2vs8 Yeast4 Yeast5 Yeast6

GA-FS þGL

7.4 4.4 4.2 58.2 3.6 4.0 8.2 8.4 5.2 6.6 14.8 4.2 4.4 9.4 13.8 3.8 6.8 2.2 5.0 6.4 4.8 5.0

FH-GBML

18.6 17.2 15.8 10.8 17.2 17.6 16.6 23.6 15.0 16.8 49.8 17.4 30.4 13.8 16.4 14.2 19.8 16.8 14.2 21.6 12.4 12.2

PAES-3obj (Min)

PAES-3obj (Max)

2.7 2.6 2.4 2.3 2.9 2.1 2.9 2.8 2.4 2.6 2.0 2.0 2.3 2.5 3.0 2.7 2.5 2.6 2.5 2.2 3.0 3.4

12.9 12.1 7.6 6.7 11.0 5.0 8.9 7.2 5.8 6.5 3.3 2.1 8.4 11.3 11.6 11.0 10.9 8.8 8.6 10.5 9.3 11.4

the average values of the minimum and maximum numbers of rules for the solutions generated by PAES-3obj. We observe that the solutions generated by GA-FSþ GL are always characterized by a complexity which is higher than the simplest solutions of PAES-3obj. In most of the datasets, the average number of rules is lower than the one of the most complex solutions of PAES-3obj. On the other hand, we can see from Table 5 that for eight datasets the complexity are equal to or higher than the one of the most complex solutions of PAES-3obj. As regards FH-GBML, we can state that the FRBCs generated by this approach are the most complex ones. Indeed, for each dataset, the number of rules of FH-GBML is always higher than the maximum number of rules of PAES-3obj. We can conclude that PAES-3obj has two advantages with respect to the three EFCs used for comparison. First, the training set does not need to be re-balanced by using the SMOTE oversampling algorithm. This advantage stands out especially when dealing with datasets characterized by a large number instances and a high imbalance ratio. Indeed, the oversampling of the minority class becomes computationally expensive, the size of the dataset almost doubles and, consequently, the learning time increases. The second advantage regards the complexity of the generated solutions, which are, in most of the cases, lower than the ones generated by the comparison approaches. For the sake of completeness, we also carried out a pairwise statistical analysis between PAES-3obj and GP-COACH-H. We performed the Wilcoxon Signed-Ranks test [58] by using the AUC distributions on the test set associated with the 44 datasets in Table 1. In Table 6 we show the results of the Wilcoxon SignedRanks test, which conﬁrm that the solutions generated by PAES-3obj statistically outperform the ones generated by GP-COACH-H. Indeed, the p-value is lower than the level of signiﬁcance α (in the experiments α ¼ 0.05) and PAES-3obj achieves a higher rank than GP-COACH-H (R þ 4 R ). 4.3. Comparison between PAES-3obj and two state-of-the-art fuzzy rule based classiﬁers In this section, we compare the results achieved by using PAES3Obj with the ones achieved by two state-of-the-art algorithms for

Table 6 Results of the Wilcoxon Signed-Ranks test using 44 datasets. Comparison

Rþ

R

p-Value

PAES-3obj Vs GP-COACH-H

669.0

321.0

0.04208

generating FRBCs, namely FURIA [30] and FARC-HD [31]. The implementations of FURIA and FARC-HD are available in WEKA [59] and KEEL [60] packages, respectively. The authors of these algorithms have demonstrated that they outperform a large number of classical classiﬁcation algorithms, both based and not based on fuzzy rules. In order to allow these algorithms to deal with imbalanced datasets, we rebalanced the training sets with the SMOTE algorithm [14]. Actually, for all the 44 datasets, we collected the rebalanced training partitions available on the KEEL repository. Obviously, the test partitions have not been rebalanced and are the same as the ones used for PAES-3obj. We recall that, in a number of contributions, the SMOTE algorithm has been recognized as one of the most suitable algorithms for rebalancing the training set and allowing traditional classiﬁcation algorithms to achieve good results with imbalanced datasets [9,61– 65]. We have executed both the algorithms by using the default parameters which are those suggested in the papers where the two algorithms were introduced. As regards FURIA (Fuzzy Unordered Rules Induction Algorithm), it was introduced in [30] as an extension of the RIPPER algorithm [66]. The main extensions regard: (i) the use of fuzzy rather than crisp rules, (ii) the exploitation of unordered rather than ordered rule sets, and (iii) the introduction of a novel rule stretching method in order to manage uncovered examples. FARC-HD (Fuzzy Association Rule-Based Classiﬁcation Model for High Dimensional Datasets) was recently introduced in [31] and is a single objective EFC which exploits association rules mining for generating FRBCs. FARC-HD is based on three stages. First, it mines all possible fuzzy association rules building a search tree to list all frequent fuzzy item sets, limiting the depth of the branches in order to ﬁnd a small number of short fuzzy rules. Second, it uses a pattern weighting scheme to reduce the number of candidate rules, preselecting the most interesting rules, in order to decrease the computational costs for the third step. Finally, a single-objective genetic algorithm, namely CHC, is used to select and tune a compact set of fuzzy association rules. We recall that, the FRBCs generated by FARC-HD use the certainty factor and the additive combination [32] as rule weight and reasoning method, respectively. Table 7 shows, for all the 44 datasets, the results corresponding to the solutions obtained by PAES-3obj, FURIA and FARC-HD. In the tables, for each dataset, we present the mean AUCs, both on the training (AUCtr) and test (AUCts) sets. As regards PAES-3obj, we show the average values of the minimum and maximum numbers of rules of the generated solutions (Rulesmin and Rulesmax, respectively). Further, for FURIA and FARC-HD, we show the average values of the number of rules of the generated solutions (Rules). We notice that FURIA achieves, in most of the datasets, the highest values of AUC both on the training and test sets. In order to statistically validate this observation, in Table 8 we show the results of non-parametric statistical tests for multiple comparison considering the AUCs on the test set. We observe that the Iman and Davenport statistical hypothesis of equivalence is rejected and therefore statistical differences among the three algorithms are detected. Thus, we have to apply the Holm post-hoc procedure considering FURIA as control algorithm. The statistical hypothesis of equivalence can be rejected for FARC-HD thus conﬁrming that FURIA outperforms, in terms of AUC on the test set, this algorithm. On the contrary, the statistical hypothesis of equivalence cannot be

M. Antonelli et al. / Neurocomputing 146 (2014) 125–136

133

Table 7 Average results achieved by PAES-3obj, FURIA and FARC-HD. Dataset

abalone19 abalone9-18 cleveland0vs4 ecoli0137vs26 ecoli0146vs5 ecoli0147vs2356 ecoli0147vs56 ecoli01vs235 ecoli01vs5 ecoli0234vs5 ecoli0267vs35 ecoli0346vs5 ecoli0347vs56 ecoli034vs5 ecoli046vs5 ecoli067vs35 ecoli067vs5 ecoli4 glass0146vs2 glass015vs2 glass016vs2 glass016vs5 glass04vs5 glass06vs5 glass2 glass4 glass5 led7digit02456789vs1 page-blocks13vs4 shuttle0vs4 shuttle2vs4 vowel0 yeast0256vs3789 yeast02579vs368 yeast0359vs78 yeast05679vs4 yeast1289vs7 yeast1458vs7 yeast1vs7 yeast2vs4 yeast2vs8 yeast4 yeast5 yeast6

PAES-3obj

FURIA

AUCtr

AUCts

Rulesmin

Rulesmax

AUCtr

AUCts

Rules

AUCtr

AUCts

Rules

0.844 0.863 0.947 0.998 0.992 0.979 0.996 0.994 0.999 0.998 0.991 0.997 0.991 0.998 0.998 0.992 0.996 0.999 0.860 0.838 0.858 0.996 1.000 0.997 0.865 0.993 0.996 0.967 0.991 1.000 1.000 0.938 0.864 0.947 0.841 0.904 0.840 0.802 0.899 0.974 0.943 0.916 0.988 0.957

0.727 0.811 0.749 0.892 0.808 0.892 0.895 0.900 0.913 0.880 0.874 0.903 0.893 0.893 0.907 0.860 0.879 0.909 0.716 0.617 0.662 0.885 0.909 0.897 0.705 0.864 0.898 0.916 0.951 0.996 0.937 0.903 0.801 0.921 0.738 0.835 0.688 0.645 0.775 0.944 0.814 0.868 0.952 0.888

2.7 2.6 2.0 2.4 2.0 3.3 3.6 2.6 2.1 2.1 3.0 2.1 3.4 2.1 2.1 3.1 2.0 2.3 3.1 3.3 2.9 2.1 2.0 2.0 2.9 2.8 2.4 2.8 2.6 2.0 2.0 2.3 2.5 2.7 2.8 2.5 3.0 2.7 2.5 2.6 2.5 2.2 3.0 3.4

12.9 12.1 3.3 7.6 5.5 8.4 9.5 7.1 6.4 6.3 8.3 6.5 8.0 6.1 7.3 6.8 6.7 6.7 9.9 10.0 11.0 5.0 3.4 4.5 8.9 7.2 5.8 6.4 6.5 3.3 2.1 8.4 12.7 11.5 11.5 11.3 11.6 11.0 10.9 8.8 8.6 10.5 9.3 11.4

0.999 0.997 0.998 0.999 0.998 0.994 0.997 0.993 0.997 0.996 0.995 0.998 0.997 0.998 0.998 0.991 0.999 1.000 0.990 0.980 0.977 0.996 0.995 0.999 0.999 0.999 0.998 0.982 0.999 1.000 0.997 0.999 0.987 0.993 0.980 0.986 0.999 0.996 0.993 0.993 0.998 0.999 0.998 0.998

0.504 0.734 0.907 0.831 0.895 0.906 0.934 0.881 0.848 0.850 0.813 0.966 0.922 0.953 0.905 0.891 0.923 0.862 0.761 0.817 0.696 0.863 0.994 0.979 0.654 0.895 0.996 0.902 0.994 1.000 0.996 0.977 0.814 0.937 0.745 0.848 0.638 0.624 0.741 0.912 0.806 0.781 0.972 0.831

47.0 34.6 10.0 9.8 10.6 15.8 13.2 9.6 8.6 9.4 11.6 9.2 12.2 8.8 8.4 10.2 11.6 8.6 14.6 14.0 13.0 8.0 4.8 6.2 14.6 8.0 6.4 12.6 6.4 2.6 3.4 9.8 40.2 26.2 27.6 25.2 45.0 39.8 27.8 19.8 15.4 37.2 13.2 23.8

0.865 0.854 0.988 0.982 0.991 0.964 0.982 0.963 0.990 0.989 0.961 0.983 0.976 0.985 0.987 0.965 0.968 0.993 0.886 0.934 0.882 0.983 0.991 0.999 0.917 0.996 0.996 0.946 0.996 1.000 1.000 1.000 0.832 0.925 0.856 0.879 0.831 0.833 0.882 0.940 0.949 0.907 0.986 0.920

0.682 0.754 0.793 0.815 0.949 0.872 0.884 0.852 0.890 0.936 0.884 0.934 0.894 0.925 0.901 0.816 0.880 0.910 0.725 0.743 0.612 0.841 0.919 0.966 0.579 0.848 0.765 0.900 0.980 1.000 0.995 0.965 0.804 0.882 0.729 0.793 0.749 0.603 0.707 0.903 0.789 0.820 0.937 0.834

20.1 20.7 18.4 7.5 12.1 14.7 12.1 12.7 10.6 10.6 12.7 9.9 12.7 11.1 11.2 12.9 10.6 8.7 11.8 14.1 12.2 8.7 6.8 8.0 11.9 10.2 8.2 16.2 7.6 3.1 4.7 14.2 14.4 11.2 20.8 16.8 16.3 18.7 17.2 12.6 8.7 20.9 9.9 11.7

Table 8 Results of the non-parametric statistical tests on the AUC computed on the test set for PAES-3obj, FURIA and FARC-HD. Algorithm Friedman rank FARC-HD 2.3409 FURIA 1.6591 PAES2.0000 3obj

Iman and Davenport p-value

Hypothesis

0.00493

Rejected

p-Value

alpha/i

Hypothesis

0.00138 0.10982

0.025 0.05

Rejected Not rejected

Holm post-hoc procedure i

Algorithm z-Value

2 FARC-HD 3.19801 1 PAES1.59900 3obj

FARC-HD

rejected for PAES-3obj. This means that the AUCs generated by FURIA and PAES-3obj are statistically equivalent. As regards the interpretability of the generated solutions, we can state that PAES-3obj generates solutions characterized by a

lower complexity than the ones generated by the two comparison approaches. Indeed, we notice that, in most of the datasets, the average number of rules of the FRBCs generated by FURIA and FARC-HD is higher than the average value of the maximum numbers of rules of the FRBCs generated by PAES-3obj. Finally, we have to highlight that, even though the FRBCs generated by FURIA are associated with the lowest Friedman rank, they cannot be expressed in terms of linguistic values with an appropriate semantic meaning. As an example, in the following we show the typical fuzzy rule generated by FURIA: Rm : IF X 1 is ½a1;m ; b1;m ; c1;m ; d1;m AND…AND X F is ½aF;m ; bF;m ; cF;m ; dF;m THEN Y is C jm with RW m

ð11Þ

where af ;m ; df ;m and bf ;m ; cf ;m identify, respectively, the extremes of the support and of the core of the speciﬁc trapezoidal fuzzy set used in the rule Rm for the condition of the variable Xf. Hence, the fuzzy sets which appear in the antecedent of each rule are not associated with linguistic fuzzy partitions of the input variables, thus loosing the semantic interpretability [35].

134

M. Antonelli et al. / Neurocomputing 146 (2014) 125–136

4.4. Comparison among PAES-3obj and four state-of-the-art classiﬁer ensembles for imbalanced datasets In this section, we compare the results achieved by the FRBCs generated by PAES-3obj with the ones obtained by a set of classiﬁer ensembles appropriately designed for imbalanced datasets. Ensembles are used to increase the accuracy of a single classiﬁer by training different classiﬁers and by combining their decisions to output a single class label. We extracted the results achieved by the best ensembles discussed in the review on ensembles for the class imbalance problem [7]. Here, the authors propose a taxonomy of classiﬁer ensembles and perform a statistical comparison among the different ensembles. In particular, they focus on data-variation ensembles, which consist in the manipulation of the training examples in order to train each classiﬁer of the ensemble with a different training set. Indeed, a number of approaches based on AdaBoost [67] and Bagging [68] have been discussed. We recall that AdaBoost, which is the most important of the Boosting algorithms, uses the overall dataset to train each classiﬁer in sequence and, after each iteration, gives more attention to the examples which are harder to classify by giving them higher weights than the other examples. On the other hand, Bagging consists in training different classiﬁers with bootstrapped replicas of the original dataset. The ﬁnal classiﬁcation decision is obtained by using a majority or weighed voting strategy. We extracted the results of the most performing ensembles of the different categories considered in the review. The ﬁrst ensemble is a cost-sensitive version of AdaBoost, denoted as AdaC2, in which the misclassiﬁcation costs are integrated into the weight update formula of AdaBoost [69]. The second approach is a boosting-based ensemble, denoted as RUSBoost [70], which removes instances from the majority class by randomly undersampling the dataset at each iteration. The third ensemble, denoted as SMOTEBagging [71], exploits the SMOTE oversampling method [14] with different re-sampling rates at each iteration for creating each single classiﬁer. The last ensemble, denoted as EasyEnsemble, uses Bagging as the main ensemble learning method and AdaBoost as base learner for each single classiﬁer [72]. Further, EasyEnsemble performs also an undersampling of the majority class at each iteration. A detailed description of the selected ensemble algorithms and the conﬁguration parameters used in the experiments can be found in [7]. Table 9 shows, for each of the 22 datasets, the average values of the AUCs, computed on the test set, for the solutions generated by AdaC2, RUSBoost, SMOTEBagging, EasyEnsemble and PAES3obj. We notice that PAES-3obj achieves the highest average AUCs in 12 datasets. Also in this case, we perform non-parametric statistical tests to verify if there exist actual differences among the different approaches. In Table 10 we show the results of non-parametric statistical tests for multiple comparison considering the AUCs on the test set. We observe that the Iman and Davenport statistical hypothesis of equivalence is rejected and therefore statistical differences among the ﬁve algorithms are detected. Thus, we have to apply the Holm post-hoc procedure considering RUSBoost, which is associated with the lowest Friedman rank, as control algorithm. The statistical hypothesis of equivalence can be rejected only for AdaC2, which results to be less accurate than RUSBoost. As regards the other algorithms, SMOTEBagging, EasyEnsemble and PAES-3obj result to be statistically equivalent to RUSBoost. In Table 11, we also show the results of the pairwise Holm post-hoc procedure. The procedure conﬁrms the statistical equivalence among SMOTEBagging, EasyEnsemble and PAES-3obj. We have to highlight, however, that, unlike the other classiﬁers, the FRBCs generated by Paes-3obj are characterized by a high interpretability.

Table 9 Average values of the AUC computed on the test set for AdaC2, RUSBoost, SMOTEBagging, EasyEnsemble and PAES-3obj. Dataset

AdaC2

RUSBoost SMOTEBagging EasyEnsemble PAES3obj

Abalone19 Abalone9-18 Ecoli0137vs26 Ecoli4 Glass016vs2 Glass016vs5 Glass2 Glass4 Glass5 Page-Block-s13vs4 Shuttle0vs4 shuttle2vs4 Vowel0 Yeast05679vs4 Yeast1289vs7 Yeast1458vs7 yeast1vs7 Yeast2vs4 Yeast2vs8 Yeast4 Yeast5 Yeast6

0.5085 0.6954 0.8154 0.928 0.5536 0.88 0.7308 0.8706 0.9732 0.9978 0.9997 0.95 0.9706 0.7816 0.6376 0.5426 0.7049 0.9205 0.6218 0.7204 0.8833 0.7163

0.6306 0.6933 0.7935 0.9418 0.6167 0.9886 0.7797 0.9151 0.9427 0.9865 1 1 0.9427 0.8033 0.7209 0.5674 0.7149 0.9333 0.7893 0.8124 0.9587 0.8233

0.5716 0.7451 0.8281 0.9326 0.5588 0.8657 0.7787 0.8742 0.878 0.9876 0.9997 1 0.9883 0.8182 0.6575 0.623 0.6967 0.8968 0.7837 0.7733 0.9618 0.8358

0.6956 0.6999 0.7318 0.877 0.6129 0.9429 0.7132 0.8505 0.9488 0.9831 1 0.9875 0.941 0.746 0.7019 0.5496 0.7212 0.9411 0.7168 0.8477 0.9531 0.8492

0.7271 0.8106 0.8916 0.9093 0.6617 0.885 0.7048 0.8636 0.8984 0.9509 0.9958 0.9372 0.9034 0.8345 0.688 0.6452 0.7749 0.9438 0.8138 0.8684 0.9515 0.8881

Table 10 Results of the non-parametric statistical tests on the AUC computed on the test set for AdaC2, RUSBoost, SMOTEBagging, EasyEnsemble and PAES-3obj. Algorithm

Friedman rank

Iman and Davenport p-value

Hypothesis

AdaC2 RUSBoost SMOTEBagging EasyEnsemble PAES-3obj

3.7955 2.4545 2.9545 3.1591 2.6364

0.04380

Rejected

z-Value

p-Value

alpha/i

Hypothesis

2.812715 1.477867

0.004913 0.139443

0.0125 0.016667

2 SMOTEBagging 1.048809

0.294266

0.025

1 PAES-3obj

0.702918

0.05

Rejected Not rejected Not rejected Not rejected

Holm post-hoc procedure i

Algorithm

4 AdaC2 3 EasyEnsemble

0.381385

5. Conclusions In this paper, we have presented an experimental study on four evolutionary fuzzy classiﬁers purposely designed for dealing with imbalanced datasets. The ﬁrst approach performs an evolutionary data base learning with an embedded rule base generation. The second approach ﬁrst adopts a genetic programming-based RB learning and then performs a genetic rule selection and tuning of the membership function parameters. The third approach is based on a single objective evolutionary method for learning the RB deﬁned on multiple granularities. Once the RB has been generated, a genetic tuning has been performed to improve the accuracy of the classiﬁer. The three approaches re-balance the training set using a state-of-the-art oversampling method. The fourth approach exploits a multi-objective evolutionary algorithm to concurrently learn the rule base and the membership function parameters of a set of fuzzy classiﬁers characterized by

M. Antonelli et al. / Neurocomputing 146 (2014) 125–136

135

Table 11 Results of the pairwise Holm post-hoc procedure. i

Algorithms

z-Value

p-Value

alpha/i

Hypothesis

10 9 8 7 6 5 4 3 2 1

AdaC2 vs. RUSBoost AdaC2 vs. PAES-3obj AdaC2 vs. SMOTEBagging RUSBoost vs. EasyEnsemble AdaC2 vs. EasyEnsemble EasyEnsemble vs. PAES-3obj RUSBoost vs. SMOTEBagging SMOTEBagging vs. PAES-3obj SMOTEBagging vs. EasyEnsemble RUSBoost vs. PAES-3obj

2.812715 2.43133 1.763906 1.477867 1.334848 1.096482 1.048809 0.667424 0.429058 0.381385

0.004913 0.015044 0.077748 0.139443 0.181926 0.272868 0.294266 0.504501 0.667881 0.702918

0.005 0.005556 0.00625 0.007143 0.008333 0.01 0.0125 0.016667 0.025 0.05

Rejected Not rejected Not rejected Not rejected Not rejected Not rejected Not rejected Not rejected Not rejected Not rejected

different trade-offs among sensitivity, speciﬁcity and complexity of the rule base. By performing non-parametric statistical tests, we have shown that the multi-objective evolutionary learning scheme outperforms, in terms of area under the ROC curve, the ﬁrst two approaches. As regards the classiﬁers generated by the third approach, they are statistically equivalent to the ones generated by the multi-objective evolutionary learning scheme, even though this scheme is associated with the best rank of the Friedman test. We have also carried out a statistical comparison among the results achieved by the fuzzy rule-based classiﬁers generated by the multi-objective evolutionary learning scheme and by two recent state-of-the-art methods, namely FURIA and FARC-HD. We have shown that the classiﬁers generated by the multiobjective evolutionary learning scheme are statistically equivalent to the ones produced by FURIA, which is associated with the highest accuracy rank. Finally, we have pointed out how the classiﬁers generated by the multi-objective evolutionary learning scheme are characterized by a quite low level of complexity, thus making them easily interpretable, while maintaining a very high accuracy level. Indeed, we have also shown that the results achieved by the fuzzy rule-based classiﬁers generated by the multi-objective evolutionary learning scheme are statistically equivalent to the ones obtained by four state-of-the-art approaches based on ensembles of non-fuzzy classiﬁers.

References [1] M. Tavallaee, N. Stakhanova, A. Ghorbani, Toward credible evaluation of anomaly-based intrusion-detection methods, IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 40 (5) (2010) 516–524. [2] M. Antonelli, M. Cococcioni, B. Lazzerini, F. Marcelloni, Computer-aided detection of lung nodules based on decision fusion techniques, Pattern Anal. Appl. 14 (3) (2011) 295–310. [3] T. Hachaj, M.R. Ogiela, Application of neural networks in detection of abnormal brain perfusion regions, Neurocomputing 122 (2013) 33–42. [4] P. Ducange, M. Fazzolari, B. Lazzerini, F. Marcelloni, An intelligent system for detecting faults in photovoltaic ﬁelds, in: 11th International Conference on Intelligent Systems Design and Applications (ISDA), 2011, pp. 1341–1346. [5] H. He, E. García, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng. 21 (9) (2009) 1263–1284. [6] N. Chawla, Data mining for imbalanced datasets: an overview, in: Data Mining and Knowledge Discovery Handbook, Springer US, 2010, pp. 875–886. [7] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera, A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybridbased approaches, IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 42 (4) (2012) 463–484. [8] U. Bhowan, M. Johnston, M. Zhang, X. Yao, Evolving diverse ensembles using genetic programming for classiﬁcation with unbalanced data, IEEE Trans. Evol. Comput. 17 (3) (2013) 368–386. [9] V. Lopez, A. Fernández, S. García, V. Palade, F. Herrera, An insight into classiﬁcation with imbalanced data: empirical results and current trends on using data intrinsic characteristics, Inf. Sci. 250 (2013) 113–141. [10] N. Garcia-Pedrajas, J. Perez-Rodriguez, A. de Haro-Garcia, Oligois: scalable instance selection for class-imbalanced data sets, IEEE Trans. Cybern. 43 (1) (2013) 332–346.

[11] V. Lopez, A. Fernandez, M.J. del Jesus, F. Herrera, A hierarchical genetic fuzzy system based on genetic programming for addressing classiﬁcation with highly imbalanced and borderline data-sets, Knowl. Based Syst. 38 (2013) 85–104. [12] V. López, I. Triguero, C.J. Carmona, S. García, F. Herrera, Addressing imbalanced classiﬁcation with instance generation techniques: IPADE-ID, Neurocomputing 126 (2014) 15–28. [13] M. Galar, A. Fernández, E. Barrenechea, F. Herrera, EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling, Pattern Recognit. 46 (12) (2013) 3460–3471. [14] N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357. [15] A. Fernández, S. García, M.J. del Jesús, F. Herrera, A study of the behaviour of linguistic fuzzy rule based classiﬁcation systems in the framework of imbalanced data-sets, Fuzzy Sets Syst. 159 (18) (2008) 2378–2398. [16] G. Wu, E. Chang, KBA: kernel boundary alignment considering imbalanced data distribution, IEEE Trans. Knowl. Data Eng. 17 (6) (2005) 786–795. [17] P. Ducange, B. Lazzerini, F. Marcelloni, Multi-objective genetic fuzzy classiﬁers for imbalanced and cost-sensitive datasets, Soft Comput. 14 (7) (2010) 713–728. [18] K. Napierala, J. Stefanowski, Identiﬁcation of different types of minority class examples in imbalanced data, in: Hybrid Artiﬁcial Intelligent Systems, Springer, 2012, pp. 139–150. [19] N.V. Chawla, D.A. Cieslak, L.O. Hall, A. Joshi, Automatically countering imbalance and its empirical relationship to cost, Data Min. Knowl. Discov. 17 (2) (2008) 225–252. [20] W. Zong, G.-B. Huang, Y. Chen, Weighted extreme learning machine for imbalance learning, Neurocomputing 101 (0) (2013) 229–242. [21] B. Krawczyk, M. Woźniak, G. Schaefer, Cost-sensitive decision tree ensembles for effective imbalanced classiﬁcation, Appl. Soft Comput. 14 (Part C (0)) (2014) 554–562. [22] S.-C. Lin, Y.chinI. Chang, W.-N. Yang, Meta-learning for imbalanced data and classiﬁcation ensemble in binary classiﬁcation, Neurocomputing 73 (1–3) (2009) 484–494. [23] J. Blaszczynski, M. Deckert, J. Stefanowski, S. Wilk, Iivotes ensemble for imbalanced data, Intell. Data Anal. 16 (5) (2012) 777–801. [24] B. Krawczyk, M. Woźniak, Diversity measures for one-class classiﬁer ensembles, Neurocomputing 126 (0) (2014) 36–44. [25] A. Fernández, M.J. del Jesús, F. Herrera, Hierarchical fuzzy rule based classiﬁcation systems with genetic rule selection for imbalanced data-sets, Int. J. Approx. Reason. 50 (3) (2009) 561–577. [26] A. Fernández, M.J. del Jesus, F. Herrera, On the inﬂuence of an adaptive inference system in fuzzy rule based classiﬁcation systems for imbalanced data-sets, Expert Syst. Appl. 36 (6) (2009) 9805–9812. [27] F. Herrera, Genetic fuzzy systems: taxonomy, current research trends and prospects, Evol. Intell. 1 (1) (2008) 27–46. [28] A. Fernández, M.J. del Jesús, F. Herrera, On the 2-tuples based genetic tuning performance for fuzzy rule based classiﬁcation systems in imbalanced datasets, Inf. Sci. 180 (8) (2010) 1268–1291. [29] P. Villar, A. Fernandez, R.A. Carrasco, F. Herrera, Feature selection and granularity learning in genetic fuzzy rule-based classiﬁcation systems for highly imbalanced data-sets, Int. J. Uncertain. Fuzziness Knowl. Based Syst. 20 (03) (2012) 369–397. [30] J. Huhn, E. Hullermeier, FURIA: an algorithm for unordered fuzzy rule induction, Data Min. Knowl. Discov. 19 (3) (2009) 293–319. [31] J. Alcala-Fdez, R. Alcala, F. Herrera, A fuzzy association rule-based classiﬁcation model for high-dimensional problems with genetic rule selection and lateral tuning, IEEE Trans. Fuzzy Syst. 19 (5) (2011) 857–872. [32] O. Cordon, M.J. del Jesus, F. Herrera, A proposal on reasoning methods in fuzzy rule-based classiﬁcation systems, Int. J. Approx. Reason. 20 (1) (1999) 21–45. [33] H. Ishibuchi, T. Nakashima, M. Nii, Classiﬁcation and Modeling with Linguistic Information Granules: (Advanced Approaches to Linguistic Data Mining Advanced Information Processing), Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2004. [34] A. Gonzalez, R. Perez, Completeness and consistency conditions for learning fuzzy rules, Fuzzy Sets Syst. 96 (1998) 37–51.

136

M. Antonelli et al. / Neurocomputing 146 (2014) 125–136

[35] M. Gacto, R. Alcala, F. Herrera, Interpretability of linguistic fuzzy rule-based systems: an overview of interpretability measures, Inf. Sci. 181 (20) (2011) 4340–4360. [36] P. Ducange, F. Marcelloni, Multi-objective evolutionary fuzzy systems, in: Proceedings of the 9th International Conference on Fuzzy Logic and Applications, WILF'11, Springer-Verlag, Berlin, Heidelberg, 2011, pp. 83–90. [37] M. Fazzolari, R. Alcalá, Y. Nojima, H. Ishibuchi, F. Herrera, A review of the application of multiobjective evolutionary fuzzy systems: current status and further directions, IEEE Trans. Fuzzy Syst. 21 (1) (2013) 45–65. [38] Z. Chi, H. Yan, T. Phm, Fuzzy Algorithms: with Applications to Image Processing and Pattern Recognition, Advances in Fuzzy Systems, World Scientiﬁc, Singapore, 1996. [39] J.A. Swets, Signal Detection Theory and ROC Analysis in Psychology and Diagnostics: Collected Papers, Lawrence Erlbaum Associates, Inc, Mahwah, New Jersey, United States, 1996. [40] T. Fawcett, An introduction to ROC analysis, Pattern Recognit. Lett. 27 (8) (2006) 861–874. [41] C. Ferri, J. Hernndez-Orallo, R. Modroiu, An experimental comparison of performance measures for classiﬁcation, Pattern Recognit. Lett. 30 (1) (2009) 27–38. [42] J. Huang, C. Ling, Using AUC and accuracy in evaluating learning algorithms, IEEE Trans. Knowl. Data Eng. 17 (3) (2005) 299–310. [43] P. Villar, A. Fernández, F. Herrera, Studying the behavior of a multiobjective genetic algorithm to design fuzzy rule-based classiﬁcation systems for imbalanced data-sets, in: 2011 IEEE International Conference on Fuzzy Systems, 2011, pp. 1239–1246. [44] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2) (2002) 182–197. [45] O. Cordon, F. Herrera, I. Zwir, Linguistic modeling by hierarchical systems of linguistic rules, IEEE Trans. Fuzzy Syst. 10 (1) (2002) 2–20. [46] F.J. Berlanga, A.J. Rivera, M.J. del Jesus, F. Herrera, GP-COACH: genetic programming-based learning of compact and accurate fuzzy rule-based classiﬁcation systems for high-dimensional problems, Inf. Sci. 180 (8) (2010) 1183–1200. [47] R. Alcalá, J. Alcalá-Fdez, F. Herrera, A proposal for the genetic lateral tuning of linguistic fuzzy systems and its interaction with rule selection, IEEE Trans. Fuzzy Syst. 15 (4) (2007) 616–635. [48] H. Ishibuchi, T. Yamamoto, T. Nakashima, Hybridization of fuzzy GBML approaches for pattern classiﬁcation problems, IEEE Trans. Syst. Man Cybern. Part B: Cybern. 35 (2) (2005) 359–365. [49] M. Antonelli, P. Ducange, B. Lazzerini, F. Marcelloni, Learning concurrently data and rule bases of Mamdani fuzzy rule-based systems by exploiting a novel interpretability index, Soft Comput. 15 (10) (2011) 1981–1998. [50] M. Antonelli, P. Ducange, B. Lazzerini, F. Marcelloni, Learning knowledge bases of multi-objective evolutionary fuzzy systems by simultaneously optimizing accuracy complexity and partition integrity, Soft Comput. 15 (12) (2011) 2335–2354. [51] M. Antonelli, P. Ducange, F. Marcelloni, Genetic training instance selection in multi-objective evolutionary fuzzy systems: a co-evolutionary approach, IEEE Trans. Fuzzy Syst. 20 (2) (2012) 276–290. [52] J.D. Knowles, D.W. Corne, Approximating the nondominated front using the Pareto archived evolution strategy, Evol. Comput. 8 (2) (2000) 149–172. [53] P. Wang, K. Tang, T. Weise, E. Tsang, X. Yao, Multiobjective genetic programming for maximizing {ROC} performance, Neurocomputing 125 (0) (2014) 102–118. [54] J. Derrac, S. García, D. Molina, F. Herrera, A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms, Swarm Evol. Comput. 1 (1) (2011) 3–18. [55] M. Friedman, The use of ranks to avoid the assumption of normality implicit in the analysis of variance, J. Am. Stat. Assoc. 32 (200) (1937) 675–701. [56] R.L. Iman, J.H. Davenport, Approximations of the critical region of the Friedman statistic, Commun. Stat.-Theory Methods Part A 9 (1980) 571–595. [57] S. Holm, A simple sequentially rejective multiple test procedure, Scand. J. Stat. 6 (1979) 65–70. [58] F. Wilcoxon, Individual comparisons by ranking methods, Biom. Bull. 1 (6) (1945) 80–83. [59] I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 3rd edition, Morgan Kaufmann, Burlington (MA), USA, 2011. [60] J. Alcalá-Fdez, A. Fernández, J. Luengo, J. Derrac, S. García, L. Sánchez, F. Herrera, Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework, Mult. Valued Logic Soft Comput. 17 (2–3) (2011) 255–287. [61] G.E.A.P.A. Batista, R.C. Prati, M.C. Monard, A study of the behavior of several methods for balancing machine learning training data, SIGKDD Explor. Newsl. 6 (1) (2004) 20–29. [62] S. García, J. Derrac, I. Triguero, C.J. Carmona, F. Herrera, Evolutionary-based selection of generalized instances for imbalanced classiﬁcation, Knowl. Based Syst. 25 (1) (2012) 3–12.

[63] H.-Y. Wang, Combination approach of smote and biased-SVM for imbalanced datasets, in: IEEE International Joint Conference on Neural Networks, IJCNN, 2008, pp. 228–231. [64] J. Luengo, A. Fernández, S. García, F. Herrera, Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling, Soft Comput. 15 (10) (2011) 1909–1936. [65] C. Fernandez-Lozano, M. Gestal, N. Pedreira-Souto, L. Postelnicu, J. Dorado, C. Robert Munteanu, Kernel-based feature selection techniques for transport proteins based on star graph topological indices, Curr. Top. Med. Chem. 13 (14) (2013) 1681–1691. [66] W.W. Cohen, Fast effective rule induction, in: Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann, 1995, pp. 115–123. [67] Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci. 55 (1) (1997) 119–139. [68] L. Breiman, Bagging predictors, Mach. Learn. 24 (2) (1996) 123–140. [69] Y. Sun, M.S. Kamel, A.K. Wong, Y. Wang, Cost-sensitive boosting for classiﬁcation of imbalanced data, Pattern Recognit. 40 (12) (2007) 3358–3378. [70] C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse, A. Napolitano, RUSBoost: a hybrid approach to alleviating class imbalance, Man Cybern. Part A: Syst. Hum. 40 (1) (2010) 185–197. [71] S. Wang, X. Yao, Diversity analysis on imbalanced data sets by using ensemble models, in: IEEE Symposium on Computational Intelligence and Data Mining, IEEE, 2009, pp. 324–331. [72] X.-Y. Liu, J. Wu, Z.-H. Zhou, Exploratory undersampling for class-imbalance learning, IEEE Trans. Syst. Man Cybern. Part B: Cybern. 39 (2) (2009) 539–550.

Michela Antonelli received the M.Sc. degree in Computer Engineering and the Ph.D. degree in Information Engineering from the University of Pisa (Italy) in 2003 and 2007, respectively. Currently, she is a research fellow with the Department of Information Engineering of the University of Pisa. During her Ph.D, she developed a Computer Aided Diagnosis system able to automatically detect lung nodules in CT scans. Since 2007 her research interests moved to Multi-Objective Evolutionary Fuzzy Systems with the aim of designing fuzzy rule-based systems both accurate and interpretable. She has coauthored more than 20 papers in international journals and conference proceedings.

Pietro Ducange received the M.Sc. degree in Computer Engineering and the Ph.D. degree in Information Engineering from the University of Pisa, Pisa, Italy, in 2005 and 2009, respectively. Currently, he is a research fellow with the Department of Information Engineering, University of Pisa. His main research interests focus on designing fuzzy rule-based systems with different trade-offs between accuracy and interpretability by using multi-objective evolutionary algorithms. He has coauthored more than 30 papers in international journals and conference proceedings. He currently serves the following international journals as a member of the Editorial Board: Soft Computing and International Journal of Swarm intelligence and Evolutionary Computation.

Francesco Marcelloni received the Laurea degree in Electronics Engineering and the Ph.D. degree in Computer Engineering from the University of Pisa in 1991 and 1996, respectively. He is currently an associate professor at the University of Pisa. His main research interests include multi-objective evolutionary fuzzy systems, situation-aware service recommenders, energy-efﬁcient data compression and aggregation in wireless sensor nodes, and monitoring systems for energy efﬁciency in buildings. He has co-edited three volumes, four journal special issues, and is (co-)author of a book and of more than 180 papers in international journals, books and conference proceedings. Currently, he serves as an associate editor of Information Sciences (Elsevier) and Soft Computing (Springer) and is on the Editorial Board of other international journals.