Constructing an enterprise ontology for an automotive supplier

Constructing an enterprise ontology for an automotive supplier

ARTICLE IN PRESS Engineering Applications of Artificial Intelligence 21 (2008) 386–397 www.elsevier.com/locate/engappai Constructing an enterprise on...

884KB Sizes 2 Downloads 46 Views

ARTICLE IN PRESS

Engineering Applications of Artificial Intelligence 21 (2008) 386–397 www.elsevier.com/locate/engappai

Constructing an enterprise ontology for an automotive supplier Eva Blomqvist, Annika O¨hgren School of Engineering, Jo¨nko¨ping University, P.O. Box 1026, SE-551 11 Jo¨nko¨ping, Sweden Received 15 January 2007; received in revised form 17 September 2007; accepted 27 September 2007 Available online 19 November 2007

Abstract This paper presents experiences and conclusions from ontology engineering applied in the automotive suppliers domain. The work focuses on construction of enterprise ontologies to support structuring of enterprise information and knowledge management. Two methods for ontology construction, developed by previous research activities, were used in parallel when developing an ontology for a company in automotive supplier industries. One method is automatic and the other method is a manual approach. A conclusion was that the developed ontologies complemented each other well and therefore the decision was made to merge them for use in the project. The resulting ontology will now be used in several pilot applications. r 2007 Elsevier Ltd. All rights reserved. Keywords: Knowledge engineering; Ontology engineering; Ontology evaluation; Ontology merging; Automobile industry

1. Introduction When considering small-scale application cases the need for reducing effort and expert requirements in ontology engineering is obvious. Previous research has resulted in two different methods for efficiently constructing enterprise ontologies, a purely manual method and an automatic method (exploiting ontology patterns). These two methods have been used in parallel during a project with industrial partners. After evaluating the developed ontologies the conclusion was that they complemented each other well and that merging the ontologies would yield the best result. The merging process was performed and the resulting ontology will be used in several pilot applications. Enterprise ontologies in this case are application ontologies within enterprises for structuring of information. In this paper ontology in general is defined as: An ontology is a hierarchically structured set of concepts describing a specific domain of knowledge, that can be used to create a knowledge base. An ontology contains concepts, a subsumption hierarchy, arbitrary relations between concepts,

Corresponding author. Tel.: +46 36101593; fax: +46 36101799.

E-mail addresses: [email protected] (E. Blomqvist), [email protected] (A. O¨hgren). 0952-1976/$ - see front matter r 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2007.09.004

and possibly other axioms. It may also contain other constraints and functions. Concepts are described both by terms, their synonyms, and also by the relations they have to other concepts. Whether a natural language definition of each concept is needed can be decided from case to case. The subsumption hierarchy imposes a generalisation relation between concepts, so that more specific concepts are placed lower in the hierarchy than more general ones. In addition to these relations there can be arbitrarily defined relations from one concept to another, either corresponding to the notion of attributes, or representing other types of associations. Ontologies can also be used and structured in many different ways. One of the most common ways to describe the level of generality of an ontology is by using the structure suggested by Guarino (1998), where a general top-level ontology can be specialised into a domain ontology or a task ontology. Domain and task ontologies can in turn be specialised, and combined, into application ontologies. Another categorisation of ontologies can be obtained through classifying them by their intended use, as in van Heijst et al. (1997). There are three main levels, terminological ontologies, information ontologies and knowledge modelling ontologies, where each level adds further complexity to the ontology structure. This paper is, as

ARTICLE IN PRESS E. Blomqvist, A. O¨hgren / Engineering Applications of Artificial Intelligence 21 (2008) 386–397

stated earlier, concerned with enterprise ontologies, on the domain or application level, intended for structuring of enterprise information. Ontologies are usually formally represented using specifically developed ontology representation languages. The language semantics are commonly expressible through first order logic, but may contain different features depending on what was considered important by the language developers and should also be chosen based on the needs of the resulting ontology-based application. At least three traditions can be noted when discussing ontology languages. The information modelling tradition, where the focus is on objects and object properties, the description logics (DL) tradition where focus is on concepts and their roles, and the semantic network tradition with a less strict semantics and where the ontology is usually described like an arbitrary graph. A comprehensive survey of ontology languages can be found in Go´mez-Pe´rez et al. (2004). Some incompatibilities exist between these traditions, so translating between different ontology languages is usually a non-trivial task. In this paper we will mainly use two FLogic-based languages and tools to represent the ontologies (OntoEdit (OntoEdit, 2004) and the KAON tool suite (KAON, 2005)), but in addition a translation of the resulting ontology into the OWL-language has been attempted (a DL-based language which is now a standard for ontology representation on the web, supported by the W3C and described in OWL Web Ontology Language Overview, 2004). Next section presents some background on ontology construction, evaluation, and merging, Section 3 describes experiences, Section 4 describes the applications that the ontology will be used in, and in Section 5 conclusions are drawn. 2. Ontology construction, evaluation, and merging This section presents some background and existing approaches concerning the two ontology construction methods, ontology evaluation, and ontology merging. 2.1. Ontology construction Methodologies for ontology development have been subject to research during a number of years, the research

387

results being a variety of different approaches. This chapter describes in brief the two methods developed and used in this experiment. 2.1.1. Manual approach In a previous paper the evaluation and combination of manual methods for developing a methodology which better fit the requirements in small-scale application contexts have been described, see O¨hgren and Sandkuhl (2005). Three important factors were found; the methodology should be defined in full detail, it should cover the whole life cycle of the ontology, and also consider reuse of already existing ontologies. Table 1 shows the results of this evaluation. Based on the evaluation, we propose an enhanced methodology especially for use in small-scale application contexts. The methodology can be viewed as a mix of some of the methodologies that were studied, taking the relevant parts from each methodology. Below, a short description is given of the proposed methodology consisting of four different phases: requirements analysis, building, implementation, and evaluation and maintenance. Initially all formalities of the ontology are specified, e.g. the intended users and uses (usage scenarios), the purpose and scope. Here it is important to try to make a clear separation between what should be in the ontology and what should be excluded from it. To plan the main activities in the further development of the ontology is also important, what tasks exist, how will they be performed, and what resources are needed (knowledge sources, software, personnel, etc.). In the requirements analysis there should also be a decision on a naming convention that will be used consistently. One step is also to check whether there are any existing ontologies that can be integrated. All this is collected in a requirements document. A middle-out approach is used when building the ontology, and the process is considered to be iterative. The intention is to start out with some basic concepts and then specify, and if necessary generalise in order to build up a taxonomical hierarchy. Then relationships, properties, and constraints are added to the ontology, respectively. The ontology should be reviewed and revised in an iterative manner. To avoid language expressivity problems all the terms, relations, properties, and constraints are collected in a separate document.

Table 1 Evaluation of existing manual methodologies Approach

Life-cycle coverage

Detailed definition

Reuse

Enterprise ontology (Uschold and King, 1995) TOVE (Gru¨ninger and Fox, 1995) Unified approach (Uschold, 1996) METHONTOLOGY (Fernandez et al., 1997) Sugumaran and Storey (2002) Noy and McGuinnes (2001) Staab et al. (2001)

Whole life-cycle Whole life-cycle Whole life-cycle Whole life-cycle Focus on building Lacks parts Whole life-cycle

No detailed guidelines No detailed guidelines Building very detailed Fairly detailed Building very detailed Building very detailed Fairly detailed

Late dev. stage Not integrated Not integrated Late dev. stage Not integrated Early dev. stage Early dev. stage

ARTICLE IN PRESS 388

E. Blomqvist, A. O¨hgren / Engineering Applications of Artificial Intelligence 21 (2008) 386–397

The implementation phase consists of implementing the ontology in an ontology editor tool. The ontology finally needs to be evaluated and tested to check that it fulfils the requirements given in the requirements document. It should also be evaluated according to criteria such as clarity, consistency, and reusability. 2.1.2. Automatic approach In most real-world cases there are already different knowledge sources that can be incorporated in the ontology engineering process. Such existing knowledge sources can be documents, databases, taxonomies, and other things. The question is how to extract the knowledge incorporated in these sources and reformulate it into an ontology. A large drawback of the manual methods are also the tedious effort required to perform the ontology construction. By automating parts of the process the intention is to save both time and effort. A commonly accepted way to reduce development effort and increase reuse, in for example software engineering, is by using patterns for the construction. The ontology community has not yet adopted the pattern idea on a broader scale. None of the existing manual methods involve patterns, nor does any semi-automatic approach, to the best of our knowledge. There are specific methods though being developed for exploiting ontology design patterns manually, like in Gangemi (2005) and W3CSWBPD (2004), and additionally some recent research (like NeON Website, 2007) aiming to address further issues. The suggested approach, as described in Blomqvist (2005), aims to automatically exploit ontology patterns of a slightly different kind. We define ontology patterns as An ontology design pattern is a self-contained ontology template intended for constructing an ontology component, consisting of a well-defined set of consistent ontology primitives. This means that an ontology pattern for automatic use is in itself a partial ontology (for examples of such patterns see Blomqvist, 2005). In a broad sense this definition can encompass all kinds of reusable assets represented as ontologies, but in addition ontology patterns may have a dimension of abstraction and consensual best-practise reuse in which case we choose to denote such patterns ontology design patterns (in analogy with software design patterns). The general idea of the semi-automatic approach is to exploit a building cycle resembling the general case-based reasoning philosophy commonly used in the field of artificial intelligence (see Aamodt and Plaza, 1994). Since the more basic parts of semi-automatic ontology construction have already been well researched, the suggested approach intends to use existing tools to extract terms and possible relations from a text corpus, as well as for different kinds of matching. The method, partially proposed in Blomqvist (2005), is still under refinement but the general idea is to take the extracted terms and relations, match them against ontology patterns, and depending on the result use parts of the

patterns to build the ontology. The accepted patterns are pruned and adapted to fit the case at hand, before they are included in the resulting ontology. In Blomqvist (2005) the method only incorporated the first two phases as described above, namely to represent the new case, extract past cases from the pattern base, and to reuse those cases when automatically building the ontology. Future developments will also include refinement of the existing phases as well as a specific ontology refinement phase and pattern learning, but this is so far not part of the implemented method used in this experiment. 2.2. Ontology evaluation The need for evaluating and validating ontologies is growing rapidly, since ontologies have recently taken the step from research into real-world applications. The evaluation approaches that exist differ in their aims, where some determine how to choose between several ontologies, others aim at validating the content of a single ontology. The methods appropriate for this case deal with correctness but also comparison of ontologies. Intuitive metrics, like number of concepts and average number of subconcepts, give a general idea of the main differences between the ontologies. Approaches that measure cohesion of ontologies, as in Yao et al. (2005), also complement the general comparison. Different methods deal with different phases of ontology construction. There are evaluation possibilities both while constructing the ontology and after the finished result is available, where the latter kind might also be used to choose between several ontologies. Several approaches exist for manually evaluating the correctness of a single ontology during development. One is described in GomezPerez (1999), where the focus is on taxonomic knowledge. Another such approach is OntoClean (described in Guarino and Welty, 2002), which exposes inappropriate modelling choices by using metaproperties. Although our focus is generally on being able to use incorrect and incomplete knowledge and ontologies instead of necessarily discarding such knowledge, it can still be valuable to know how reliable the ontology is. The OntoMetric framework described in Lozano-Tello and Gomez-Perez (2004) is aimed at comparing two or more ontologies. A multilevel framework of characteristics is used as a template for information concerning the ontologies. The evaluation is conducted by a domain expert using these characteristics to explore the suitability of the ontology in a specific case. The general characteristics used are concerned with the dimensions of content, language, methodology, tools and costs. Since these are focused on choosing ontologies, for example the methodology dimension (that might sound appealing to use) was not deemed appropriate in our case. Instead the focus was on the content dimension, to evaluate the result of our methodologies. Other similar but slightly more restricted approaches, also based on quality factors,

ARTICLE IN PRESS E. Blomqvist, A. O¨hgren / Engineering Applications of Artificial Intelligence 21 (2008) 386–397

can be found in Supekar et al. (2004) and Davies et al. (2003). Information retrieval-related approaches as the one in Brewster et al. (2004) and Navigli et al. (2004) are not considered suitable for our case, since the semi-automatic construction method applies some of the same techniques suggested for the evaluation, and might thereby be biased in the evaluation. As also noted in Brewster et al. (2004) and Porzel and Malaka (2004) the most appropriate way to evaluate an ontology is intuitively to apply it to the intended task, or similar tasks, and measure and evaluate its performance. Such possibilities are so far very seldom present, and as we shall see this is so far also true in our case. 2.3. Ontology merging Ontology integration and merging is a field that has been the subject of research for quite some time now. Although most of the existing manual ontology construction methodologies have some step called ‘‘ontology integration’’, there are usually no details on how this should be performed. It is a quite complex process to integrate or merge two or more ontologies, and several different problems can arise from the merging process. According to Sofia Pinto et al. (1999) there exist three different situations in which the term integration has been used. The first one is integration of ontologies by reusing available ontologies when building a new ontology. The second is when wanting to merge different ontologies within the same subject into a single ontology that unifies all of them. The third and final situation is when wanting to construct an application that uses several ontologies. For our case it is the second approach mentioned above that is most relevant and in the remaining part of this paper we refer to it as ontology merging. Klein (2001) analyses the problems that exist when combining or integrating ontologies. The author claims that there exist three different types of problems: mismatches between ontologies, ontology versioning problems, and other practical problems. The mismatches between ontologies can be on the language or on the ontology level. An example of language level mismatch is that two ontologies written in different ontology languages might have different syntax and different logical representation representing the same expression. There might also be a language expressivity problem in which some things expressable in one language is simply not possible in other languages. When it comes to ontology level mismatches these problems are more related to interpretation of the ontologies, as well as model coverage differences and terminological differences. Ontology versioning problems are related to the fact that it is important to keep track of changes in an ontology and that applications that depend on one version of the ontology might not work if the ontology is updated. Other practical problems might be that it is hard to find the terms that need

389

to be aligned, or that a specific mapping can have consequences that are difficult to see. Several methodologies exist to perform ontology merging, since merging is a complex task, the methodologies are quite advanced. One methodology for merging or integrating ontological knowledge is ONIONS (ONtologic Integration Of Naive Sources), for details see Gangemi et al. (1996). The methodology is divided into six different steps and has been used in several cases within the medical domain. Another methodology for ontology merging was performed when developing the SENSUS ontology, see Knight and Luk (1994). Various online dictionaries and semantic networks, such as WordNet, were merged together using both manual and semi-automatic methods. The semi-automatic parts used for example similarity matching of the text definitions. In Kotis and Vouros (2004) the authors describe their approach to ontology merging. They take advantage of WordNet as well as description logics’ reasoning services. There are some ontology merging and integration tools available, such as Ontolingua (see Farquhar et al., 1997), Chimaera (as described in McGuinness et al., 2000), and PROMPT, (see Noy and Musen, 2000). Ontolingua has three basic integration operations specified which are intended to be used when considering reuse of ontologies. Inclusion relations are used when the whole ontology is included without modification. Restriction relations used when there is a need to restrict included axioms. The last type of operation is polymorphic refinement and is used when an operation should be used with different kinds of arguments, such as vector in one case and number in another. Chimaera is a tool used for editing, merging and diagnosing ontologies and is based on the Ontolingua ontology development environment described above. It can support the user in the merging task by giving suggestions of which terms that are candidates to be merged or have a relationship. It can also give candidates for reorganisation the taxonomy by using heuristic strategies. PROMPT works similarly, by guiding the user with suggestions, conflict-resolution strategies, etc. Conflicts might be name conflicts, redundancies, or other types of inconsistencies. 3. Enterprise ontology development The experiment which is the focus of this paper involved developing two ontologies for the same scope using two different methods, evaluation of both ontologies, and merging of the ontologies taking into account the evaluation results. In this section first the manual and the automatic construction is described, followed by a brief description of the evaluation performed. The section is concluded with a description of the merging of the resulting ontologies. The structure of the experiment is illustrated in Fig 1. The ontology engineering process was part of the SEMCO research project, aiming at introducing semantic technologies into the development process of softwareintensive electronic systems. The scope of the experiment

ARTICLE IN PRESS E. Blomqvist, A. O¨hgren / Engineering Applications of Artificial Intelligence 21 (2008) 386–397

390

Manual Ontology Construction

Automatic Ontology Construction

Ontology Evaluation

Ontology Merging

Fig. 1. The structure of the experiment.

was to construct a selected part of the enterprise ontology for one of the SEMCO project partners. The ontology is so far not a complete enterprise ontology but limited to describing the requirements engineering process, requirements and specifications with connections to products and parts, organisational concepts and project artefacts. 3.1. Manual construction The manual construction followed the phases described in Section 2.1.1. A user requirements document was produced, which included identification of existing knowledge sources, defining usage scenarios, and the possibility to find other ontologies to integrate was considered, but no ontologies that were considered relevant were found. The available project documents were used in the first iteration of the building phase. A simple concept hierarchy was built, but natural language descriptions for each concept were deemed unnecessary at this point. It was quite hard to derive relations, constraints, and axioms from the documents so after document analysis, focus was switched to the other knowledge sources: interviews with selected employees at the company. Interviews with company employees were performed in two sessions. At the first session the interviewees discussed the top-level concepts, then went further down the subsumption hierarchy, discussing each concept. Feedback was given in the form of suggestions, such as ‘‘Restructure this’’ or ‘‘This concept is not that important’’. After the first interview session, the suggestions were considered and some were implemented. The second interview session was carried out similarly, resulting in minor corrections to the ontology. Implementation of the ontology was integrated into the building phase, the ontology is quite simple and no language expressivity problems occurred. The last phase of the manual methodology (evaluation and maintenance) were

partly integrated into the building phase, where the interviewees reviewed the ontology. Other parts of the evaluation are described later in this paper, and maintenance has not yet been performed. The resulting ontology has 8 concepts directly beneath the root of the subsumption hierarchy, and 224 concepts in total. Some of the most general concepts are illustrated in Fig. 2 through a screen-shot from OntoEdit (tool details can be found at OntoEdit, 2004). The ontology representation language was not considered an issue, since the application intended to use the ontology was still in its planning stage. The choice of tool was based on that the internal representation of the tool conforms to the ontology definition (as stated in Section 1). The resulting ontology contains a few major parts, as can be seen in Fig. 2. The figure shows only a small part of the ontology and some details are hidden to increase readability. Despite this, the division of the ontology into subject areas can be noted. Directly related to the focus of the ontology are the parts dealing with product parts and requirements. In addition one part deals with artefacts, which denotes different things produced during the product development process, such as documents (e.g. requirement specifications). Another important part is the organisation units and the roles present in the organisation, which in turn participate in the processes of the organisation. This is included to be able to connect roles to both the request and realisation of different requirements and product parts within different process steps. Finally some supporting areas like quantities and measuring units are present in order to assist in describing certain requirements and product parts. 3.2. Semi-automatic construction As input to the automatic construction process, 25 ontology patterns were developed (details and source links can be found in Blomqvist, 2005). Most of the patterns originate in the data model patterns field, but also goal structures and upper level ontologies were partly used. The patterns were represented as small ontologies, enriched with synonyms from WordNet (WordNet, 2005). The text corpus used consisted of software development plans, software development process descriptions, and other similar documents provided by the company. Extraction of relevant terms and relations from these texts was performed using existing tools (in this case the research prototype Text-To-Onto as described in Maedche, 2003). This resulted in 190 terms, which were then used as input to the matching process. The matching of the pattern-concepts against the extracted terms was done using a string matching tool called SecondString (as described in Cohen et al., 2003). The list of correctly matched terms were also used to extract possible relations between those concepts (again using the Text-To-Onto tool). The score representing the number of acceptably matching relations then was

ARTICLE IN PRESS E. Blomqvist, A. O¨hgren / Engineering Applications of Artificial Intelligence 21 (2008) 386–397

391

Fig. 2. Some of the top-level concepts and relations of the manually constructed ontology.

weighted together with the score of matched terms into a total score for each pattern. This resulted in 14 patterns with a score above the predefined threshold. Finally, the resulting ontology was compiled from the accepted patterns. For each pattern each matched concept was included in the ontology, together with all its matched synonyms. Then all relations leading to and from the concept were considered. Using a set of heuristics some of the relations were added to the ontology. The resulting ontology contains 35 concepts directly beneath the root of the subsumption hierarchy and in total 85 concepts. Some top-level concepts are illustrated in Fig. 3 through a screenshot from the KAON tool-suite (see KAON, 2005) where the ontology was implemented. As stated in Section 3.1, the final choice of ontology implementation language is still to be made, but the internal representation of this tool conforms to the ontology definition (as described in Section 1). When analysed individually the coverage of the ontology with respect to the extracted terms turned out to only be about 34%, which is a relatively low number. This, together with other characteristics of the ontology, is analysed further in the following sections. The automatically constructed ontology is not really divided into subject areas as the manually constructed ontology, as can be seen in Fig. 3. The figure shows only a small part of the ontology and some details are hidden to

increase readability. Despite this, we can note that products, parts and requirements also play a central role in this ontology. Also roles, work and parties appear in this ontology, which would loosely correspond to the organisational parts of the manually constructed ontology. Already in this small illustration the high number of general relations can be noted. For example in the figure the relations tell us that a product will be produced in response to a work requirement, the product is asked for via a set of product requirements, the product has a set of features which can be either available or selected and the product is described in some document. 3.3. Evaluation of the ontologies Details on the evaluation setup and results can be found in Blomqvist et al. (2006), below only a brief overview is given in order to motivate the decision to finally combine the two ontologies. General characteristics of the ontologies were collected, as illustrated by the first two columns in Table 2 (‘‘M’’ denotes the manually created ontology and ‘‘A’’ the automatically created one). The results show that the automatically created ontology has a large number of root concepts, it lacks some abstract general notions to keep the concepts together in groups. Despite this, the concepts are much more strongly related through non-taxonomic

ARTICLE IN PRESS 392

E. Blomqvist, A. O¨hgren / Engineering Applications of Artificial Intelligence 21 (2008) 386–397

Fig. 3. Some top-level concepts and relations of the automatically constructed ontology.

Table 2 General characteristics Characteristic

M

A

C

Number of concepts Number of root concepts Number of leaf concepts Avg depth of inheritance Avg number of rel. concepts Avg number of attributes Avg number of subclasses

224 8 180 2.5 0.1 0.0 1.0

85 35 64 1.9 0.8 0.5 0.6

379 5 273 3.5 1.3 0.1 1.0

relations and have more attribute relations (these are actually relations to attribute concepts, but will in the rest of this paper be denoted ‘‘attributes’’) than in the other ontology. The manually constructed ontology on the other hand contains a larger number of concepts, especially on the more specific levels. In comparison it is noted that the low coverage of the semi-automatically constructed ontology with respect to the input terms is mainly concerned with these most specific concepts, that can not directly be found in the patterns. The manually constructed ontology also contains a top-level abstraction dividing the ontology into intuitive subject areas, but there are few attributes and nontaxonomic relations, since relations seem to be harder to elicit from interviews than concepts. Next, the ontologies were checked for correctness using the taxonomic evaluation and the OntoClean method (as described in Section 2.2). This resulted in only a few errors

to be found in each ontology. Some of the criteria in the taxonomic evaluation could not be evaluated since the application where the ontology is to be used is still being developed, and this makes it hard to determine correctness of scope and exact coverage of the ontology. Correctness was not the main focus of the overall evaluation but still confirms that the methodologies used provide reasonable results. Finally, the OntoMetric framework (as described in Section 2.2) was used to let domain experts evaluate the ontologies. The evaluation material and guidelines were prepared by the ontology engineers, but the evaluation team was formed solely by domain experts from the company in question. In addition to the evaluation of the OntoMetric characteristics, afterwards also an interview was performed. The result shows that both ontologies contain an appropriate number of concepts, but the concepts in the manually constructed ontology are deemed more essential, most likely because they are more specific. The automatically constructed ontology lacks general abstract concepts to give it a comprehensible structure. On the other hand, the automatically constructed ontology contains more attributes, which describe and define the concepts and reduces the need for natural language definitions. The automatically created ontology also contains more relations than the manually created one, even such relations that the company might not have thought of itself but that are still found to be valid. The manually created ontology only contains relations that are explicitly

ARTICLE IN PRESS E. Blomqvist, A. O¨hgren / Engineering Applications of Artificial Intelligence 21 (2008) 386–397

stated by the company. It is the non-taxonomic relations that give structure to the automatically created ontology while the manual ontology relies on specificity of the taxonomic structure and precise naming. The automatically created ontology is still perceived, by the domain experts, as having quite a large depth, since it presents more detailed divisions of the intermediate levels and more non-taxonomic relations. The manually created ontology has a larger number of subclasses per concept but this is mostly the case at the lowest level, where many company-specific concepts exist. The evaluation resulted in the conclusion that neither ontology contained too many errors and that the two ontologies complement each other well. The automatically constructed ontology contains many relations, and is nicely structured on the intermediate levels of the subsumption hierarchy. Meanwhile, the manually constructed ontology has more structure on the top-level and more detailed concepts on the bottom-levels. It was suggested, by both ontology engineers and the domain experts involved in the evaluation, that the ontologies should be combined into one ontology, to use the beneficial parts of each ontology. 3.4. Merging of the ontologies Since both of the constructed ontologies were built for the same case, and were partly constructed using the same knowledge sources it can be assumed that they use approximately the same terminology. The methodologies and tools described in Section 2.3 do not take into account such special features, but rather solve problems related to distinctions in definitions and conflicts between terms. A major function of the tools described is that they are able to give suggestions on candidates that should be merged or have a relationship. In our case it was fairly obvious which terms should be connected and which terms that was possible to merge into one concept. Based on this the merging process was performed manually and the KAON tool-suite was used for the implementation. The reason for this was simple convenience, since the more complex relations and axioms of the automatically created ontology were already implemented using this tool. The process started with an ‘‘empty’’ ontology, where parts from each ontology were entered in turn, starting from the top of the subsumption hierarchy. First, the top-level concepts of the manually created ontology were added, together with all relations between them. The former top-level concepts of the automatically created ontology were thereby grouped as subconcepts of this structure. This step also resulted in some slight reorganisation of the top-level concepts, and the addition of some intermediate concepts, to make the two ontologies fit together and to get a more intuitive structure. It was also considered important that all the siblings of a concept are on the same level of generality. Second, the most specific concepts from the manually created ontology were inserted into the ontology, at the

393

bottom level of the subsumption hierarchy. The fit between the two ontologies was not always perfect, therefore some new intermediate concepts were introduced. This was due to the same reasons as the re-organisation at the top-level, to get an intuitive structure and to get all siblings on the same level of generality. All relations and attributes from the manually created ontology, which were not already in the automatically created ontology, were also included. This process resulted in an ontology with 379 concepts, where only 5 of them are placed directly beneath the root of the subsumption hierarchy. A summary of some general characteristics of the ontology is presented in Table 2 (in the table the combined ontology is denoted by ‘‘C’’), together with the values of each ontology before the combination (the manually constructed one denoted by ‘‘M’’ and the automatically constructed one denoted by ‘‘A’’). The average measures represent the average over all concepts in the ontology. The intermediate ‘‘glue’’-concepts which were added during the combination process amount to 18% of the total number of concepts. During the merging process, care was taken to make sure that no new errors (of the type described in Section 3.3) were introduced. The manual merging process makes sure that there were no language level syntax mismatches. Language expressivity problems were avoided partly due to the fact that neither of the ontologies are very complex and the axioms that do exist are fairly simple and partly because both ontologies were expressed in FLogic-based languages (as described in Section 1). Since both ontologies were constructed for the same enterprise and task it is also possible to avoid ontology level mismatches. The most interesting evaluation is still to be performed though, since the applications where the ontology will be used is only partially developed. To apply the ontology is the only way to test how well it will actually perform its tasks. The ontology was first implemented in the KAON tool-suite, which conforms to the ontology definition. The top-level concepts of the resulting ontology can be viewed in Fig. 4. In the figure some parts of the ontology are excluded due to readability reasons. The final ontology still contains many of the same subject areas at the top level, like roles, processes and parties. In the combined ontology two new top categories were introduced, further grouping the concepts concerned with products and their features, namely work product and feature. A work product is anything that is produced by a process, e.g. requirements and specifications as well as the product itself. The feature concept groups all features of products and processes. When the application scenarios in the following section were developed, a decision was made to export the ontology to Prote´ge´ (2006), but this was conducted through the tools’ import and export capabilities (still without introducing any mismatches since Prote´ge´ conforms to a similar ontology representation formalism). In addition an experiment was made to transform the ontology into the standard web ontology language OWL, but this time some problems arose. The most frequently

ARTICLE IN PRESS 394

E. Blomqvist, A. O¨hgren / Engineering Applications of Artificial Intelligence 21 (2008) 386–397

Fig. 4. The top-level concepts and relations of the resulting ontology.

encountered problem was the issue of classes as property values, which is not easily expressible in OWL (as described in Representing Classes As Property Values on the Semantic Web, 2005). The conclusion was that for our application scenarios the Prote´ge´ internal ontology formalism will be used. 4. SEMCO applications There are several applications of the ontology envisioned in the SEMCO-project, but so far only one of these have been implemented. This scenario is the ontology-based artefact management, supporting reuse and comparison of artefacts between projects, as a part of the more general framework of realising a domain repository for the development processes. To do this an application for artefact management had to be developed. The artefact management tool was constructed as a plug-in for the ontology development environment Prote´ge´ and is currently called ArtifactManager. The main idea of the artefact manager tool is to use the enterprise ontology to define and store metadata and attributes of an artefact, as well as a link to the artefact itself. The enterprise ontology provides the attributes and the metadata, and artefacts are attached to it as instances, and connected to instantiated attributes. In this way the actual artefact will be connected to a part of the enterprise ontology and in addition have attribute values corresponding to instances of enterprise ontology concepts. When artefacts have been stored in this way they can be searched, retrieved, and compared using their connection to the enterprise ontology. The search is divided into attribute search and ontology search, where attribute

search focuses on common keyword-based search of artefact attributes. The search possibility involving the ontology is based on searching for similar concept paths in the ontology as those associated with the artefact (the artefact metadata). The user can also create his own ontology (as a query) and compare it to the enterprise ontology, instead of selecting parts of the enterprise ontology itself. In that case the search is performed as ontology matching. The artefact management tab of the ArtifactManager plug-in is shown in Fig. 5. The ArtifactManager plug-in has been used by experienced engineers involved in the project, but not yet evaluated together with the intended users of the company. Thereby the final evaluation of the ontology in this scenario is also yet to be performed. Still, it can be noted that the final merged ontology covers all the major parts concerning common artefacts in the requirements engineering phase, which leads us to believe that it can be used directly without any further configuration, with the ArtifactManager plug-in. The second scenario envisioned in the project is the integration of feature models and enterprise ontologies, by supporting the feature metamodel in the enterprise ontology, with the aim of identifying similar requirements and product features in future projects. The features might be related also to the organisational elements in order to track responsibilities and expertise. In the end the aim is to try and generate internal requirements directly from detected features in the source documents (customer requirements), and the enterprise ontology and its feature model, based on semantic similarities between the source documents and stored requirements of previous projects.

ARTICLE IN PRESS E. Blomqvist, A. O¨hgren / Engineering Applications of Artificial Intelligence 21 (2008) 386–397

395

Fig. 5. The Prote´ge´ ArtifactManager plug-in.

5. Conclusion The general conclusion of the evaluation was that some strengths and weaknesses can be noted in both the manual and the automatic approach. The automatic approach will probably never capture the most specific (companyspecific) concepts without further development (the last two phases of revision and pattern learning). Also the method can only capture what is in the patterns on the upper levels, unless some more general architecture patterns could be exploited. The automatic approach has its strengths in relying on well-proven solutions and easily including complex relations and axioms. The manual approach gives a less structured result, with less complex relations and axioms. On the other hand, the manual approach has one big advantage, since it also captures the most specific concepts that the enterprise actually use. During the evaluation it was suggested, both by the domain experts and ontology experts, that since the ontologies complement each other the best result might be obtained by combining them. The merging process was rather straight forward, as described in Section 3.4, but still care had to be taken to preserve the correctness of the two ontologies into the combined one. The resulting ontology has not yet been evaluated thoroughly, most important is of course to see how well it performs its tasks. Hopefully it can be part of future evaluations to test the ontology against its goals and application scenarios, as presented in Section 4. Unfortunately it is not possible yet, since development of the full version of the pilot application in this project is still an ongoing task.

This case has resulted in valuable experiences. The automatic approach can be improved by more thorough evaluation of the patterns, both by for example OntoClean and domain experts. Another improvement is to enrich the patterns with more axioms. As stated above it has also been determined that the two remaining phases of the envisioned methodology cycle are essential for producing a good result. An automatic revision step might exploit additional information extraction techniques to improve the automatically constructed ontology by increasing the coverage with respect to the input. The manual approach could be improved by using a larger set of knowledge acquisition techniques to elicit more complex information structures, both from documents and domain experts. Also the possibility of using manual ontology design patterns could be considered, as discussed in Section 2.1.2. An additional specialisation of the manual methodology, for different kinds of ontologies will also be a subject of study in the future. The merging process in our case was not very complex, but if the ontologies to merge are more advanced and complex it might not be so easy to do this by hand. It might also be the case that it is necessary to integrate or merge two ontologies that do not use the same terminology or was developed using completely different knowledge sources. In such cases one or more of the methodologies/ tools described in Section 2.3 might be useful. The problems listed in Section 2.3 might also have to be considered more thoroughly in such cases. The main conclusion is that each method has its strengths and weaknesses, and the resulting ontologies seemed to complement each other well, and were quite

ARTICLE IN PRESS 396

E. Blomqvist, A. O¨hgren / Engineering Applications of Artificial Intelligence 21 (2008) 386–397

easily merged into one single ontology. The next step is to use this parallel development and combination of results in other cases, to validate and generalise the results presented here. Perhaps in the future a combination or integration of the methods can be possible, to generate the best possible results. Acknowledgements This work is part of the project Semantic Structuring of Components for Model-based Software Engineering of Dependable Systems (SEMCO), Grant from the Swedish KK-Foundation (Grant 2003/0241). This is an extended version of a paper presented at INCOM2006. Special thanks to the two reviewers for valuable comments on how to improve this extended version. References Aamodt, A., Plaza, E., 1994. Case-based reasoning: foundational issues, methodological variations, and system approaches. AICom Artificial Intelligence Communications, IOS Press 7, 39–59. Blomqvist, E. 2005. Fully automatic construction of enterprise ontologies using design patterns: initial method and first experiences. In: Proceedings of the 4th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE), Agia Napa, Cyprus. Blomqvist, E., O¨hgren, A., Sandkuhl, K., 2006. Ontology construction in an enterprise context: comparing and evaluating two approaches. In: Proceedings of the 8th International Conference on Enterprise Information Systems. Brewster, C., Alani, H., Dasmahapatra, S., Wilks, Y., 2004. Data driven ontology evaluation. In: Proceedings of International Conference on Language Resources and Evaluation, Lisbon, Portugal. Cohen, W., Ravikumar, P., Fienberg, S., 2003. A comparison of string distance metrics for name-matching tasks. In: Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), August 9–10, 2003, Acapulco, Mexico. Davies, I., Green, P., Milton, S., Rosemann, M., 2003. Using meta models for the comparison of ontologies. In: Proceedings of Evaluation of Modeling Methods in Systems Analysis and Design Workshop— EMMSAD’03. Farquhar, A., Fikes, R., Rice, J., 1997. Tools for assembling modular ontologies in ontolingua. In: AAAI97 Proceedings, pp. 436–441. Fernandez, M., Gomez-Perez, A., Juristo, N., 1997. METHONTOLOGY: from ontological art towards ontological engineering. In: Proceedings of AAAI97 Spring Symposium Series, Workshop on Ontological Engineering, pp. 33–40. Gangemi, A., 2005. Ontology design patterns for semantic web content. In: Proceedings of ISWC 2005, vol. 3729 of LNCS. Springer, Berlin, pp. 262–276. Gangemi, A., Steve, G., Giacomelli, F., 1996. ONIONS: an ontological methodology for taxonomic knowledge integration. In: ECAI96’s Workshop on Ontological Engineering. Gomez-Perez, A., 1999. Evaluation of taxonomic knowledge in ontologies and knowledge bases. In: Banff Knowledge Acquisition for Knowledge-Based Systems, KAW’99. Go´mez-Pe´rez, A., Ferna´ndez-Lio´pez, M., Corcho, O., 2004. Ontological Engineering. Springer, Berlin. Gru¨ninger, M., Fox, M., 1995. Methodology for the Design and Evaluation of Ontologies. In: Proceedings of IJCAI’95, Workshop on Basic Ontological Issues in Knowledge Sharing, April 13, 1995.

Guarino, N., 1998. Formal ontology and information systems. In: Proceedings of FOIS98, pp. 3–15. Guarino, N., Welty, C., 2002. Evaluating ontological decisions with ontoclean. Communications of the ACM 45 (2), 61–65. KAON, 2005. hhttp://kaon.semanticweb.org/i. Klein, M., 2001. Combining and relating ontologies: an analysis of problems and solutions. In: Gomez-Perez, A., Gruninger, M., Stuckenschmidt, H., Uschold, M. (Eds.), Workshop on Ontologies and Information Sharing, IJCAI’01, Seattle. Knight, K., Luk, S., 1994. Building a large knowledge base for machine translation. In: AAAI94 Proceedings, pp. 773–778. Kotis, K., Vouros, G., 2004. The HCONE approach to ontology merging. In: The Semantic Web: Research and Applications. First European Semantic Web Symposium, pp. 137–151. Lozano-Tello, A., Gomez-Perez, A., 2004. ONTOMETRIC: a method to choose the appropriate ontology. Journal of Database Management 15 (2), 1–18. Maedche, A., 2003. Ontology Learning for the Semantic Web. Kluwer Academic Publishers, Norwell. McGuinness, D., Fikes, R., Rice, J., Wilder, S., 2000. An environment for merging and testing large ontologies. In: Proceedings of the 17th International Conference on Principles of Knowledge Representation and Reasoning (KR2000), Breckenridge, Colorado, USA. Navigli, R., Velardi, P., Cucchiarelli, A., Neri, F., 2004. Automatic ontology learning: supporting a per-concept evaluation by domain experts. In: Workshop on Ontology Learning and Population, in the 16th European Conference on Artificial Intelligence (ECAI 2004), Valencia, Spain. NeON Website, 2007. Available at: hhttp://www.neon-project.org/i. Noy, N., McGuinnes, L., 2001. Ontology development 101: a Guide to creating your first ontology. Technical report, Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880. Noy, N.F., Musen, M.A., 2000. PROMPT: algorithm and tool for automated ontology merging and alignment. In: 17th National Conference on Artificial Intelligence (AAAI-2000), Austin, Texas. O¨hgren, A., Sandkuhl, K., 2005. Towards a methodology for ontology development in small and medium-sized enterprises. In: IADIS Conference on Applied Computing, Algarve, Portugal. OntoEdit, 2004. Prev. vers. of OntoStudio, from Ontoprise GmbH. hhttp://www.ontoprise.dei. OWL Web Ontology Language Overview, 2004. Available at: hhttp:// www.w3.org/TR/owl-features/i. Porzel, R., Malaka, R., 2004. A task-based approach for ontology evaluation. In: Workshop on Ontology Learning and Population, in the 16th European Conference on Artificial Intelligence (ECAI 2004). Prote´ge´, 2006. hhttp://protege.stanford.edu/i. Representing Classes As Property Values on the Semantic Web, 2005. Available at: hhttp://www.w3.org/TR/swbp-classes-as-values/i. Sofia Pinto, H., Gomez-Perez, A., Martins, J.P., 1999. Some issues on ontology integration. In: Proceedings of the Workshop on Ontologies and Problem Solving Methods during IJCAI-99, Stockholm, Sweden. Staab, S., Studer, R., Schnurr, H.-P., Sure, Y., 2001. Knowledge processes and ontologies. IEEE Intelligent Systems 16 (1), 26–34. Sugumaran, V., Storey, V.C., 2002. Ontologies for conceptual modeling: their creation, use, and management. Data & Knowledge Engineering 42, 251–271. Supekar, K., Patel, C., Lee, Y., 2004. Characterizing quality of knowledge on semantic Web. In: Proceedings of AAAI Florida AI Research Symposium (FLAIRS-2004), Miami Beach, Florida. Uschold, M., 1996. Building ontologies: towards a unified methodology. In: Proceedings of Expert Systems ’96, the 16th Annual Conference of the British Computer Society Specialist Group on Expert Systems, Cambridge, UK. Uschold, M., King, M., 1995. Towards a methodology for building ontologies. In: Workshop on Basic Ontological Issues in Knowledge Sharing. International Joint Conference on Artificial Intelligence.

ARTICLE IN PRESS E. Blomqvist, A. O¨hgren / Engineering Applications of Artificial Intelligence 21 (2008) 386–397 van Heijst, G., Schreiber, A.T., Wielinga, B.J., 1997. Using explicit ontologies for KBS development. International Journal of Human– Computer Studies 46 (2–3), 183–292. W3C-SWBPD, 2004. Semantic web best practices and deployment working group. Available at: hhttp://www.w3.org/2001/sw/BestPractices/i.

397

WordNet, 2005. available at hhttp://wordnet.princeton.edu/i. 2005-04-14. Yao, H., Orme, A.M., Etzkorn, L., 2005. Cohesion metrics for ontology design and application. Journal of Computer Science 1 (1), 107–113.