The end result of the Ontology Summit 2007 was a framework for ontology assessment. This framework consisted of 7 dimensions:
Some of the purposes of this article are the following:
It is not the intention of this article to be sufficiently detailed as to allow one to make decisions about the use of any of the ontologies being assessed.
Each of the framework dimensions was subsequently elaborated in OntologySummit2007 Assessment Criteria with detailed assessment criteria so that each dimension could be assessed more or less uniformly on a scale of 1 to 5. For some dimensions not all of the 5 levels were used, and in one dimension (Expressiveness) the scale is 0.5 to 5.
At the Ontology Summit 2007 meeting a small number of ontologies were assessed at the Population Session. This initial assessment process furnished a number of assessment examples and helped to validate the framework. This article is an extension of the assessments performed at the Ontology Summit 2007 meeting. Starting with the ontologies that were assessed at the meeting, we added additional ontologies from Schemaweb and Swoogle, for a total of 40 ontologies. The ontologies selected from Schemaweb were chosen based on popularity, while the ones selected from Swoogle were chosen randomly. Since the selection process was not entirely random, the statistics are not as useful as they might have been.
The following are the ontologies:
The assessments were based on published descriptions of the ontologies. As these are almost entirely informal, the assessments suffer from the same lack of formality.
The results of our assessments are shown at the end of this article. The assessments were analyzed by performing a factor analysis. The factor analysis was statistically significant with p-value equal to 0.00237. The factor is:
Of the 7 dimensions, 3 are not linear (Intended Use, Prescriptive vs. Descriptive and Governance), so it is not appropriate to be using a linear analysis on them. Not surprisingly, these are the dimensions with the smallest coefficients in the factor above. The other 4 dimensions are linear, and they were analyzed by computing the correlation matrix and by performing a factor analysis. The following is the correlation matrix:
Expressiveness 1.0000000 0.6526347 -0.6049835 0.5990725 Structure 0.6526347 1.0000000 -0.2488049 0.7349169 Granularity -0.6049835 -0.2488049 1.0000000 -0.3039242 Automated Reasoning 0.5990725 0.7349169 -0.3039242 1.0000000
The following is the factor:
The proportion of variation accounted for by the factor is 0.55. The factor analysis is statistically significant with p-value 0.00154.
Another approach to understanding the framework dimensions is to use a principal component analysis. The principal component is the eigenvector with the largest eigenvalue. Unlike the factor analysis above, in a principal component analysis one can compute all of the eigenvectors in descending order by eigenvalue. Although only the first eigenvector is the principal component, the entire analysis is referred to by this name. The eigenvectors are normalized to have unit length as vectors which is not usually done for a factor analysis. The eigenvectors are labeled with v1, v2, v3 and v4 in the principal component analysis shown here:
v1 v2 v3 v4 Expressiveness 0.6630407 -0.1399413 -0.6618071 -0.3206320 Structure 0.3737721 0.3831627 -0.1121577 0.8371985 Granularity -0.4194978 0.7440946 -0.4726879 -0.2165894 Automated Reasoning 0.4946646 0.5290803 0.5709625 -0.3865007
The factor and principal component (v1 above) are qualitatively similar. The most striking similarity is that they both distinguish the Granularity dimension as being different from the other dimensions. They differ in the values for the 3 other dimensions. In the factor analysis the values are nearly the same while in the principal component analysis Expressiveness has a much higher value than Structure and Reasoning. This is likely because the factor analysis is computing a single factor while the component analysis is computing 4 components. Indeed, in the other 3 components, the value for Expressiveness is negative.
In any case, all three analyses (correlation matrix, factor analysis and principal component analysis) agree that Granularity differs from the other 3 dimensions. The correlation matrix shows that the dimensions other than the Granularity dimension are correlated from 60% to 74%, while Granularity is negatively correlated with every other dimension, with the negative correlation being strongest with Expressiveness. One possible interpretation is that the ontologies that were assessed tend to be either narrow and deep or broad and shallow rather than being narrow and shallow or broad and deep. The fact that there are few ontologies that are narrow and shallow may be the result of selection bias, since the sample of ontologies was only partly chosen at random. The scarcity of ontologies that are broad and deep should not be surprising since building such ontologies is very expensive.
The following are the assessments shown pictorially using "radar" diagrams. As stated earlier in this article, these assessments are intended to provide a global picture of the landscape of ontologies today. It is not intended that that assessments will allow one to make decisions about the use of any of these ontologies.
The assessments are also available in these formats:
The authors would like to acknowledge significant input from John Sowa and Patrick Cassidy. However, the authors take full responsibility for any errors, omissions and misrepresentations.