Tagpedia: a Semantic Reference to Describe and Search for Web Resources
Social Web and Knowledge Management Workshop at the 17th World Wide Web Conference (WWW), 2008
Nowadays the Web represents a growing collection of an enormous amount of contents where the need for better ways to... more
Nowadays the Web represents a growing collection of an enormous amount of contents where the need for better ways to find and organize the available data is becoming a fundamental issue, in order to deal with information overload. Keyword based Web searches are actually the preferred mean to seek for contents related to a specific topic. Search engines and collaborative tagging systems make possible the search for information thanks to the association of descriptive keywords to Web resources. All of them show problems of inconsistency and consequent reduction of recall and precision of searches, due to polysemy, synonymy and in general all the different lexical forms that can be used to refer to a particular meaning. A possible way to face or at least reduce these problems is represented by the introduction of semantics to characterize the contents of Web resources: each resource is described by one or more concepts instead of simple and often ambiguous keywords. To support these task the availability of a global semantic resource of reference is fundamental. On the basis of our past experience with the semantic tagging of Web resources and the SemKey Project, we are developing Tagpedia, a general-domain ”encyclopedia” of tags, semantically structured for generating semantic descriptions of contents over the Web, created by mining Wikipedia. In this paper, starting from an analysis of the weak points of non-semantic keyword based Web searches, we introduce our idea of semantic characterization of Web resources describing the structure and organization of Tagpedia. We introduce our first realization of agpedia, suggesting all the possible improvements that can be carried
out in order to exploit its full potential.
3 views
Seen by:CSP Techniques for Solving Combinatorial Queries within Relational Databases.
M. Mouhoub and C. Feng. CSP Techniques for Solving Combinatorial Queries within Relational Databases. Intelligent Systems for Knowledge Management. Studies in Computational Intelligence, N. T. Nguyen and E. Szczerbicki editors, Springer, pages 131-151, 2009
A combinatorial query is a request for tuples from multiple relations that satisfy a conjunction of constraints on... more A combinatorial query is a request for tuples from multiple relations that satisfy a conjunction of constraints on tuple attribute values. Managing combinatorial queries using the traditional database systems is very challenging due to the combinatorial nature of the problem. Indeed, for queries involving a large number of constraints, relations and tuples, the response time to satisfy these queries becomes an issue. To overcome this difficulty in practice we propose a new model integrating the Constraint Satisfaction Problem (CSP) framework into the database systems. Indeed, CSPs are very popular for solving combinatorial problems and have demonstrated their ability to tackle, in an efficient manner, real life large scale applications under constraints. In order to compare the performance in response time of our CSP-based model with the traditional way for handling combinatorial queries and implemented by MS SQL Server, we have conducted several experiments on large size databases. The results are very promising and show the superiority of our method comparing to the traditional one.
Using the Newly‐created ILE DBMS to Better Represent Temporal and Historical GIS Data
* Kantabutra, V., Owens, J. B., Ames, D. P., Burns, C. N., and Stephenson, B. (2010). “Using the Newly-created ILE DBMS to Better Represent Temporal and Historical GIS Data.” Transactions in GIS 14, s1: 39-58; doi: 10.1111/j.1467-9671.2010.01222.x.
This article introduces a type of DBMS called the Intentionally-Linked Entities (ILE) DBMS for use as the basis for... more
This article introduces a type of DBMS called the Intentionally-Linked Entities (ILE) DBMS for use as the basis for temporal and historical Geographical Information Systems. ILE represents each entity in a database only once, thereby mostly eliminating
redundancy and fragmentation, two major problems in Relational and other database systems. These advantages of ILE are realized by using relationship objects and pointers to implement all of the relationships among data entities in a native fashion using dynamically-allocated linked data structures. ILE can be considered to be a modern and extended implementation of the E/R data model. ILE also facilitates storage of things that are more faithful to the historical records, such as gazetteer entries of places with imprecisely known or unknown locations. This is difficult in Relational database systems but is a routine task using ILE because ILE is implemented using modern memory allocation techniques. We use the China Historical GIS (CHGIS) and other databases to illustrate the advantages of ILE. This is accomplished by modeling these databases in ILE and comparing them to the existing Relational implementations.
8 views
Seen by:Kddonto: An Ontology for Discovery and Composition of Kdd Algorithms
In Proc. of the ECML/PKDD09 Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery, pages 13-24, Bled, Slovenia, Sep 7-11, 2009.
Co-authored with Claudia Diamantini and Domenico Potena.
Knowledge Discovery in Databases (KDD) is a highly complex process, where a lot of tools are needed to achieve the... more
Knowledge Discovery in Databases (KDD) is a highly complex process, where a lot of tools are needed to achieve the discovery goal. It implies that an user has both to choose the algorithms most suitable to her goal, and to compose them for designing a process. In order to support the user, in this paper we introduce KDDONTO, an ontology formalizing the domain of KDD algorithms. For the design of KDDONTO we follow a formal ontology building methodology aimed to define goal-oriented ontologies satisfying quality requirements. We first identify the basic terms characterizing algorithms, and analyzing them
we formally derive classes and relations of the ontology. Finally, an OWL-DL implementation is proposed and its evaluation is discussed.
http://boole.diiga.univpm.it/paper/sokd09.pdf
Supporting Users in KDD Process Design: a Semantic Similarity Matching Approach
In In Proc. of the Planning To Learn Workshop in ECAI2010, pages 27-34, Lisbon, Portugal, August 17, 2010.
Co-authored with Claudia Diamantini and Domenico Potena.
Data Mining has reached a quite mature and sophisticated
stage, with a plethora of techniques to deal with... more
Data Mining has reached a quite mature and sophisticated
stage, with a plethora of techniques to deal with complex data
analysis tasks. In contrast, the capability of users to fully exploit these techniques has not increased proportionately. For this reason the definition of methods and systems supporting users in Knowledge Discovery in Databases (KDD) activities is gaining increasing attention among researchers. The present work fits into this mainstream, proposing a methodology and the related system to support users in the composition of tools for forming valid and useful KDD processes. The basic pillar of the methodology is a similarity matching technique devised to recognize valid algorithmic sequences on the basis of their input/output pairs. Similarity is based on a semantic description of algorithms, their properties and interfaces, and is measured by a proper evaluation function. This allows to rank the candidate processes, so that users are provided with a criterion to
choose the most suitable process with respect to their requests.
http://boole.diiga.univpm.it/paper/planlearn2010.pdf
A Semantic-Aided Designer for Knowledge Discovery
In Proc. of In Proc. of the 2011 Int. Symposium on Collaboration Technologies and Systems (CTS 2011), IEEE, Philadelphia, USA, May 23-27, 2011, pages 86-74.
Co-authored with Claudia Diamantini and Domenico Potena.
Knowledge Discovery in Databases (KDD), as any scientific experimentation in e-Science, is a complex and... more
Knowledge Discovery in Databases (KDD), as any scientific experimentation in e-Science, is a complex and computationally intensive process aimed at gaining knowledge from a huge set of data. Often performed in distributed settings, a KDD project usually involves a deep interaction between tools and several users with specific expertise, which can be either a co-located group or a geographically distributed virtual team of experts. Given the complexity of the process, such users need some support to achieve their goal of knowledge extraction. This paper introduces KDDesigner, a web-based semantic-driven tool aimed at supporting users in the collaborative design of a KDD process. In this paper we address semantic issues related to the collaborative project managment: tool localization, tool integration, interfaces matchmaking, process execution, team building
and process versioning.
http://boole.diiga.univpm.it/paper/cts11.pdf
Packaging Data for Re-Use: Databases in Model Organism Biology.
forthcoming in 2010 in Howlett, P and Morgan, MS (eds) How Well Do ‘Facts’ Travel. Cambridge University Press.
Using Map-Based Visual Interfaces to Facilitate Knowledge Discovery in Digital Libraries
by Olga Buchel
Co-authored with Professor Kamran Sedig
In recent years there has been growing interest in supporting knowledge discovery activities using map-based visual... more In recent years there has been growing interest in supporting knowledge discovery activities using map-based visual interfaces. The goal is promising and ambitious, but not very easy to achieve due to the lack of understanding of cognitive factors involved in how information is transformed into knowledge. In this paper we present a map-based visual interface, VICOLEX (VIsual COLlection Explorer), aimed at facilitating and supporting knowledge discovery and users’ cognitive activities by means of integrated visual representations coupled with interactions.
Instance-Based Classifiers to Discover the Gradient of Typicality in Data
Gagliardi, F. (2011) “Instance-Based Classifiers to Discover the Gradient of Typicality in Data”. In: Pirrone, R., Sorbello, F. (eds.) “AI*IA 2011: Artificial Intelligence Around Man and Beyond. XIIth International Conference of the Italian Association for Artificial Intelligence, Palermo, Italy, September 15-17, 2011. Proceedings”. LNCS vol. 6934, Springer Berlin, Heidelberg. Pp. 457-462. (ISBN: 978-3-642-23953-3) (DOI: 10.1007/978-3-642-23954-0_47) (Link: http://dx.doi.org/10.1007/978-3-642-23954-0_47 http://www.springer.com/978-3-642-23953-3)
One of the aims of machine learning and data mining regards the problem of discovering useful and interesting... more
One of the aims of machine learning and data mining regards the problem of discovering useful and interesting knowledge from data. Usually instance-based (IB) classifiers are considered unsuitable for knowledge extraction tasks.
Conversely in this paper we consider the families of IB classifiers based on prototype methods and on nearest-neighbours and we show that some hybrid IB classifiers can infer a mixture of representative instances, varying from abstracted prototypes to previous observed atypical exemplars, which can be used to discover the “typicality structure” of learnt categories.
Experimental results show that one of the proposed hybrid classifiers (the Prototype exemplar learning classifier), detects a concise and meaningful set of representative instances varying from prototypical ones to atypical ones, which form a gradient of typicality.
This kind of class representations cohere with theories developed in cognitive science about how human mind classifies.
DiFaB - A Databased Visual Archive of Byzantium and the Challenges of Indexing Historical Material Culture
by Fani Gargova
together with S. Teetor, D. Terkl and U. Unterweger, in K. Kriz et al. (Eds.), Mapping Different Geographies (pp. 201-217). Heidelberg (2010).
The Digital Research Archive for Byzantium (DiFaB) is currently establishing a valuable tool for scholars in the field... more The Digital Research Archive for Byzantium (DiFaB) is currently establishing a valuable tool for scholars in the field of Byzantine Studies while also aiming at interdisciplinary research. This paper introduces the specifities of the field, after which the project and working strategies are described. The long-established affinity of byzantinists to topography can be linked with a strong interest in new mapping technologies, of which the project DiFaB plans to make use. Technical aspects such as compliancy are discussed in their role as requisites for present and future interoperability and eventual co-operations. The currently established standards produce striking answers to certain problems, however there is still room for advancement. By illustrating the problems encountered so far, this paper serves as an attempt to contribute to their further development. The paper further argues the usefulness of mapping for Byzantine art history with possible analogies to other cultural/historical sciences and the innovative potential of historical databases due to their visualisability and the effect of serendipity.
Re-thinking Organisms: The Epistemic Impact of Databases on Model Organism Biology.
co-authored with Rachel Ankeny, forthcoming in Studies in the History and the Philosophy of the Biological and Biomedical Sciences, Part C, 2012.
Updating a Biomedical Database: Writing, Reading and Invisible Contribution
in D. Barton, U. Papen (eds.), 2010, Anthropology of Writing : Understanding Textually-Mediated Worlds, London, Continuum, p. 47-66.
The development of information and communication technologies has multiplied our ability to produce, circulate and... more The development of information and communication technologies has multiplied our ability to produce, circulate and store large amounts of data. Over the last twenty years databases have become an essential part of biomedical research. For these databases to operate effectively, a link has to be made between very small amounts of biological material (only a few microlitres) and a wide range of personal data relating to the donors (age, sex, occupation, lifestyle, diet, etc.) and their state of health (clinical and biological data). Yet most studies on bioinformatics databases take this link for granted, as if it emerged naturally and automatically from the data collection process. However, the relationship between samples and data does not emerge in and of itself. This paper shows that the solidity of the link between different types of data is based on the daily work of writing. It also shows that ‘information' is not the starting point of the work. Conversely, the whole set of documents and writing practices are precisely a way of transforming data into information that has a polyvalent value: scientific, medical and legal.
Integrated Bio-Entity Network: A System for Biological Knowledge Discovery
A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities)... more A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein–protein interactions, protein/gene regulations, protein–small molecule interactions, protein–GO relationships, protein–pathway relationships, and pathway–disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses—the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs.
Knowledge Discovery: Enhancing Data Mining and Decision Support Integration
There are six stages of data mining processes; business understanding, data understanding, data preparation,... more There are six stages of data mining processes; business understanding, data understanding, data preparation, modelling, evaluation and deployment. The third and one of the most important stages in data mining process is the data cleaning and preparation stage. Data cleaning and pre-processing involve the creation of the relevant data subset through data selection, as well as finding of useful properties/features, generating new features, defining appropriate feature values and/or value discretization. However, data mining’s performance and result accuracy highly dependent on the format and the availability of data presented and also the computational data mining tools. Experts are involved in most stages of a data mining project described by the CRISP-DM [Chapman, 2000]. The most informative attributes that influenced the accuracy of data mining are computed prior or during the process of data mining. On the other hand, a complementary approach to such problem solving that does not rely on collecting observational data is decision making. In this approach the human decision maker builds alternative models and defines the preference ordering criteria. This information is then used to make a rational decision. This process can be supported by computational decision support systems. To improve the quality of decision support, better submodels are needed, modeling the underlying decision making processes in a more realistic way. In order to include as much information as possible, the submodels of the expert system are usually provided with a lot of parameters describing different aspects of the decision making, hoping that the characteristics that are truly important are included in the model. In the context of classification, those descriptive parameters are termed features or attributes, and the selection of a good set of features/attributes is of key importance in the design of good classification models that will be used afterwards by the expert system. The roles of experts in data mining and decision supports are different, but complementary [Lavrač and Bohanec, 2003]. In an integrated approach to data mining and decision support, the potential of experts can even better be exploited in all stages of the integrated problem solving process. The gap between the format of data as stored in the data sources and that required by newly developed data mining algorithms must be bridged before any novel machine learning and data modelling algorithms tools can be used to their full potential. Transforming this data into a format appropriate for mining is a key (and often very time consuming) phase of the data mining process called data preparation. In this dissertation, I present a comprehensive survey of existing research in integrating data mining and decision supports and techniques in improving the performance of data mining algorithms and decision support. After the survey, a research proposal is put forward to study and investigate the method of integrating data mining and decision support for better accuracy results produced by both data mining algorithms and decision support model. Finally, some preliminary work in this area is presented.
59 views
Seen by:Pattern-Based Transformation Approach to Relational Domain Learning Using Dynamic Aggregation for Relational Attributes
Rayner Alfred and Dimitar KAzakov, Pattern-Based Transformation Approach to Relational Domain Learning Using Dynamic Aggregation for Relational Attributes, Proceedings of the 2006 International Conference on Data Mining, DMIN 2006, Las Vegas, Nevada, USA, June 26-29, 2006 2006
Due to the widespread use of relational databases (mySQL, Oracle, DB2, MsSQL), most data are stored as multiple tables... more Due to the widespread use of relational databases (mySQL, Oracle, DB2, MsSQL), most data are stored as multiple tables in what can be a very large database. As a result, more efficient algorithms for mining data from multirelational domain need to be implemented. Inductive Logic programming (ILP) techniques are useful for analyzing data in multi-relational databases. Unfortunately, even though not complex in structure, such business data are often large and contain highly non-determinate components, making them difficult for ILP learners geared towards structurally complex tasks. In this paper, we build a novel transformation-based approach to relational domain learning and describe the transformation process implemented through relational aggregation based on pattern distance. In this paper, we present the prototype of “Dynamic Aggregation of Relational Attributes ” (hence called DARA) that is capable of mapping one-to-many relationship into one-to-one relationship, while preventing loss of information, in handling classification task in relational domains. We experimentally show these results in a multi-relational domain that show higher percentage of correctly classified instances and illustrate set of rules extracted using our approach.
12 views
Seen by:A Genetic-Based Feature Construction Method for Data Summarisation
Rayner Alfred, (2008) A genetic-based feature construction method for data summarisation. In: 4th International Conference on Advanced Data Mining and Applications (ADMA 2008), 8-10 October 2008, Chengdu, China.
The importance of input representation has been recognised already in machine learning. This paper discusses the... more The importance of input representation has been recognised already in machine learning. This paper discusses the application of genetic-based feature construction methods to generate input data for the data summarisation method called Dynamic Aggregation of Relational Attributes (DARA). Here, feature construction methods are applied in order to improve the descriptive accuracy of the DARA algorithm. The DARA algorithm is designed to summarise data stored in the non-target tables by clustering them into groups, where multiple records stored in non-target tables correspond to a single record stored in a target table. This paper addresses the question whether or not the descriptive accuracy of the DARA algorithm benefits from the feature construction process. This involves solving the problem of constructing a relevant set of features for the DARA algorithm by using a genetic-based algorithm. This work also evaluates several scoring measures used as fitness functions to find the best set of constructed features.
Data Summarization Approach to Relational Domain Learning Based on Frequent Pattern to Support the Development of Decision Making
Rayner Alfred and Dimitar Kazakov, Data Summarization Approach to Relational Domain Learning Based on Frequent Pattern to Support the Development of Decision Making, X. Li, O.R. Zaiane, and Z. Li (Eds.): ADMA 2006, LNAI 4093, pp. 889 – 898, 2006. © Springer-Verlag Berlin Heidelberg 2006.
A new approach is needed to handle huge dataset stored in multiple tables in a very-large database. Data mining and... more A new approach is needed to handle huge dataset stored in multiple tables in a very-large database. Data mining and Knowledge Discovery in Databases (KDD) promise to play a crucial role in the way people interact with databases, especially decision support databases where analysis and exploration operations are essential. In this paper, we present related works in Relational Data Mining, define the basic notions of data mining for decision support and the types of data aggregation as a means of categorizing or summarizing data. We then present a novel approach to relational domain learning to support the development of decision making models by introducing automated construction of hierarchical multi-attribute model for decision making. We will describe how relational dataset can naturally be handled to support the construction of hierarchical multi-attribute model by using relational aggregation based on pattern's distance. In this paper, we presents the prototype ofDynamic Aggregation of Relational Attributes (hence called DARA) that is capable of supporting the construction of hierarchical multi-attribute model for decision making. We experimentally show these results in a multi-relational domain that shows higher percentage of correctly classified instances and illustrate set of rules extracted from the relational domains to support decision-making.
Rules Extraction Based on Data Summarisation Approach Using DARA
Rayner Alfred. 2008. Rules Extraction Based on Data Summarisation Approach Using DARA. In Proceedings of the 4th international conference on Advanced Data Mining and Applications (ADMA '08), Changjie Tang, Charles X. Ling, Xiaofang Zhou, Nick J. Cercone, and Xue Li (Eds.). Springer-Verlag, Berlin, Heidelberg, 540-547. DOI=10.1007/978-3-540-88192-6_54 http://dx.doi.org/10.1007/978-3-540-88192-6_54
This paper helps the understanding and development of a data summarisation approach that summarises structured data... more This paper helps the understanding and development of a data summarisation approach that summarises structured data stored in a non-target table that has many-to-one relations with the target table. In this paper, the feasibility of data summarisation techniques, borrowed from the Information Retrieval Theory, to summarise patterns obtained from data stored across multiple tables with one-to-many relations is demonstrated. The paper describes the Dynamic Aggregation of Relational Attributes (DARA) framework, which summarises data stored in non-target tables in order to facilitate data modelling efforts in a multi-relational setting. The application of the DARA algorithm involving structured data is presented in order to show the adaptability of this algorithm to real world problems.
10 views
Seen by:
