A data integration concept for an interdisciplinary research database
WILLMES, C. and BARETH, G. (2012): A data integration concept for an interdisciplinary research database. In: Proceedings of the Young Researchers forum on Geographic Information Science - GI Zeitgeist, ifgiPrints 44, Münster, Germany, March 2012, ISBN: 978-3-89838-663-0, Akademische Verlagsgesellschaft AKA, Heidelberg, pp. 67 - 72.
This paper presents an overview of the current state of development of an archaeological and a palaeoenvironmental... more This paper presents an overview of the current state of development of an archaeological and a palaeoenvironmental data model for an interdisciplinary research database. The models are constructed iteratively by integrating heterogeneous data and adjusting the model where necessary. The integration concept is an iterative approach which combines several techniques for data model development, including semantic and syntactic integration and alignment, as well as semantic data linkage with external knowledgebases and models. The goal is to provide integrated spatio-temporal access to an existing wealth of data to facilitate research on the integrated data basis.
40 views
Seen by:CDAO-Store: Ontology-driven Data Integration for Phylogenetic Analysis
by Ben Wright
in BMC BIoinformatics
Background
The Comparative Data Analysis Ontology (CDAO) is an ontology developed, as part of the EvoInfo and... more
Background
The Comparative Data Analysis Ontology (CDAO) is an ontology developed, as part of the EvoInfo and EvoIO groups supported by the National Evolutionary Synthesis Center, to provide semantic descriptions of data and transformations commonly found in the domain of phylogenetic analysis. The core concepts of the ontology enable the description of phylogenetic trees and associated character data matrices.
Results
Using CDAO as the semantic back-end, we developed a triple-store, named CDAO-Store. CDAO-Store is a RDF-based store of phylogenetic data, including a complete import of TreeBASE. CDAO-Store provides a programmatic interface, in the form of web services, and a web-based front-end, to perform both user-defined as well as domain-specific queries; domain-specific queries include search for nearest common ancestors, minimum spanning clades, filter multiple trees in the store by size, author, taxa, tree identifier, algorithm or method. In addition, CDAO-Store provides a visualization front-end, called CDAO-Explorer, which can be used to view both character data matrices and trees extracted from the CDAO-Store. CDAO-Store provides import capabilities, enabling the addition of new data to the triple-store; files in PHYLIP, MEGA, nexml, and NEXUS formats can be imported and their CDAO representations added to the triple-store.
Conclusions
CDAO-Store is made up of a versatile and integrated set of tools to support phylogenetic analysis. To the best of our knowledge, CDAO-Store is the first semantically-aware repository of phylogenetic data with domain-specific querying capabilities. The portal to CDAO-Store is available at http://www.cs.nmsu.edu/~cdaostore.
CFP - International Journal of Data Engineering (IJDE)
by J. Stewart
Computer Science Journals (CSC Journals)
Computer Science Journals (CSC Journals) invites researchers, editors, scientists & scholars to publish their... more
Computer Science Journals (CSC Journals) invites researchers, editors, scientists & scholars to publish their scientific research papers in an International Journal of Data Engineering (IJDE) Volume 3, Issue 3.
Data Engineering refers to the use of data engineering techniques and methodologies in the design, development and assessment of computer systems for different computing platforms and application environments. With the proliferation of the different forms of data and its rich semantics, the need for sophisticated techniques has resulted an in-depth content processing, engineering analysis, indexing, learning, mining, searching, management, and retrieval of data.
International Journal of Data Engineering (IJDE) is a peer reviewed scientific journal for sharing and exchanging research and results to problems encountered in today’s data engineering societies. IJDE especially encourage submissions that make efforts (1) to expose practitioners to the most recent research results, tools, and practices in data engineering topics; (2) to raise awareness in the research community of the data engineering problems that arise in practice; (3) to promote the exchange of data & information engineering technologies and experiences among researchers and practitioners; and (4) to identify new issues and directions for future research and development in the data & information engineering fields. IJDE is a peer review journal that targets researchers and practitioners working on data engineering and data management.
CSC Journals anticipate and invite papers on any of the following topics:
Annotation and Data Curation
Approximation and Uncertainty in Databases and Pro
Autonomic Databases
Data Engineering
Data Engineering Algorithms
Data Engineering for Ubiquitous Mobile Distributed
Data Engineering Models
Data Integration
Data Mining and Knowledge Discovery
Data Ontologies
Data Privacy and Security
Data Query Optimization in Databases
Data Streams and Sensor Networks
Data Warehousing
Database Tuning
Database User Interfaces and Information Visualiza
Knowledge Technologies
Metadata Management and Semantic Interoperability
OLAP and Data Grids
Personalized Databases
Query Processing in Databases
Scientific Biomedical and Other Advanced Database
Semantic Web
Social Information Management
Spatial Temporal and Multimedia Data Engineering
Web Data Engineering and Management
Important Dates - IJDE CFP - Volume 3, Issue 3.
Paper Submission: March 31, 2012
Author Notification: May 15, 2012
Issue Publication: June 2012
For complete details about IJDE archives publications, abstracting/indexing, editorial board and other important information, please refer to IJDE homepage (http://www.cscjournals.org/csc/journals/IJDE/description.php?JCode=IJDE).
We look forward to receive your valuable papers. If you have further questions please do not hesitate to contact us at cscpress@cscjournals.org. Our team is committed to provide a quick and supportive service throughout the publication process.
A complete list of journals can be found at http://www.cscjournals.org/csc/bysubject.php
9 views
Seen by:Computer-based genealogy reconstruction in founder populations
Journal of Biomedical Informatics
his paper describes a software tool that reconstructs entire genealogies from data collected from different and... more
his paper describes a software tool that reconstructs entire genealogies from data collected from different and heterogeneous sources, including municipal and parish records archived over centuries. The tool exploits a record linkage algorithm relying on a rule-based data matching approach. It applies a general strategy for managing the ambiguities due to missing, imprecise or erroneous input data. The process follows an iterative approach that combines automatic pedigree reconstruction with software-empowered human data revision to improve the quality and the accuracy of the results and to optimize the matching rules.
The paper discusses the results obtained by reconstructing the entire genealogy of the population of the Val Borbera, a geographically isolated valley in Northern Italy. The genealogy could be reconstructed from data going back as far as the XVI century. The resulting pedigree includes 75,994 trios, 58.9% of which belonging to a unique big family, reconstructed over 13 generations.
Design and Implementation of a Query Planner for Data Integration
Koul, N., and Honavar, V. (2009). Design and Implementation of a Query Planner for Data Integration. In: Proceedings of the IEEE Conference on Tools with Artificial Intelligence.
Digital information and communication networks and scientific research substance – An investigation of meteorology
by Yi Shen
Proceedings of the American Society for Information Science and Technology
Volume 44, Issue 1, pages 1–4, 2007
22 views
Seen by:Vispedia: Interactive Visual Exploration of Wikipedia Data Via Search-based Integration
by Leslie Wu
Wikipedia is an example of the collaborative, semi-structured data sets emerging on the Web. These data sets have... more Wikipedia is an example of the collaborative, semi-structured data sets emerging on the Web. These data sets have large, non- uniform schema that require costly data integration into structured tables before visualization can begin. We present Vispedia, a Web-based visualization system that reduces the cost of this data integration. Users can browse Wikipedia, select an interesting data table, then use a search interface to discover, integrate, and visualize additional columns of data drawn from multiple Wikipedia articles. This interaction is supported by a fast path search algorithm over DBpedia, a semantic graph extracted from Wikipedia’s hyperlink structure. Vispedia can also export the augmented data tables produced for use in traditional visualization systems. We believe that these techniques begin to address the “long tail” of visualization by allowing a wider audience to visualize a broader class of data. We evaluated this system in a first-use formative lab study. Study participants were able to quickly create effective visualizations for a diverse set of domains, performing data integration as needed.
12 views
Seen by:A Survey on Uncertainty Management In Data Integration
Matteo Magnani, Danilo Montesi. ACM Journal of data and information quality, 2010.
Community Driven Requests for Proposals: Applying Semantics to match Customer Purchase Intents to Vendor Offers
Christophe Debruyne, Davor Meersman, Mathias Baert and Rami Hansenne
This paper presents a platform for requests for proposals and describes how ontologies drive the different com-... more This paper presents a platform for requests for proposals and describes how ontologies drive the different com- ponents: the creation of a proposal, the annotation of vendor data, the transformation of vendor data into other formats and the semantic matching of a proposal against annotated vendor data. The ontology construction started from DOGMA, a methodology with its grounding in the linguistic representation of knowledge that is suitable for community participation in the creation process. The ontologies were created in a modular way, with general product and meta-models that can be extended depending on the domain. In the case of the pilot, the product were holiday packages, more precisely winter sports holiday packages.
32 views
Seen by:Linkage query writer
by Reynold Xin
Demo Paper, VLDB 2009, Lyon, France.
We present Linkage Query Writer (LinQuer), a system for generating SQL queries for semantic link discovery over... more We present Linkage Query Writer (LinQuer), a system for generating SQL queries for semantic link discovery over relational data. The LinQuer framework consists of (a) LinQL, a language for specification of linkage requirements; (b) a web interface and an API for translating LinQL queries to standard SQL queries; (c) an interface that assists users in writing LinQL queries. We discuss the challenges involved in the design and implementation of a declarative and easy to use framework for discovering links between different data items in a single data source or across different data sources. We demonstrate different steps of the linkage requirements specification and discovery process in several real world scenarios and show how the LinQuer system can be used to create high-quality linked data sources.
A Structured Overview of Simultaneous Component Based Data Integration
Background: Data integration is currently one of the main challenges in the biomedical sciences. Often different... more
Background: Data integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pieces stemming, for example, from different measurement techniques. This implies that more and more data appear that consist of two or more data arrays that have a shared mode. An integrative analysis of such coupled data should be based on a simultaneous analysis of all data arrays. In this respect, the family of simultaneous component methods (e.g., SUM-PCA, unrestricted PCovR, MFA, STATIS, and SCA-P) is a natural choice. Yet, different simultaneous component methods may lead to quite different results.
Results: We offer a structured overview of simultaneous component methods that frames them in a principal components setting such that both the common core of the methods and the specific elements with regard to which they differ are highlighted. An overview of principles is given that may guide the data analyst in choosing an appropriate simultaneous component method. Several theoretical and practical issues are illustrated with an empirical example on metabolomics data for Escherichia coli as obtained with different analytical chemical measurement methods.
Conclusion: Of the aspects in which the simultaneous component methods differ, pre-processing and weighting are consequential. Especially, the type of weighting of the different matrices is essential for simultaneous component analysis. These types are shown to be linked to different specifications of the idea of a fair integration of the different coupled arrays.
A RDF-Based Data Integration Framework
by Hadi Saboohi
Co-authored with Amineh Amini, and Nasser Nemat Bakhsh.
Published in 2008.
Data integration is one of the main problems in distributed data sources. An approach is to provide an integrated... more Data integration is one of the main problems in distributed data sources. An approach is to provide an integrated mediated schema for various data sources. This research work aims at developing a framework for defining an integrated schema and querying on it. The basic idea is to employ recent standard languages and tools to provide a unified data integration framework. RDF is used for integrated schema descriptions as well as providing a unified view of data. RDQL is used for query reformulation. Furthermore, description logic inference services provide necessary means for satisfiability checking of concepts in integrated schema. The framework has tools to display integrated schema, query on it, and provides enough flexibilities to be used in different application domains.
20 views
Seen by:Distributed and Scalable XML Document Processing Architecture for E-Commerce Systems
Co-authored with David Cheung, S. D. Lee, William Song, C. J. Tan
XML has become a very important emerging standard for E-commerce because of its flexibility and universality. Many... more XML has become a very important emerging standard for E-commerce because of its flexibility and universality. Many software designers are actively developing new systems to handle information in XML formats. We propose a generic architecture for processing XML. We have designed an XML processing system using the latest technologies such as XML, XSLT, HTTP and Java Servlets. Our design is very generic, flexible, scalable, extensible and suitable for distributed network environments. A main application of the architecture and the system is to support data exchange in electronic commerce systems.
