• Volume 6,Issue 3,2012 Table of Contents
    Select All
    Display Type: |
    • >Special Issue of Festschrift in Honor of Professor David Bell
    • Preface

      2012, 6(3):359-361.

      Abstract (3182) HTML (0) PDF 208.21 K (2923) Comment (0) Favorites

      Abstract:

    • Linear Time Baire Hierarchical Clustering for Enterprise Information Retrieval

      2012, 6(3):363-380.

      Abstract (3364) HTML (0) PDF 3.76 M (3187) Comment (0) Favorites

      Abstract:The Baire or longest common prefix metric induces an ultrametric or tree topology. It has many interesting properties such as the following: the Baire distance, or metric, is also an ultrametric; associated with the tree topology is a hierarchically-structured, embedded set of clusters; the hierarchical clustering can be viewed in terms of density-based and grid-based structuring of the data. We are interested in using the hierarchical structuring of the data induced by the Baire metric for top-down search, in an information retrieval context. Enterprise search and retrieval requires exhaustivity of retrievals. Another requirement is that enterprise search supports situation awareness in order to implement different policies of access to, and use of, data. We show how situation awareness can be supported by the Baire metric, as used for structuring data in order to support enterprise search and retrieval.

    • Fusion of Data and Knowledge for Safe UAV Landing

      2012, 6(3):381-398.

      Abstract (3753) HTML (0) PDF 2.24 M (3753) Comment (0) Favorites

      Abstract:Autonomous Unmanned Aerial Vehicles (UAVs) have the potential to significantly improve current working practices for a variety of applications including aerial surveillance and search-and-rescue. However before UAVs can be widely integrated into civilian airspace there are a number of technical challenges which must be overcome including provision of an autonomous method of landing which would be executed in the event of an emergency. A fundamental component of autonomous landing is safe landing zone detection of which terrain classification is a major constituent. Presented in this paper is an extension of the Multi-Modal Expectation Maximization algorithm which combines data in the form of multiple images of the same scene, with knowledge in the form of historic training data and Ordnance Survey map information to compute updated class parameters. These updated parameters are subsequently used to classify the terrain of an area based on the pixel data contained within the images. An image's contribution to the classification of an area is then apportioned according to its coverage of that area. Preliminary results are presented based on aerial imagery of the Antrim Plateau region in Northern Ireland which indicates potential in the approach used.

    • Assessing Disclosure Risk and Data Utility Trade-off in Transaction Data Anonymization

      2012, 6(3):399-417.

      Abstract (3488) HTML (0) PDF 1.50 M (3740) Comment (0) Favorites

      Abstract:Organizations and businesses, including financial institutions and healthcare providers, are increasingly collecting and disseminating information about individuals in the form of transactions. A transaction associates an individual with a set of items, each representing a potentially confidential activity, such as the purchase of a stock or the diagnosis of a disease. Thus, transaction data need to be shared in a way that preserves individuals' privacy, while remaining useful in intended tasks. While algorithms for anonymizing transaction data have been developed, the issue of how to achieve a "desired" balance between disclosure risk and data utility has not been investigated. In this paper, we assess the balance offered by popular algorithms using the R-U confidentiality map. Our analysis and experiments shed light on how the joint impact on disclosure risk and data utility can be examined, which allows the production of high-quality anonymization solutions.

    • Measuring Software Requirements Evolution Caused by Inconsistency

      2012, 6(3):419-434.

      Abstract (3652) HTML (0) PDF 836.76 K (3147) Comment (0) Favorites

      Abstract:It has been widely recognized that requirements evolution is unavoidable in any sizeable software project. Moreover, if the requirement evolution is not managed properly, it may result in many troublesome problems during the process of software development. For example, poor management of requirements evolution may lead to inconsistencies in requirements and incomparability between requirements and other work products. Repairing these problems can lead to extra consumption of development resources. However, inconsistency is considered as one of the concerns of requirements evolution. In this paper, we propose a family of logic-based measures to evaluating software requirements evolution caused by inconsistency handling. Each of these measurements provides a distinctive perspective of quantitative description for the requirements evolution. At first, we provide a syntax-based measure for the change in requirements statements during the requirements evolution. Then we provide a semantics-based approach to measuring the change in the expression ability of requirements specification during the process of evolution. Finally, we characterize three special kinds of requirements evolution based on these measurements, including the evolved requirements specification with minimal change, the evolved requirements specification with minimal significance change, and the evolved requirements specification with maximal plausibility.

    • Contextual Probability and Neighbourhood Counting

      2012, 6(3):435-452.

      Abstract (3188) HTML (0) PDF 1.06 M (4049) Comment (0) Favorites

      Abstract:In this paper, we review the concept of contextual probability, the resulting notion of neighbourhood counting and the various specialisations of this notion which result in new functions for measuring similarity, such as all common subsequences. We also provide new results on the generalisation of the all common subsequences similarity. Contextual probability was originally proposed as an alternative way of reasoning. It was later found to be an alternative way of estimating probability, and it led to the introduction of the neighbourhood counting notion. This notion was then found to be a generic similarity metric that can be applied to different types of data.

    • Web Data Extraction from Query Result Pages Based on Visual and Content Features

      2012, 6(3):453-472.

      Abstract (4096) HTML (0) PDF 3.31 M (4688) Comment (0) Favorites

      Abstract:A rapidly increasing number of Web databases are now become accessible via their HTML form-based query interfaces. Query result pages are dynamically generated in response to user queries, which encode structured data and are displayed for human use. Query result pages usually contain other types of information in addition to query results, e.g., advertisements, navigation bar etc. The problem of extracting structured data from query result pages is critical for web data integration applications, such as comparison shopping, meta-search engines etc, and has been intensively studied. A number of approaches have been proposed. As the structures of Web pages become more and more complex, the existing approaches start to fail, and most of them do not remove irrelevant contents which may affect the accuracy of data record extraction. We propose an automated approach for Web data extraction. First, it makes use of visual features and query terms to identify data sections and extracts data records in these sections. We also represent several content and visual features of visual blocks in a data section, and use them to filter out noisy blocks. Second, it measures similarity between data items in different data records based on their visual and content features, and aligns them into different groups so that the data in the same group have the same semantics. The results of our experiments with a large set of Web query result pages in di?erent domains show that our proposed approaches are highly effective.