• Volume 7,Issue 3,2013 Table of Contents
    Select All
    Display Type: |
    • >Special Issue on Manifold Learning
    • Editorial of the Special Issue on Manifold Learning

      2013, 7(3):357-358.

      Abstract (3780) HTML (0) PDF 76.18 K (2267) Comment (0) Favorites

      Abstract:In many information analysis tasks, one is often confronted with thousands to millions dimensional data, such as images, documents, videos, web data, bioinformatics data, etc. Conventional statistical and computational tools are severely inadequate for processing and analysing high-dimensional data due to the curse of dimensionality, where we often need to conduct inference with a limited number of samples. On the other hand, naturally occurring data may be generated by structured systems with possibly much fewer degrees of freedom than the ambient dimension would suggest. Recently, various works have considered the case when the data is sampled from a submanifold embedded in the much higher dimensional Euclidean space. Learning with full consideration of the low dimensional manifold structure, or specifically the intrinsic topological and geometrical properties of the data manifold is referred to as manifold learning, which has been receiving growing attention in our community in recent years. This special issue is to attract articles that (a) address the frontier problems in the scientific principles of manifold learning, and (b) report empirical studies and applications of manifold learning algorithms, including but not limited to pattern recognition, computer vision, web mining, image processing and so on. A total of 13 submissions were received. The papers included in this special issue are selected based on the reviews by experts in the subject area according to the journal's procedure and quality standard. Each paper is reviewed by at least two reviewers and some of the papers were revised for two rounds according to the reviewers' comments. The special issue includes 6 papers in total: 3 papers on the foundational theories of manifold learning, 2 papers on graph-based methods, and 1 paper on the application of manifold learning to video compression. The papers on the foundational theories of manifold learning cover the topics about the generalization ability of manifold learning, manifold ranking, and multi-manifold factorization. In the paper entitled ``Manifold Learning: Generalizing Ability and Tangential Proximity'', Bernstein and Kuleshov propose a tangential proximity based technique to address the generalized manifold learning problem. The proposed method ensures not only proximity between the points and their reconstructed values but also proximity between the corresponding tangent spaces. The traditional manifold ranking methods are based on the Laplacian regularization, which suffers from the issue that the solution is biased towards constant functions. To overcome this issue, in the paper entitled ``Manifold Ranking using Hessian Energy'', Guan et al. propose to use the second-order Hessian energy as regularization for manifold ranking. In the paper entitled ``Multi-Manifold Concept Factorization for Data Clustering'', Li et al. incorporate the multi-manifold ensemble learning into concept factorization to better preserve the local structure of the data, thus yielding more satisfactory clustering results. The papers on graph-based methods cover the topics about label propagation and graph-based dimensionality reduction. In the paper entitled ``Bidirectional Label Propagation over Graphs'', Liu et al. propose a novel label propagation algorithm to propagate labels along positive and negative edges in the graph. The construction of the graph is novel against the conventional approach by incorporating the dissimilarity among data points into the affinity matrix. In the paper entitled ``Locally Regressive Projections'', Lijun Zhang proposes a novel graph-based dimensionality reduction method that captures the local discriminative structure of the data space. The key idea is to fit a linear model locally around each data point, and then use the fitting error to measure the performance of dimensionality reduction. In the last paper entitled ``Combining Active and Semi-Supervised Learning for Video Compression'', motivated from manifold regularization, Zhang and Ji propose a machine learning approach for video compression. Active learning is used to select the most representative pixels in the encoding process, and semi-supervised learning is used to recover the color video in the decoding process. One remarking property of this approach is that the active learning algorithm shares the same loss function as the semi-supervised learning algorithm, providing a unified framework for video compression. Many people have been involved in making this special issue possible. The guest editor would like to express his gratitude to all the contributing authors for their insightful work on manifold learning. The guest editor would like to thank the reviewers for their comments and useful suggestions in order to improve the quality of the papers. The guest editor would also like to thank Prof. Ruqian Lu, the editor-in-chief of the International Journal of Software and Informatics, for providing the precious opportunity to publish this special issue. Finally, we hope the reader will enjoy this special issue and find it useful.

    • Manifold Learning: Generalization Ability and Tangent Proximity

      2013, 7(3):359-390.

      Abstract (4552) HTML (0) PDF 4.26 M (3053) Comment (0) Favorites

      Abstract:One of the ultimate goals of Manifold Learning (ML) is to reconstruct an unknown nonlinear low-dimensional Data Manifold (DM) embedded in a high-dimensional observation space from a given set of data points sampled from the manifold. We derive asymptotic expansion and local lower and upper bounds for the maximum reconstruction error in a small neighborhood of an arbitrary point. The expansion and bounds are defined in terms of the distance between tangent spaces to the original DM and the Reconstructed Manifold (RM) at the selected point and its reconstructed value, respectively. We propose an amplification of the ML, called Tangent Bundle ML, in which proximity is required not only between the DM and RM but also between their tangent spaces. We present a new geometrically motivated Grassman & Stiefel Eigenmaps algorithm that solves this problem and gives a new solution for the ML also.

    • Manifold Ranking using Hessian Energy

      2013, 7(3):391-405.

      Abstract (3393) HTML (0) PDF 271.49 K (2616) Comment (0) Favorites

      Abstract:In recent years, learning on manifolds has attracted much attention in the academia community. The idea that the distribution of real-life data forms a low dimensional manifold embedded in the ambient space works quite well in practice, with applications such as ranking, dimensionality reduction, semi-supervised learning and clustering. This paper focuses on ranking on manifolds. Traditional manifold ranking methods try to learn a ranking function that varies smoothly along the data manifold by using a Laplacian regularizer. However, the Laplacian regularization suffers from the issue that the solution is biased towards constant functions. In this work, we propose using second-order Hessian energy as regularization for manifold ranking. Hessian energy overcomes the above issue by only penalizing accelerated variation of the ranking function along the geodesics of the data manifold. We also develop a manifold ranking framework for general graphs/hypergraphs for which we do not have an original feature space (i.e. the ambient space). We evaluate our ranking method on the COREL image dataset and a rich media dataset crawled from Last.fm. The experimental results indicate that our manifold ranking method is effective and outperforms traditional graph Laplacian based ranking method.

    • Multi-Manifold Concept Factorization for Data Clustering

      2013, 7(3):407-418.

      Abstract (4840) HTML (0) PDF 233.83 K (3410) Comment (0) Favorites

      Abstract:Clustering plays an important role in many fields, such as pattern recognition and data mining. Its goal is to group the collected data points into their respective clusters. To this end, a number of matrix factorization based methods have been developed to obtain satisfying clustering results by extracting the latent concepts in the data, e.g., concept factorization (CF) and locally consistent concept factorization (LCCF). LCCF takes into account the local manifold structure of the data, but it is nontrivial to estimate the intrinsic manifold, reflecting the true data structure. To address this issue, we in this paper present a novel method called Multi-Manifold Concept Factorization (MMCF) to derive more promising clustering performance. Specifically, we assume the intrinsic manifold lies in a convex hull of some predefined candidate manifolds. The basic idea is to learn a convex combination of a group of candidate manifolds, which is utilized to approximate the intrinsic manifold of the data. In this way, the low-dimensional data representation derived from MMCF is able to better preserve the locally geometrical structure of the data. To optimize the objective function, we develop an alternating algorithm and learn the manifold coefficients using the entropic mirror descent algorithm. The effectiveness of the proposed approach has been demonstrated through a set of evaluations on several real-world data sets.

    • Bidirectional Label Propagation over Graphs

      2013, 7(3):419-433.

      Abstract (2529) HTML (0) PDF 1.23 M (2835) Comment (0) Favorites

      Abstract:Graph-Based label propagation algorithms are popular in the state-of-the-art semi-supervised learning research. The key idea underlying this algorithmic family is to enforce labeling consistency between any two examples with a positive similarity. However, negative similarities or dissimilarities are equivalently valuable in practice. To this end, we simultaneously leverage similarities and dissimilarities in our proposed semi-supervised learning algorithm which we term Bidirectional Label Propagation (BLP). Different from previous label propagation mechanisms that proceed along a single direction of graph edges, the BLP algorithm can propagate labels along not only positive but also negative edge directions. By using an initial neighborhood graph and class assignment constraints inherent among the labeled examples, a set of class-specific graphs are learned, which include both positive and negative edges and thus reveal discriminative cues. Over the learned graphs, a convex propagation criterion is carried out to ensure consistent labelings along the positive edges and inconsistent labelings along the negative edges. Experimental evidence discovered in synthetic and real-world datasets validates excellent performance of the proposed BLP algorithm.

    • Locally Regressive Projections

      2013, 7(3):435-451.

      Abstract (2058) HTML (0) PDF 307.85 K (2172) Comment (0) Favorites

      Abstract:We propose a novel linear dimensionality reduction algorithm, namely Locally Regressive Projections (LRP). To capture the local discriminative structure, for each data point, a local patch consisting of this point and its neighbors is constructed. LRP assumes that the low dimensional representations of points in each patch can be well estimated by a locally fitted regression function. Specifically, we train a linear function for each patch via ridge regression, and use its fitting error to measure how well the new representations can respect the local structure. The optimal projections are thus obtained by minimizing the summation of the fitting errors over all the local patches. LRP can be performed under either supervised or unsupervised settings. Our theoretical analysis reveals the connections between LRP and the classical methods such as PCA and LDA. Experiments on face recognition and clustering demonstrate the effectiveness of our proposed method.

    • Combining Active and Semi-Supervised Learning for Video Compression

      2013, 7(3):453-468.

      Abstract (1958) HTML (0) PDF 4.52 M (2442) Comment (0) Favorites

      Abstract:Video compression algorithms manipulate video signals to dramatically reduce the storage and bandwidth required while maximizing perceived video quality. Typical video compression methods include discrete cosine transform, vector quantization, fractal compression, and discrete wavelet transform. Recently, a machine learning based approach has been proposed which converts the color images (frames) to gray scale images (frames) and the color information for only a few representative pixels is kept. A learning model is then trained to predict the color values for the gray scale pixels across frames. Selecting the most representative pixels is essentially an active learning problem, while colorization is a semi-supervised learning problem. In this paper, we propose to combine active and semi-supervised learning for video compression. The basic idea is to minimize the size of the covariance matrix of the regularized least squares estimates, in which the regression model assumes that each pixel can be reconstructed by the other pixels with similar spatial location and intensity value. The experimental results demonstrate the effectiveness of the proposed approach for video compression.

    • >Regular Papers
    • An Empirical Study of Lehman's Law on Software Quality Evolution

      2013, 7(3):469-481.

      Abstract (2729) HTML (0) PDF 692.06 K (3209) Comment (0) Favorites

      Abstract:Software systems must change to adapt to new functional requirements and nonfunctional requirements. According to Lehman's laws of software evolution, on the one side, the size and the complexity of a software system will continually increase in its life time; on the other side, the quality of a software system will decrease unless it is rigorously maintained and adapted. Lehman's laws of software evolution, especially of those on software size and complexity, have been widely validated. However, there are few empirical studies of Lehman's law on software quality evolution, despite the fact that quality is one of the most important measurements of a software product. This paper defines a metric---accumulated defect density---to measure the quality of evolving software systems. We mine the bug reports and measure the size and complexity growth of four evolution lines of Apache Tomcat and Apache Ant projects. Based on these studies, Lehman's law on software quality evolution is examined and evaluated.