16. SDM 2016:Miami, Florida, USA

Proceedings of the 2016 SIAM International Conference on Data Mining, Miami, Florida, USA, May 5-7, 2016. SIAM 【DBLP Link

Paper Num: 98 || Session Num: 0

1. Front Matter.

Paper Link】 【Pages】:

【Authors】:

【Abstract】: Frontmatter includes welcome letter and table of contents.

【Keywords】:

2. Clustering in the Face of Fast Changing Streams.

Paper Link】 【Pages】:1-9

【Authors】: Liudmila Ulanova ; Nurjahan Begum ; Mohammad Shokoohi-Yekta ; Eamonn J. Keogh

【Abstract】: Clustering is arguably the most important primitive for data mining, finding use as a subroutine in many higher-order algorithms. In recent years, the community has redirected its attention from the batch case to the online case. This need to support online clustering is engendered by the proliferation of cheap ubiquitous sensors that continuously monitor various aspects of our world, from heartbeats as we exercise to the number of mosquitoes visiting a well in a village in Ethiopia. In this work, we argue that current online clustering solutions offer a room for improvement. To some degree they all have at least one of the following shortcomings: they are parameter-laden, only defined for certain distance functions, sensitive to outliers, and/or they are approximate. This last point requires clarification; in some sense almost all clustering algorithms are approximate. For example, in general, k-means only approximately optimizes its objective function. However, streaming versions of the k-means algorithm are further approximating this approximation, potentially leading to very poor solutions. In this work, we introduce an algorithm that mitigates these flaws. It is parameter-lite, defined for any distance function, insensitive to outliers and produces the same output as the batch version of the algorithm. We demonstrate the utility and effectiveness of our ideas with case studies in entomology, cardiology and biological audio processing.

【Keywords】:

3. Kernelized Sparse Self-Representation for Clustering and Recommendation.

Paper Link】 【Pages】:10-17

【Authors】: Xiao Bian ; Feng Li ; Xia Ning

【Abstract】: Sparse models have demonstrated substantial success in applications for data analysis such as clustering, classification and denoising. However, most of the current work is built upon the assumption that data is distributed in a union of subspaces, whereas limited work has been conducted on nonlinear datasets where data reside in a union of manifolds rather than a union of subspaces. To understand data nonlinearity using sparse models, in this paper, we propose to exploit the self-representation property of nonlinear data in an implicit feature space using kernel methods. We propose a kernelized sparse self-representation model, denoted as KSSR, and a novel Kernelized Fast Iterative Soft-Thresholding Algorithm, denoted as K-FISTA, to recover the underlying nonlinear structure among the data. We evaluate our method for clustering problems on both synthetic and real-world datasets, and demonstrate its superior performance compared to the other state-of-the-art methods. We also apply our method for collaborative filtering in recommender systems, and demonstrate its great potential for novel applications beyond clustering.

【Keywords】:

4. Multi-Domain Manifold Learning for Drug-Target Interaction Prediction.

Paper Link】 【Pages】:18-26

【Authors】: Ruichu Cai ; Zhenjie Zhang ; Srinivasan Parthasarathy ; Anthony K. H. Tung ; Zhifeng Hao ; Wen Zhang

【Abstract】: Drug-target interaction (DTI) provides novel insights about the genomic drug discovery, and is a critical technique to drug discovery. Recently, researchers try to incorporate different information about drugs and targets for prediction. However, the heterogeneous and high-dimensional data poses huge challenge to existing machine learning methods. In the last few years, extensive research efforts have been devoted to the utilization of manifold property on high dimensional data, e.g. dimension reduction methods preserving local structures of the manifolds. Motivated by the successes of these studies, we propose a general framework incorporating both manifold structures and known interaction/non-interaction information to predict the drug-target interactions. To overcome the challenges of domain scaling and information inconsistency, we formulate the problem with Semidefinite Programming (SDP), including new constraints to improve the robustness of the learning procedure. A variety of optimization techniques are also designed to enhance the scalability of the problem solver. Effectiveness of the method is evaluated by experiments on the benchmark dataset. Compared with state-of-the-art methods, the proposed methods generate much more accurate drug-target interaction prediction.

【Keywords】:

5. Finding Surprisingly Frequent Patterns of Variable Lengths in Sequence Data.

Paper Link】 【Pages】:27-35

【Authors】: Reza Sadoddin ; Jörg Sander ; Davood Rafiei

【Abstract】: We address the problem of finding ‘surprising’ patterns of variable length in sequence data, where a surprising pattern is defined as a subsequence of a longer sequence, whose observed frequency is statistically significant with respect to a given distribution. Finding statistically significant patterns in sequence data is the core task in some interesting applications such as Biological motif discovery and anomaly detection. We show that the presence of few ‘true’ surprising patterns in the data could cause a large number of highly-correlated patterns to stand statistically significant just because of those few significant patterns. Our approach to solving the ‘redundant patterns’ problem is based on capturing the dependencies between patterns through an ‘explain’ relationship where a set of patterns can explain the statistical significance of another pattern. This allows us to address the problem of redundancy by choosing a few ‘core’ patterns which explain the significance of all other significant patterns. We propose a greedy algorithm for efficiently finding an approximate core pattern set of minimum size. Using both synthetic and real-world sequential data, chosen from different domains including Medicine and Bioinformatics, we show that the proposed notion of core patterns very closely matches the notion of ‘true’ surprising patterns in data.

【Keywords】:

6. Identifying Connectivity Patterns for Brain Diseases via Multi-side-view Guided Deep Architectures.

Paper Link】 【Pages】:36-44

【Authors】: Jingyuan Zhang ; Bokai Cao ; Sihong Xie ; Chun-Ta Lu ; Philip S. Yu ; Ann B. Ragin

【Abstract】: There is considerable interest in mining neuroimage data to discover clinically meaningful connectivity patterns to inform an understanding of neurological and neuropsychiatric disorders. Subgraph mining models have been used to discover connected subgraph patterns. However, it is difficult to capture the complicated interplay among patterns. As a result, classification performance based on these results may not be satisfactory. To address this issue, we propose to learn non-linear representations of brain connectivity patterns from deep learning architectures. This is non-trivial, due to the limited subjects and the high costs of acquiring the data. Fortunately, auxiliary information from multiple side views such as clinical, serologic, immunologic, cognitive and other diagnostic testing also characterizes the states of subjects from different perspectives. In this paper, we present a novel Multi-side-View guided AutoEncoder (MVAE) that incorporates multiple side views into the process of deep learning to tackle the bias in the construction of connectivity patterns caused by the scarce clinical data. Extensive experiments show that MVAE not only captures discriminative connectivity patterns for classification, but also discovers meaningful information for clinical interpretation.

【Keywords】:

7. Regularized Weighted Linear Regression for High-dimensional Censored Data.

Paper Link】 【Pages】:45-53

【Authors】: Yan Li ; Bhanukiran Vinzamuri ; Chandan K. Reddy

【Abstract】: Survival analysis aims at modeling time to event data which occurs ubiquitously in many biomedical and healthcare applications. One of the critical challenges with modeling such survival data is the presence of censored outcomes which cannot be handled by standard regression models. In this paper, we propose a regularized linear regression model with weighted least-squares to handle the survival prediction in the presence of censored instances. We also employ the elastic net penalty term for inducing sparsity into the linear model to effectively handle high-dimensional data. As opposed to the existing censored linear models, the parameter estimation of our model does not need any prior estimation of survival times of censored instances. In addition, we propose a self-training framework which is able to improve the prediction performance of our proposed linear model. We demonstrate the performance of the proposed model using several real-world high-dimensional biomedical benchmark datasets and our experimental results indicate that our model outperforms other related competing methods and attains very competitive performance on various datasets.

【Keywords】:

8. Node Classification in Signed Social Networks.

Paper Link】 【Pages】:54-62

【Authors】: Jiliang Tang ; Charu C. Aggarwal ; Huan Liu

【Abstract】: Node classification in social networks has been proven to be useful in many real-world applications. The vast majority of existing algorithms focus on unsigned social networks (or social networks with only positive links), while little work exists for signed social networks. It is evident from recent developments in signed social network analysis that negative links have added value over positive links. Therefore, the incorporation of negative links has the potential to benefit various analytical tasks. In this paper, we study the novel problem of node classification in signed social networks. We provide a principled way to mathematically model positive and negative links simultaneously and propose a novel framework NCSSN for node classification in signed social networks. Experimental results on real-world signed social network datasets demonstrate the effectiveness of the proposed framework NCSSN. Further experiments are conducted to gain a deeper understanding of the importance of negative links for NCSSN.

【Keywords】:

9. Uncovering Multiple Diffusion Networks Using the First-Hand Sharing Pattern.

Paper Link】 【Pages】:63-71

【Authors】: Pei-Lun Liao ; Chung-Kuang Chou ; Ming-Syan Chen

【Abstract】: In our daily life, rumors are spread among people but diffusion processes and spreading paths behind rumors are usually hidden. The problem of finding this hidden process is getting more attention since after understanding the process, one can manipulate the diffusion speed of the process. In this work, we observe the pattern of information propagation that most nodes are inclined to share the first-hand information. In other words, the virality of an information piece will generally decay as it becomes rephrased or secondhand. We propose a generative model with the pattern and design the corresponding optimization method to infer both the hidden networks and transmission rates between nodes. Experimental results show that our model outperforms several state-of-the-art models on both synthetic and real datasets for network inference.

【Keywords】:

10. Integrating Community and Role Detection in Information Networks.

Paper Link】 【Pages】:72-80

【Authors】: Ting Chen ; Lu An Tang ; Yizhou Sun ; Zhengzhang Chen ; Haifeng Chen ; Guofei Jiang

【Abstract】: Community detection and role detection in information networks have received wide attention recently, where the former aims to detect the groups of nodes that are closely connected to each other and the latter aims to discover the underlying roles of nodes in the network. Traditional studies treat these two problems as orthogonal issues and propose algorithms for these two tasks separately. In this paper, we propose to integrate communities and roles in a unified model and detect both of them simultaneously for information networks. Intuitively, (1) correctly detecting the communities in a network will lead to the success of detecting roles of nodes, such as opinion leaders and followers in social networks; and (2) correctly identifying the roles of the nodes will lead to a better network modeling and thus a better detection of communities. A novel probabilistic network model, the Mixed Membership Community and Role model (MMCR), is then proposed, which models the latent community and role of each node at the same time, and the probability of links are defined accordingly. By testing our model on synthetic networks and two real-world networks, we demonstrate that our approach leads to better performance for both community detection and role detection. Moreover, our model has a better interpretation for link generation in networks according to the link prediction task.

【Keywords】:

11. Exploiting Emotional Information for Trust/Distrust Prediction.

Paper Link】 【Pages】:81-89

【Authors】: Ghazaleh Beigi ; Jiliang Tang ; Suhang Wang ; Huan Liu

【Abstract】: Trust and distrust networks are usually extremely sparse and the vast majority of the existing algorithms for trust/distrust prediction suffer from the data sparsity problem. In this paper, following the research from psychology and sociology, we envision that users' emotions such as happiness and anger are strong indicators of trust/distrust relations. Meanwhile the popularity of social media encourages the increasing number of users to freely express their emotions; hence emotional information is pervasively available and usually denser than the trust and distrust relations. Therefore incorporating emotional information could have the potentials to alleviate the data sparsity in the problem of trust/distrust prediction. In this study, we investigate how to exploit emotional information for trust/distrust prediction. In particular, we provide a principled way to capture emotional information mathematically and propose a novel trust/distrust prediction framework ETD. Experimental results on the real-world social media dataset demonstrate the effectiveness of the proposed framework and the importance of emotional information in trust/distrust prediction.

【Keywords】:

12. Online Prediction of User Actions through an Ensemble Vote from Vector Representation and Frequency Analysis Models.

Paper Link】 【Pages】:90-98

【Authors】: Changsung Moon ; Dakota Medd ; Paul Jones ; Steve Harenberg ; William Oxbury ; Nagiza F. Samatova

【Abstract】: The history of interactions between a user and a piece of technology can be represented as a sequence of actions. The ability to predict a user's next action is useful to many applications. For example, a user-interface that can anticipate the actions of a user is able to provide a more positive experience through just-in-time recommendations and pro-actively allocating or caching resources. Existing sequence prediction techniques have failed to address some of the challenges associated with this task, such as predicting an action that has never appeared for a given context. Techniques for an analogous task in the field of Natural Language Processing (NLP) avoid this issue; however, applying these NLP techniques directly to user action prediction would result in the loss of action frequency and action order, both of which are critically important. Therefore, we propose a method that unifies ideas from NLP with the task of sequence prediction. Our method, Frequency Vector (FVEC) prediction, is an online algorithm that predicts the top-N most likely next actions by combining scores from two models: a frequency analysis model and a vector representation model. In the frequency model, the score of an action is calculated based on the frequency that the action has occurred right after a given context. In the vector representation model, a vector for each action is learned, and a score for an action is calculated based on the similarity of its vector and the mean of the vectors for each action in a given context. Evaluations of FVEC on three real-world datasets resulted in a consistently higher prediction accuracy (and lower standard deviation) than all tested sequence prediction algorithms.

【Keywords】:

13. FairPlay: Fraud and Malware Detection in Google Play.

Paper Link】 【Pages】:99-107

【Authors】: Mahmudur Rahman ; Mizanur Rahman ; Bogdan Carbunar ; Duen Horng Chau

【Abstract】: Fraudulent behaviors in Google's Android app market fuel search rank abuse and malware proliferation. We present FairPlay, a novel system that uncovers both malware and search rank fraud apps, by picking out trails that fraudsters leave behind. To identify suspicious apps, FairPlay's PCF algorithm correlates review activities and uniquely combines detected review relations with linguistic and behavioral signals gleaned from longitudinal Google Play app data. We contribute a new longitudinal app dataset to the community, which consists of over 87K apps, 2.9M reviews, and 2.4M reviewers, collected over half a year. FairPlay achieves over 95% accuracy in classifying gold standard datasets of malware, fraudulent and legitimate apps. We show that 75% of the identified malware apps engage in search rank fraud. FairPlay discovers hundreds of fraudulent apps that currently evade Google Bouncer's detection technology, and reveals a new type of attack campaign, where users are harassed into writing positive reviews, and install and review other apps.

【Keywords】:

14. Synergies that Matter: Efficient Interaction Selection via Sparse Factorization Machine.

Paper Link】 【Pages】:108-116

【Authors】: Jianpeng Xu ; Kaixiang Lin ; Pang-Ning Tan ; Jiayu Zhou

【Abstract】: Collaborative filtering has been widely used in modern recommender systems to provide accurate recommendations by leveraging historical interactions between users and items. The presence of cold-start items and users has imposed a huge challenge to recommender systems based on collaborative filtering, because of the unavailability of such interaction information. The factorization machine is a powerful tool designed to tackle the cold-start problems by learning a bilinear ranking model that utilizes content information about users and items, exploiting the interactions with such content information. While a factorization machine makes use of all possible interactions between all content features to make recommendations, many of the features and their interactions are not predictive of recommendations, and incorporating them in the model will deteriorate the generalization performance of the recommender systems. In this paper, we propose an efficient Sparse Factorization Machine (SFM), that simultaneously identifies relevant user and item content features, models interactions between these relevant features, and learns a bilinear model using only these synergistic interactions. We have carried out extensive empirical studies on both synthetic and real-world datasets, and compared our method to other state-of-the-art baselines, including Factorization Machine. Experimental results show that SFM can greatly outperform other baselines.

【Keywords】:

15. A Spatial-Temporal Probabilistic Matrix Factorization Model for Point-of-Interest Recommendation.

Paper Link】 【Pages】:117-125

【Authors】: Huayu Li ; Richang Hong ; Zhiang Wu ; Yong Ge

【Abstract】: With the rapid development of Location-based Social Network (LBSN) services, a large number of Point-of-Interests (POIs) have been available, which consequently raises a great demand of building personalized POI recommender systems. A personalized POI recommender system can significantly help users to find their preferred POIs and assist POI owners to attract more customers. However, due to the complexity of users' checkin decision making process that is influenced by many different factors such as POI distance and region's prosperity, and the dynamics of user's preference, POI recommender systems usually suffer from many challenges. Although different latent factor based methods (e.g., probabilistic matrix factorization) have been proposed, most of them do not successfully incorporate both geographical influence and temporal effect together into latent factor models. To this end, in this paper, we propose a new Spatial-Temporal Probabilistic Matrix Factorization (STPMF) model that models a user's preference for POI as the combination of his geographical preference and other general interest in POI. Furthermore, in addition to static general interest of user, we capture the temporal dynamics of user's interest as well by modeling checkin data in a unique way. To evaluate the proposed STPMF model, we conduct extensive experiments with many state-of-the-art baseline methods and evaluation metrics on two real-world data sets. The experimental results clearly demonstrate the effectiveness of our proposed STPMF model.

【Keywords】:

16. Top-N Recommendation with Novel Rank Approximation.

Paper Link】 【Pages】:126-134

【Authors】: Zhao Kang ; Qiang Cheng

【Abstract】: The importance of accurate recommender systems has been widely recognized by academia and industry. However, the recommendation quality is still rather low. Recently, a linear sparse and low-rank representation of the user-item matrix has been applied to produce Top-N recommendations. This approach uses the nuclear norm as a convex relaxation for the rank function and has achieved better recommendation accuracy than the state-of-the-art methods. In the past several years, solving rank minimization problems by leveraging nonconvex relaxations has received increasing attention. Some empirical results demonstrate that it can provide a better approximation to original problems than convex relaxation. In this paper, we propose a novel rank approximation to enhance the performance of Top-N recommendation systems, where the approximation error is controllable. Experimental results on real data show that the proposed rank approximation improves the Top-N recommendation accuracy substantially.

【Keywords】:

17. Vocal Competence Based Karaoke Recommendation: A Maximum-Margin Joint Model.

Paper Link】 【Pages】:135-143

【Authors】: Chu Guan ; Yanjie Fu ; Xinjiang Lu ; Hui Xiong ; Enhong Chen ; Yingling Liu

【Abstract】: In online karaoke, the decision process in choosing a song is different from that in music radio, because users usually prefer songs that meet their vocal competence besides their tastes. Traditional music recommendation methods typically model users' personalized preference for songs in terms of content and style. However, this can be improved by considering the degree of matching the vocal competence (e.g. pitch, volume, and rhythm) of users to the vocal requirements of songs. To this end, in this paper, we develop a karaoke recommender system by incorporating vocal competence. Along this line, we propose a joint modeling method named CBNTF by exploiting the mutual enhancement between non-negative tensor factorization (NTF) and support vector machine (SVM). Specifically, we first extract vocal (i.e., pitch, volume, and rhythm) ratings of a user for a song from his/her singing records. Since these vocal ratings encode users' vocal competence from three aspects, we treat these vocal ratings as a tensor, exploit an NTF method, and learn the latent features of users' vocal metrics. These factorized features are simultaneously fed into an SVM classifier and then we use the trained classifier to predict the overall rating of a user with respect to a song. In addition, we propose an enhanced objective function to exploit the mutual enhancement between NTF and SVM, and devise an effective method to solve this objective as a coupled least-squares optimization problem via a maximum margin framework. With the estimated model, we compute the similarity between users and songs in terms of pitch, volume and rhythm and recommend songs to users. Finally, we conduct extensive experiments with real-world online karaoke data. The results demonstrate the effectiveness of our method.

【Keywords】:

18. A Confidence-Based Approach for Balancing Fairness and Accuracy.

Paper Link】 【Pages】:144-152

【Authors】: Benjamin Fish ; Jeremy Kun ; Ádám Dániel Lelkes

【Abstract】: We study three classical machine learning algorithms in the context of algorithmic fairness: adaptive boosting, support vector machines, and logistic regression. Our goal is to maintain the high accuracy of these learning algorithms while reducing the degree to which they discriminate against individuals because of their membership in a protected group. Our first contribution is a method for achieving fairness by shifting the decision boundary for the protected group. The method is based on the theory of margins for boosting. Our method performs comparably to or outperforms previous algorithms in the fairness literature in terms of accuracy and low discrimination, while simultaneously allowing for a fast and transparent quantification of the trade-off between bias and error. Our second contribution addresses the shortcomings of the bias-error trade-off studied in most of the algorithmic fairness literature. We demonstrate that even hopelessly naive modifications of a biased algorithm, which cannot be reasonably said to be fair, can still achieve low bias and high accuracy. To help to distinguish between these naive algorithms and more sensible algorithms we propose a new measure of fairness, called resilience to random bias (RRB). We demonstrate that RRB distinguishes well between our naive and sensible fairness algorithms. RRB together with bias and accuracy provides a more complete picture of the fairness of an algorithm.

【Keywords】:

19. Differentially Private Significance Testing on Paired-Sample Data.

Paper Link】 【Pages】:153-161

【Authors】: Christine Task ; Chris Clifton

【Abstract】: Rigorous data mining results require measures of the statistical significance of the outcomes. The complexity of the data and models makes this a challenge; methods to protect privacy further complicate the issue. We demonstrate how to estimate statistical significance of results in the context of a social network analysis problem; the impact of the noise required to provide differential privacy is included in the significance measure. As a result, providing privacy does not complicate the use of the analysis. While demonstrated for social network analysis, the approach is general. The Wilcoxon signed-rank test used is appropriate for a wide variety of data with “before” and “after” measurements, and adapts well to differential privacy. We demonstrate on publicly available data with known privacy issues, showing that some apparently large differences are not significant, some small differences are, and that when the analysis is done using differential privacy, the same results can been achieved while protecting individual privacy.

【Keywords】:

20. A general framework to increase the robustness of model-based change point detection algorithms to outliers and noise.

Paper Link】 【Pages】:162-170

【Authors】: Xi C. Chen ; Yuanshun Yao ; Sichao Shi ; Snigdhansu Chatterjee ; Vipin Kumar ; James H. Faghmous

【Abstract】: The autonomous identification of time-steps where the behavior of a time-series significantly deviates from a predefined model, or time-series change point detection, is an active field of research with notable applications in finance, health, and advertising. One family of time-series change detection algorithms, referred to as “model-based methods”, although useful for many applications, performs poor when the data are noisy and have outliers. We introduce a new framework that enables existing model-based methods to be more robust to these data challenges. We demonstrate the effectiveness of our approach on remote sensing and mobile health data. Our method introduces two new concepts: (i) a random sampling procedure allows us to overcome outliers, and (ii) a matrix-based representation of anomaly scores provides a flexible and intuitive way to identify multiple types of changes and test their significance. We show that our method performs better than several baseline methods, including application-specific algorithms, and provide all data and open-source code.

【Keywords】:

21. LODES: Local Density Meets Spectral Outlier Detection.

Paper Link】 【Pages】:171-179

【Authors】: Saket Sathe ; Charu C. Aggarwal

【Abstract】: The problem of outlier detection has been widely studied in existing literature because of its numerous applications in fraud detection, medical diagnostics, fault detection, and intrusion detection. A large category of outlier analysis algorithms have been proposed, such as proximity-based methods and local density-based methods. These methods are effective in finding outliers distributed along linear manifolds. Spectral methods, however, are particularly well suited to finding outliers when the data is distributed along manifolds of arbitrary shape. In practice, the underlying manifolds may have varying density, as a result of which a direct use of spectral methods may not be effective. In this paper, we show how to combine spectral techniques with local density-based methods in order to discover interesting outliers. We present experimental results demonstrating the effectiveness of our approach with respect to well-known competing methods.

【Keywords】:

22. Routine Mining Based Anomaly Detection in Mobile Phone Data.

Paper Link】 【Pages】:180-188

【Authors】: Tian Qin ; Guojie Song ; Sizhen Du

【Abstract】: Previous works related to anomaly detection in mobile phone data rely heavily on manually selected features or statistics, which weakens the generalization of the model. In addition, traditional methods only allow anomaly to appear in certain fixed time interval with predefined duration. In this work, based on an unsupervised probabilistic topic model, we propose a Routine Mining Based Anomaly Detection (RMBAD) approach that learns the pattern of normal from naturally existing human routines, free of any manually extracted features. Taking a generative approach, the RMBAD model combines group anomaly detection and topic segmentation method, segmenting mobile subscriber's sequence of behaviors into explainable routines and report those unexplainable segments as anomalies simultaneously. Furthermore, The RMBAD model allows routines of different durations to coexist, thus achieving a more realistic modeling of human activity pattern, which ultimately leads to higher anomaly detection accuracy. Extensive experiments are conducted on both synthetic and real world mobile datasets, and the empirical results show that the RMBAD model can effectively discover hidden routines of human activity and identify those groups of behaviors that collectively appear anomalous.

【Keywords】:

23. A Scalable Approach for Outlier Detection in Edge Streams Using Sketch-based Approximations.

Paper Link】 【Pages】:189-197

【Authors】: Stephen Ranshous ; Steve Harenberg ; Kshitij Sharma ; Nagiza F. Samatova

【Abstract】: Dynamic graphs are a powerful way to model an evolving set of objects and their ongoing interactions. A broad spectrum of systems, such as information, communication, and social, are naturally represented by dynamic graphs. Outlier (or anomaly) detection in dynamic graphs can provide unique insights into the relationships of objects and identify novel or emerging relationships. To date, outlier detection in dynamic graphs has been studied in the context of graph streams, focusing on the analysis and comparison of entire graph objects. However, the volume and velocity of data are necessitating a transition from outlier detection in the context of graph streams to outlier detection in the context of edge streams–where the stream consists of individual graph edges instead of entire graph objects. In this paper, we propose the first approach for outlier detection in edge streams. We first describe a high-level model for outlier detection based on global and local structural properties of a stream. We then propose a novel application of the Count-Min sketch for approximating these properties, and prove probabilistic error bounds on our edge outlier scoring functions. Our sketch-based implementation provides a scalable solution, having constant time updates and constant space requirements. Experiments on synthetic and real-world datasets demonstrate our method's scalability, effectiveness for discovering outliers, and the effects of approximation.

【Keywords】:

24. R1STM: One-class Support Tensor Machine with Randomised Kernel.

Paper Link】 【Pages】:198-206

【Authors】: Sarah M. Erfani ; Mahsa Baktashmotlagh ; Sutharshan Rajasegarad ; Vinh Nguyen ; Christopher Leckie ; James Bailey ; Kotagiri Ramamohanarao

【Abstract】: Identifying unusual or anomalous patterns in an underlying dataset is an important but challenging task in many applications. The focus of the unsupervised anomaly detection literature has mostly been on vectorised data. However, many applications are more naturally described using higher-order tensor representations. Approaches that vectorise tensorial data can destroy the structural information encoded in the high-dimensional space, and lead to the problem of the curse of dimensionality. In this paper we present the first unsupervised tensorial anomaly detection method, along with a randomised version of our method. Our anomaly detection method, the One-class Support Tensor Machine (1STM), is a generalisation of conventional one-class Support Vector Machines to higher-order spaces. 1STM preserves the multiway structure of tensor data, while achieving significant improvement in accuracy and efficiency over conventional vectorised methods. We then leverage the theory of nonlinear random projections to propose the Randomised 1STM (R1STM). Our empirical analysis on several real and synthetic datasets shows that our R1STM algorithm delivers comparable or better accuracy to a state-of-the-art deep learning method and traditional kernelised approaches for anomaly detection, while being approximately 100 times faster in training and testing.

【Keywords】:

25. Scalable Anomaly Ranking of Attributed Neighborhoods.

Paper Link】 【Pages】:207-215

【Authors】: Bryan Perozzi ; Leman Akoglu

【Abstract】: Given a graph with node attributes, what neighborhoods are anomalous? To answer this question, one needs a quality score that utilizes both structure and attributes. Popular existing measures either quantify the structure only and ignore the attributes (e.g., conductance), or only consider the connectedness of the nodes inside the neighborhood and ignore the cross-edges at the boundary (e.g., density). In this work we propose normality, a new quality measure for attributed neighborhoods. Normality utilizes structure and attributes together to quantify both internal consistency and external separability. It exhibits two key advantages over other measures: (1) It allows many boundary-edges as long as they can be “exonerated”; i.e., either (i) are expected under a null model, and/or (ii) the boundary nodes do not exhibit the subset of attributes shared by the neighborhood members. Existing measures, in contrast, penalize boundary edges irrespectively. (2) Normality can be efficiently maximized to automatically infer the shared attribute subspace (and respective weights) that characterize a neighborhood. This efficient optimization allows us to process graphs with millions of attributes. We capitalize on our measure to present a novel approach for Anomaly Mining of Entity Neighborhoods (AMEN). Experiments on real-world attributed graphs illustrate the effectiveness of our measure at anomaly detection, outperforming popular approaches including conductance, density, OddBall, and SODA. In addition to anomaly detection, our qualitative analysis demonstrates the utility of normality as a powerful tool to contrast the correlation between structure and attributes across different graphs.

【Keywords】:

26. Linear and Kernel Classification: When to Use Which?

Paper Link】 【Pages】:216-224

【Authors】: Hsin-Yuan Huang ; Chih-Jen Lin

【Abstract】: Kernel methods are known to be a state-of-the-art classification technique. Nevertheless, the training and prediction cost is expensive for large data. On the other hand, linear classifiers can easily scale up, but are inferior to kernel classifiers in terms of predictability. Recent research has shown that for some data sets (e.g., document data), linear is as good as kernel classifiers. In such cases, the training of a kernel classifier is a waste of both time and memory. In this work, we investigate the important issue of efficiently and automatically deciding whether kernel classifiers perform strictly better than linear for a given data set. Our proposed method is based on cheaply constructing a classifier that exhibits nonlinearity and can be automatically trained. Then we make a decision by comparing the performance of our constructed classifier with the linear classifier. We propose two methods: the first one trains the degree-2 feature expansion by a linear-classification method, while the second dissects the feature space into several regions and trains a linear classifier for each region. The design considerations of our methods are very different from past works for speeding up the kernel training. They still aim at obtaining accuracy close to the kernel classifier, but ours would like to give a quick and accurate decision without worrying about accuracy. Empirically our methods can efficiently make correct indications for a wide variety of data sets. Our proposed process can thus be a useful component for automatic machine learning.

【Keywords】:

27. Pattern Aided Classification.

Paper Link】 【Pages】:225-233

【Authors】: Guozhu Dong ; Vahid Taslimitehrani

【Abstract】: This paper makes several contributions to research on classification. First, it introduces a new style of classifiers, namely pattern aided classifiers (PXC), each defined by several pattern and group-specific-classifier pairs. A PXC uses patterns as conditions and it applies a group-specific classifier only to data instances satisfying its associated pattern. Second, it introduces a new classification algorithm, called Contrast Pattern Aided Classification (CPXC), for learning accurate PXCs. Experiments over multiple benchmark datasets confirm that CPXC often builds significantly more accurate classifiers than traditional classification algorithms. Third, it introduces the technique of opportunity-guided boosting and the concept of conditional classifier ensembles, and it provides insight on why certain datasets are very challenging to traditional classification algorithms.

【Keywords】:

28. TerseSVM : A Scalable Approach for Learning Compact Models in Large-scale Classification.

Paper Link】 【Pages】:234-242

【Authors】: Rohit Babbar ; Krikamol Muandet ; Bernhard Schölkopf

【Abstract】: For large-scale multi-class classification problems, consisting of tens of thousand target categories, recent works have emphasized the need to store billions of parameters. For instance, the classical l2-norm regularization employed by a state-of-the-art method results in the model size of 17GB for a training set whose size is only 129MB. To the contrary, by using a mixed-norm regularization approach, we show that around 99.5% of the stored parameters is dispensable noise. Using this strategy, we can extract the information relevant for classification, which is constituted in remaining 0.5% of the parameters, and hence demonstrate drastic reduction in model sizes. Furthermore, the proposed method leads to improvement in generalization performance compared to state-of-the-art methods, especially for under-represented categories. Lastly, our method enjoys easy parallelization, and scales well to tens of thousand target categories.

【Keywords】:

29. Discriminative Training of Structured Dictionaries via Block Orthogonal Matching Pursuit.

Paper Link】 【Pages】:243-251

【Authors】: Wenling Shang ; Kihyuk Sohn ; Honglak Lee ; Anna Gilbert

【Abstract】: It is well established that high-level representations learned via sparse coding are effective for many machine learning applications such as denoising and classification. In addition to being reconstructive, sparse representations that are discriminative and invariant can further help with such applications. In order to achieve these desired properties, this paper proposes a new framework that discriminatively trains structured dictionaries via block orthogonal matching pursuit. Specifically, the dictionary atoms are assumed to be organized into blocks. Distinct classes correspond to distinct blocks of dictionary atoms; however, our algorithm can handle the case where multiple classes share blocks. We provide theoretical justification and empirical evaluation of our method.

【Keywords】:

30. A Unified View of Localized Kernel Learning.

Paper Link】 【Pages】:252-260

【Authors】: John Moeller ; Sarathkrishna Swaminathan ; Suresh Venkatasubramanian

【Abstract】: Multiple Kernel Learning, or MKL, extends (kernelized) SVM by attempting to learn not only a classifier/regressor but also the best kernel for the training task, usually from a combination of existing kernel functions. Most MKL methods seek the combined kernel that performs best over every training example, sacrificing performance in some areas to seek a global optimum. Localized kernel learning (LKL) overcomes this limitation by allowing the training algorithm to match a component kernel to the examples that can exploit it best. Several approaches to the localized kernel learning problem have been explored in the last several years. We unify many of these approaches under one simple system and design a new algorithm with improved performance. We also develop enhanced versions of existing algorithms, with an eye on scalability and performance.

【Keywords】:

31. Binary Classifier Calibration Using an Ensemble of Linear Trend Estimation.

Paper Link】 【Pages】:261-269

【Authors】: Mahdi Pakdaman Naeini ; Gregory F. Cooper

【Abstract】: Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called ensemble of linear trend estimation (ELiTE). ELiTE utilizes the recently proposed ℓ1 trend filtering signal approximation method [22] to find the mapping from uncalibrated classification scores to the calibrated probability estimates. ELiTE is designed to address the key limitations of the histogram binning-based calibration methods which are (1) the use of a piecewise constant form of the calibration mapping using bins, and (2) the assumption of independence of predicted probabilities for the instances that are located in different bins. The method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus, it can be applied with many existing classification models. We demonstrate the performance of ELiTE on real datasets for commonly used binary classification models. Experimental results show that the method outperforms several common binary-classifier calibration methods. In particular, ELiTE commonly performs statistically significantly better than the other methods, and never worse. Moreover, it is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is practically O(N log N) time, where N is the number of samples.

【Keywords】:

32. Power Simultaneous Spectral Data Embedding and Clustering.

Paper Link】 【Pages】:270-278

【Authors】: Kais Allab ; Lazhar Labiod ; Mohamed Nadif

【Abstract】: Spectral clustering methods use the Laplacian eigenvalues and eigenvectors to obtain a low-dimensional embedding that can be trivially clustered. For that purpose, spectral clustering is often based on a tandem approach where the two steps: affinity matrix eigendecomposition and k-means clustering, are performed separately. The potential flaw of such common practice is that the obtained relaxed continuous spectral solution can severely deviate from the true discrete clustering solution. Given the high computational cost of such spectral clustering methods, this paper provides the so-called PSDEC framework. PSDEC performs simultaneously the eigendecomposition of the affinity matrix and clustering tasks, and uses the Power method to speed up the unified process convergence. In PSDEC, the selected top eigenvectors of the Laplacian matrix can be of help in detecting a cluster structure of objects and providing simpler and more interpretable solutions. We show that by doing so, our method can learn low-dimensional representations that are better suited to clustering, outperforming not only spectral clustering algorithms but also some NMF variants.

【Keywords】:

33. Process Trace Clustering: A Heterogeneous Information Network Approach.

Paper Link】 【Pages】:279-287

【Authors】: Phuong Nguyen ; Aleksander Slominski ; Vinod Muthusamy ; Vatche Ishakian ; Klara Nahrstedt

【Abstract】: Process mining is the task of extracting information from event logs, such as ones generated from workflow management or enterprise resource planning systems, in order to discover models of the underlying processes, organizations, and products. As the event logs often contain a variety of process executions, the discovered models can be complex and difficult to comprehend. Trace clustering helps solve this problem by splitting the event logs into smaller subsets and applying process discovery algorithms on each subset, resulting in per-subset discovered processes that are less complex and more accurate. However, the state-of-the-art clustering techniques are limited: the similarity measures are not process-aware and they do not scale well to high-dimensional event logs. In this paper, we propose a conceptualization of process's event logs as a heterogeneous information network, in order to capture the rich semantic meaning, and thereby derive better process-specific features. In addition, we propose SeqPathSim, a meta path-based similarity measure that considers node sequences in the heterogeneous graph and results in better clustering. We also introduce a new dimension reduction method that combines event similarity with regularization by process model structure to deal with event logs of high dimensionality. The experimental results show that our proposed approach outperforms state-of-the-art trace clustering approaches in both accuracy and structural complexity metrics.

【Keywords】:

34. Lagrangian Constrained Clustering.

Paper Link】 【Pages】:288-296

【Authors】: Mohadeseh Ganji ; James Bailey ; Peter J. Stuckey

【Abstract】: Incorporating background knowledge in clustering problems has attracted wide interest. This knowledge can be represented as pairwise instance-level constraints. Existing techniques approach satisfaction of such constraints from a soft (discretionary) perspective, yet there exist scenarios for constrained clustering where satisfying as many constraints as possible. We present a new Lagrangian Constrained Clustering framework (LCC) for clustering in the presence of pairwise constraints which gives high priority to satisfying constraints. LCC is an iterative optimization procedure which incorporates dynamic penalties for violated constraints. Experiments show that LCC can outperform existing constrained clustering algorithms in scenarios which satisfying as many constraints as possible.

【Keywords】:

35. Fast Multiplier Methods to Optimize Non-exhaustive, Overlapping Clustering.

Paper Link】 【Pages】:297-305

【Authors】: Yangyang Hou ; Joyce Jiyoung Whang ; David F. Gleich ; Inderjit S. Dhillon

【Abstract】: Clustering is one of the most fundamental and important tasks in data mining. Traditional clustering algorithms, such as K-means, assign every data point to exactly one cluster. However, in real-world datasets, the clusters may overlap with each other. Furthermore, often, there are outliers that should not belong to any cluster. We recently proposed the NEO-K-Means (Non-Exhaustive, Overlapping K-Means) objective as a way to address both issues in an integrated fashion. Optimizing this discrete objective is NP-hard, and even though there is a convex relaxation of the objective, straightforward convex optimization approaches are too expensive for large datasets. A practical alternative is to use a low-rank factorization of the solution matrix in the convex formulation. The resulting optimization problem is non-convex, and we can locally optimize the objective function using an augmented Lagrangian method. In this paper, we consider two fast multiplier methods to accelerate the convergence of an augmented Lagrangian scheme: a proximal method of multipliers and an alternating direction method of multipliers (ADMM). For the proximal augmented Lagrangian or proximal method of multipliers, we show a convergence result for the non-convex case with bound-constrained subproblems. These methods are up to 13 times faster—with no change in quality—compared with a standard augmented Lagrangian method on problems with over 10,000 variables and bring runtimes down from over an hour to around 5 minutes.

【Keywords】:

36. Stochastic Co-clustering for Document-Term Data.

Paper Link】 【Pages】:306-314

【Authors】: Aghiles Salah ; Nicoleta Rogovschi ; Mohamed Nadif

【Abstract】: Co-clustering is more useful than one-sided clustering when dealing with high dimensional sparse data. We propose to address the aim of document clustering with a generative model-based co-clustering approach. To this end, we rely on a particular mixture of von Mises-Fisher distributions and propose a new parsimonious model allowing to reveal a block diagonal structure as well as a good partitioning of documents and terms. Then, by setting the estimate of the model parameters under the maximum likelihood (ML) approach, we derive three novel co-clustering algorithms: a soft one and two stochastic variants. Empirical results on numerous simulated and real-world datasets, demonstrate the advantages of our approach to model and co-cluster high dimensional sparse data.

【Keywords】:

37. On finding the maximum edge biclique in a bipartite graph: a subspace clustering approach.

Paper Link】 【Pages】:315-323

【Authors】: Eran Shaham ; Honghai Yu ; Xiao-Li Li

【Abstract】: Bipartite graphs have been proven useful in modeling a wide range of relationship networks. Finding the maximum edge biclique within a bipartite graph is a well-known problem in graph theory and data mining, with numerous real-world applications across different domains. We propose a probabilistic algorithm for finding the maximum edge biclique using a Monte Carlo subspace clustering approach. Extensive experimentation with both artificial and real-world datasets shows that the algorithm is significantly better than the state-of-the-art technique. We prove that there are solid theoretical reasons for the algorithm's efficacy that manifest in a polynomial complexity of time and space.

【Keywords】:

38. Geometric methods to accelerate k-means algorithms.

Paper Link】 【Pages】:324-332

【Authors】: Petr Rysavý ; Greg Hamerly

【Abstract】: The k-means algorithm is popular for data clustering applications. Most implementations use Lloyd's algorithm, which does many unnecessary distance calculations. Several accelerated algorithms (Elkan's, Hamerly's, heap, etc.) have recently been developed which produce exactly the same answer as Lloyd's, only faster. They avoid redundant work using the triangle inequality paired with a set of lower and upper bounds on point-centroid distances. In this paper we propose several novel methods that allow those accelerated algorithms to perform even better, giving up to eight times further speedup. Our methods give tighter lower bound updates, efficiently skip centroids that cannot possibly be close to a set of points, keep extra information about upper bounds to help the heap algorithm avoid more distance computations, and decrease the number of distance calculations that are done in the first iteration.

【Keywords】:

39. Pivot-based k-means Algorithm for Numerous-class Data Sets.

Paper Link】 【Pages】:333-341

【Authors】: Takashi Hattori ; Kazuo Aoyama ; Kazumi Saito ; Tetsuo Ikeda ; Eri Kobayashi

【Abstract】: This paper presents an accelerated k-means clustering algorithm suitable for a large-scale and numerous-class data set. The proposed iterative algorithm avoids unnecessary exact distance calculations, especially in the early and the last stage in its convergence process, and retains the same result as the standard algorithm. This is efficiently performed by two distinct components. One uses the lower bounds on exact distances between objects and centroids for the acceleration in the early stage. The lower bound is calculated by the triangle inequality with newly introduced fixed points called pivots. The other component for the last stage skips the distance calculation between an invariant centroid and an object satisfying the following condition. The cluster to which the object is assigned remains unchanged, i.e., its centroid is also invariant. For much further speed-up, we can incorporate an existing algorithm into the proposed algorithm as an extension, which often performs a complementary role in the middle stage. The experimental results we obtained on large-scale and high-dimensional image data sets demonstrate that given a large k value, the proposed algorithm outperforms existing algorithms in terms of both the reduction rate of distance calculations and elapsed time.

【Keywords】:

40. k-Means for Streaming and Distributed Big Sparse Data.

Paper Link】 【Pages】:342-350

【Authors】: Artem Barger ; Dan Feldman

【Abstract】: We provide the first streaming algorithm for computing a provable approximation to the k-means of sparse Big Data. Here, sparse Big Data is a stream of n vectors in ℝd, where each vector has O(1) non-zeroes entries and possibly d ≥ n. E.g., adjacency matrix of a graph, web-links, social network, document-terms, or image-features matrices. Our streaming algorithm stores at most log n · kO(1) input points in memory. If the stream is distributed among M machines, the running time reduces by a factor of M, while communicating a total of M · kO(1) (sparse) input points between the machines. Our main contribution is a deterministic algorithm for computing a sparse (k, ∊)-coreset, which is a weighted subset of kO(1) input points that approximates the sum of squared distances from the n input points to every set of k centers, up to (1 ± ∊) factor, for any given constant ε > 0. This is the first such coreset of size independent of both d and n. Our experimental results show how our algorithm can bs used to boost the performance of any given k-means heuristics, even in the off-line setting. Open access to our implementation is also provided.

【Keywords】:

41. Haliteds: Fast and Scalable Subspace Clustering for Multidimensional Data Streams.

Paper Link】 【Pages】:351-359

【Authors】: Afonso Expedito Da Silva ; Lucas L. Sanches ; Antonio C. Fraideinberze ; Robson L. F. Cordeiro

【Abstract】: Given a data stream with many attributes and high frequency of events, how to cluster similar events? Can it be done in real time? For example, how to cluster decades of frequent measurements of tens of climatic attributes to aid real time alert systems in forecasting extreme climatic events, such as floods and hurricanes? The task of clustering data with many attributes is known as subspace clustering. Today, there exists a need for algorithms of this type well-suited to process multidimensional data streams, for which real time processing is highly desirable. This paper proposes the new algorithm Haliteds – a fast, scalable and highly accurate subspace clustering algorithm for multidimensional data streams. It improves upon an existing technique that was originally designed to process static (not streams) data. Our main contributions are: (1) Analysis of Data Streams: the new algorithm takes advantage of the knowledge obtained from clustering past data to easy clustering data in the present. This fact allows our Haliteds to be considerably faster than its base algorithm, yet obtaining the same accuracy of results; (2) Real Time Processing: as opposed to the state-of-the-art, Haliteds is fast and scalable, making it feasible to analyze streams with many attributes and high frequency of events in real time; (3) Experiments: we ran experiments using synthetic data and a real multidimensional stream with almost one century of climatic data. Our Haliteds was up to 217 times faster than 5 representative works, i.e., its base algorithm plus 4 others from the state-of-the-art, always presenting highly accurate results.

【Keywords】:

42. Online Clustering of Multivariate Time-series.

Paper Link】 【Pages】:360-368

【Authors】: Masud Moshtaghi ; Christopher Leckie ; James C. Bezdek

【Abstract】: The intrinsic nature of streaming data requires algorithms that are capable of fast data analysis to extract knowledge. Most current unsupervised data analysis techniques rely on the implementation of known batch techniques over a sliding window, which can hinder their utility for the analysis of evolving structure in applications involving large streams of data. This research presents a novel data clustering algorithm, which exploits the correlation between data points in time to cluster the data, while maintaining a set of decision boundaries to identify noisy or anomalous data. We illustrate the proposed algorithm for online clustering with numerical results on both real-life and simulated datasets, which demonstrate the efficiency and accuracy of our approach compared to existing methods.

【Keywords】:

43. Learning A Task-Specific Deep Architecture For Clustering.

Paper Link】 【Pages】:369-377

【Authors】: Zhangyang Wang ; Shiyu Chang ; Jiayu Zhou ; Meng Wang ; Thomas S. Huang

【Abstract】: While sparse coding-based clustering methods have shown to be successful, their bottlenecks in both efficiency and scalability limit the practical usage. In recent years, deep learning has been proved to be a highly effective, efficient and scalable feature learning tool. In this paper, we propose to emulate the sparse coding-based clustering pipeline in the context of deep learning, leading to a carefully crafted deep model benefiting from both. A feed-forward network structure, named TAGnet, is constructed based on a graph-regularized sparse coding algorithm. It is then trained with task-specific loss functions from end to end. We discover that connecting deep learning to sparse coding benefits not only the model performance, but also its initialization and interpretation. Moreover, by introducing auxiliary clustering tasks to the intermediate feature hierarchy, we formulate DTAGnet and obtain a further performance boost. Extensive experiments demonstrate that the proposed model gains remarkable margins over several state-of-the-art methods.

【Keywords】:

44. Kernelized Matrix Factorization for Collaborative Filtering.

Paper Link】 【Pages】:378-386

【Authors】: Xinyue Liu ; Charu Aggarwal ; Yu-Feng Li ; Xiangnan Kong ; Xinyuan Sun ; Saket Sathe

【Abstract】: Matrix factorization (MF) methods have shown great promise in collaborative filtering (CF). Conventional MF methods usually assume that the correlated data is distributed on a linear hyperplane, which is not always the case. Kernel methods are used widely in SVMs to classify linearly non-separable data, as well as in PCA to discover the non-linear embeddings of data. In this paper, we present a novel method to kernelize matrix factorization for collaborative filtering, which is equivalent to performing the low-rank matrix factorization in a possibly much higher dimensional space that is implicitly defined by the kernel function. Inspired by the success of multiple kernel learning (MKL) methods, we also explore the approach of learning multiple kernels from the rating matrix to further improve the accuracy of prediction. Since the right choice of kernel is usually unknown, our proposed multiple kernel matrix factorization method helps to select effective kernel functions from the candidates. Through extensive experiments on real-world datasets, we show that our proposed method captures the nonlinear correlations among data, which results in improved prediction accuracy compared to the state-of-art CF models.

【Keywords】:

45. Robust Unsupervised Feature Selection on Networked Data.

Paper Link】 【Pages】:387-395

【Authors】: Jundong Li ; Xia Hu ; Liang Wu ; Huan Liu

【Abstract】: Feature selection has shown its effectiveness to prepare high-dimensional data for many data mining and machine learning tasks. Traditional feature selection algorithms are mainly based on the assumption that data instances are independent and identically distributed. However, this assumption is invalid in networked data since instances are not only associated with high dimensional features but also inherently interconnected with each other. In addition, obtaining label information for networked data is time consuming and labor intensive. Without label information to direct feature selection, it is difficult to assess the feature relevance. In contrast to the scarce label information, link information in networks are abundant and could help select relevant features. However, most networked data has a lot of noisy links, resulting in the feature selection algorithms to be less effective. To address the above mentioned issues, we propose a robust unsupervised feature selection framework NetFS for networked data, which embeds the latent representation learning into feature selection. Therefore, content information is able to help mitigate the negative effects from noisy links in learning latent representations, while good latent representations in turn can contribute to extract more meaningful features. In other words, both phases could cooperate and boost each other. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework.

【Keywords】:

46. Euclidean Co-Embedding of Ordinal Data for Multi-Type Visualization.

Paper Link】 【Pages】:396-404

【Authors】: Dung D. Le ; Hady Wirawan Lauw

【Abstract】: Embedding deals with reducing the high-dimensional representation of data into a low-dimensional representation. Previous work mostly focuses on preserving similarities among objects. Here, not only do we explicitly recognize multiple types of objects, but we also focus on the ordinal relationships across types. Collaborative Ordinal Embedding or COE is based on generative modelling of ordinal triples. Experiments show that COE outperforms the baselines on objective metrics, revealing its capacity for information preservation for ordinal data.

【Keywords】:

Paper Link】 【Pages】:405-413

【Authors】: Morteza Haghir Chehreghani

【Abstract】: We study Minimax distance measures for K-nearest neighbor search and classification. Recently, the use of this distance measure is shown to improve the K-nearest neighbor classification results. We consider the computational aspects of this problem and propose an efficient and general-purpose algorithm for computing Minimax neighbors which requires a significantly lower runtime and is applicable with any arbitrary distance measure. We study the computational optimality of our approach and its connection to the Prim's algorithm, and then, generalize our analysis to computing one-to-all Minimax distances. In the following, we investigate in detail the edges selected by Minimax distances and thereby explore the ability of Minimax distances in detecting outlier objects. We evaluate the performance of our methods on a variety of real-world datasets, e.g. text documents and images.

【Keywords】:

48. Nonlinear Joint Unsupervised Feature Selection.

Paper Link】 【Pages】:414-422

【Authors】: Xiaokai Wei ; Bokai Cao ; Philip S. Yu

【Abstract】: In the era of big data, one is often confronted with the problem of high dimensional data for many machine learning or data mining tasks. Feature selection, as a dimension reduction technique, is useful for alleviating the curse of dimensionality while preserving interpretability. In this paper, we focus on unsupervised feature selection, as class labels are usually expensive to obtain. Unsupervised feature selection is typically more challenging than its supervised counterpart due to the lack of guidance from class labels. Recently, regression-based methods with L2,1 norms have gained much popularity as they are able to evaluate features jointly which, however, consider only linear correlations between features and pseudo-labels. In this paper, we propose a novel nonlinear joint unsupervised feature selection method based on kernel alignment. The aim is to find a succinct set of features that best aligns with the original features in the kernel space. It can evaluate features jointly in a nonlinear manner and provides a good ‘0/1’ approximation for the selection indicator vector. We formulate it as a constrained optimization problem and develop a Spectral Projected Gradient (SPG) method to solve the optimization problem. Experimental results on several real-world datasets demonstrate that our proposed method outperforms the state-of-the-art approaches significantly.

【Keywords】:

49. A Framework to Adjust Dependency Measure Estimates for Chance.

Paper Link】 【Pages】:423-431

【Authors】: Simone Romano ; Nguyen Xuan Vinh ; James Bailey ; Karin Verspoor

【Abstract】: Estimating the strength of dependency between two variables is fundamental for exploratory analysis and many other applications in data mining. For example: non-linear dependencies between two continuous variables can be explored with the Maximal Information Coefficient (MIC); and categorical variables that are dependent to the target class are selected using Gini gain in random forests. Nonetheless, because dependency measures are estimated on finite samples, the interpretability of their quantification and the accuracy when ranking dependencies become challenging. Dependency estimates are not equal to 0 when variables are independent, cannot be compared if computed on different sample size, and they are inflated by chance on variables with more categories. In this paper, we propose a framework to adjust dependency measure estimates on finite samples. Our adjustments, which are simple and applicable to any dependency measure, are helpful in improving interpretability when quantifying dependency and in improving accuracy on the task of ranking dependencies. In particular, we demonstrate that our approach enhances the interpretability of MIC when used as a proxy for the amount of noise between variables, and to gain accuracy when ranking variables during the splitting procedure in random forests.

【Keywords】:

50. Risk Prediction with Electronic Health Records: A Deep Learning Approach.

Paper Link】 【Pages】:432-440

【Authors】: Yu Cheng ; Fei Wang ; Ping Zhang ; Jianying Hu

【Abstract】: The recent years have witnessed a surge of interests in data analytics with patient Electronic Health Records (EHR). Data-driven healthcare, which aims at effective utilization of big medical data, representing the collective learning in treating hundreds of millions of patients, to provide the best and most personalized care, is believed to be one of the most promising directions for transforming healthcare. EHR is one of the major carriers for make this data-driven healthcare revolution successful. There are many challenges on working directly with EHR, such as temporality, sparsity, noisiness, bias, etc. Thus effective feature extraction, or phenotyping from patient EHRs is a key step before any further applications. In this paper, we propose a deep learning approach for phenotyping from patient EHRs. We first represent the EHRs for every patient as a temporal matrix with time on one dimension and event on the other dimension. Then we build a four-layer convolutional neural network model for extracting phenotypes and perform prediction. The first layer is composed of those EHR matrices. The second layer is a one-side convolution layer that can extract phenotypes from the first layer. The third layer is a max pooling layer introducing sparsity on the detected phenotypes, so that only those significant phenotypes will remain. The fourth layer is a fully connected softmax prediction layer. In order to incorporate the temporal smoothness of the patient EHR, we also investigated three different temporal fusion mechanisms in the model: early fusion, late fusion and slow fusion. Finally the proposed model is validated on a real world EHR data warehouse under the specific scenario of predictive modeling of chronic diseases.

【Keywords】:

51. Predicting the Popularity of News Articles.

Paper Link】 【Pages】:441-449

【Authors】: Yaser Keneshloo ; Shuguang Wang ; Eui-Hong Sam Han ; Naren Ramakrishnan

【Abstract】: Consuming news articles is an integral part of our daily lives and news agencies such as The Washington Post (WP) expend tremendous effort in providing high quality reading experiences for their readers. Journalists and editors are faced with the task of determining which articles will become popular so that they can efficiently allocate resources to support a better reading experience. The reasons behind the popularity of news articles are typically varied, and might involve contemporariness, writing quality, and other latent factors. In this paper, we cast the problem of popularity prediction problem as regression, engineer several classes of features (metadata, contextual or content-based, temporal, and social), and build models for forecasting popularity. The system presented here is deployed in a real setting at The Washington Post; we demonstrate that it is able to accurately predict article popularity with an R2 ≈ 0.8 using features harvested within 30 minutes of publication time.

【Keywords】:

52. Uncovering Latent Behaviors in Ant Colonies.

Paper Link】 【Pages】:450-458

【Authors】: Mohamed Kafsi ; Raphaël Braunschweig ; Danielle Mersch ; Matthias Grossglauser ; Laurent Keller ; Patrick Thiran

【Abstract】: Many biological systems exhibit collective behaviors that strengthen their adaptability to their environment, compared to more solitary species. Describing these behaviors is challenging yet necessary in order to understand these biological systems. We propose a probabilistic model that enables us to uncover the collective behaviors observed in a colony of ants. This model is based on the assumption that the behavior of an individual ant is a time-dependent mixture of latent behaviors that are specific to the whole colony. We apply this model to a large-scale dataset obtained by observing the mobility of nearly 1000 Camponotus fellah ants from six different colonies. Our results indicate that a colony typically exhibits three classes of behaviors, each characterized by a specific spatial distribution and a level of activity. Moreover, these spatial distributions, which are uncovered automatically by our model, match well with the ground truth as manually annotated by domain experts. We further explore the evolution of the behavior of individual ants and show that it is well captured by a second order Markov chain that encodes the fact that the future behavior of an ant depends not only on its current behavior but also on its preceding one.

【Keywords】:

53. The Impact of Community Safety on House Ranking.

Paper Link】 【Pages】:459-467

【Authors】: Zijun Yao ; Yanjie Fu ; Bin Liu ; Hui Xiong

【Abstract】: It is well recognized that community safety which affects people's right to live without fear of crime has considerable impacts on housing investments. Housing investors can make more informed decisions if they are fully aware of safety related factors. To this end, we develop a safety-aware house ranking method by incorporating community safety into house assessment. Specifically, we first propose a novel framework to infer community safety level by mining community crime evidences from rich spatio-temporal historical crime data. Then we develop a ranking model which fuses multiply community safety features to rank house value based on the degree of community safety. Finally, we conduct a comprehensive evaluation of the proposed method with real-world crime and house data. The experimental results show that the proposed method substantially outperforms the baseline methods for house ranking.

【Keywords】:

54. iPath: Forecasting the Pathway to Impact.

Paper Link】 【Pages】:468-476

【Authors】: Liangyue Li ; Hanghang Tong ; Jie Tang ; Wei Fan

【Abstract】: Forecasting the success of scientific work has been attracting extensive research attention in the recent years. It is often of key importance to foresee the pathway to impact for scholarly entities for (1) tracking research frontier, (2) invoking an early intervention and (3) proactively allocating research resources. Many recent progresses have been seen in modeling the long-term scientific impact for point prediction. However, challenges still remain when it comes to forecasting the impact pathway. In this paper, we propose a novel predictive model to collectively achieve a set of design objectives to address these challenges, including prediction consistency and parameter smoothness. Extensive empirical evaluations on real scholarly data validate the effectiveness of the proposed model.

【Keywords】:

55. Cost-Sensitive Batch Mode Active Learning: Designing Astronomical Observation by Optimizing Telescope Time and Telescope Choice.

Paper Link】 【Pages】:477-485

【Authors】: Xide Xia ; Pavlos Protopapas ; Finale Doshi-Velez

【Abstract】: Astronomers and telescope operators must make decisions about what to observe given limited telescope time. To optimize this decision-making process, we present a batch, cost-sensitive, active learning approach that exploits structure in the unlabeled dataset, accounts for label uncertainty, and minimizes annotation costs. We first cluster the unlabeled instances in feature space. We next introduce an uncertainty-reducing selection criterion that encourages the batch of selected instances to span multiple clusters, in addition to taking into account annotation cost. Finally, we extend this criterion to incorporate the fact that nearby astronomical objects may be observed at the same time. On two large astronomical data sets, our approach balances the trade-offs among FOV, aperture, and time cost and, therefore, helps astronomers design effective experiments.

【Keywords】:

56. A Fast Kernel for Attributed Graphs.

Paper Link】 【Pages】:486-494

【Authors】: Yu Su ; Fangqiu Han ; Richard E. Harang ; Xifeng Yan

【Abstract】: As a fundamental technique for graph analysis, graph kernels have been successfully applied to a wide range of problems. Unfortunately, the high computational complexity of existing graph kernels is limiting their further applications to larger-scale graph datasets. In this paper, we propose a fast graph kernel, the descriptor matching (DM) kernel, for graphs with both categorical and numerical attributes. The computation time of the DM kernel is linear with respect to graph size. On graphs with n nodes and m edges, the kernel computation for two graphs can be done in O(n+m) time. Although there are other linear-time graph kernels, most of them are restricted to graphs with only categorical attributes; their efficiency mainly comes from the sparseness of the feature space resulted from the mutually orthogonal categorical attributes. Extensive experiments on both synthetic and real-world graph datasets show promising performance of DM in both accuracy and efficiency: On graphs with both categorical and numerical attributes, DM is orders of magnitude faster than several state-of-the-art graph kernels, while being much more accurate than the only graph kernel that is more efficient.

【Keywords】:

57. BIRDNEST: Bayesian Inference for Ratings-Fraud Detection.

Paper Link】 【Pages】:495-503

【Authors】: Bryan Hooi ; Neil Shah ; Alex Beutel ; Stephan Günnemann ; Leman Akoglu ; Mohit Kumar ; Disha Makhija ; Christos Faloutsos

【Abstract】: Review fraud is a pervasive problem in online commerce, in which fraudulent sellers write or purchase fake reviews to manipulate perception of their products and services. Fake reviews are often detected based on several signs, including 1) they occur in short bursts of time; 2) fraudulent user accounts have skewed rating distributions. However, these may both be true in any given dataset. Hence, in this paper, we propose an approach for detecting fraudulent reviews which combines these 2 approaches in a principled manner, allowing successful detection even when one of these signs is not present. To combine these 2 approaches, we formulate our Bayesian Inference for Rating Data (BIRD) model, a flexible Bayesian model of user rating behavior. Based on our model we formulate a likelihood-based suspiciousness metric, Normalized Expected Surprise Total (NEST). We propose a linear-time algorithm for performing Bayesian inference using our model and computing the metric. Experiments on real data show that BIRDNEST successfully spots review fraud in large, real-world graphs: the 50 most suspicious users of the Flipkart platform flagged by our algorithm were investigated and all identified as fraudulent by domain experts at Flipkart.

【Keywords】:

58. Unstable Communities in Network Ensembles.

Paper Link】 【Pages】:504-512

【Authors】: Ahsanur Rahman ; Steve T. K. Jan ; Hyunju Kim ; B. Aditya Prakash ; T. M. Murali

【Abstract】: Ensembles of graphs arise in several natural applications. Many techniques exist to compute frequent, dense subgraphs in these ensembles. In contrast, in this paper, we propose to discover maximally variable regions of the graphs, i.e., sets of nodes that induce very different subgraphs across the ensemble. We first develop two intuitive and novel definitions of such node sets, which we then show can be efficiently enumerated using a level-wise algorithm. Finally, using extensive experiments on multiple real datasets, we show how these sets capture the main structural variations of the given set of networks and also provide us with interesting and relevant insights about these datasets.

【Keywords】:

59. CAMLP: Confidence-Aware Modulated Label Propagation.

Paper Link】 【Pages】:513-521

【Authors】: Yuto Yamaguchi ; Christos Faloutsos ; Hiroyuki Kitagawa

【Abstract】: How can we tell if Alice is a talkative person or a silent person? In this paper, we focus on the node classification problem on networked data such as social networks and the web. There are two open challenges with this problem: (1) we want to handle various kinds of label correlations in real-world networks such as homophily (i.e., love of the same) and heterophily (i.e., love of the different), and (2) we want to exploit the confidence of the inference results to enhance the accuracy. There is no algorithm that solves these two challenges at the same time. We tackle with these two challenges by proposing CAMLP, a novel node classification algorithm. Our contributions are three-fold: (a) Novel algorithm; our algorithm is confidence-aware and is applicable to both homophily and heterophily networks, (b) Theory; we give theoretical analyses of our algorithm, and (c) Practice; we perform extensive experiments on 5 different network datasets including homophily and heterophily networks. Our experiments show that the proposed algorithm improves the precision of major competitors not only on heterophily networks, but also on homophily networks.

【Keywords】:

60. Query-Driven Maximum Quasi-Clique Search.

Paper Link】 【Pages】:522-530

【Authors】: Pei Lee ; Laks V. S. Lakshmanan

【Abstract】: Quasi-cliques are an elegant way to model dense subgraphs, with each node adjacent to at least a fraction λ ∊ (0, 1] of other nodes in the subgraph. In this paper, we focus on a new graph mining problem, the query-driven maximum quasi-clique (QMQ) search, which aims to find the largest λ-quasi-clique containing a given query node set S. This problem has many applications and is proved to be NP-Hard and inapproximable. To solve the problem efficiently in practice, we propose the notion of core tree to organize dense subgraphs recursively, which reduces the search space and effectively helps find the solution within a few tree traversals. To optimize a solution to a better solution, we introduce three refinement operations: Add, Remove and Swap. We propose two iterative maximization algorithms, DIM and SUM, to approach QMQ by deterministic and stochastic means respectively. With extensive experiments on three real datasets, we demonstrate that our algorithms significantly outperform several baselines in running time and the quality.

【Keywords】:

61. Distributed Representations of Expertise.

Paper Link】 【Pages】:531-539

【Authors】: Fangqiu Han ; Shulong Tan ; Huan Sun ; Mudhakar Srivatsa ; Deng Cai ; Xifeng Yan

【Abstract】: Collaborative networks are common in real life, where domain experts work together to solve tasks issued by customers. How to model the proficiency of experts is critical for us to understand and optimize collaborative networks. Traditional expertise models, such as topic model based methods, cannot capture two aspects of human expertise simultaneously: Specialization (what area an expert is good at?) and Proficiency Level (to what degree?). In this paper, we propose new models to overcome this problem. We embed all historical task data in a lower dimension space and learn vector representations of expertise based on both solved and unsolved tasks. Specifically, in our first model, we assume that each expert will only handle tasks whose difficulty level just matches his/her proficiency level, while experts in the second model accept tasks whose levels are equal to or lower than his/her proficiency level. Experiments on real world datasets show that both models outperform topic model based approaches and standard classifiers such as logistic regression and support vector machine in terms of prediction accuracy. The learnt vector representations can be used to compare expertise in a large organization and optimize expert allocation.

【Keywords】:

62. Temporal Kernel Descriptors for Learning with Time-sensitive Patterns.

Paper Link】 【Pages】:540-548

【Authors】: Doyen Sahoo ; Abhishek Sharma ; Steven C. H. Hoi ; Peilin Zhao

【Abstract】: Detecting temporal patterns is one of the most prevalent challenges while mining data. Often, timestamps or information about when certain instances or events occurred can provide us with critical information to recognize temporal patterns. Unfortunately, most existing techniques are not able to fully extract useful temporal information based on the time (especially at different resolutions of time). They miss out on 3 crucial factors: (i) they do not distinguish between timestamp features (which have cyclical or periodic properties) and ordinary features; (ii) they are not able to detect patterns exhibited at different resolutions of time (e.g. different patterns at the annual level, and at the monthly level); and (iii) they are not able to relate different features (e.g. multimodal features) of instances with different temporal properties (e.g. while predicting stock prices, stock fundamentals may have annual patterns, and at the same time factors like peer stock prices and global markets may exhibit daily patterns). To solve these issues, we offer a novel multiple-kernel learning view and develop Temporal Kernel Descriptors which utilize Kernel functions to comprehensively detect temporal patterns by deriving relationship of instances with the time features. We automatically learn the optimal kernel function, and hence the optimal temporal similarity between two instances. We formulate the optimization as a Multiple Kernel Learning (MKL) problem. We empirically evaluate its performance by solving the optimization using Online MKL.

【Keywords】:

63. Modelling Recurrent Events for Improving Online Change Detection.

Paper Link】 【Pages】:549-557

【Authors】: Alexandr V. Maslov ; Mykola Pechenizkiy ; Indre Zliobaite ; Tommi Kärkkäinen

【Abstract】: The task of online change point detection in sensor data streams is often complicated due to presence of noise that can be mistaken for real changes and therefore affecting performance of change detectors. Most of the existing change detection methods assume that changes are independent from each other and occur at random in time. In this paper we study how performance of detectors can be improved in case of recurrent changes. We analytically demonstrate under which conditions and for how long recurrence information is useful for improving the detection accuracy. We propose a simple computationally efficient message passing procedure for calculating a predictive probability distribution of change occurrence in the future. We demonstrate two straightforward ways to apply the proposed procedure to existing change detection algorithms. Our experimental analysis illustrates the effectiveness of these approaches in improving the performance of a baseline online change detector by incorporating recurrence information.

【Keywords】:

64. MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance.

Paper Link】 【Pages】:558-566

【Authors】: Jingbo Shang ; Jian Peng ; Jiawei Han

【Abstract】: Consecutive pattern mining aiming at finding sequential patterns substrings, is a special case of frequent pattern mining and has been played a crucial role in many real world applications, especially in biological sequence analysis, time series analysis, and network log mining. Approximations, including insertions, deletions, and substitutions, between strings are widely used in biological sequence comparisons. However, most existing string pattern mining methods only consider hamming distance without insertions/deletions (indels). Little attention has been paid to the general approximate consecutive frequent pattern mining under edit distance, potentially due to the high computational complexity, particularly on DNA sequences with billions of base pairs. In this paper, we introduce an efficient solution to this problem. We first formulate the Maximal Approximate Consecutive Frequent Pattern Mining (MACFP) problem that identifies substring patterns under edit distance in a long query sequence. Then, we propose a novel algorithm with linear time complexity to check whether the support of a substring pattern is above a predefined threshold in the query sequence, thus greatly reducing the computational complexity of MACFP. With this fast decision algorithm, we can efficiently solve the original pattern discovery problem with several indexing and searching techniques. Comprehensive experiments on sequence pattern analysis and a study on cancer genomics application demonstrate the effectiveness and efficiency of our algorithm, compared to several existing methods.

【Keywords】:

65. DPClass: An Effective but Concise Discriminative Patterns-Based Classification Framework.

Paper Link】 【Pages】:567-575

【Authors】: Jingbo Shang ; Wenzhu Tong ; Jian Peng ; Jiawei Han

【Abstract】: Pattern-based classification was originally proposed to improve the accuracy using selected frequent patterns, where many efforts were paid to prune a huge number of non-discriminative frequent patterns. On the other hand, tree-based models have shown strong abilities on many classification tasks since they can easily build high-order interactions between different features and also handle both numerical and categorical features as well as high dimensional features. By taking the advantage of both modeling methodologies, we propose a natural and effective way to resolve pattern-based classification by adopting discriminative patterns which are the prefix paths from root to nodes in tree-based models (e.g., random forest). Moreover, we further compress the number of discriminative patterns by selecting the most effective pattern combinations that fit into a generalized linear model. As a result, our discriminative pattern-based classification framework (DPClass) could perform as good as previous state-of-the-art algorithms, provide great interpretability by utilizing only very limited number of discriminative patterns, and predict new data extremely fast. More specifically, in our experiments, DPClass could gain even better accuracy by only using top-20 discriminative patterns. The framework so generated is very concise and highly explanatory to human experts.

【Keywords】:

66. Fast Lossless Frequent Itemset Mining in Data Streams using Crucial Patterns.

Paper Link】 【Pages】:576-584

【Authors】: Ariyam Das ; Carlo Zaniolo

【Abstract】: We study the problem of mining exact frequent itemsets from data streams. Since the number of frequent patterns is often quite large, concise representations that save resources by avoiding redundancy are critical for an efficient lossless extraction of frequent patterns. In this paper, we introduce the novel concept of crucial patterns, and formally prove them to be an effective subset of closed frequent itemsets that assures lossless extraction. Extensive experiments confirm that crucial patterns provide a significantly better lossless compression for the frequent itemsets than other condensed representations. Lastly, we propose our new Crucial Pattern Mining (CPM) algorithm for data streams that includes the significant optimization strategies described in the paper. The performance study show that CPM consistently outperforms other state-of-the-art methods by orders of magnitude, on both synthetic and real-world data sets.

【Keywords】:

67. Flexibly Mining Better Subgroups.

Paper Link】 【Pages】:585-593

【Authors】: Hoang Vu Nguyen ; Jilles Vreeken

【Abstract】: In subgroup discovery, perhaps the most crucial task is to discover high-quality one-dimensional subgroups, and refinements of these. For nominal attributes, finding such binary features is relatively straightforward, as we can consider individual attribute values as such. For numerical attributes, the task is more challenging as individual numeric values are not reliable statistics. Instead, we can consider combinations of adjacent values, i.e. bins. Existing binning strategies, however, are not tailored for subgroup discovery. That is, the bins they construct do not necessarily facilitate the discovery of high-quality subgroups, therewith potentially degrading the mining result. To address this, we introduce FLEXI. In short, we propose to use an optimal binning strategy for finding high-quality binary features for both numeric and ordinal attributes. We instantiate FLEXI with various quality measures and show how to achieve efficiency accordingly. Experiments on both synthetic and real-world data sets show that FLEXI outperforms state of the art with up to 25 times improvement in subgroup quality.

【Keywords】:

68. Deterministic Column Sampling for Low-Rank Matrix Approximation: Nyström vs. Incomplete Cholesky Decomposition.

Paper Link】 【Pages】:594-602

【Authors】: Raajen Patel ; Tom Goldstein ; Eva L. Dyer ; Azalia Mirhoseini ; Richard G. Baraniuk

【Abstract】: Kernel matrices that encode the distance (or similarity) between data points are widely used throughout the computational sciences for classification, clustering, and dimensionality reduction. For large datasets, the cost of computing and factorizing such matrices becomes intractable. Thus instead of operating on the entire matrix, approximate methods such as the Nyström method and the incomplete Cholesky decomposition (ICD) generate a low rank matrix factorization using only a subset of the matrix columns (or rows). Here, we present an adaptive column sampling strategy for the Nyström method that we dub Accelerated Sequential Incoherence Selection (oASIS). This sampling strategy reveals a missing link between Nyström methods and ICD: we demonstrate that ICD is actually a special case of the Nyström method with the oASIS adaptive sampling rule. Numerical experiments and theoretical results suggest that oASIS achieves performance comparable to state-of-the-art greedy Nyström methods but with shorter runtimes and less memory consumption.

【Keywords】:

Paper Link】 【Pages】:603-611

【Authors】: Michael B. Hynes ; Hans De Sterck

【Abstract】: In large-scale unconstrained optimization algorithms such as limited memory BFGS (LBFGS), a common sub-problem is a line search minimizing the loss function along a descent direction. Commonly used line searches iteratively find an approximate solution for which the Wolfe conditions are satisfied, typically requiring multiple function and gradient evaluations per line search, which is expensive in parallel due to communication requirements. In this paper we propose a new line search approach for cases where the loss function is analytic, as in least squares regression, logistic regression, or low rank matrix factorization. We approximate the loss function by a truncated Taylor polynomial, whose coefficients may be computed efficiently in parallel with less communication than evaluating the gradient, after which this polynomial may be minimized with high accuracy in a neighbourhood of the expansion point. The expansion may be repeated iteratively in a line search invocation until the expansion point and minimum are sufficiently accurate. Our Polynomial Expansion Line Search (PELS) was implemented in the Apache Spark framework and used to accelerate the training of a logistic regression model on binary classification datasets from the LIBSVM repository with LBFGS and the Nonlinear Conjugate Gradient (NCG) method. In large-scale numerical experiments in parallel on a 16-node cluster with 256 cores using the url, kdd-a, and kdd-b datasets, the PELS approach produced significant convergence improvements compared to the use of classical Wolfe approximate line searches. For example, to reach the final training label prediction accuracies, LBFGS using PELS had speedup factors of 1.8–2 over LBFGS using a Wolfe approximate line search, measured by both the number of iterations and the time required, due to the better accuracy of step sizes computed in the line search. PELS has the potential to significantly accelerate widely-used parallel large-scale regression and factorization computations, and is applicable to important classes of continuous optimization problems with smooth loss functions.

【Keywords】:

70. Structured Regression on Multilayer Networks.

Paper Link】 【Pages】:612-620

【Authors】: Athanasia Polychronopoulou ; Zoran Obradovic

【Abstract】: Until recently, research in social network analysis was focused on single layer networks, where only one type of links among nodes is considered. This approach does not consider the variety of interactions that exist among nodes, resulting in the loss of a large amount of information. In the last few years there is an advanced interest in multilayer networks analysis, where multiple types of nodal connections are considered jointly. In most approaches however the contributions of the various interactions are averaged, resulting again in the loss of information. In this work we present a structured regression model for node attribute prediction in multilayer networks. Our Gaussian Conditional Random Fields model is designed to maximize the information gained from the use of data with multiple layers of graphical structure. Our model accommodates graphs with layers that share the same set of nodes allowing for missing nodes and unobserved connections. At the same time it models the evolution of such networks over time without requiring the addition of a new layer. We present evidence that this model outperforms the traditionally used one and that it offers predictive accuracy that increases as the number of layers used grows, on both synthetic data and challenging real world applications such as predicting citation count and sepsis hospitalization admission rate at all hospitals in California.

【Keywords】:

Paper Link】 【Pages】:621-629

【Authors】: Chenguang Wang ; Yizhou Sun ; Yanglei Song ; Jiawei Han ; Yangqiu Song ; Lidan Wang ; Ming Zhang

【Abstract】: Recent studies have demonstrated the power of modeling real world data as heterogeneous information networks (HINs) consisting of multiple types of entities and relations. Unfortunately, most of such studies (e.g., similarity search) confine discussions on the networks with only a few entity and relationship types, such as DBLP. In the real world, however, the network schema can be rather complex, such as Freebase. In such HINs with rich schema, it is often too much burden to ask users to provide explicit guidance in selecting relations for similarity search. In this paper, we study the problem of relation similarity search in schema-rich HINs. Under our problem setting, users are only asked to provide some simple relation instance examples (e.g., 〈Barack Obama, John Kerry〉 and 〈George W. Bush, Condoleezza Rice〉) as a query, and we automatically detect the latent semantic relation (LSR) implied by the query (e.g., “president vs. secretary-of-state”). Such LSR will help to find other similar relation instances (e.g., 〈Bill Clinton, Madeleine Albright〉). In order to solve the problem, we first define a new meta-path-based relation similarity measure, RelSim, to measure the similarity between relation instances in schema-rich HINs. Then given a query, we propose an optimization model to efficiently learn LSR implied in the query through linear programming, and perform fast relation similarity search using RelSim based on the learned LSR. The experiments on real world datasets derived from Freebase demonstrate the effectiveness and efficiency of our approach.

【Keywords】:

72. Collective Opinion Spam Detection using Active Inference.

Paper Link】 【Pages】:630-638

【Authors】: Shebuti Rayana ; Leman Akoglu

【Abstract】: Opinion spam has become a widespread problem in the online review world, where paid or biased reviewers write fake reviews to elevate or relegate a product (or business) to mislead the consumers for profit or fame. In recent years, opinion spam detection has attracted a lot of attention from both the business and research communities. However, the problem still remains challenging as human labeling is expensive and hence labeled data is scarce, which is needed for supervised learning and evaluation. There exist recent works (e.g., FraudEagle [2], SpEagle [19]) which address the spam detection problem as an unsupervised network inference task on the review network. These methods are also able to incorporate labels (if available), and have been shown to achieve improved performance under the semi-supervised inference setting, in which the labels of a random sample of nodes are consumed. In this work, we address the problem of active inference for opinion spam detection. Active inference is the process of carefully selecting a subset of instances (nodes) whose labels are obtained from an oracle to be used during the (network) inference. Our goal is to employ a label acquisition strategy that selects a given number of nodes (a.k.a. the budget) wisely, as opposed to randomly, so as to improve detection performance significantly over the random selection. Our key insight is to select nodes that (i) exhibit high uncertainty, (ii) reside in a dense region, and (iii) are close-by to other uncertain nodes in the network. Based on this insight, we design a utility measure, called Expected UnCertainty Reach (EUCR), and pick the node with the highest EUCR score at every step iteratively. Experiments on two large real-world datasets from Yelp.com show that our method significantly outperforms random sampling as well as other state-of-the-art active inference approaches.

【Keywords】:

73. Discovery of Precursors to Adverse Events using Time Series Data.

Paper Link】 【Pages】:639-647

【Authors】: Vijay Manikandan Janakiraman ; Bryan L. Matthews ; Nikunj C. Oza

【Abstract】: We develop an algorithm for automatic discovery of precursors in time series data (ADOPT). In a time series setting, a precursor may be considered as any event that precedes and increases the likelihood of an adverse event. In a multivariate time series data, there are exponential number of events which makes a brute force search intractable. ADOPT works by breaking down the problem into two steps - (1) inferring a model of the nominal time series (data without adverse event) by considering the nominal data to be generated by a hidden expert and (2) using the expert's model as a benchmark to evaluate the adverse time series to identify suboptimal events as precursors. For step (1), we use a Markov Decision Process (MDP) framework where value functions and Bellman's optimality are used to infer the expert's actions. For step (2), we define a precursor score to evaluate a given instant of a time series by comparing its utility with that of the expert. Thus, the search for precursors is transformed to a search for sub-optimal action sequences in ADOPT. As an application case study, we use ADOPT to discover precursors to go-around events in commercial flights using real aviation data.

【Keywords】:

74. Effective Crowd Expertise Modeling via Cross Domain Sparsity and Uncertainty Reduction.

Paper Link】 【Pages】:648-656

【Authors】: Sihong Xie ; Qingbo Hu ; Weixiang Shao ; Jingyuan Zhang ; Jing Gao ; Wei Fan ; Philip S. Yu

【Abstract】: Characterizations of crowd expertise is vital to online applications where the crowd plays a central role, such as StackExchange for question-answering and LinkedIn as a workforce market. With accurately estimated worker expertise, new jobs can be assigned to the right workers more effectively and efficiently. Most existing methods solely rely on the sparse worker-job interactions, leading to poorly estimated expertise that does not generalize well to a large amount of unseen jobs. Though transfer learning can utilize external domains to mitigate the sparsity, the auxiliary domains can themselves suffer from incomplete information, leading to inferior performance. There is a lack of principled framework to handle the sparse and incomplete data to achieve better expertise modeling. Based on multitask learning, we propose a framework that uses the knowledge learned from one domain to gradually resolve the data sparsity or incompleteness problem in the other alternatively. Experimental results on several question-answering datasets demonstrate the effectiveness and convergence of the iterative framework.

【Keywords】:

75. GSpartan: a Geospatio-Temporal Multi-task Learning Framework for Multi-location Prediction.

Paper Link】 【Pages】:657-665

【Authors】: Jianpeng Xu ; Pang-Ning Tan ; Lifeng Luo ; Jiayu Zhou

【Abstract】: This paper presents a novel geospatio-temporal prediction framework called GSpartan to simultaneously build local regression models at multiple locations. The framework assumes that the local models share a common, low-rank representation, which makes them amenable to multi-task learning. GSpartan learns a set of base models to capture the spatio-temporal variabilities of the data and represents each local model as a linear combination of the base models. A graph Laplacian regularization is used to enforce constraints on the local models based on their spatial autocorrelation. We also introduce sparsity-inducing norms to perform feature selection for the base models and model selection for the local models. Experimental results using historical climate data from 37 weather stations showed that, on average, GSpartan outperforms single-task learning and other existing multi-task learning methods in more than 65% of the stations, which increases to 81% when there are fewer training examples.

【Keywords】:

76. Learning Correlative and Personalized Structure for Online Multi-Task Classification.

Paper Link】 【Pages】:666-674

【Authors】: Peng Yang ; Guangxia Li ; Peilin Zhao ; Xiaoli Li ; Sujatha Das Gollapalli

【Abstract】: Multi-Task Learning (MTL) can enhance the classifier's generalization performance by learning multiple related tasks simultaneously. Conventional MTL works under the offline or batch learning setting and suffers from the expensive training cost together with the poor scalability. To address such inefficiency issues, online learning technique has been applied to solve MTL problems. However, most existing algorithms for online MTL constrain task relatedness into a presumed structure via a single weight matrix, a strict restriction that does not always hold in practice. In this paper, we propose a general online MTL framework that overcomes this restriction by decomposing the weight matrix into two components: the first component captures the correlative structure among tasks in a low-rank subspace, and the second component identifies the personalized patterns for the outlier tasks. A projected gradient scheme is devised to learn such components adaptively. Theoretical analysis shows that the proposed algorithm can achieve a sub-linear regret with respect to the best linear model in hindsight. Experimental investigation on a number of real-world datasets also verifies the efficacy of our approach.

【Keywords】:

77. Online Sparse Passive Aggressive Learning with Kernels.

Paper Link】 【Pages】:675-683

【Authors】: Jing Lu ; Peilin Zhao ; Steven C. H. Hoi

【Abstract】: Conventional online kernel methods often yield an unbounded large number of support vectors, making them inefficient and non-scalable for large-scale applications. Recent studies on bounded kernel-based online learning have attempted to overcome this shortcoming. Although they can bound the number of support vectors at each iteration, most of them fail to bound the number of support vectors for the final output solution which is often obtained by averaging the series of solutions over all the iterations. In this paper, we propose a novel kernel-based online learning method, Sparse Passive Aggressive learning (SPA), which can output a final solution with a bounded number of support vectors. The key idea of our method is to explore an efficient stochastic sampling strategy, which turns an example into a new support vector with some probability that depends on the loss suffered by the example. We theoretically prove that the proposed SPA algorithm achieves an optimal regret bound in expectation, and empirically show that the new algorithm outperforms various bounded kernel-based online learning algorithms.

【Keywords】:

78. ADMM for Training Sparse Structural SVMs with Augmented ℓ1 Regularizers.

Paper Link】 【Pages】:684-692

【Authors】: Balamurugan Palanisamy ; Anusha Posinasetty ; Shirish Krishnaj Shevade

【Abstract】: Structural Support Vector Machine (Structural SVM) is a powerful tool for classification problems involving structured outputs. This paper proposes a fast Alternating Direction Method of Multipliers (ADMM) for structural SVM with augmented ℓ1 regularizers. The designed ADMM alternately solves a sequence of three problems, one of which uses a fast sequential dual optimization method [3] developed for training ℓ2 regularized structural SVM. The other two problems have easy-to-compute closed-form solutions. The algorithm is simple to implement and extensive empirical experiments show that the proposed ADMM is faster than several competing methods on a number of benchmark sequence labeling datasets. In addition to showing the convergence of the proposed ADMM, this paper is the first to prove the global non-asymptotic convergence of the sequential dual optimization method to solve a sub-problem of the ADMM algorithm.

【Keywords】:

79. Structural Orthogonal Procrustes Regression for Face Recognition with Pose Variations and Misalignment.

Paper Link】 【Pages】:693-701

【Authors】: Ying Tai ; Jian Yang ; Fanlong Zhang ; Yigong Zhang ; Lei Luo ; Jianjun Qian

【Abstract】: Regression based method is a hot topic in the face recognition community and has achieved interesting results when dealing with well-aligned frontal face images. However, most of the existing regression analysis based methods are sensitive to pose variations. In this paper, we firstly introduce the orthogonal Procrustes problem (OPP), which is simple but effective, as a model to handle pose variations in two-dimensional face images. OPP seeks an optimal transformation between two images to correct the pose from one to the other. We integrate OPP into the regression model and propose the structural orthogonal Procrustes regression (SOPR) using the nuclear norm constraint on the error term to keep image's structural information. Moreover, a subject-wise strategy is adopted to address the problem that the gallery images may span over different poses. The proposed model is solved by an efficient iteratively reweighted algorithm and experimental results on popular face databases demonstrate the effectiveness of our method.

【Keywords】:

80. Capricorn: An Algorithm for Subtropical Matrix Factorization.

Paper Link】 【Pages】:702-710

【Authors】: Sanjar Karaev ; Pauli Miettinen

【Abstract】: Max-times algebra, sometimes known as subtropical algebra, is a semi-ring over the nonnegative real numbers where the addition operation is the max function and the multiplication is the standard one. Factorizing a nonnegative matrix over the max-times algebra, instead of the standard (nonnegative) one, allows us to find structures and regularities that cannot be easily expressed in the standard algebra. In this paper we study the problem of decomposing matrices over the max-times algebra; in particular, we present our algorithm, Capricorn, to find such decompositions. Experimental evaluation shows that Capricorn finds the max-times structure from data reliably, and furthermore, also identifies areas where it cannot find any structure.

【Keywords】:

81. Automatic Unsupervised Tensor Mining with Quality Assessment.

Paper Link】 【Pages】:711-719

【Authors】: Evangelos E. Papalexakis

【Abstract】: Tensor decomposition has been very popular in unsupervised modelling and multi-aspect data mining. In an exploratory setting, where no labels or ground truth are available how can we automatically decide how many components to extract? How can we assess the quality of our results, so that a domain expert can factor this quality measure in the interpretation of our results? In this paper, we introduce AutoTen, a novel automatic unsupervised tensor mining algorithm with minimal user intervention, which leverages and improves upon heuristics that assess the result quality. We extensively evaluate AutoTen's performance on synthetic data, outperforming existing baselines on this very hard problem. Finally, we apply AUTOTEN to a variety of real datasets, providing insights and discoveries.

【Keywords】:

82. Rank Selection for Non-negative Matrix Factorization with Normalized Maximum Likelihood Coding.

Paper Link】 【Pages】:720-728

【Authors】: Yu Ito ; Shinichi Oeda ; Kenji Yamanishi

【Abstract】: Non-negative matrix factorization (NMF) is one of the most important technologies in data mining. This is the task of factorizing a matrix into the product of two non-negative low rank matrices. In most of works on NMF, the rank is predetermined in ad hoc. This paper addresses the issue of how we can select the best rank from given data. The problem is that the conventional statistical model selection criteria such as AIC, MDL etc. cannot straightforwardly be applied to this issue because the regularity conditions for the criteria are not fulfilled. We overcome this problem to propose a novel methodology for rank selection. The key ideas are to 1) use the technique of latent variable completion to make the model regular and 2) then to apply the normalized maximum likelihood coding to rank selection for the regular model. We further propose a novel method for rank change detection when rank changes over time. We demonstrate the effectiveness of our methods for rank selection and rank change detection through synthetic data and real data sets.

【Keywords】:

83. Sparse Hybrid Variational-Gibbs Algorithm for Latent Dirichlet Allocation.

Paper Link】 【Pages】:729-737

【Authors】: Ximing Li ; Jihong OuYang ; Xiaotang Zhou

【Abstract】: Topic modeling algorithms such as the latent Dirichlet allocation (LDA) play an important role in machine learning research. Fitting LDA using Gibbs sampler-related algorithms involves a sampling process over K topics. We can use the sparsity in LDA to accelerate this expensive topic sampling process even for very large K values. However, LDA gradually loses sparsity as the number of documents increases. Motivated by the goal of fast LDA inference with large numbers of both topics and documents, in this paper we propose the novel sparse hybrid variational-Gibbs (SHVG) algorithm. The SHVG algorithm divides the topic sampling probability into a sparse term that scales linearly with the number of per-document instantiated topics Kd, and a dense term that uses the Alias method to reduce the time cost to constant O(1) time. This will lead to a significant improvement on efficiency. Using stochastic optimization techniques, we further develop an online version of SHVG for streaming documents. Experimental results on corpora with a wide range of sizes demonstrate the efficiency and effectiveness of the proposed SHVG algorithm.

【Keywords】:

84. Scaling Lifted Probabilistic Inference and Learning Via Graph Databases.

Paper Link】 【Pages】:738-746

【Authors】: Mayukh Das ; Yuqing Wu ; Tushar Khot ; Kristian Kersting ; Sriraam Natarajan

【Abstract】: Over the past decade, exploiting relations and symmetries within probabilistic models has been proven to be surprisingly effective at solving large scale data mining problems. One of the key operations inside these lifted approaches is counting - be it for parameter/structure learning or for efficient inference. Typically, however, they just count exploiting the logical structure using adhoc operators. This paper investigates whether ‘Compilation to Graph Databases’ could be a practical technique for scaling lifted probabilistic inference and learning methods. We demonstrate that the proposed approach achieves reasonable speed-ups for both inference and learning, without sacrificing performance.

【Keywords】:

85. Estimating Posterior Ratio for Classification: Transfer Learning from Probabilistic Perspective.

Paper Link】 【Pages】:747-755

【Authors】: Song Liu ; Kenji Fukumizu

【Abstract】: Transfer learning assumes classifiers of similar tasks share certain parameter structures. Unfortunately, modern classifiers use sophisticated feature representations with huge parameter spaces which lead to costly transfer. Under the impression that changes from one classifier to another should be “simple”, an efficient transfer learning criterion that only learns the “differences” is proposed in this paper. We train a posterior ratio which turns out to minimize the upper-bound of the target learning risk. The model of posterior ratio does not have to share the same parameter space with the source classifier at all so it can be easily modelled and efficiently trained. The resulting classifier therefore is obtained by simply multiplying the existing probabilistic-classifier with the learned posterior ratio.

【Keywords】:

86. Constrained Group Testing to Predict Binding Response of Candidate Compounds.

Paper Link】 【Pages】:756-764

【Authors】: Paul Quint ; Stephen D. Scott ; N. V. Vinodchandran ; Brad Worley

【Abstract】: We study the problem of identifying reactive compound(s) in a solution as efficiently as possible, with the goal of minimizing the number of chemical tests required to make exact identification. The area of group testing is appropriate for this problem, except that most group testing approaches assume that arbitrary tests can be performed, which is not the case in our application. To address this, we introduce a new model called mask-based constrained group testing, develop a randomized algorithm for it, and prove that under the right conditions, the algorithm is guaranteed w.h.p. to efficiently identify the active compounds of a solution with a small number of tests. We also show that our algorithm performs very well empirically on synthetic and real data.

【Keywords】:

87. Regularized Parametric Regression for High-dimensional Survival Analysis.

Paper Link】 【Pages】:765-773

【Authors】: Yan Li ; Kevin S. Xu ; Chandan K. Reddy

【Abstract】: Survival analysis aims to predict the occurrence of specific events of interest at future time points. The presence of incomplete observations due to censoring brings unique challenges in this domain and differentiates survival analysis techniques from other standard regression methods. In many applications where the distribution of the survival times can be explicitly modeled, parametric survival regression is a better alternative to the commonly used Cox proportional hazards model for this problem of censored regression. However, parametric survival regression suffers from model overfitting in high-dimensional scenarios. In this paper, we propose a unified model for regularized parametric survival regression for an arbitrary survival distribution. We employ a generalized linear model to approximate the negative log-likelihood and use the elastic net as a sparsity-inducing penalty to effectively deal with high-dimensional data. The proposed model is then formulated as a penalized iteratively reweighted least squares and solved using a cyclical coordinate descent-based method. We demonstrate the performance of our proposed model on various high-dimensional real-world microarray gene expression benchmark datasets. Our experimental results indicate that the proposed model produces more accurate estimates compared to the other competing state-of-the-art methods.

【Keywords】:

88. Copula-HDP-HMM: Non-parametric Modeling of Temporal Multivariate Data for I/O Efficient Bulk Cache Preloading.

Paper Link】 【Pages】:774-782

【Authors】: Lavanya Sita Tekumalla ; Chiranjib Bhattacharyya

【Abstract】: Caching is an important determinant of storage system performance. Bulk cache preloading is the process of preloading large batches of relevant data into cache, minutes or hours in advance of actual requests by the application. We address bulk preloading by analyzing high-level spatio-temporal motifs from raw and noisy I/O traces by aggregating the trace into a temporal sequence of correlated count vectors. Such temporal multivariate data from trace aggregation arise from a diverse set of workloads leading to diverse data distributions with complex spatio-temporal dependencies. Motivated by this, we propose the Copula-HDP-HMM, a new Bayesian non-parametric modeling technique based on Gaussian Copula, suitable for temporal multivariate data with arbitrary marginals, avoiding limiting assumptions on the marginal distributions. We are not aware of prior work on copula based extensions of Bayesian non-parametric modeling algorithms for discrete data. Inference with copulas is hard when data is not continuous. We propose inference based on extended rank likelihood that circumvents specifying marginals, making our inference suitable for count data and even data with a combination of discrete and continuous marginals, enabling the use of Bayesian nonparametric modeling, for several data types, without assumptions on marginals. Finally, we propose HULK, a strategy for I/O efficient bulk cache preloading using our Copula-HDP-HMM model to leverage high-level spatio-temporal motifs in Block I/O traces. In experiments on benchmark traces, we show near perfect hitrate of 0.95 using HULK, a tremendous improvement over baseline using Multivariate Poisson, with only a fourth of I/O overhead.

【Keywords】:

89. On Skewed Multi-dimensional Distributions: the FusionRP Model, Algorithms, and Discoveries.

Paper Link】 【Pages】:783-791

【Authors】: Venkata Krishna Pillutla ; Zhanpeng Fang ; Pravallika Devineni ; Christos Faloutsos ; Danai Koutra ; Jie Tang

【Abstract】: How do we model and find outliers in Twitter data? Given the number of retweets of each person on a social network, what is their expected number of comments? Real-life data are often very skewed, exhibiting power-law-like behavior. For such skewed multidimensional discrete data, the existing models are not general enough to capture various realistic scenarios, and need to be discretized as they often model continuous quantities. We propose FusionRP, short for Fusion Restaurant Process, a simple and intuitive model for skewed multi-dimensional discrete distributions, such as number of retweets vs. comments in Twitter-like data. Our model is discrete by design, has provably asymptotic log-logistic sum of marginals, is general enough to capture varied relationships, and most importantly, fits real data very well. We give an effective and scalable maximum-likelihood based fitting approach that is linear in the number of unique observed values and the input dimension. We test FusionRP on a twitter-like social network with 2.2M users, a phone call network with 1.9M call records, game data with 45M users and Facebook data with 2.5M posts. Our results show that FusionRP significantly outperforms several alternative methods and can detect outliers, such as bot-like behaviors in the Facebook data.

【Keywords】:

90. Universal Dependency Analysis.

Paper Link】 【Pages】:792-800

【Authors】: Hoang Vu Nguyen ; Panagiotis Mandros ; Jilles Vreeken

【Abstract】: Most data is multi-dimensional. Discovering whether any subset of dimensions, or subspaces, shows dependence is a core task in data mining. To do so, we require a measure that quantifies how dependent a subspace is. For practical use, such a measure should be universal in the sense that it captures correlation in subspaces of any dimensionality and allows to meaningfully compare scores across different subspaces, regardless how many dimensions they have and what specific statistical properties their dimensions possess. Further, it would be nice if the measure can non-parametrically and efficiently capture both linear and non-linear correlations. In this paper, we propose UDS, a multivariate dependence measure that fulfils all of these desiderata. In short, we define UDS based on cumulative entropy and propose a principled normalisation scheme to bring its scores across different subspaces to the same domain, enabling universal dependence assessment. UDS is purely non-parametric as we make no assumption on data distributions nor types of correlation. To compute it on empirical data, we introduce an efficient and non-parametric method. Extensive experiments show that UDS outperforms state of the art.

【Keywords】:

91. High Dimensional Structured Estimation with Noisy Designs.

Paper Link】 【Pages】:801-809

【Authors】: Amir Asiaee T. ; Soumyadeep Chatterjee ; Arindam Banerjee

【Abstract】: Structured estimation methods, such as LASSO, have received considerable attention in recent years and substantial progress has been made in extending such methods to general norms and non-Gaussian design matrices. In real world problems, however, covariates are usually corrupted with noise and there have been efforts to generalize structured estimation method for noisy covariate setting. In this paper we first show that without any information about the noise in covariates, currently established techniques of bounding statistical error of estimation fail to provide consistency guarantees. However, when information about noise covariance is available or can be estimated, then we prove consistency guarantees for any norm regularizer, which is a more general result than the state of the art. Next, we investigate empirical performance of structured estimation, specifically LASSO, when covariates are noisy and empirically show that LASSO is not consistent or stable in the presence of additive noise. However, prediction performance improves quite substantially when the noise covariance is available for incorporating in the estimator.

【Keywords】:

92. Learning Linear Dynamical Systems from Multivariate Time Series: A Matrix Factorization Based Framework.

Paper Link】 【Pages】:810-818

【Authors】: Zitao Liu ; Milos Hauskrecht

【Abstract】: The linear dynamical system (LDS) model is arguably the most commonly used time series model for real-world engineering and financial applications due to its relative simplicity, mathematically predictable behavior, and the fact that exact inference and predictions for the model can be done efficiently. In this work, we propose a new generalized LDS framework, gLDS, for learning LDS models from a collection of multivariate time series (MTS) data based on matrix factorization, which is different from traditional EM learning and spectral learning algorithms. In gLDS, each MTS sequence is factorized as a product of a shared emission matrix and a sequence-specific (hidden) state dynamics, where an individual hidden state sequence is represented with the help of a shared transition matrix. One advantage of our generalized formulation is that various types of constraints can be easily incorporated into the learning process. Furthermore, we propose a novel temporal smoothing regularization approach for learning the LDS model, which stabilizes the model, its learning algorithm and predictions it makes. Experiments on several real-world MTS data show that (1) regular LDS models learned from gLDS are able to achieve better time series predictive performance than other LDS learning algorithms; (2) constraints can be directly integrated into the learning process to achieve special properties such as stability, low-rankness; and (3) the proposed temporal smoothing regularization encourages more stable and accurate predictions.

【Keywords】:

93. Spatio-Temporal Tensor Analysis for Whole-Brain fMRI Classification.

Paper Link】 【Pages】:819-827

【Authors】: Guixiang Ma ; Lifang He ; Chun-Ta Lu ; Philip S. Yu ; Linlin Shen ; Ann B. Ragin

【Abstract】: Owing to prominence as a research and diagnostic tool in human brain mapping, whole-brain fMRI image analysis has been the focus of intense investigation. Conventionally, input fMRI brain images are converted into vectors or matrices and adapted in kernel based classifiers. fMRI data, however, are inherently coupled with sophisticated spatio-temporal tensor structure (i.e., 3D space × time). Valuable structural information will be lost if the tensors are converted into vectors. Furthermore, time series fMRI data are noisy, involving time shift and low temporal resolution. To address these analytic challenges, more compact and discriminative representations for kernel modeling are needed. In this paper, we propose a novel spatio-temporal tensor kernel (STTK) approach for whole-brain fMRI image analysis. Specifically, we design a volumetric time series extraction approach to model the temporal data, and propose a spatio-temporal tensor based factorization for feature extraction. We further leverage the tensor structure to encode prior knowledge in the kernel. Extensive experiments using real-world datasets demonstrate that our proposed approach effectively boosts the fMRI classification performance in diverse brain disorders (i.e., Alzheimer's disease, ADHD and HIV).

【Keywords】:

94. Linear-time Detection of Non-linear Changes in Massively High Dimensional Time Series.

Paper Link】 【Pages】:828-836

【Authors】: Hoang Vu Nguyen ; Jilles Vreeken

【Abstract】: Change detection in multivariate time series has applications in many domains, including health care and network monitoring. A common approach to detect changes is to compare the divergence between the distributions of a reference window and a test window. When the number of dimensions is very large, however, such a naïve approach has both quality and efficiency issues: to ensure robustness the window size needs to be large, which not only leads to missed alarms but also increases runtime. To this end, we propose Light, a linear-time algorithm for robustly detecting non-linear changes in massively high dimensional time series. Importantly, Light provides high flexibility in choosing the window size, allowing the domain expert to fit the level of details required. To do such, we 1) perform scalable pca to reduce dimensionality, 2) perform scalable factorisation of the joint distribution, and 3) scalably compute divergences between these lower dimensional distributions. Extensive empirical evaluation on both synthetic and real-world data show that Light outperforms state of the art with up to 100% improvement in both quality and efficiency.

【Keywords】:

95. Speeding Up All-Pairwise Dynamic Time Warping Matrix Calculation.

Paper Link】 【Pages】:837-845

【Authors】: Diego Furtado Silva ; Gustavo E. A. P. A. Batista

【Abstract】: Dynamic Time Warping (DTW) is certainly the most relevant distance for time series analysis. However, its quadratic time complexity may hamper its use, mainly in the analysis of large time series data. All the recent advances in speeding up the exact DTW calculation are confined to similarity search. However, there is a significant number of important algorithms including clustering and classification that require the pairwise distance matrix for all time series objects. The only techniques available to deal with this issue are constraint bands and DTW approximations. In this paper, we propose the first exact approach for speeding up the all-pairwise DTW matrix calculation. Our method is exact and may be applied in conjunction with constraint bands. We demonstrate that our algorithm reduces the runtime in approximately 50% on average and up to one order of magnitude in some datasets.

【Keywords】:

96. Joint Learning of Representation and Structure for Sparse Regression on Graphs.

Paper Link】 【Pages】:846-854

【Authors】: Chao Han ; Shanshan Zhang ; Mohamed F. Ghalwash ; Slobodan Vucetic ; Zoran Obradovic

【Abstract】: In many applications, including climate science, power systems, and remote sensing, multiple input variables are observed for each output variable and the output variables are dependent. Several methods have been proposed to improve prediction by learning the conditional distribution of the output variables. However, when the relationship between the raw features and the outputs is nonlinear, the existing methods cannot capture both the nonlinearity and the underlying structure well. In this study, we propose a structured model containing hidden variables, which are nonlinear functions of inputs and which are linearly related with the output variables. The parameters modeling the relationships between the input and hidden variables, between the hidden and output variables, as well as among the output variables are learned simultaneously. To demonstrate the effectiveness of our proposed method, we conducted extensive experiments on eight synthetic datasets and three real-world challenging datasets: forecasting wind power, forecasting solar energy, and forecasting precipitation over U.S. The proposed method was more accurate than state-of-the-art structured regression methods.

【Keywords】:

97. Infusing Geo-Recency Mixture Models for Effective Location Prediction in LBSN.

Paper Link】 【Pages】:855-863

【Authors】: Roland Assam ; Subramanyam Sathyanarayana ; Thomas Seidl

【Abstract】: An overwhelming volume of geo-social data is generated daily. This data could be analyzed and used to predict or expose valuable vivid patterns that would be imperative to society. However, prediction on sparse check-in data in Location-Based Social Networks (LBSN) is still challenging. In this paper, we propose a novel technique that utilizes matrix factorization to predict the top-k future locations for check-in data. Specifically, we introduce a new approach to capture the unique mobility behaviors of users by crafting Geo-Recency based Gaussian mixture models. Then, we determine users or friends with similar check-in preferences by computing the Cross Likelihood Ratio (CLR) similarity from each user's Geo-Recency Gaussian model. These CLR scores are leveraged during matrix factorization to learn and predict future locations. In addition, we present a new technique that employs these Geo-Recency Gaussian mixture models to effectively quantify social influence. We conduct numerous experiments on real datasets and compare our empirical results to that of state-of-the-art works. Our experiments reveal that our future location prediction technique performs better than that of state-of-the-art works.

【Keywords】:

98. Back Matter.

Paper Link】 【Pages】:864-867

【Authors】:

【Abstract】: Backmatter includes author index

【Keywords】: