17. ICDM 2017:New Orleans, LA, USA

2017 IEEE International Conference on Data Mining, ICDM 2017, New Orleans, LA, USA, November 18-21, 2017. IEEE Computer Society 【DBLP Link】

Paper Num: 155 || Session Num: 2

Regular Papers 72
Short Papers 83

Regular Papers 72

1. Split Miner: Discovering Accurate and Simple Business Process Models from Event Logs.

【Paper Link】【Pages】:1-10

【Authors】: Adriano Augusto ; Raffaele Conforti ; Marlon Dumas ; Marcello La Rosa

【Abstract】: The problem of automated discovery of process models from event logs has been intensively researched in the past two decades. Despite a rich field of proposals, state-of-the-art automated process discovery methods suffer from two recurrent deficiencies when applied to real-life logs: (i) they produce large and spaghetti-like models; and (ii) they produce models that either poorly fit the event log (low fitness) or highly generalize it (low precision). Striking a tradeoff between these quality dimensions in a robust and scalable manner has proved elusive. This paper presents an automated process discovery method that produces simple process models with low branching complexity and consistently high and balanced fitness, precision and generalization, while achieving execution times 2-6 times faster than state-of-the-art methods on a set of 12 real-life logs. Further, our approach guarantees deadlock-freedom for cyclic process models and soundness for acyclic. Our proposal combines a novel approach to filter the directly-follows graph induced by an event log, with an approach to identify combinations of split gateways that accurately capture the concurrency, conflict and causal relations between neighbors in the directly-follows graph.

【Keywords】: Automated Process Discovery; Process Mining; Directly-follows Graph; Event Log; BPMN

2. A Deep Transfer Learning Approach for Improved Post-Traumatic Stress Disorder Diagnosis.

【Paper Link】【Pages】:11-20

【Authors】: Debrup Banerjee ; Kazi Islam ; Gang Mei ; Lemin Xiao ; Guangfan Zhang ; Roger Xu ; Shuiwang Ji ; Jiang Li

【Abstract】: Post-traumatic stress disorder (PTSD) is a traumatic-stressor related disorder developed by exposure to a traumatic or adverse environmental event that caused serious harm or injury. Structured interview is the only widely accepted clinical practice for PTSD diagnosis but suffers from several limitations including the stigma associated with the disease. Diagnosis of PTSD patients by analyzing speech signals has been investigated as an alternative since recent years, where speech signals are processed to extract frequency features and these features are then fed into a classification model for PTSD diagnosis. In this paper, we developed a deep belief network (DBN) model combined with a transfer learning (TL) strategy for PTSD diagnosis. We computed three categories of speech features and utilized the DBN model to fuse these features. The TL strategy was utilized to transfer knowledge learned from a large speech recognition database, TIMIT, for PTSD detection where PTSD patient data is difficult to collect. We evaluated the proposed methods on two PTSD speech databases, each of which consists of audio recordings from 26 patients. We compared the proposed methods with other popular methods and showed that the state-of-the-art support vector machine (SVM) classifier only achieved an accuracy of 57.68%, and TL strategy boosted the performance of the DBN from 61.53% to 74.99%. Altogether, our method provides a pragmatic and promising tool for PTSD diagnosis.

【Keywords】: Post-traumatic stress disorder; Deep learning; Transfer learning; Deep belief network

3. Many Heads are Better than One: Local Community Detection by the Multi-walker Chain.

【Paper Link】【Pages】:21-30

【Authors】: Yuchen Bian ; Jingchao Ni ; Wei Cheng ; Xiang Zhang

【Abstract】: Local community detection (or local clustering) is of fundamental importance in large network analysis. Random walk based methods have been routinely used in this task. Most existing random walk methods are based on the single-walker model. However, without any guidance, a single-walker may not be adequate to effectively capture the local cluster. In this paper, we study a multi-walker chain (MWC) model, which allows multiple walkers to explore the network. Each walker is influenced (or pulled back) by all other walkers when deciding the next steps. This helps the walkers to stay as a group and within the cluster. We introduce two measures based on the mean and standard deviation of the visiting probabilities of the walkers. These measures not only can accurately identify the local cluster, but also help detect the cluster center and boundary, which cannot be achieved by the existing single-walker methods. We provide rigorous theoretical foundation for MWC, and devise efficient algorithms to compute it. Extensive experimental results on a variety of real-world networks demonstrate that MWC outperforms the state-of-the-art local community detection methods by a large margin.

【Keywords】: Local community detection; Multi-Walker Chain

4. Knowledge Guided Short-Text Classification for Healthcare Applications.

【Paper Link】【Pages】:31-40

【Authors】: Shilei Cao ; Buyue Qian ; Changchang Yin ; Xiaoyu Li ; Jishang Wei ; Qinghua Zheng ; Ian Davidson

【Abstract】: The need for short-text classification arises in many text mining applications particularly health care applications. In such applications shorter texts mean linguistic ambiguity limits the semantic expression, which in turns would make typical methods fail to capture the exact semantics of the scarce words. This is particularly true in health care domains when the text contains domain-specific or infrequently appearing words, whose embedding can not be easily learned due to the lack of training data. Deep neural network has shown great potentials in boost the performance of such problems according to its strength on representation capacity. In this paper, we propose a bidirectional long short-term memory (BI-LSTM) recurrent network to address the short-text classification problem that can be used in two settings. Firstly when a knowledge dictionary is available we adopt the well-known attention mechanism to guide the training of network using the domain knowledge in the dictionary. Secondly, to address the cases when domain knowledge dictionary is not available, we present a multi-task model to jointly learn the domain knowledge dictionary and do the text classification task simultaneously. We apply our method to a real-world interactive healthcare system and an extensively public available ATIS dataset. The results show that our model can positively grasp the key point of the text and significantly outperforms many state-of-the-art baselines.

【Keywords】: Dictionaries; Diseases; Neural networks; Valves; Semantics

5. A Generic Framework for Interesting Subspace Cluster Detection in Multi-attributed Networks.

【Paper Link】【Pages】:41-50

【Authors】: Feng Chen ; Baojian Zhou ; Adil Alim ; Liang Zhao

【Abstract】: Detection of interesting (e.g., coherent or anomalous) clusters has been studied extensively on plain or univariate networks, with various applications. Recently, algorithms have been extended to networks with multiple attributes for each node in the real-world. In a multi-attributed network, often, a cluster of nodes is only interesting for a subset (subspace) of attributes, andthis type of clusters is called subspace clusters. However, in the current literature, few methods are capable of detecting subspace clusters, which involves concurrent feature selection and network cluster detection. These relevant methods are mostly heuristic-driven and customized for specific application scenarios. In this work, we present a generic and theoretical framework for detection of interesting subspace clusters in large multi-attributed networks. Specifically, we propose a subspace graph-structured matching pursuit algorithm, namely, SG-Pursuit, to address a broad class of such problems for different scorefunctions (e.g., coherence or anomalous functions) and topology constraints (e.g., connected subgraphs and dense subgraphs). We prove that our algorithm 1) runs in nearly-linear time on the network size and the total number of attributes and 2) enjoys rigorous guarantees (geometrical convergence rate and tight error bound) analogous to those of the state-of-the-art algorithms for sparse feature selection problems and subgraph detection problems. As a case study, we specialize SG-Pursuit to optimizea number of well-known score functions for two typical tasks, including detection of coherent dense and anomalous connected subspace clusters in real-world networks. Empirical evidence demonstrates that our proposed generic algorithm SG-Pursuit is superior over state-of-the-art methods that are designed specifically for these two tasks.

【Keywords】: Subspace cluster detection; Dense subgraph mining; Anomalous pattern detection; Attributed networks

6. Revisiting Spectral Graph Clustering with Generative Community Models.

【Paper Link】【Pages】:51-60

【Authors】: Pin-Yu Chen ; Lingfei Wu

【Abstract】: The methodology of community detection can be divided into two principles: imposing a network model on a given graph, or optimizing a designed objective function. The former provides guarantees on theoretical detectability but falls short when the graph is inconsistent with the underlying model. The latter is model-free but fails to provide quality assurance for the detected communities. In this paper, we propose a novel unified framework to combine the advantages of these two principles. The presented method, SGC-GEN, not only considers the detection error caused by the corresponding model mismatch to a given graph, but also yields a theoretical guarantee on community detectability by analyzing Spectral Graph Clustering (SGC) under GENerative community models (GCMs). SGC-GEN incorporates the predictability on correct community detection with a measure of community fitness to GCMs. It resembles the formulation of supervised learning problems by enabling various community detection loss functions and model mismatch metrics. We further establish a theoretical condition for correct community detection using the normalized graph Laplacian matrix under a GCM, which provides a novel data-driven loss function for SGC-GEN. In addition, we present an effective algorithm to implement SGC-GEN, and show that the computational complexity of SGC-GEN is comparable to the baseline methods. Our experiments on 18 real-world datasets demonstrate that SGC-GEN possesses superior and robust performance compared to 6 baseline methods under 7 representative clustering metrics.

【Keywords】: community detection; generative models; spectral algorithms

7. Improving I/O Complexity of Triangle Enumeration.

【Paper Link】【Pages】:61-70

【Authors】: Yi Cui ; Di Xiao ; Daren B. H. Cline ; Dmitri Loguinov

【Abstract】: In the age of big data, many graph algorithms are now required to operate in external memory and deliver performance that does not significantly degrade with the scale of the problem. One particular area that frequently deals with graphs larger than RAM is triangle listing, where the algorithms must carefully piece together edges from multiple partitions to detect cycles. In recent literature, two competing proposals (i.e., Pagh and PCF) have emerged; however, neither one is universally better than the other. Since little is known about the I/O cost of PCF or how these methods compare to each other, we undertake an investigation into the properties of these algorithms, model their I/O cost, understand their shortcomings, and shed light on the conditions under which each method defeats the other. This insight leads us to develop a novel framework we call Trigon that surpasses the I/O performance of both previous techniques in all graphs and under all RAM conditions.

【Keywords】: graph mining; algorithms; external memory

8. TensorCast: Forecasting with Context Using Coupled Tensors (Best Paper Award).

【Paper Link】【Pages】:71-80

【Authors】: Miguel Ramos de Araujo ; Pedro Manuel Pinto Ribeiro ; Christos Faloutsos

【Abstract】: Given an heterogeneous social network, can we forecast its future? Can we predict who will start using a given hashtag on twitter? Can we leverage side information, such as who retweets or follows whom, to improve our membership forecasts? We present TensorCast, a novel method that forecasts time-evolving networks more accurately than current state of the art methods by incorporating multiple data sources in coupled tensors. TensorCast is (a) scalable, being linearithmic on the number of connections; (b) effective, achieving over 20% improved precision on top-1000 forecasts of community members; (c) general, being applicable to data sources with different structure. We run our method on multiple real-world networks, including DBLP and a Twitter temporal network with over 310 million non-zeros, where we predict the evolution of the activity of the use of political hashtags.

【Keywords】: Coupled Tensor Factorization; Tensor Forecasting; Link Prediction; Contextual Recommendation

9. Situation Aware Multi-task Learning for Traffic Prediction.

【Paper Link】【Pages】:81-90

【Authors】: Dingxiong Deng ; Cyrus Shahabi ; Ugur Demiryurek ; Linhong Zhu

【Abstract】: Due to the recent vast availability of transportation traffic data, major research efforts have been devoted to traffic prediction, which is useful in many applications such as urban planning, traffic management and navigations systems. Current prediction methods that independently train a model per traffic sensor cannot accurately predict traffic in every situation (e.g., rush hours, constructions and accidents) because there may not exist sufficient training samples per sensor for all situations. To address this shortcoming, our core idea is to explore the commonalities of prediction tasks across multiple sensors who behave similarly in a specific traffic situation. Instead of building a model independently per sensor, we propose a Multi-Task Learning (MTL) framework that aims to first automatically identify the traffic situations and then simultaneously build one forecasting model for similar-behaving sensors per traffic situation. The key innovation here is that instead of the straightforward application of MTL where each "task" corresponds to a sensor, we relate each MTL's "task" to a traffic situation. Specifically, we first identify these traffic situations by running clustering algorithms on all sensors' data. Subsequently, to enforce the commonalities under each identified situation, we use the group Lasso regularization in MTL to select a common set of features for the prediction tasks, and we adapt efficient FISTA algorithm with guaranteed convergence rate. We evaluated our methods with a large volume of real-world traffic sensor data; our results show that by incorporating traffic situations, our proposed MTL framework performs consistently better than naively applying MTL per sensor. Moreover, our holistic approach, under different traffic situations, outperforms all the best traffic prediction approaches for a given situation by up to 18% and 30% in short and long term predictions, respectively.

【Keywords】: traffic prediction; traffic situation clustering; situation-aware multi-task learning; long term traffic forecasting

10. Large Scale Kernel Methods for Online AUC Maximization.

【Paper Link】【Pages】:91-100

【Authors】: Yi Ding ; Chenghao Liu ; Peilin Zhao ; Steven C. H. Hoi

【Abstract】: Learning to optimize AUC performance for classifying label imbalanced data in online scenarios has been extensively studied in recent years. Most of the existing work has attempted to address the problem directly in the original feature space, which may not suitable for non-linearly separable datasets. To solve this issue, some kernel-based learning methods are proposed for non-linearly separable datasets. However, such kernel approaches have been shown to be inefficient and failed to scale well on large scale datasets in practice. Taking this cue, in this work, we explore the use of scalable kernel-based learning techniques as surrogates to existing approaches: random Fourier features and Nyström method, for tackling the problem and bring insights to the differences between the two methods based on their online performance. In contrast to the conventional kernel-based learning methods which suffer from high computational complexity of the kernel matrix, our proposed approaches elevate this issue with linear features that approximate the kernel function/matrix. Specifically, two different surrogate kernel-based learning models are presented for addressing the online AUC maximization task: (i) the Fourier Online AUC Maximization (FOAM) algorithm that samples the basis functions from a data-independent distribution to approximate the kernel functions; and (ii) the Nyström Online AUC Maximization (NOAM) algorithm that samples a subset of instances from the training data to approximate the kernel matrix by a low rank matrix. Another novelty of the present work is the proposed mini-batch Online Gradient Descent method for model updating to control the noise and reduce the variance of gradients. We provide theoretical analyses for the two proposed algorithms. Empirical studies on commonly used large scale datasets show that the proposed algorithms outperformed existing state-of-the-art methods in terms of both AUC performance and computational efficiency.

【Keywords】: Kernel; Computational modeling; Approximation algorithms; Machine learning algorithms; Support vector machines; Learning systems

11. A Hyperplane-Based Algorithm for Semi-Supervised Dimension Reduction.

【Paper Link】【Pages】:101-110

【Authors】: Huang Fang ; Minhao Cheng ; Cho-Jui Hsieh

【Abstract】: We consider the semi-supervised dimension reduction problem: given a high dimensional dataset with a small number of labeled data and huge number of unlabeled data, the goal is to find the low-dimensional embedding that yields good classification results. Most of the previous algorithms for this task are linkage-based algorithms. They try to enforce the must-link and cannot-link constraints in dimension reduction, leading to a nearest neighbor classifier in low dimensional space. In this paper, we propose a new hyperplane-based semi-supervised dimension reduction method-the main objective is to learn the low-dimensional features that can both approximate the original data and form a good separating hyperplane. We formulate this as a non-convex optimization problem and propose an efficient algorithm to solve it. The algorithm can scale to problems with millions of features and can easily incorporate non-negative constraints in order to learn interpretable non-negative features. Experiments on real world datasets demonstrate that our hyperplane-based dimension reduction method outperforms state-of-art linkage-based methods when very few labels are available.

【Keywords】: Semi-supervised learning; Dimension reduction; optimization

12. IterativE Grammar-Based Framework for Discovering Variable-Length Time Series Motifs.

【Paper Link】【Pages】:111-116

【Authors】: Yifeng Gao ; Jessica Lin ; Huzefa Rangwala

【Abstract】: In recent years, finding repetitive similar patterns in time series has become a popular problem. These patterns are called time series motifs. Recent studies show that using grammar compression algorithms to find repeating patterns from the symbolized time series holds promise in discovering approximate motifs with variable length. However, grammar compression algorithms are traditionally designed for string compression. Therefore, existing work on grammar induction has not fully utilized much available information that can be used to enhance the performance of the algorithms. In this work, an iterative framework based on grammar induction is proposed. In each iteration, a revision operator called Noise Reduction Operator is applied to revise the symbolized time series string based on the rules returned from a base grammar induction algorithm. In our experiments, we show that the proposed work can find motifs of the same quality, with much faster running time compared to the state-of-the-art variable-length exact motif discovery algorithm in real world time series data.

【Keywords】: Grammar; Approximation algorithms; Time series analysis; Algorithm design and analysis; Noise reduction; Data mining

13. Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels.

【Paper Link】【Pages】:117-126

【Authors】: Shaghayegh Gharghabi ; Yifei Ding ; Chin-Chia Michael Yeh ; Kaveh Kamgar ; Liudmila Ulanova ; Eamonn J. Keogh

【Abstract】: Unsupervised semantic segmentation in the time series domain is a much-studied problem due to its potential to detect unexpected regularities and regimes in poorly understood data. However, the current techniques have several shortcomings, which have limited the adoption of time series semantic segmentation beyond academic settings for three primary reasons. First, most methods require setting/learning many parameters and thus may have problems generalizing to novel situations. Second, most methods implicitly assume that all the data is segmentable, and have difficulty when that assumption is unwarranted. Finally, most research efforts have been confined to the batch case, but online segmentation is clearly more useful and actionable. To address these issues, we present an algorithm which is domain agnostic, has only one easily determined parameter, and can handle data streaming at a high rate. In this context, we test our algorithm on the largest and most diverse collection of time series datasets ever considered, and demonstrate our algorithm's superiority over current solutions. Furthermore, we are the first to show that semantic segmentation may be possible at superhuman performance levels.

【Keywords】: Time Series; Semantic Segmentation; Online Algorithms

14. Overlapping Community Detection via Constrained PARAFAC: A Divide and Conquer Approach.

【Paper Link】【Pages】:127-136

【Authors】: Fatemeh Sheikholeslami ; Georgios B. Giannakis

【Abstract】: The task of community detection over complex networks is of paramount importance in a multitude of applications. The present work puts forward a top-to-bottom community identification approach, termed DC-EgoTen, in which an egonet-tensor (EgoTen) based algorithm is developed in a divide-and-conquer (DC) fashion for breaking the network into smaller subgraphs, out of which the underlying communities progressively emerge. In particular, each step of DC-EgoTen forms a multi-dimensional egonet-based representation of the graph, whose induced structure enables casting the task of overlapping community identification as a constrained PARAFAC decomposition. Thanks to the higher representational capacity of tensors, the novel egonet-based representation improves the quality of detected communities by capturing multi-hop connectivity patterns of the network. In addition, the top-to-bottom approach ensures successive refinement of identified communities, so that the desired resolution is achieved. Synthetic as well as real-world tests corroborate the effectiveness of DC-EgoTen.

【Keywords】: Community detection; overlapping communities; egonet subgraphs; tensor decomposition; constrained PARAFAC

15. Scalable Algorithms for Locally Low-Rank Matrix Modeling.

【Paper Link】【Pages】:137-146

【Authors】: Qilong Gu ; Joshua D. Trzasko ; Arindam Banerjee

【Abstract】: We consider the problem of modeling data matrices with locally low rank (LLR) structure, a generalization of the popular low rank structure widely used in a variety of real world application domains ranging from medical imaging to recommendation systems. While LLR modeling has been found to be promising in real world application domains, limited progress has been made on the design of scalable algorithms for such structures. In this paper, we consider a convex relaxation of LLR structure, and propose an efficient algorithm based on dual projected gradient descent (D-PGD) for computing the proximal operator. While the original problem is non-smooth, so that primal (sub)gradient algorithms will be slow, we show that the proposed D-PGD algorithm has geometrical convergence rate. We present several practical ways to further speed up the computations, including acceleration and approximate SVD computations. With experiments on both synthetic and real data from MRI (magnetic resonance imaging) denoising, we illustrate the superior performance of the proposed D-PGD algorithm compared to several baselines.

【Keywords】: Convergence; Noise reduction; Computational modeling; Acceleration; Approximation algorithms; Algorithm design and analysis

16. A Self-Adaptive Sliding Window Based Topic Model for Non-uniform Texts.

【Paper Link】【Pages】:147-156

【Authors】: Jin He ; Lei Li ; Xindong Wu

【Abstract】: The contents generated from different data sources are usually non-uniform, such as long texts produced by news websites and short texts produced by social media. Uncovering topics over large-scale non-uniform texts becomes an important task for analyzing network data. However, the existing methods may fail to recognize the difference between long texts and short texts. To address this problem, we propose a novel topic modeling method for non-uniform text topic modeling referred to as self-adaptive sliding window based topic model (SSWTM). Specifically, in all kinds of texts, relevant words have a closer distance to each other than irrelevant words. Based on this assumption, SSWTM extracts relevant words by using a selfadaptive sliding window and models on the whole corpus. The self-adaptive sliding window can filter noisy information and change the size of a window according to different text contents. Experimental results on short texts from Twitter and long texts from Chinese news articles demonstrate that our method can discover more coherent topics for non-uniform texts compared with state-of-the-art methods.

【Keywords】: Adaptation models; Probabilistic logic; Analytical models; Solid modeling; Data mining; Computational modeling

17. Kernel Conditional Clustering.

【Paper Link】【Pages】:157-166

【Authors】: Xiao He ; Thomas Gumbsch ; Damian Roqueiro ; Karsten M. Borgwardt

【Abstract】: Clustering results are often affected by covariates that are independent of the clusters one would like to discover. Traditionally, Alternative Clustering algorithms can be used to solve such a problem. However, these suffer from at least one of the following problems: i) continuous covariates or non-linearly separable clusters cannot be handled; ii) assumptions are made about the distribution of the data; iii) one or more hyper-parameters need to be set. Here we propose a novel algorithm, named Kernel Conditional Clustering (KCC), whose objective is derived from a kernel based conditional dependence measure. KCC is parameter-light and makes no assumptions about the cluster structure, the covariates, or the distribution of the data. On both simulated and real-world datasets, the proposed KCC algorithm detects the ground truth cluster structures more accurately than state-of-the-art alternative clustering methods.

【Keywords】: Conditional Clustering; Conditional Dependence Measure; Alternative Clustering; Kernel

【Paper Link】【Pages】:167-176

【Authors】: Ji Hu ; Zidong Yang ; Yuanchao Shu ; Peng Cheng ; Jiming Chen

【Abstract】: Rapid development of bike-sharing systems has brought people enormous convenience during the past decade. On the other hand, high transport flexibility comes with dynamic distribution of shared bikes, leading to an unbalanced bike usage and growing maintenance cost. In this paper, we consider to rebalance bicycle utilization by means of directing users to different stations. For the first time, we devise a trip advisor that recommends bike check-in and check-out stations with joint consideration of service quality and bicycle utilization. From historical data, we firstly identify that biased bike usage is rooted from circumscribed bicycle circulation among few active stations. Therefore, with defined station activeness, we optimize the bike circulation by leading users to shift bikes between highly active stations and inactive ones. We extensively evaluate the performance of our design through real-world datasets. Evaluation results show that the percentage of frequent used bikes decreases by 33.6% on usage number and 28.6% on usage time.

【Keywords】: Bicycles; Maintenance engineering; Urban areas; Meters; Pollution; Legged locomotion; Probabilistic logic

【Paper Link】【Pages】:177-186

【Authors】: Tsuyoshi Idé ; Dzung T. Phan ; Jayant Kalagnanam

【Abstract】: This paper proposes a new framework for anomaly detection when collectively monitoring many complex systems. The prerequisite for condition-based monitoring in industrial applications is the capability of (1) capturing multiple operational states, (2) managing many similar but different assets, and (3) providing insights into the internal relationship of the variables. To meet these criteria, we propose a multi-task learning approach based on a sparse mixture of sparse Gaussian graphical models (GGMs). Unlike existing fused- and group-lasso-based approaches, each task is represented by a sparse mixture of sparse GGMs, and can handle multi-modalities. We develop a variational inference algorithm combined with a novel sparse mixture weight selection algorithm. To handle issues in the conventional automatic relevance determination (ARD) approach, we propose a new ℓ₀-regularized formulation that has guaranteed sparsity in mixture weights. We show that our framework eliminates well-known issues of numerical instability in the iterative procedure of mixture model learning. We also show better performance in anomaly detection tasks on real-world data sets. To the best of our knowledge, this is the first proposal of multi-task GGM learning allowing multi-modal distributions.

【Keywords】: Anomaly detection; Mathematical model; Monitoring; Bayes methods; Data models; Electronic mail

20. Exploratory Analysis of Graph Data by Leveraging Domain Knowledge.

【Paper Link】【Pages】:187-196

【Authors】: Di Jin ; Danai Koutra

【Abstract】: Given the soaring amount of data being generated daily, graph mining tasks are becoming increasingly challenging, leading to tremendous demand for summarization techniques. Feature selection is a representative approach that simplifies a dataset by choosing features that are relevant to a specific task, such as classification, prediction, and anomaly detection. Although it can be viewed as a way to summarize a graph in terms of a few features, it is not well-defined for exploratory analysis, and it operates on a set of observations jointly rather than conditionally (i.e., feature selection from many graphs vs. selection for an input graph conditioned on other graphs). In this work, we introduce EAGLE (Exploratory Analysis of Graphs with domain knowLEdge), a novel method that creates interpretable, feature-based, and domain-specific graph summaries in a fully automatic way. That is, the same graph in different domains-e.g., social science and neuroscience-will be described via different EAGLE summaries, which automatically leverage the domain knowledge and expectations. We propose an optimization formulation that seeks to find an interpretable summary with the most representative features for the input graph so that it is: diverse, concise, domain-specific, and efficient. Extensive experiments on synthetic and real-world datasets with up to ~1M edges and ~400 features demonstrate the effectiveness and efficiency of EAGLE and its benefits over existing methods. We also show how our method can be applied to various graph mining tasks, such as classification and exploratory analysis.

【Keywords】: Summarization; Domain knowledge; Feature selection

21. Efficiently Discovering Locally Exceptional Yet Globally Representative Subgroups.

【Paper Link】【Pages】:197-206

【Authors】: Janis Kalofolias ; Mario Boley ; Jilles Vreeken

【Abstract】: Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-k subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time.

【Keywords】: Subgroup discovery; Branch-and-bound; Fairness

22. Visually-Aware Fashion Recommendation and Design with Generative Image Models.

【Paper Link】【Pages】:207-216

【Authors】: Wang-Cheng Kang ; Chen Fang ; Zhaowen Wang ; Julian McAuley

【Abstract】: Building effective recommender systems for domains like fashion is challenging due to the high level of subjectivity and the semantic complexity of the features involved (i.e., fashion styles). Recent work has shown that approaches to 'visual' recommendation (e.g. clothing, art, etc.) can be made more accurate by incorporating visual signals directly into the recommendation objective, using 'off-the-shelf' feature representations derived from deep networks. Here, we seek to extend this contribution by showing that recommendation performance can be significantly improved by learning 'fashion aware' image representations directly, i.e., by training the image representation (from the pixel level) and the recommender system jointly; this contribution is related to recent work using Siamese CNNs, though we are able to show improvements over state-of-the-art recommendation techniques such as BPR and variants that make use of pretrained visual features. Furthermore, we show that our model can be used generatively, i.e., given a user and a product category, we can generate new images (i.e., clothing items) that are most consistent with their personal taste. This represents a first step towards building systems that go beyond recommending existing items from a product corpus, but which can be used to suggest styles and aid the design of new products.

【Keywords】: Recommender systems; Visualization; Training; Business process re-engineering; Gallium nitride; Clothing

23. AutoLearn - Automated Feature Generation and Selection.

【Paper Link】【Pages】:217-226

【Authors】: Ambika Kaul ; Saket Maheshwary ; Vikram Pudi

【Abstract】: In recent years, the importance of feature engineering has been confirmed by the exceptional performance of deep learning techniques, that automate this task for some applications. For others, feature engineering requires substantial manual effort in designing and selecting features and is often tedious and non-scalable. We present AutoLearn, a regression-based feature learning algorithm. Being data-driven, it requires no domain knowledge and is hence generic. Such a representation is learnt by mining pairwise feature associations, identifying the linear or non-linear relationship between each pair, applying regression and selecting those relationships that are stable and improve the prediction performance. Our experimental evaluation on 18 UC Irvine and 7 Gene expression datasets, across different domains, provides evidence that the features learnt through our model can improve the overall prediction accuracy by 13.28%, compared to original feature space and 5.87% over other top performing models, across 8 different classifiers without using any domain knowledge.

【Keywords】: Feature Generation; Feature Selection; Classification

24. Collective Entity Resolution in Familial Networks.

【Paper Link】【Pages】:227-236

【Authors】: Pigi Kouki ; Jay Pujara ; Christopher Marcum ; Laura M. Koehly ; Lise Getoor

【Abstract】: Entity resolution in settings with rich relational structure often introduces complex dependencies between co-references. Exploiting these dependencies is challenging - it requires seamlessly combining statistical, relational, and logical dependencies. One task of particular interest is entity resolution in familial networks. In this setting, multiple partial representations of a family tree are provided, from the perspective of different family members, and the challenge is to reconstruct a family tree from these multiple, noisy, partial views. This reconstruction is crucial for applications such as understanding genetic inheritance, tracking disease contagion, and performing census surveys. Here, we design a model that incorporates statistical signals, such as name similarity, relational information, such as sibling overlap, and logical constraints, such as transitivity and bijective matching, in a collective model. We show how to integrate these features using probabilistic soft logic, a scalable probabilistic programming framework. In experiments on real-world data, our model significantly outperforms state-of-the-art classifiers that use relational features but are incapable of collective reasoning.

【Keywords】: Probabilistic logic; Programming; History; Electronic mail; Image color analysis; Diabetes

25. Scalable and Adaptive Algorithms for the Triangle Interdiction Problem on Billion-Scale Networks.

【Paper Link】【Pages】:237-246

【Authors】: Alan Kuhnle ; Victoria G. Crawford ; My T. Thai

【Abstract】: Motivated by the relevance of clustering or transitivity to a variety of network applications, we study the Triangle Interdiction Problem (TIP), which is to find a minimum-size set of edges that intersects all triangles of a network. As existing approximation algorithms for this NP-hard problem either do not scale well to massive networks or have poor solution quality, we formulate two algorithms, TARL and DART, with worst-case guarantees 5/2 and 3 with respect to optimal, respectively. Furthermore, DART is able to efficiently maintain its worst-case guarantee under dynamic edge insertion and removal to the network. In our comprehensive experimental evaluation, we demonstrate that DART is able to run on networks with billions of triangles within 2 hours and is able to dynamically update its solution in microseconds.

【Keywords】: Triangle Interdiction; Cycle Transversal; Scalable approximation

26. Online Learning of Acyclic Conditional Preference Networks from Noisy Data.

【Paper Link】【Pages】:247-256

【Authors】: Fabien Labernia ; Bruno Zanuttini ; Brice Mayag ; Florian Yger ; Jamal Atif

【Abstract】: We deal with online learning of acyclic Conditional Preference networks (CP-nets) from data streams, possibly corrupted with noise. We introduce a new, efficient algorithm relying on (i) information-theoretic measures defined over the induced preference rules, which allow us to deal with corrupted data in a principled way, and on (ii) the Hoeffding bound to define an asymptotically optimal decision criterion for selecting the best conditioned variable to update the learned network. This is the first algorithm dealing with online learning of CP-nets in the presence of noise. We provide a thorough theoretical analysis of the algorithm, and demonstrate its effectiveness through an empirical evaluation on synthetic and on real datasets.

【Keywords】: preference learning; conditional preferences networks; graphical learning; online learning; Hoeffding bound; noisy preferences

27. GoGP: Fast Online Regression with Gaussian Processes.

【Paper Link】【Pages】:257-266

【Authors】: Trung Le ; Khanh Nguyen ; Vu Nguyen ; Tu Dinh Nguyen ; Dinh Q. Phung

【Abstract】: One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared with its rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.

【Keywords】: Online learning; Gaussian Process regression

28. HiMuV: Hierarchical Framework for Modeling Multi-modality Multi-resolution Data.

【Paper Link】【Pages】:267-276

【Authors】: Jianbo Li ; Jingrui He ; Yada Zhu

【Abstract】: Many real-world applications are characterized by temporal data collected from multiple modalities, each sampled with a different resolution. Examples include manufacturing processes and financial market prediction. In these applications, an interesting observation is that within the same modality, we often have data from multiple views, thus naturally forming a 2-level hierarchy: with the multiple modalities on the top, and the multiple views at the bottom. For example, in aluminum smelting processes, the multiple modalities include power, noise, alumina feed, etc; and within the same modality such as power, the different views correspond to various voltage, current and resistance control signals and measured responses. For such applications, we aim to address the following challenge, i.e., how can we integrate such multi-modality multi-resolution data to effectively predict the targets of interest, such as bath temperature in aluminum smelting cell and the volatility in financial market. In this paper, for the first time, we simultaneously model the hierarchical data structure and the multi-resolution property via a novel framework named HiMuV. Different from existing work based on multiple views on a single level or a single resolution, the proposed framework is based on the key assumption that the information from different modalities is complementary, whereas the information within the same modality (across different views) is redundant in terms of predicting the targets of interest. Therefore, we introduce an optimization framework where the objective function contains both the prediction loss and a novel regularizer enforcing the consistency among different views within the same modality. To solve this optimization framework, we propose an iterative algorithm based on randomized block coordinate descent. Experimental results on synthetic data, benchmark data, and various real data sets from aluminum smelting processes, and stock market prediction demonstrate the effectiveness and efficiency of the proposed algorithm.

【Keywords】: multi-modality; multi-resolution; time series; regression; aluminum smelting

29. Linear Time Complexity Time Series Classification with Bag-of-Pattern-Features.

【Paper Link】【Pages】:277-286

【Authors】: Xiaosheng Li ; Jessica Lin

【Abstract】: Time series classification has attracted much attention due to the ubiquity of time series. With the advance of technologies, the volume of available time series data becomes huge and the content is changing rapidly. This requires time series data mining methods to have low computational complexities. In this paper, we propose a parameter-free time series classification method that has a linear time complexity. The approach is evaluated on all the 85 datasets in the well-known UCR time series classification archive. The results show that the new method achieves better overall classification accuracy performance than the widely used benchmark, i.e. 1-nearest neighbor with dynamic time warping, while consuming orders of magnitude less running time. The proposed method is also applied on a large real-world bird sounds dataset to verify its effectiveness.

【Keywords】: Time series analysis; Time complexity; Data mining; Time measurement; Birds; Transforms; Windows

30. An Analysis of Boosted Linear Classifiers on Noisy Data with Applications to Multiple-Instance Learning.

【Paper Link】【Pages】:287-296

【Authors】: Rui Liu ; Soumya Ray

【Abstract】: An interesting observation about the well-known AdaBoost algorithm is that, though theory suggests it should overfit when applied to noisy data, experiments indicate it often does not do so in practice. In this paper, we study the behavior of AdaBoost on datasets with one-sided uniform class noise using linear classifiers as the base learner. We show analytically that, under some ideal conditions, this approach will not overfit, and can in fact recover a zero-error concept with respect to the true, uncorrupted instance labels. We also analytically show that AdaBoost increases the margins of predictions over boosting iterations, as has been previously suggested in the literature. We then compare the empirical behavior of AdaBoost using real world datasets with one-sided noise derived from multiple-instance data. Although our assumptions may not hold in a practical setting, our experiments show that standard AdaBoost still performs well, as suggested by our analysis, and often outperforms baseline variations in the literature that explicitly try to account for noise.

【Keywords】: AdaBoost; Noise; Multiple-Instance Learning

31. BiCycle: Item Recommendation with Life Cycles.

【Paper Link】【Pages】:297-306

【Authors】: Xinyue Liu ; Yuanfang Song ; Charu C. Aggarwal ; Yao Zhang ; Xiangnan Kong

【Abstract】: Recommender systems have attracted much attention in last decades, which can help the users explore new items in many applications. As a popular technique in recommender systems, item recommendation works by recommending items to users based on their historical interactions. Conventional item recommendation methods usually assume that users and items are stationary, which is not always the case in real-world applications. Many time-aware item recommendation models have been proposed to take the temporal effects into the considerations based on the absolute time stamps associated with observed interactions. We show that using absolute time to model temporal effects can be limited in some circumstances. In this work, we propose to model the temporal dynamics of both users and items in item recommendation based on their life cycles. This problem is very challenging to solve since the users and items can co-evolve in their life cycles and the sparseness of the data become more severe when we consider the life cycles of both users and items. A novel time-aware item recommendation model called BiCycle is proposed to address these challenges. BiCycle is designed based on two important observations: 1) correlated users or items usually share similar patterns in the similar stages of their life cycles. 2) user preferences and item characters can evolve gradually over different stages of their life cycles. Extensive experiments conducted on three real-world datasets demonstrate the proposed approach can significantly improve the performance of recommendation tasks by considering the inner life cycles of both users and items.

【Keywords】: Item Recommendation; Collaborative Filtering; Life Cycle; Evolution Inference; Data Mining

32. Telling Cause from Effect Using MDL-Based Local and Global Regression.

【Paper Link】【Pages】:307-316

【Authors】: Alexander Marx ; Jilles Vreeken

【Abstract】: We consider the fundamental problem of inferring the causal direction between two univariate numeric random variables X and Y from observational data. The two-variable case is especially difficult to solve since it is not possible to use standard conditional independence tests between the variables. To tackle this problem, we follow an information theoretic approach based on Kolmogorov complexity and use the Minimum Description Length (MDL) principle to provide a practical solution. In particular, we propose a compression scheme to encode local and global functional relations using MDL-based regression. We infer X causes Y in case it is shorter to describe Y as a function of X than the inverse direction. In addition, we introduce Slope, an efficient linear-time algorithm that through thorough empirical evaluation on both synthetic and real world data we show outperforms the state of the art by a wide margin.

【Keywords】: Kolmogorov Complexity; MDL; Causal Inference; Regression; Hypercompression

33. Distributing Frank-Wolfe via Map-Reduce.

【Paper Link】【Pages】:317-326

【Authors】: Armin Moharrer ; Stratis Ioannidis

【Abstract】: Large-scale optimization problems abound in data mining and machine learning applications, and the computational challenges they pose are often addressed through parallelization. We identify structural properties under which a convex optimization problem can be massively parallelized via map-reduce operations using the Frank-Wolfe (FW) algorithm. The class of problems that can be tackled this way is quite broad and includes experimental design, AdaBoost, and projection to a convex hull. Implementing FW via map-reduce eases parallelization and deployment via commercial distributed computing frameworks. We demonstrate this by implementing FW over Spark, an engine for parallel data processing, and establish that parallelization through map-reduce yields significant performance improvements: we solve problems with 10 million variables using 350 cores in 44 minutes; the same operation takes 133 hours when executed serially.

【Keywords】: Frank-Wolfe; Distributed Algorithms; Convex Optimization; Spark

34. Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs.

【Paper Link】【Pages】:327-336

【Authors】: Christopher Morris ; Kristian Kersting ; Petra Mutzel

【Abstract】: Most state-of-the-art graph kernels only take local graph properties into account, i.e., the kernel is computed with regard to properties of the neighborhood of vertices or other small substructures. On the other hand, kernels that do take global graph properties into account may not scale well to large graph databases. Here we propose to start exploring the space between local and global graph kernels, so called glocalized graph kernels, striking the balance between both worlds. Specifically, we introduce a novel graph kernel based on the k-dimensional Weisfeiler-Lehman algorithm. Unfortunately, the k-dimensional Weisfeiler-Lehman algorithm scales exponentially in k. Consequently, we devise a stochastic version of the kernel with provable approximation guarantees using conditional Rademacher averages. On bounded-degree graphs, it can even be computed in constant time. We support our theoretical results with experiments on several graph classification benchmarks, showing that our kernels often outperform the state-of-the-art in terms of classification accuracies.

【Keywords】: Graph Kernel; Graph Classification; Sampling

35. Importance Sketching of Influence Dynamics in Billion-Scale Networks.

【Paper Link】【Pages】:337-346

【Authors】: Hung T. Nguyen ; Tri P. Nguyen ; NhatHai Phan ; Thang N. Dinh

【Abstract】: The blooming availability of traces for social, biological, and communication networks opens up unprecedented opportunities in analyzing diffusion processes in networks. However, the sheer sizes of the nowadays networks raise serious challenges in computational efficiency and scalability. In this paper, we propose a new hyper-graph sketching framework for influence dynamics in networks. The core of our sketching framework, called SKIS, is an efficient importance sampling algorithm that returns only non-singular reverse cascades in the network. Comparing to previously developed sketches like RIS and SKIM, our sketch significantly enhances estimation quality while substantially reducing processing time and memory-footprint. Further, we present general strategies of using SKIS to enhance existing algorithms for influence estimation and influence maximization which are motivated by practical applications like viral marketing. Using SKIS, wedesign high-quality influence oracles for seed sets with average estimation error up to 10x times smaller than those using RIS and 6x times smaller than SKIMs. In addition, our influence maximization using SKIS substantially improves the quality of solutions for greedy algorithms. It achieves up to 10x times speed-up and 4x memory reduction for the fastest RIS-based DSSA algorithm, while maintaining the same theoretical guarantees.

【Keywords】: Importance Sampling; Influence Dynamics; Billion-scale Networks

36. Bayesian Optimization in Weakly Specified Search Space.

【Paper Link】【Pages】:347-356

【Authors】: Vu Nguyen ; Sunil Gupta ; Santu Rana ; Cheng Li ; Svetha Venkatesh

【Abstract】: Bayesian optimization (BO) has recently emerged as a powerful and flexible tool for hyper-parameter tuning and more generally for the efficient global optimization of expensive black-box functions. Systems implementing BO has successfully solved difficult problems in automatic design choices and machine learning hyper-parameters tunings. Many recent advances in the methodologies and theories underlying Bayesian optimization have extended the framework to new applications and provided greater insights into the behavior of these algorithms. Still, these established techniques always require a user-defined space to perform optimization. This pre-defined space specifies the ranges of hyper-parameter values. In many situations, however, it can be difficult to prescribe such spaces, as a prior knowledge is often unavailable. Setting these regions arbitrarily can lead to inefficient optimization - if a space is too large, we can miss the optimum with a limited budget, on the other hand, if a space is too small, it may not contain the optimum point that we want to get. The unknown search space problem is intractable to solve in practice. Therefore, in this paper, we narrow down to consider specifically the setting of "weakly specified" search space for Bayesian optimization. By weakly specified space, we mean that the pre-defined space is placed at a sufficiently good region so that the optimization can expand and reach to the optimum. However, this pre-defined space need not include the global optimum. We tackle this problem by proposing the filtering expansion strategy for Bayesian optimization. Our approach starts from the initial region and gradually expands the search space. Wedevelop an efficient algorithm for this strategy and derive its regret bound. These theoretical results are complemented by an extensive set of experiments on benchmark functions and tworeal-world applications which demonstrate the benefits of our proposed approach.

【Keywords】: Bayesian optimization; weakly specified space; hyper-parameter tuning; experimental design

37. Relational Mixture of Experts: Explainable Demographics Prediction with Behavioral Data.

【Paper Link】【Pages】:357-366

【Authors】: Masafumi Oyamada ; Shinji Nakadai

【Abstract】: Given a collection of basic customer demographics (e.g., age and gender) andtheir behavioral data (e.g., item purchase histories), how can we predictsensitive demographics (e.g., income and occupation) that not every customermakes available?This demographics prediction problem is modeled as a classification task inwhich a customer's sensitive demographic y is predicted from his featurevector x. So far, two lines of work have tried to produce a"good" feature vector x from the customer's behavioraldata: (1) application-specific feature engineering using behavioral data and (2) representation learning (such as singular value decomposition or neuralembedding) on behavioral data. Although these approaches successfullyimprove the predictive performance, (1) designing a good feature requiresdomain experts to make a great effort and (2) features obtained fromrepresentation learning are hard to interpret. To overcome these problems, we present a Relational Infinite SupportVector Machine (R-iSVM), a mixture-of-experts model that can leveragebehavioral data. Instead of augmenting the feature vectors of customers, R-iSVM uses behavioral data to find out behaviorally similar customerclusters and constructs a local prediction model at each customer cluster. In doing so, R-iSVM successfully improves the predictive performance withoutrequiring application-specific feature designing and hard-to-interpretrepresentations. Experimental results on three real-world datasets demonstrate the predictiveperformance and interpretability of R-iSVM. Furthermore, R-iSVM can co-existwith previous demographics prediction methods to further improve theirpredictive performance.

【Keywords】: demographics prediction; entity-relationship data; Bayesian nonparametrics; kernel machine

38. Unsupervised Feature Learning with Discriminative Encoder.

【Paper Link】【Pages】:367-376

【Authors】: Gaurav Pandey ; Ambedkar Dukkipati

【Abstract】: In recent years, deep discriminative models have achieved extraordinary performance on supervised learning tasks, significantly outperforming their generative counterparts. However, their success relies on the presence of a large amount of labeled data. How can one use the same discriminative models for learning useful features in the absence of labels? We address this question in this paper, by jointly modeling the distribution of data and latent features in a manner that explicitly assigns zero probability to unobserved data. Rather than maximizing the marginal probability of observed data, we maximize the joint probability of the data and the latent features using a two step EM-like procedure. To prevent the model from overfitting to our initial selection of latent features, we use adversarial regularization. Depending on the task, we allow the latent features to be one-hot or real-valued vectors, and define a suitable prior on the features. For instance, one-hot features correspond to class labels, and are directly used for unsupervised and semi-supervised classification task, whereas real-valued feature vectors are fed as input to simple classifiers for auxiliary supervised discrimination tasks. The proposed model, which we dub dicriminative encoder (or DisCoder), is flexible in the type of latent features that it can capture. The proposed model achieves state-of-the-art performance on several challenging tasks. Qualitative visualization of the latent features shows that the features learnt by the DisCoder are indeed meaningful.

【Keywords】: unsupervised learning; semi-supervised learning; discriminative model

39. Learning Doubly Stochastic Affinity Matrix via Davis-Kahan Theorem.

【Paper Link】【Pages】:377-384

【Authors】: Jiwoong Park ; Taejeong Kim

【Abstract】: Building an ideal graph which reveals the exact intrinsic structure of the data is critical in graph-based clustering. There have been a lot of efforts to construct an affinity matrix satisfying such a need in terms of a similarity measure. A recent approach attracting attention is on using doubly stochastic normalization of the affinity matrix to improve the clustering performance. In this paper, we propose a novel method to build a high-quality affinity matrix via incorporating Davis-Kahan theorem of matrix perturbation theory in the doubly stochastic normalization problem. We interpret the goal of the doubly stochastic normalization problem as minimizing the relative distance between the eigenspaces of the corresponding matrices. Also, for the doubly stochastic normalization problem we include an additional constraint that each eigenvalue be on the unit interval to fully conform to the spectral graph theory. Experiments on our framework present superior performance over various datasets.

【Keywords】: Eigenvalues and eigenfunctions; Laplace equations; Symmetric matrices; Clustering algorithms; Perturbation methods; Cost function; Kernel

40. Adaptive Laplace Mechanism: Differential Privacy Preservation in Deep Learning.

【Paper Link】【Pages】:385-394

【Authors】: NhatHai Phan ; Xintao Wu ; Han Hu ; Dejing Dou

【Abstract】: In this paper, we focus on developing a novel mechanism to preserve differential privacy in deep neural networks, such that: (1) The privacy budget consumption is totally independent of the number of training steps; (2) It has the ability to adaptively inject noise into features based on the contribution of each to the output; and (3) It could be applied in a variety of different deep neural networks. To achieve this, we figure out a way to perturb affine transformations of neurons, and loss functions used in deep neural networks. In addition, our mechanism intentionally adds "more noise" into features which are "less relevant" to the model output, and vice-versa. Our theoretical analysis further derives the sensitivities and error bounds of our mechanism. Rigorous experiments conducted on MNIST and CIFAR-10 datasets show that our mechanism is highly effective and outperforms existing solutions.

【Keywords】: Deep Learning; Differential Privacy; Laplace Mechanism

41. A Short-Term Rainfall Prediction Model Using Multi-task Convolutional Neural Networks.

【Paper Link】【Pages】:395-404

【Authors】: Minghui Qiu ; Peilin Zhao ; Ke Zhang ; Jun Huang ; Xing Shi ; Xiaoguang Wang ; Wei Chu

【Abstract】: Precipitation prediction, such as short-term rainfall prediction, is a very important problem in the field of meteorological service. In practice, most of recent studies focus on leveraging radar data or satellite images to make predictions. However, there is another scenario where a set of weather features are collected by various sensors at multiple observation sites. The observations of a site are sometimes incomplete but provide important clues for weather prediction at nearby sites, which are not fully exploited in existing work yet. To solve this problem, we propose a multi-task convolutional neural network model to automatically extract features from the time series measured at observation sites and leverage the correlation between the multiple sites for weather prediction via multi-tasking. To the best of our knowledge, this is the first attempt to use multi-task learning and deep learning techniques to predict short-term rainfall amount based on multi-site features. Specifically, we formulate the learning task as an end-to-end multi-site neural network model which allows to leverage the learned knowledge from one site to other correlated sites, and model the correlations between different sites. Extensive experiments show that the learned site correlations are insightful and the proposed model significantly outperforms a broad set of baseline models including the European Centre for Medium-range Weather Forecasts system (ECMWF).

【Keywords】: Multi-Task Convolutional Neural Networks; Rainfall Prediction; Multi-site Features

42. Scalable Hashing-Based Network Discovery.

【Paper Link】【Pages】:405-414

【Authors】: Tara Safavi ; Chandra Sripada ; Danai Koutra

【Abstract】: Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. In these domains, networks are often constructed out of multiple time series by computing measures of association or similarity between pairs of series. The nodes in a discovered graph correspond to time series, which are linked via edges weighted by the association scores of their endpoints. After graph construction, the network may be thresholded such that only the edges with stronger weights remain and the desired sparsity level is achieved. While this approach is feasible for small datasets, its quadratic time complexity does not scale as the individual time series length and the number of compared series increase. Thus, to avoid the costly step of building a fully-connected graph before sparsification, we propose a fast network discovery approach based on probabilistic hashing of randomly selected time series subsequences. Evaluation on real data shows that our methods construct graphs nearly 15 times as fast as baseline methods, while achieving both network structure and accuracy comparable to baselines in task-based evaluation.

【Keywords】: Time series analysis; Correlation; Time measurement; Neuroscience; Market research

43. Benchmark Generator for Dynamic Overlapping Communities in Networks.

【Paper Link】【Pages】:415-424

【Authors】: Neha Sengupta ; Michael Hamann ; Dorothea Wagner

【Abstract】: We describe a dynamic graph generator with overlapping communities that is capable of simulating community scale events while at the same time maintaining crucial graph properties. Such a benchmark generator is useful to measure and compare the responsiveness and efficiency of dynamic community detection algorithms. Since the generator allows the user to tune multiple parameters, it can also be used to test the robustness of a community detection algorithm across a spectrum of inputs. In an experimental evaluation, we demonstrate the generator's performance and show that graph properties are indeed maintained over time. Further, we show that standard community detection algorithms are able to find the generated community structure. To the best of our knowledge, this is the first time that all of the above have been combined into one benchmark generator, and this work constitutes an important building block for the development of efficient and reliable dynamic, overlapping community detection algorithms.

【Keywords】: Clustering; Social networks

44. Accurate Detection of Automatically Spun Content via Stylometric Analysis.

【Paper Link】【Pages】:425-434

【Authors】: Usman Shahid ; Shehroze Farooqi ; Raza Ahmad ; Zubair Shafiq ; Padmini Srinivasan ; Fareed Zaffar

【Abstract】: Spammers use automated content spinning techniques to evade plagiarism detection by search engines. Text spinners help spammers in evading plagiarism detectors by automatically restructuring sentences and replacing words or phrases with their synonyms. Prior work on spun content detection relies on the knowledge about the dictionary used by the text spinning software. In this work, we propose an approach to detect spun content and its seed without needing the text spinner's dictionary. Our key idea is that text spinners introduce stylometric artifacts that can be leveraged for detecting spun documents. We implement and evaluate our proposed approach on a corpus of spun documents that are generated using a popular text spinning software. The results show that our approach can not only accurately detect whether a document is spun but also identify its source (or seed) document - all without needing the dictionary used by the text spinner.

【Keywords】: text spinning; stylometry; spam

45. STExNMF: Spatio-Temporally Exclusive Topic Discovery for Anomalous Event Detection.

【Paper Link】【Pages】:435-444

【Authors】: Dear Sungbok Shin ; Minsuk Choi ; Jinho Choi ; Scott Langevin ; Christopher Bethune ; Philippe Horne ; Nathan Kronenfeld ; Ramakrishnan Kannan ; Barry L. Drake ; Haesun Park ; Jaegul Choo

【Abstract】: Understanding newly emerging events or topics associated with a particular region of a given day can provide deep insight on the critical events occurring in highly evolving metropolitan cities. We propose herein a novel topic modeling approach on text documents with spatio-temporal information (e.g., when and where a document was published) such as location-based social media data to discover prevalent topics or newly emerging events with respect to an area and a time point. We consider a map view composed of regular grids or tiles with each showing topic keywords from documents of the corresponding region. To this end, we present a tilebased spatio-temporally exclusive topic modeling approach called STExNMF, based on a novel nonnegative matrix factorization (NMF) technique. STExNMF mainly works based on the two following stages: (1) first running a standard NMF of each tile to obtain general topics of the tile and (2) running a spatiotemporally exclusive NMF on a weighted residual matrix. These topics likely reveal information on newly emerging events or topics of interest within a region. We demonstrate the advantages of our approach using the geo-tagged Twitter data of New York City. We also provide quantitative comparisons in terms of the topic quality, spatio-temporal exclusiveness, topic variation, and qualitative evaluations of our method using several usage scenarios. In addition, we present a fast topic modeling technique of our model by leveraging parallel computing.

【Keywords】: Topic modeling; social network analysis; matrix factorization; event detection; anomaly detection

46. A Probabilistic Approach for Learning with Label Proportions Applied to the US Presidential Election.

【Paper Link】【Pages】:445-454

【Authors】: Tao Sun ; Daniel Sheldon ; Brendan OConnor

【Abstract】: Ecological inference (EI) is a classical problem from political science to model voting behavior of individuals given only aggregate election results. Flaxman et al. recently formulated EI as machine learning problem using distribution regression, and applied it to analyze US presidential elections. However, distribution regression unnecessarily aggregates individual-level covariates available from census microdata, and ignores known structure of the aggregation mechanism. We instead formulate the problem as learning with label proportions (LLP), and develop a new, probabilistic, LLP method to solve it. Our model is the straightforward one where individual votes are latent variables. We use cardinality potentials to efficiently perform exact inference over latent variables during learning, and introduce a novel message-passing algorithm to extend cardinality potentials to multivariate probability models for use within multiclass LLP problems. We show experimentally that LLP outperforms distribution regression for predicting individual-level attributes, and that our method is as good as or better than existing state-of-the-art LLP methods.

【Keywords】: Ecological Inference; Learning with Label Proportions; cardinality potentials

47. Edge-Based Wedge Sampling to Estimate Triangle Counts in Very Large Graphs.

【Paper Link】【Pages】:455-464

【Authors】: Duru Türkoglu ; Ata Turk

【Abstract】: The number of triangles in a graph is useful to deduce a plethora of important features of the network that the graph is modeling. However, finding the exact value of this number is computationally expensive. Hence, a number of approximation algorithms based on random sampling of edges, or wedges (adjacent edge pairs) have been proposed for estimating this value. We argue that for large sparse graphs with power-law degree distribution, random edge sampling requires sampling large number of edges before providing enough information for accurate estimation, and existing wedge sampling methods lead to biased samplings, which in turn lead to less accurate estimations. In this paper, we propose a hybrid algorithm between edge and wedge sampling that addresses the deficiencies of both approaches. We start with uniform edge sampling and then extend each selected edge to form a wedge that is more informative for estimating the overall triangle count. The core estimate we make is the number of triangles each sampled edge in the first phase participates in. This approach provides accurate approximations with very small sampling ratios, outperforming the state-of-the-art up to 8 times in sample size while providing estimations with 95% confidence.

【Keywords】: Triangle count estimation; Random edge sampling; Random wedge sampling

【Paper Link】【Pages】:465-474

【Authors】: Binghui Wang ; Neil Zhenqiang Gong ; Hao Fu

【Abstract】: Detecting fraudulent users in online social networks is a fundamental and urgent research problem as adversaries can use them to perform various malicious activities. Global social structure based methods, which are known as guilt-by-association, have been shown to be promising at detecting fraudulent users. However, existing guilt-by-association methods either assume symmetric (i.e., undirected) social links, which oversimplifies the asymmetric (i.e., directed) social structure of real-world online social networks, or only leverage labeled fraudulent users or labeled normal users (but not both) in the training dataset, which limits detection accuracies. In this work, we propose GANG, a guilt-by-association method on directed graphs, to detect fraudulent users in OSNs. GANG is based on a novel pairwise Markov Random Field that we design to capture the unique characteristics of the fraudulent-user-detection problem in directed OSNs. In the basic version of GANG, given a training dataset, we leverage Loopy Belief Propagation (LBP) to estimate the posterior probability distribution for each user and uses it to predict a user's label. However, the basic version is not scalable enough and not guaranteed to converge because it relies on LBP. Therefore, we further optimize GANG and our optimized version can be represented as a concise matrix form, with which we are able to derive conditions for convergence. We compare GANG with various existing guilt-by-association methods on a large-scale Twitter dataset and a large-scale Sina Weibo dataset with labeled fraudulent and normal users. Our results demonstrate that GANG substantially outperforms existing methods, and that the optimized version of GANG is significantly more efficient than the basic version.

【Keywords】: Image edge detection; Twitter; Training; Random variables; Markov random fields; Robustness

49. Topological Recurrent Neural Network for Diffusion Prediction.

【Paper Link】【Pages】:475-484

【Authors】: Jia Wang ; Vincent W. Zheng ; Zemin Liu ; Kevin Chen-Chuan Chang

【Abstract】: In this paper, we study the problem of using representation learning to assist information diffusion prediction on graphs. In particular, we aim at estimating the probability of an inactive node to be activated next in a cascade. Despite the success of recent deep learning methods for diffusion, we find that they often underexplore the cascade structure. We consider a cascade as not merely a sequence of nodes ordered by their activation time stamps; instead, it has a richer structure indicating the diffusion process over the data graph. As a result, we introduce a new data model, namely diffusion topologies, to fully describe the cascade structure. We find it challenging to model diffusion topologies, which are dynamic directed acyclic graphs (DAGs), with the existing neural networks. Therefore, we propose a novel topological recurrent neural network, namely Topo-LSTM, for modeling dynamic DAGs. We customize Topo-LSTM for the diffusion prediction task, and show it improves the state-of-the-art baselines, by 20.1%-56.6% (MAP) relatively, across multiple real-world data sets.

【Keywords】: Topology; Receivers; Integrated circuit modeling; Network topology; Predictive models; Machine learning; Data models

50. Multi-task Survival Analysis.

【Paper Link】【Pages】:485-494

【Authors】: Lu Wang ; Yan Li ; Jiayu Zhou ; Dongxiao Zhu ; Jieping Ye

【Abstract】: Collecting labeling information of time-to-event analysis is naturally very time consuming, i.e., one has to wait for the occurrence of the event of interest, which may not always be observed for every instance. By taking advantage of censored instances, survival analysis methods internally consider more samples than standard regression methods, which partially alleviates this data insufficiency problem. Whereas most existing survival analysis models merely focus on a single survival prediction task, when there are multiple related survival prediction tasks, we may benefit from the tasks relatedness. Simultaneously learning multiple related tasks, multi-task learning (MTL) provides a paradigm to alleviate data insufficiency by bridging data from all tasks and improves generalization performance of all tasks involved. Even though MTL has been extensively studied, there is no existing work investigating MTL for survival analysis. In this paper, we propose a novel multi-task survival analysis framework that takes advantage of both censored instances and task relatedness. Specifically, based on two common used task relatedness assumptions, i.e., low-rank assumption and cluster structure assumption, we formulate two concrete models, COX-TRACE and COX-cCMTL, under the proposed framework, respectively. We develop efficient algorithms and demonstrate the performance of the proposed multi-task survival analysis models on the The Cancer Genome Atlas (TCGA) dataset. Our results show that the proposed approaches can significantly improve the prediction performance in survival analysis and can also discover some inherent relationships among different cancer types.

【Keywords】: Survival analysis; Multi-task learning; regularization; Cox model

51. Tracking Hit-and-Run Vehicle with Sparse Video Surveillance Cameras and Mobile Taxicabs.

【Paper Link】【Pages】:495-504

【Authors】: Yang Wang ; Wuji Chen ; Wei Zheng ; He Huang ; Wen Zhang ; Hengchang Liu

【Abstract】: Due to the sparse distribution of road video surveillance cameras, precise trajectory tracking for hit-and-run vehicles remains a challenging task. Previous research on vehicle trajectory recovery mostly focuses on recovering trajectory with low-sampling-rate GPS coordinates by retrieving road traffic flow patterns from collected GPS information. However, to the best of our knowledge, none of them considered using on-road taxicabs as mobile video surveillance cameras as well as the time-varying characteristics of vehicle traveling and road traffic flow patterns, therefore not suitable for recovering trajectories of hit-and-run vehicles. With this insight, we model the travel time-cost of a road segment during various time periods precisely with LNDs (Logarithmic Normal Distributions), then use LSNDs (Log Skew Normal Distributions) to approximate the time-cost of an urban trip during various time periods. We propose a novel approach to calculate possible location and time distribution of the hit-and-run vehicle in parallel, select the optimal taxicab to verify the distribution by uploading and checking video clips of this taxicab, finally refine the restoring trajectory in a parallel and recursive manner. We evaluate our solution on real-world taxicab and road surveillance system datasets. Experimental results demonstrate that our approach outperforms alternative solutions in terms of accuracy ratio of vehicle tracking.

【Keywords】: Roads; Trajectory; Cameras; Video surveillance; Mobile communication; Global Positioning System; Streaming media

52. Discovering Truths from Distributed Data.

【Paper Link】【Pages】:505-514

【Authors】: Yaqing Wang ; Fenglong Ma ; Lu Su ; Jing Gao

【Abstract】: In the big data era, the information about the same object collected from multiple sources is inevitably conflicting. The task of identifying true information (i.e., the truths) among conflicting data is referred to as truth discovery, which incorporates the estimation of source reliability degrees into the aggregation of multi-source data. However, in many real-world applications, large-scale data are distributed across multiple servers. Traditional truth discovery approaches cannot handle this scenario due to the constraints of communication overhead and privacy concern. Another limitation of most existing work is that they ignore the differences among objects, i.e., they treat all the objects equally. This limitation would be exacerbated in distributed environments where significant differences exist among the objects. To tackle the aforementioned issues, in this paper, we propose a novel distributed truth discovery framework (DTD), which can effectively and efficiently aggregate conflicting data stored across distributed servers, with the differences among the objects as well as the importance level of each server being considered. The proposed framework consists of two steps: the local truth computation step conducted by each local server and the central truth estimation step taking place in the central server. Specifically, we introduce the uncertainty values to model the differences among objects, and propose a new uncertainty-based truth discovery method (UbTD) for calculating the true information of objects in each local server. The outputs of the local truth computation step include the estimated local truths and the variances of objects, which are the input information of the central truth estimation step. To infer the final true information in the central server, we propose a new algorithm to aggregate the outputs of all the local servers with the quality of different local servers taken into account. The proposed distributed truth discovery framework can infer object truths without delivering any raw data to the central server, and thus can reduce communication overhead as well as preserve data privacy. Experimental results on three real world datasets show that the proposed DTD framework can efficiently estimate object truths with accuracy guarantee, and the proposed UbTD algorithm significantly outperforms the state-of-the-art batch truth discovery approaches.

【Keywords】: Truth discovery; distributed system; uncertainty estimation

53. Local Bayes Risk Minimization Based Stopping Strategy for Hierarchical Classification.

【Paper Link】【Pages】:515-524

【Authors】: Yu Wang ; Qinghua Hu ; Yucan Zhou ; Hong Zhao ; Yuhua Qian ; Jiye Liang

【Abstract】: In large-scale data classification tasks, it is becoming more and more challenging in finding a true class from a huge amount of candidate categories. Fortunately, a hierarchical structure usually exists in these massive categories. The task of utilizing this structure for effective classification is called hierarchical classification. It usually follows a top-down fashion which predicts a sample from the root node with a coarse-grained category to a leaf node with a fine-grained category. However, misclassification is inevitable if the information is insufficient or large uncertainty exists in the prediction process. In this scenario, we can design a stopping strategy to stop the sample at an internal node with a coarser category, instead of predicting a wrong leaf node. Several studies address the problem by improving performance in terms of hierarchical accuracy and informative prediction. However, all of these researches ignore an important issue: when predicting a sample at the current node, the error is inclined to occur if large uncertainty exists in the next lower level children nodes. In this paper, we integrate this uncertainty into a risk problem: when predicting a sample at a decision node, it will take precipitance risk in predicting the sample to a children node in the next lower level on one hand, and take conservative risk in stopping at the current node on the other. We address the risk problem by designing a Local Bayes Risk Minimization (LBRM) framework, which divides the prediction process into recursively deciding to stop or to go down at each decision node by balancing these two risks in a top-down fashion. Rather than setting a global loss function in the traditional Bayes risk framework, we replace it with different uncertainty in the two risks for each decision node. The uncertainty on the precipitance risk and the conservative risk are measured by information entropy on children nodes and information gain from the current node to children nodes, respectively. We propose a Weighted Tree Induced Error (WTIE) to obtain the predictions of minimum risk with different emphasis on the two risks. Experimental results on various datasets show the effectiveness of the proposed LBRM algorithm.

【Keywords】: Hierarchical Classification; Stopping Strategy; Local Bayes Risk Minimization; Uncertainty

54. AWDA: An Adaptive Wishart Discriminant Analysis.

【Paper Link】【Pages】:525-534

【Authors】: Haoyi Xiong ; Wei Cheng ; Wenqing Hu ; Jiang Bian ; Zhishan Guo

【Abstract】: Linear Discriminant Analysis (LDA) is widely-used for supervised dimension reduction and linear classification. Classical LDA, however, suffers from the ill-posed estimation problem on data with high dimension and low sample size (HDLSS). To cope with this problem, in this paper, we propose an Adaptive Wishart Discriminant Analysis (AWDA) for classification, that makes predictions in an ensemble way. Comparing to existing approaches, AWDA has two advantages: 1) leveraging theWishart distribution, AWDA ensembles multiple LDA classifiers parameterized by the sampled covariance matrices via a Bayesian Voting Scheme, which theoretically improves the robustness of classification, compared to LDA classifiers using a single (probably ill-posed) covariance matrix estimator; 2) AWDA updates the weights for voting optimally to adapt the local information of each new input data, so as to enable the nonlinear classification. Theoretical analysis indicates that AWDA guarantees a close approximation to the optimal Bayesian inference and thus achieves robust performance on high dimensional data. Extensive experiments on real-world datasets show that our approach outperforms state-of-the-art algorithms by a large margin.

【Keywords】: Data Mining; Classification; Linear Discriminant Analysis; Bayesian Inference and Wishart Distribution

55. Generating Medical Hypotheses Based on Evolutionary Medical Concepts.

【Paper Link】【Pages】:535-544

【Authors】: Guangxu Xun ; Kishlay Jha ; Vishrawas Gopalakrishnan ; Yaliang Li ; Aidong Zhang

【Abstract】: Literature based discovery (LBD) is a task that aims to uncover hidden associations between non-interacting scientific concepts by rationally connecting independent nuggets of information. Broadly, prior approaches to LBD include use of: a) distributional statistics and explicit representation, b) graph-theoretic measures, and c) supervised machine learning methods to find associations. However, purely distributional approaches may not necessarily entail semantically meaningful association and graph-theoretic approaches suffer from scalability issues. While supervised machine learning based approaches have the potential to elucidate associations, the training data required is too expensive to generate. In this paper we propose a novel dynamic Medical Subject Heading (MeSH) embedding model which is able to model the evolutionary behavior of medical concepts to uncover latent associations between them. The proposed model allows us to learn the evolutionary trajectories of MeSH embeddings and detect informative terms. Hence, based on the dynamic MeSH embeddings, meaningful medical hypotheses can be efficiently generated. To evaluate the efficacy of the proposed model, we perform both qualitative and quantitative evaluation. The results demonstrate that leveraging the evolutionary features of MeSH concepts is an effective way for predicting novel associations.

【Keywords】: Semantics; Diseases; Oils; Manuals; Fish; Unified modeling language

56. HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift.

【Paper Link】【Pages】:545-554

【Authors】: Dingqi Yang ; Bin Li ; Laura Rettig ; Philippe Cudré-Mauroux

【Abstract】: Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming data, where the elements of a histogram are observed in a streaming manner. First, the ever-growing cardinality of histogram elements makes any similarity computation inefficient. Second, the concept-drift issue in the data streams also impairs the accurate assessment of the similarity. In this paper, we propose to overcome the above challenges with HistoSketch, a fast similarity-preserving sketching method for streaming histograms with concept drift. Specifically, HistoSketch is designed to incrementally maintain a set of compact and fixed-size sketches of streaming histograms to approximate similarity between the histograms, with the special consideration of gradually forgetting the outdated histogram elements. We evaluate HistoSketch on multiple classification tasks using both synthetic and real-world datasets. The results show that our method is able to efficiently approximate similarity for streaming histograms and quickly adapt to concept drift. Compared to full streaming histograms gradually forgetting the outdated histogram elements, HistoSketch is able to dramatically reduce the classification time (with a 7500x speedup) with only a modest loss in accuracy (about 3.5%).

【Keywords】: Similarity-Preserving Sketching; Histograms; Streaming Data; Concept Drift; Consistent Weighted Sampling

57. Mining Customer Valuations to Optimize Product Bundling Strategy.

【Paper Link】【Pages】:555-564

【Authors】: Li Ye ; Hong Xie ; Weijie Wu ; John C. S. Lui

【Abstract】: Product bundling is widely adopted for information goods and online services because it can increase profit for companies. For example, cable companies often bundle Internet access and video streaming services together. However, it is challenging to obtain an optimal bundling strategy, not only because it is computationally expensive, but also that customers' private information (e.g., valuations for products) is needed for the decision, and we need to infer it from accessible datasets. As customers' purchasing data are getting richer due to the popularity of online shopping, doors are open for us to infer this information. This paper aims to address: (1) How to infer customers' valuations from the purchasing data? (2) How to determine the optimal product bundle to maximize the profit? We first formulate a profit maximization framework to select the optimal bundle set. We show that finding the optimal bundle set is NPhard. We then identify key factors that impact the profitability of product bundling. These findings give us insights to develop a computationally efficient algorithm to approximate the optimal product bundle with a provable performance guarantee. To obtain the input of the bundling algorithm, we infer the distribution of customers' valuations from their purchasing data, based on which we run our bundling algorithm and conduct experiments on an Amazon co-purchasing dataset. We extensively evaluate the accuracy of our inference and the bundling algorithm. Our results reveal conditions under which bundling is highly profitable and provide insights to guide the deployment of product bundling.

【Keywords】: Bundling; Consumer Valuations; Economics; Data Mining; Approximation Algorithm

58. Matrix Profile VI: Meaningful Multidimensional Motif Discovery.

【Paper Link】【Pages】:565-574

【Authors】: Chin-Chia Michael Yeh ; Nickolas Kavantzas ; Eamonn J. Keogh

【Abstract】: Time series motifs are approximately repeating patterns in real-valued time series data. They are useful for exploratory data mining and are often used as inputs for various time series clustering, classification, segmentation, rule discovery, and visualization algorithms. Since the introduction of the first motif discovery algorithm for univariate time series in 2002, multiple efforts have been made to generalize motifs to the multidimensional case. In this work, we show that these efforts, which typically attempt to find motifs on all dimensions, will not produce meaningful motifs except in the most contrived situations. We explain this finding and introduce mSTAMP, an algorithm that allows meaningful discovery of multidimensional motifs. Beyond producing objectively and subjectively meaningful results, our algorithm has a host of additional advantages, including being much faster, requiring fewer parameters and supporting streaming data. We demonstrate the utility of our mSTAMP-based motif discovery framework on domains as diverse as audio processing, industry, and sports analytics.

【Keywords】: Time Series; Motif Discovery; Multidimensional Data

59. Deep Similarity-Based Batch Mode Active Learning with Exploration-Exploitation.

【Paper Link】【Pages】:575-584

【Authors】: Changchang Yin ; Buyue Qian ; Shilei Cao ; Xiaoyu Li ; Jishang Wei ; Qinghua Zheng ; Ian Davidson

【Abstract】: Active learning aims to reduce manual labeling efforts by proactively selecting the most informative unlabeled instances to query. In real-world scenarios, it's often more practical to query a batch of instances rather than a single one at each iteration. To achieve this we need to keep not only the informativeness of the instances but also their diversity. Many heuristic methods have been proposed to tackle batch mode active learning problems, however, they suffer from two limitations which if addressed would significantly improve the query strategy. Firstly, the similarity amongst instances is simply calculated using the feature vectors rather than being jointly learned with the classification model. This weakens the accuracy of the diversity measurement. Secondly, these methods usually exploit the decision boundary by querying the data points close to it. However, this can be inefficient when the labeled set is too small to reveal the true boundary. In this paper, we address both limitations by proposing a deep neural network based algorithm. In the training phase, a pairwise deep network is not only trained to perform classification, but also to project data points into another space, where the similarity can be more precisely measured. In the query selection phase, the learner selects a set of instances that are maximally uncertain and minimally redundant (exploitation), as well as are most diverse from the labeled instances (exploration). We evaluate the effectiveness of the proposed method on a variety of classification tasks: MNIST classification, opinion polarity detection, and heart failure prediction. Our method outperforms the baselines with both higher classification accuracy and faster convergence rate.

【Keywords】: Neural networks; Labeling; Uncertainty; Clustering algorithms; Redundancy; Support vector machines; Electronic mail

60. SPTF: A Scalable Probabilistic Tensor Factorization Model for Semantic-Aware Behavior Prediction.

【Paper Link】【Pages】:585-594

【Authors】: Hongzhi Yin ; Hongxu Chen ; Xiaoshuai Sun ; Hao Wang ; Yang Wang ; Quoc Viet Hung Nguyen

【Abstract】: With the rapid rise of various e-commerce and social network platforms, users are generating large amounts of heterogeneous behavior data, such as purchasehistory, adding-to-favorite, adding-to-cart and click activities, and this kind of user behavior data is usually binary, only reflecting a user's action or inaction (i.e., implicit feedback data). Tensor factorization is a promising means of modeling heterogeneous user behaviors by distinguishing different behavior types. However, ambiguity arises in the interpretation of the unobserved user behavior records that mix both real negative examples and potential positive examples. Existing tensor factorization models either ignore unobserved examples or treat all of them as negative examples, leading to either poor prediction performance or huge computation cost. In addition, the distribution of positive examples w.r.t. behavior types is heavily skewed. Existing tensor factorization models would bias towards the type of behaviors with a large number of positive examples. In this paper, we propose a scalable probabilistic tensor factorization model (SPTF) for heterogeneous behavior data and develop a novel negative sampling technique to optimize SPTF by leveraging both observed and unobserved examples with much lower computational costs and higher modeling accuracy. To overcome the issue of the heavy skewness of the behavior data distribution, we propose a novel adaptive ranking-based positive sampling approach to speed up the model convergence and improve the prediction accuracy for sparse behavior types. Our proposed model optimization techniques enable SPTF to be scalable to large-scale behavior datasets. Extensive experiments have been conducted on a large-scale e-commerce dataset, and the experimental results show the superiority of our proposed SPTF model in terms of prediction accuracy and scalability.

【Keywords】: Predictive models; Tensile stress; Data models; Adaptation models; Computational modeling; Probabilistic logic; Mathematical model

61. Supervised Belief Propagation: Scalable Supervised Inference on Attributed Networks.

【Paper Link】【Pages】:595-604

【Authors】: Jaemin Yoo ; Saehan Jo ; U. Kang

【Abstract】: Given an undirected network where some of the nodes are labeled, how can we classify the unlabeled nodes with high accuracy? Loopy Belief Propagation (LBP) is an inference algorithm widely used for this purpose with various applications including fraud detection, malware detection, web classification, and recommendation. However, previous methods based on LBP have problems in modeling complex structures of attributed networks because they manually and heuristically select the most important parameter, the propagation strength. In this paper, we propose Supervised Belief Propagation (SBP), a scalable and novel inference algorithm which automatically learns the optimal propagation strength by supervised learning. SBP is generally applicable to attributed networks including weighted and signed networks. Through extensive experiments, we demonstrate that SBP generalizes previous LBP-based methods and outperforms previous LBP and RWR based methods in real-world networks.

【Keywords】: Belief propagation; Node classification; Social network

【Paper Link】【Pages】:605-614

【Authors】: Jiawei Zhang ; Congying Xia ; Chenwei Zhang ; Limeng Cui ; Yanjie Fu ; Philip S. Yu

【Abstract】: Network embedding aims at projecting the network data into a low-dimensional feature space, where the nodes are represented as a unique feature vector and network structure can be effectively preserved. In recent years, more and more online application service sites can be represented as massive and complex networks, which are extremely challenging for traditional machine learning algorithms to deal with. Effective embedding of the complex network data into low-dimension feature representation can both save data storage space and enable traditional machine learning algorithms applicable to handle the network data. Network embedding performance will degrade greatly if the networks are of a sparse structure, like the emerging networks with few connections. In this paper, we propose to learn the embedding representation for a target emerging network based on the broad learning setting, where the emerging network is aligned with other external mature networks at the same time. To solve the problem, a new embedding framework, namely "Deep alIgned autoencoder based eMbEdding" (DIME), is introduced in this paper. DIME handles the diverse link and attribute in a unified analytic based on broad learning, and introduces the multiple aligned attributed heterogeneous social network concept to model the network structure. A set of meta paths are introduced in the paper, which define various kinds of connections among users via the heterogeneous link and attribute information. The closeness among users in the networks are defined as the meta proximity scores, which will be fed into DIME to learn the embedding vectors of users in the emerging network. Extensive experiments have been done on real-world aligned social networks, which have demonstrated the effectiveness of DIME in learning the emerging network embedding vectors.

【Keywords】: Network Embedding; Multiple Aligned Social Networks; Heterogeneous Information Network; Broad Learning

63. Data-Driven Immunization.

【Paper Link】【Pages】:615-624

【Authors】: Yao Zhang ; Arvind Ramanathan ; Anil Vullikanti ; Laura L. Pullum ; B. Aditya Prakash

【Abstract】: Given a contact network and coarse-grained diagnostic information like electronic Healthcare Reimbursement Claims (eHRC) data, can we develop efficient intervention policies to control an epidemic? Immunization is an important problem in multiple areas especially epidemiology and public health. However, most existing studies focus on developing pre-emptive strategies assuming prior epidemiological models. In practice, disease spread is usually complicated, hence assuming an underlying model may deviate from true spreading patterns, leading to possibly inaccurate interventions. Additionally, the abundance of health care surveillance data (like eHRC) makes it possible to study data-driven strategies without too many restrictive assumptions. Hence, such an approach can help public-health experts take more practical decisions. In this paper, we take into account propagation log and contact networks for controlling propagation. We formulate the novel and challenging Data-Driven Immunization problem without assuming classical epidemiological models. To solve it, we first propose an efficient sampling approach to align surveillance data with contact networks, then develop an efficient algorithm with the provably approximate guarantee for immunization. Finally, we show the effectiveness and scalability of our methods via extensive experiments on multiple datasets, and conduct case studies on nation-wide real medical surveillance data.

【Keywords】: Vaccines; Immune system; Surveillance; Approximation algorithms; Diseases; Resource management; Sociology

64. Online and Distributed Robust Regressions Under Adversarial Data Corruption.

【Paper Link】【Pages】:625-634

【Authors】: Xuchao Zhang ; Liang Zhao ; Arnold P. Boedihardjo ; Chang-Tien Lu

【Abstract】: In today's era of big data, robust least-squares regression becomes a more challenging problem when considering the adversarial corruption along with explosive growth of datasets. Traditional robust methods can handle the noise but suffer from several challenges when applied in huge dataset including 1) computational infeasibility of handling an entire dataset at once, 2) existence of heterogeneously distributed corruption, and 3) difficulty in corruption estimation when data cannot be entirely loaded. This paper proposes online and distributed robust regression approaches, both of which can concurrently address all the above challenges. Specifically, the distributed algorithm optimizes the regression coefficients of each data block via heuristic hard thresholding and combines all the estimates in a distributed robust consolidation. Furthermore, an online version of the distributed algorithm is proposed to incrementally update the existing estimates with new incoming data. We also prove that our algorithms benefit from strong robustness guarantees in terms of regression coefficient recovery with a constant upper bound on the error of state-of-the-art batch methods. Extensive experiments on synthetic and real datasets demonstrate that our approaches are superior to those of existing methods in effectiveness, with competitive efficiency.

【Keywords】: Robust Regression; Online Algorithm; Distributed Algorithm

65. MetaLDA: A Topic Model that Efficiently Incorporates Meta Information.

【Paper Link】【Pages】:635-644

【Authors】: He Zhao ; Lan Du ; Wray L. Buntine ; Gang Liu

【Abstract】: Besides the text content, documents and their associated words usually come with rich sets of meta information, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this paper, we present a topic model, called MetaLDA, which is able to leverage either document or word meta information, or both of them jointly. With two data argumentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta information. Extensive experiments on several real world datasets demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, compared with other models using meta information, our model runs significantly faster.

【Keywords】: topic models; meta information; short texts

【Paper Link】【Pages】:645-654

【Authors】: Huan Zhao ; Quanming Yao ; James T. Kwok ; Dik Lun Lee

【Abstract】: Matrix Factorization (MF) is a very popular method for recommendation systems. It assumes that the underneath rating matrix is low-rank. However, this assumption can be too restrictive to capture complex relationships and interactions among users and items. Recently, Local LOw-Rank Matrix Approximation (LLORMA) has been shown to be very successful in addressing this issue. It just assumes the rating matrix is composed of a number of low-rank submatrices constructed from subsets of similar users and items. Although LLORMA outperforms MF, how to construct such submatrices remains a big problem. Motivated by the availability of rich social connections in today's recommendation systems, we propose a novel framework, i.e., Social LOcal low-rank Matrix Approximation (SLOMA), to address this problem. To the best of our knowledge, SLOMA is the first work to incorporate social connections into the local low-rank framework. Furthermore, we enhance SLOMA by applying social regularization to submatrices factorization, denoted as SLOMA++. Therefore, the proposed model can benefit from both social recommendation and the local low-rank assumption. Experimental results from two real-world datasets, Yelp and Douban, demonstrate the superiority of the proposed models over LLORMA and MF.

【Keywords】: Recommendation system; Collaborative Filtering; Matrix factorization; Local low-rank; Social network

67. Exploiting Hierarchical Structures for POI Recommendation.

【Paper Link】【Pages】:655-664

【Authors】: Pengpeng Zhao ; Xiefeng Xu ; Yanchi Liu ; Ziting Zhou ; Kai Zheng ; Victor S. Sheng ; Hui Xiong

【Abstract】: With the rapid development of location-based social networks, Point-of-Interest (POI) recommendation has played an important role in helping people discover attractive locations. However, existing POI recommendation methods assume a flat structure of POIs, which are better described in a hierarchical structure in reality. Furthermore, we discover that both users' content and spatial preferences exhibit hierarchical structures. To this end, in this paper, we propose a hierarchical geographical matrix factorization model (HGMF) to utilize the hierarchical structures of both users and POIs for POI recommendation. Specifically, we first describe the POI influence degrees over regions with two-dimensional normal distribution, and learn the influence areas of different layers of POIs as the input of HGMF. Then, we perform matrix factorization on user content preference matrix, user spatial preference matrix, and POIs characteristic matrix jointly with the modeling of implicit hierarchical structures. Moreover, a two-step optimization method is proposed to learn the implicit hierarchical structure and find the solution of HGMF efficiently. Finally, we evaluate HGMF on two large-scale real-world location-based social networks datasets. Our experimental results demonstrate that it outperforms the state-of-the-art methods in terms of precision and recall.

【Keywords】: Gaussian distribution; Social network services; Optimization methods; Matrix decomposition; Data mining; Computer science; Recommender systems

68. AnySCAN: An Efficient Anytime Framework with Active Learning for Large-Scale Network Clustering.

【Paper Link】【Pages】:665-674

【Authors】: Weizhong Zhao ; Gang Chen ; Xiaowei Xu

【Abstract】: Network clustering is an essential approach to finding latent clusters in real-world networks. As the scale of real-world networks becomes increasingly larger, the existing network clustering algorithms fail to discover meaningful clusters efficiently. In this paper, we propose a framework called AnySCAN, which applies anytime theory to the structural clustering algorithm for networks (SCAN). Moreover, an active learning strategy is proposed to advance the refining procedure in AnySCAN framework. AnySCAN with the active learning strategy is able to find the exactly same clustering result on large-scale networks as the original SCAN in a significantly more efficient manner. Extensive experiments on real-world and synthetic networks demonstrate that our proposed method outperforms existing network clustering approaches.

【Keywords】: Clustering algorithms; Rocks; Electronic mail; Data mining; Machine learning algorithms; Data structures; Conferences

69. SCED: A General Framework for Sparse Tensor Decomposition with Constraints and Elementwise Dynamic Learning.

【Paper Link】【Pages】:675-684

【Authors】: Shuo Zhou ; Sarah M. Erfani ; James Bailey

【Abstract】: CANDECOMP/PARAFAC Decomposition (CPD) is one of the most popular tensor decomposition methods that has been extensively studied and widely applied. In recent years, sparse tensors that contain a huge portion of zeros but a limited number of non-zeros have attracted increasing interest. Existing techniques are not directly applicable to sparse tensors, since they mainly target dense ones and usually have poor efficiency. Additionally, specific issues also arise for sparse tensors, depending on different data sources and applications: the role of zero entries can be different; incorporating constraints like non-negativity and sparseness might be necessary; the ability to learn on-the-fly is a must for dynamic scenarios that new data keeps arriving at high velocity. However, state-of-art algorithms only partially address the above issues. To fill this gap, we propose a general framework for finding the CPD of sparse tensors. Modeling the sparse tensor decomposition problem by a generalized weighted CPD formulation and solving it efficiently, our proposed method is also flexible to handle constraints and dynamic data streams. Through experiments on both synthetic and real-world datasets, for the static case, our method demonstrates significant improvements in terms of effectiveness, efficiency and scalability. Moreover, under the dynamic setting, our method speeds up current technology by hundreds to thousands times, without sacrificing decomposition quality.

【Keywords】: Tensile stress; Heuristic algorithms; Loading; Matrix decomposition; Optimization; Sparse matrices; Data mining

70. A Randomized Approach for Crowdsourcing in the Presence of Multiple Views.

【Paper Link】【Pages】:685-694

【Authors】: Yao Zhou ; Jingrui He

【Abstract】: Driven by the dramatic growth of data both in terms of the size and sources, learning from heterogeneous data is emerging as an important research direction for many real applications. One of the biggest challenges of this type of problem is how to meaningfully integrate heterogeneous data to considerably improve the generality and quality of the learning model. In this paper, we first present a unified learning framework that aims to leverage the structural information from two types of data heterogeneity: view heterogeneity (as in multi-view learning) and worker heterogeneity (as in crowdsourcing). The objective follows the principles of view consistency and worker consensus by minimizing the loss term with a regularized prediction tensor. We then propose to relax and solve the optimization framework with an iterative updating method. We also prove that the gradient of the most time-consuming updating block is separable with respect to the workers, which leads to a randomized algorithm with faster speed and better convergence. Finally, we compare the proposed method with several state-of-the-arts and demonstrate its effectiveness on various data sets.

【Keywords】: Heterogeneous Learning; Crowdsourcing; Multi-view Learning; Optimization

71. Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining (Best Student Paper Award).

【Paper Link】【Pages】:695-704

【Authors】: Yan Zhu ; Makoto Imamura ; Daniel Nikovski ; Eamonn J. Keogh

【Abstract】: Since their introduction over a decade ago, time series motifs have become a fundamental tool for time series analytics, finding diverse uses in dozens of domains. In this work we introduce Time Series Chains, which are related to, but distinct from, time series motifs. Informally, time series chains are a temporally ordered set of subsequence patterns, such that each pattern is similar to the pattern that preceded it, but the first and last patterns are arbitrarily dissimilar. In the discrete space, this is similar to extracting the text chain "hit, hot, dot, dog" from a paragraph. The first and last words have nothing in common, yet they are connected by a chain of words with a small mutual difference. Time series chains can capture the evolution of systems, and help predict the future. As such, they potentially have implications for prognostics. In this work, we introduce a robust definition of time series chains, and a scalable algorithm that allows us to discover them in massive datasets.

【Keywords】: Time Series; Motifs; Prognostics; Link Analysis

72. Spatio-Temporal Neural Networks for Space-Time Series Forecasting and Relations Discovery.

【Paper Link】【Pages】:705-714

【Authors】: Ali Ziat ; Edouard Delasalles ; Ludovic Denoyer ; Patrick Gallinari

【Abstract】: We introduce a dynamical spatio-temporal model formalized as a recurrent neural network for forecasting time series of spatial processes, i.e. series of observations sharing temporal and spatial dependencies. The model learns these dependencies through a structured latent dynamical component, while a decoder predicts the observations from the latent representations. We consider several variants of this model, corresponding to different prior hypothesis about the spatial relations between the series. The model is evaluated and compared to state-of-the-art baselines, on a variety of forecasting problems representative of different application areas: epidemiology, geo-spatial statistics and car-traffic prediction. Besides these evaluations, we also describe experiments showing the ability of this approach to extract relevant spatial relations.

【Keywords】: Predictive models; Forecasting; Biological system modeling; Data models; Time series analysis; Computational modeling; Hidden Markov models

Short Papers 83

73. Differentially Private Mixture of Generative Neural Networks.

【Paper Link】【Pages】:715-720

【Authors】: Gergely Ács ; Luca Melis ; Claude Castelluccia ; Emiliano De Cristofaro

【Abstract】: Generative models are used in an increasing number of applications that rely on large amounts of contextually rich information about individuals. Owing to possible privacy violations, however, publishing or sharing generative models is not always viable. In this paper, we introduce a novel solution for privately releasing generative models and entire high-dimensional datasets produced by these models. We model the generator distribution of the training data by a mixture of k generative neural networks. These are trained together and collectively learn the generator distribution of a dataset. Data is divided into k clusters, using a novel differentially private kernel k-means, then each cluster is given to separate generative neural networks, such as Restricted Boltzmann Machines or Variational Autoencoders, which are trained only on their own cluster using differentially private gradient descent. We evaluate our approach using the MNIST dataset and a large Call Detail Records dataset, showing that it produces realistic synthetic samples, which can also be used to accurately compute arbitrary number of counting queries.

【Keywords】: differential privacy; generative neural networks; Restricted Boltzmann Machines; Variational Autoencoders; kernel k-means; clustering

74. A Probabilistic Geographical Aspect-Opinion Model for Geo-Tagged Microblogs.

【Paper Link】【Pages】:721-726

【Authors】: Aman Ahuja ; Wei Wei ; Wei Lu ; Kathleen M. Carley ; Chandan K. Reddy

【Abstract】: Due to the rapid increase in the number of users owning location-based devices, there is a considerable amount of geo-tagged data available on social media websites, such as Twitter and Facebook. This geo-tagged data can be useful in a variety of ways to extract location-specific information, as well as to comprehend the variation of information across different geographical regions. A lot of techniques have been proposed for extracting location-based information from social media, but none of these techniques aim to utilize an important characteristic of this data, which is the presence of aspects and their opinions, expressed by the users on these platforms. In this paper, we propose Geographic Aspect Opinion model (GASPOP), a probabilistic model that jointly discovers the variation of aspect and opinion, that correspond to different topics across various geographical regions from geo-tagged social media data. It incorporates the syntactic features of text in the generative process to differentiate aspect and opinion words from general background words. The user-based modeling of topics, also enables it to determine the interest distribution of various users. Furthermore, our model can be used to predict the location of different tweets based on their text. We evaluated our model on Twitter data, and our experimental results show that GASPOP can jointly discover latent aspect and opinion words for different topics across latent geographical regions. Moreover, a quantitative analysis of GASPOP using widely used evaluation metrics shows that it outperforms the state-of-the-art methods.

【Keywords】: Microblogs; probabilistic models; aspect mining; opinion mining; topic modeling

75. Aspect Sentiment Model for Micro Reviews.

【Paper Link】【Pages】:727-732

【Authors】: Reinald Kim Amplayo ; Seung-won Hwang

【Abstract】: This paper aims at an aspect sentiment model for aspect-based sentiment analysis (ABSA) focused on micro reviews. This task is important in order to understand short reviews majority of the users write, while existing topic models are targeted for expert-level long reviews with sufficient co-occurrence patterns to observe. Current methods on aggregating micro reviews using metadata information may not be effective as well due to metadata absence, topical heterogeneity, and cold start problems. To this end, we propose a model called Micro Aspect Sentiment Model (MicroASM). MicroASM is based on the observation that short reviews 1) are viewed with sentiment-aspect word pairs as building blocks of information, and 2) can be clustered into larger reviews. When compared to the current state-of-the-art aspect sentiment models, experiments show that our model provides better performance on aspect-level tasks such as aspect term extraction and document-level tasks such as sentiment classification.

【Keywords】: Mathematical model; Data models; Atmospheric modeling; Analytical models; Motion pictures; Tagging

76. Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions.

【Paper Link】【Pages】:733-738

【Authors】: Ehsan Mohammady Ardehaly ; Aron Culotta

【Abstract】: Opinion mining and demographic attribute inference have many applications in social science. In this paper, we propose models to infer daily joint probabilities of multiple latent attributes from Twitter data, such as political sentiment and demographic attributes. Since it is costly and time-consuming to annotate data for traditional supervised classification, we instead propose scalable Learning from Label Proportions (LLP) models for demographic and opinion inference using U.S. Census, national and state political polls, and Cook partisan voting index as population level data. In LLP classification settings, the training data is divided into a set of unlabeled bags, where only the label distribution of each bag is known, removing the requirement of instance-level annotations. Our proposed LLP model, Weighted Label Regularization (WLR), provides a scalable generalization of prior work on label regularization to support weights for samples inside bags, which is applicable in this setting where bags are arranged hierarchically (e.g., county-level bags are nested inside of state-level bags). We apply our model to Twitter data collected in the year leading up to the 2016 U.S. presidential election, producing estimates of the relationships among political sentiment and demographics over time and place. We find that our approach closely tracks traditional polling data stratified by demographic category, resulting in error reductions of 28-44% over baseline approaches. We also provide descriptive evaluations showing how the model may be used to estimate interactions among many variables and to identify linguistic temporal variation, capabilities which are typically not feasible using traditional polling methods.

【Keywords】: Learning from Label Proportions; LLP; Data Mining; NLP; Machine Learning; Label Regularization

77. Hierarchical Multinomial-Dirichlet Model for the Estimation of Conditional Probability Tables.

【Paper Link】【Pages】:739-744

【Authors】: Laura Azzimonti ; Giorgio Corani ; Marco Zaffalon

【Abstract】: We present a novel approach for estimating conditional probability tables, based on a joint, rather than independent, estimate of the conditional distributions belonging to the same table. We derive exact analytical expressions for the estimators and we analyse their properties both analytically and via simulation. We then apply this method to the estimation of parameters in a Bayesian network. Given the structure of the network, the proposed approach better estimates the joint distribution and significantly improves the classification performance with respect to traditional approaches.

【Keywords】: hierarchical Bayesian model; Bayesian networks; conditional probability estimation

78. Multi-party Sparse Discriminant Learning.

【Paper Link】【Pages】:745-750

【Authors】: Jiang Bian ; Haoyi Xiong ; Wei Cheng ; Wenqing Hu ; Zhishan Guo ; Yanjie Fu

【Abstract】: Sparse Discriminant Analysis (SDA) has been widely used to improve the performance of classical Fisher's Linear Discriminant Analysis in supervised metric learning, feature selection and classification. With the increasing needs of distributed data collection, storage and processing, enabling the Sparse Discriminant Learning to embrace the Multi-Party distributed computing environments becomes an emerging research topic. This paper proposes a novel Multi-Party SDA algorithm, which can learn SDA models effectively without sharing any raw dataand basic statistics among machines. The proposed algorithm 1) leverages the direct estimation of SDA [1] to derive a distributed loss function for the discriminant learning, 2) parameterizes the distributed loss function with local/global estimates through bootstrapping, and 3) approximates a global estimation of linear discriminant projection vector by optimizing the "distributed bootstrapping loss function" with gossip-based stochastic gradient descent. Experimental results on both synthetic and real-world benchmark datasets show that our algorithm can compete with the centralized SDA with similar performance, and significantly outperforms the most recent distributed SDA [2] in terms of accuracy and F1-score.

【Keywords】: Multi-Party Statistical Learning; Sparse Discriminant Analysis; Bayesian Asymptotic Efficiency

79. MDL for Causal Inference on Discrete Data.

【Paper Link】【Pages】:751-756

【Authors】: Kailash Budhathoki ; Jilles Vreeken

【Abstract】: The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as the direction with the lowest Kolmogorov complexity. This notion is very powerful as it can detect any causal dependency that can be explained by a physical process. However, due to the halting problem, it is also not computable. In this paper we propose an computable instantiation that provably maintains the key aspects of the ideal. We propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, the score degrades gracefully, and we are still maximally able to detect dependencies between the marginal and the conditional distribution. As a proof of concept, we propose CISC, a linear-time algorithm for causal inference by stochastic complexity, for pairs of univariate discrete variables. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes.

【Keywords】: Causal Inference; MDL; Discrete Data

80. Efficient Mining of Subsample-Stable Graph Patterns.

【Paper Link】【Pages】:757-762

【Authors】: Aleksey Buzmakov ; Sergei O. Kuznetsov ; Amedeo Napoli

【Abstract】: A scalable method for mining graph patterns stable under subsampling is proposed. The existing subsample stability and robustness measures are not antimonotonic according to definitions known so far. We study a broader notion of antimonotonicity for graph patterns, so that measures of subsample stability become antimonotonic. Then we propose gSOFIA for mining the most subsample-stable graph patterns. The experiments on numerous graph datasets show that gSOFIA is very efficient for discovering subsample-stable graph patterns.

【Keywords】: graph mining; subsample stability; Formal Concept Analysis

81. Multi-level Feedback Web Links Selection Problem: Learning and Optimization.

【Paper Link】【Pages】:763-768

【Authors】: Kechao Cai ; Kun Chen ; Longbo Huang ; John C. S. Lui

【Abstract】: Selecting the right web links for a website is important because appropriate links not only can provide high attractiveness but can also increase the website's revenue. In this work, we first show that web links have an intrinsic multi-level feedback structure. For example, consider a 2-level feedback web link: the 1st level feedback provides the Click-Through Rate (CTR) and the 2nd level feedback provides the potential revenue, which collectively produce the compound 2-level revenue. We consider the context-free links selection problem of selecting links for a homepage so as to maximize the total compound 2-level revenue while keeping the total 1st level feedback above a preset threshold. We further generalize the problem to links with n (n ≥ 2)-level feedback structure. The key challenge is that the links' multi-level feedback structures are unobservable unless the links are selected on the homepage. To our best knowledge, we are the first to model the links selection problem as a constrained multi-armed bandit problem and design an effective links selection algorithm by learning the links' multi-level structure with provable sub-linear regret and violation bounds. We uncover the multi-level feedback structures of web links in two real-world datasets. We also conduct extensive experiments on the datasets to compare our proposed LExp algorithm with two state-of-the-art context-free bandit algorithms and demonstrate that LExp algorithm is the most effective in links selection while satisfying the constraint.

【Keywords】: link selection; multi-level feedback; constrained multi-armed bandit; learning; LExp

82. HitFraud: A Broad Learning Approach for Collective Fraud Detection in Heterogeneous Information Networks.

【Paper Link】【Pages】:769-774

【Authors】: Bokai Cao ; Mia Mao ; Siim Viidu ; Philip S. Yu

【Abstract】: On electronic game platforms, different payment transactions have different levels of risk. Risk is generally higher for digital goods in e-commerce. However, it differs based on product and its popularity, the offer type (packaged game, virtual currency to a game or subscription service), storefront and geography. Existing fraud policies and models make decisions independently for each transaction based on transaction attributes, payment velocities, user characteristics, and other relevant information. However, suspicious transactions may still evade detection and hence we propose a broad learning approach leveraging a graph based perspective to uncover relationships among suspicious transactions, i.e., inter-transaction dependency. Our focus is to detect suspicious transactions by capturing common fraudulent behaviors that would not be considered suspicious when being considered in isolation. In this paper, we present HitFraud that leverages heterogeneous information networks for collective fraud detection by exploring correlated and fast evolving fraudulent behaviors. First, a heterogeneous information network is designed to link entities of interest in the transaction database via different semantics. Then, graph based features are efficiently discovered from the network exploiting the concept of meta-paths, and decisions on frauds are made collectively on test instances. Experiments on real-world payment transaction data from Electronic Arts demonstrate that the prediction performance is effectively boosted by HitFraud with fast convergence.

【Keywords】: collective fraud detection; inter-transaction dependency; heterogeneous information network

83. Domain Adaptation for Online ECG Monitoring.

【Paper Link】【Pages】:775-780

【Authors】: Diego Carrera ; Beatrice Rossi ; Pasqualina Fragneto ; Giacomo Boracchi

【Abstract】: Successful ECG monitoring algorithms often rely on learned models to describe the heartbeats morphology. Unfortunately, when the heart rate increases the heartbeats get transformed, and a model that can properly describe the heartbeats of a specific user in resting conditions might not be appropriate for monitoring the same user during everyday activities. We model heartbeats by dictionaries yielding sparse representations and propose a novel domain-adaptation solution which transforms user-specific dictionaries according to the heart rate. In particular, we learn suitable linear transformations from a large dataset containing ECG tracings, and we show that these transformations can successfully adapt dictionaries when the heart rate changes. Remarkably, the same transformations can be used for multiple users and different sensing apparatus. We investigate the implications of our findings in ECG monitoring by wearable devices, and present an efficient implementation of an anomaly-detection algorithm leveraging such transformations.

【Keywords】: Heart beat; Dictionaries; Electrocardiography; Monitoring; Adaptation models; Training

84. EC3: Combining Clustering and Classification for Ensemble Learning.

【Paper Link】【Pages】:781-786

【Authors】: Tanmoy Chakraborty

【Abstract】: We propose EC3, a novel algorithm that merges classification and clustering together in order to support both binary and multi-class classification. EC3 is based on a principled combination of multiple classification and multiple clustering methods using a convex optimization function. We additionally propose iEC3, a variant of EC3 that handles imbalanced training data. We perform an extensive experimental analysis comparing EC3 and iEC3 with 12 baseline methods on 13 standard benchmark datasets. We show that our methods outperform other baselines for every single dataset, achieving at most 10% higher AUC. Moreover our methods are faster, more resilient to noise and class imbalance than the best baseline method.

【Keywords】: Ensemble algorithm; clustering; classification; imbalanced data

85. Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records.

【Paper Link】【Pages】:787-792

【Authors】: Zhengping Che ; Yu Cheng ; Shuangfei Zhai ; Zhaonan Sun ; Yan Liu

【Abstract】: The rapid growth of Electronic Health Records (EHRs), as well as the accompanied opportunities in Data-Driven Healthcare (DDH), has been attracting widespread interests and attentions. Recent progress in the design and applications of deep learning methods has shown promising results and is forcing massive changes in healthcare academia and industry, but most of these methods rely on massive labeled data. In this work, we propose a general deep learning framework which is able to boost risk prediction performance with limited EHR data. Our model takes a modified generative adversarial network namely ehrGAN, which can provide plausible labeled EHR data by mimicking real patient records, to augment the training dataset in a semi-supervised learning manner. We use this generative model together with a convolutional neural network (CNN) based prediction model to improve the onset prediction performance. Experiments on two real healthcare datasets demonstrate that our proposed framework produces realistic data samples and achieves significant improvements on classification tasks with the generated data over several stat-of-the-art baselines.

【Keywords】: electronic health record; generative adversarial network; health care; deep learning

86. Clustering by Shift.

【Paper Link】【Pages】:793-798

【Authors】: Morteza Haghir Chehreghani

【Abstract】: In order to yield a more balanced partitioning, we investigate the use of additive regularizations for the Min Cut cost function, instead of normalization. In particular, we study the case where the regularization term is the sum of the squared size of the clusters, which then leads to shifting (adaptively) the pairwise similarities. We study the connection of such a model with Correlation Clustering and then propose an efficient local search optimization algorithm to solve the new clustering problem. Finally, we demonstrate the superior performance of our method by extensive experiments on different datasets.

【Keywords】: Clustering; Regularization; Shift; Efficiency; Local search

87. Efficient Computation of Pairwise Minimax Distance Measures.

【Paper Link】【Pages】:799-804

【Authors】: Morteza Haghir Chehreghani

【Abstract】: We study efficient computation of Minimax distances measures, which enable to capture the correct structures via taking the transitive relations into account. We analyze in detail two settings, the dense graphs and the sparse graphs. In particular, we show that an adapted variant of the Kruskal's algorithm is the most efficient approach for computing pairwise Minimax distances. However, for dense graphs we require a preprocessing step based on the Prim's algorithm, in order to reduce the set of candidate edges to be investigated. For each case, we study the correctness, efficiency and computational optimality of our approach. We perform numerical experiments on several datasets to validate the superior performance of our methods.

【Keywords】: Distance measure; Minimax distances; Efficiency; Dense graphs; Sparse Graphs

88. Warehouse Site Selection for Online Retailers in Inter-Connected Warehouse Networks.

【Paper Link】【Pages】:805-810

【Authors】: Can Chen ; Junming Liu ; Qiao Li ; Yijun Wang ; Hui Xiong ; Shanshan Wu

【Abstract】: Supply chain management aims at delivering goods in the shortest time at the lowest possible price while ensuring the best possible quality and is now vital to the success of the online retail business. Executing effective warehouse site selection has been one of the key challenges in the development of a successful supply chain system. While some effective strategies for warehouse site selection have been identified by the domain experts based on their experiences, the emergence of new ways of collecting fine-grained supply chain data has enabled a new paradigm for warehouse site selection. Indeed, in this paper, we provide a data-smart approach for addressing the connected capacitated warehouse location problem (CCWL), which searches for the minimum total transportation cost of the warehouse network including supplier-warehouses shipping cost, warehouse-customer delivering cost and the cost of warehouse-warehouse inter-transportation. Specifically, we first design a sales distribution prediction model and evaluate the importance of customer logistic service utilities on online market sales demand for online retailers. Then, we propose the E&M algorithm to optimize warehouse locations continuously with much less computation cost. Moreover, the computation cost is further reduced through delivery demand based Hierarchical Clustering which reduces the problem size by grouping delivering cities with close locations. Finally, we validate the proposed method on real-world e-Commerce supply chain data and the selection effect of new warehouses is evaluated in terms of sales improvement with faster delivery and more effective inventory management.

【Keywords】: Site Selection; Demand Prediction; Clustering; E-commerce

89. Learning Multiple Similarities of Users and Items in Recommender Systems.

【Paper Link】【Pages】:811-816

【Authors】: Huiyuan Chen ; Jing Li

【Abstract】: In addition to the sparse user-item (U-I) matrix, an increasing number of current recommender systems seek to improve performance by exploiting extra heterogeneous data sources (e.g., online social networks). Such rich side sources can provide very useful information about users' personal behaviors and items' properties, therefore can significantly benefit recommender systems. Most existing work can only incorporate a single side source of users or items. In many real-life applications, however, there exist multiple side sources for both users and items, which may be constructed from multiple domains. To incorporate multiple sources, we propose a flexible and robust algorithm called MSUI (Learning Multiple Similarities of Users and Items). The key idea of MSUI is to simultaneously capture the associations between latent factors in the U-I matrix and multiple side sources of users and items through joint nonnegative matrix factorization. MSUI has several advantages over existing methods. First, it seamlessly integrates information from the U-I matrix and information from multiple side sources. Second, by incorporating multiple sources that are usually in complementary, it's more robust to noise contained in each source. Finally, it provides a unified framework that can be easily extended to incorporate additional data sources.

【Keywords】: Recommender Systems; Multi-View Learning; Matrix Factorization

90. Learning to Fuse Music Genres with Generative Adversarial Dual Learning.

【Paper Link】【Pages】:817-822

【Authors】: Zhiqian Chen ; Chih-Wei Wu ; Yen-Cheng Lu ; Alexander Lerch ; Chang-Tien Lu

【Abstract】: FusionGAN is a novel genre fusion framework for music generation that integrates the strengths of generative adversarial networks and dual learning. In particular, the proposed method offers a dual learning extension that can effectively integrate the styles of the given domains. To efficiently quantify the difference among diverse domains and avoid the vanishing gradient issue, FusionGAN provides a Wasserstein based metric to approximate the distance between the target domain and the existing domains. Adopting the Wasserstein distance, a new domain is created by combining the patterns of the existing domains using adversarial learning. Experimental results on public music datasets demonstrated that our approach could effectively merge two genres.

【Keywords】: Gallium nitride; Generators; Convergence; Creativity; Linear programming; Machine learning; Music

91. Data Prefetching for Large Tiered Storage Systems.

【Paper Link】【Pages】:823-828

【Authors】: Giovanni Cherubini ; Yusik Kim ; Mark A. Lantz ; Vinodh Venkatesan

【Abstract】: In multi-tier storage systems with large amounts of data, most of the data is stored on inexpensive slower tiers such as cloud or tape to achieve cost savings. This also implies that retrieving the data from the slower storage tiers incurs high latency. Therefore, it would be beneficial to proactively prefetch data from slower tiers to faster tiers by predicting future data accesses. State-of-the-art access prediction methods typically record access history of individual files, data objects, or data segments. However, in systems with large amounts of infrequently accessed (or cold) data, file-level access history is often unavailable for much of the data due to the low frequency of access. In this paper, we extract information from file metadata to predict file accesses in a storage system. The proposed method relies on the hypothesis that users and applications access data stored in the system in a given context and that the context and, therefore, the set of files that are likely to be accessed can be identified by detecting access patterns in file metadata. As an application, we consider the LOFAR radio telescope's long term archive, where the access patterns are learned based on a rich set of metadata, and these patterns are then used to make predictions as to likely future accesses by the astronomers.

【Keywords】: access prediction; caching; machine learning; archive; LOFAR

92. Audio-Visual Sentiment Analysis for Learning Emotional Arcs in Movies.

【Paper Link】【Pages】:829-834

【Authors】: Eric Chu ; Deb Roy

【Abstract】: Stories can have tremendous power - not only useful for entertainment, they can activate our interests and mobilize our actions. The degree to which a story resonates with its audience may be in part reflected in the emotional journey it takes the audience upon. In this paper, we use machine learning methods to construct emotional arcs in movies, calculate families of arcs, and demonstrate the ability for certain arcs to predict audience engagement. The system is applied to Hollywood films and high quality shorts found on the web. We begin by using deep convolutional neural networks for audio and visual sentiment analysis. These models are trained on both new and existing large-scale datasets, after which they can be used to compute separate audio and visual emotional arcs. We then crowdsource annotations for 30-second video clips extracted from highs and lows in the arcs in order to assess the micro-level precision of the system, with precision measured in terms of agreement in polarity between the system's predictions and annotators' ratings. These annotations are also used to combine the audio and visual predictions. Next, we look at macro-level characterizations of movies by investigating whether there exist 'universal shapes' of emotional arcs. In particular, we develop a clustering approach to discover distinct classes of emotional arcs. Finally, we show on a sample corpus of short web videos that certain emotional arcs are statistically significant predictors of the number of comments a video receives. These results suggest that the emotional arcs learned by our approach successfully represent macroscopic aspects of a video story that drive audience engagement. Such machine understanding could be used to predict audience reactions to video stories, ultimately improving our ability as storytellers to communicate with each other.

【Keywords】: visual sentiment analysis; audio sentiment analysis; multimodal; emotions; emotional arcs; stories; video

93. Robust Estimation of Gaussian Copula Causal Structure from Mixed Data with Missing Values.

【Paper Link】【Pages】:835-840

【Authors】: Ruifei Cui ; Perry Groot ; Tom Heskes

【Abstract】: We consider the problem of causal structure learning from data with missing values, assumed to be drawn from a Gaussian copula model. First, we extend the 'Rank PC' algorithm, designed for Gaussian copula models with purely continuous data (so-called nonparanormal models), to incomplete data by applying rank correlation to pairwise complete observations and replacing the sample size with an effective sample size in the conditional independence tests to account for the information loss from missing values. The resulting approach works when the data are missing completely at random (MCAR). Then, we propose a Gibbs sampling procedure to draw correlation matrix samples from mixed data under missingness at random (MAR). These samples are translated into an average correlation matrix, and an effective sample size, resulting in the 'Copula PC' algorithm for incomplete data. Simulation study shows that: 1) the usage of the effective sample size significantly improves the performance of 'Rank PC' and 'Copula PC'; 2) 'Copula PC' estimates a more accurate correlation matrix and causal structure than 'Rank PC' under MCAR and, even more so, under MAR. Also, we illustrate our methods on a real-world data set about gene expression.

【Keywords】: Correlation; Algorithm design and analysis; Data models; Mathematical model; Standards; Markov processes; Computational modeling

94. Network Clocks: Detecting the Temporal Scale of Information Diffusion.

【Paper Link】【Pages】:841-846

【Authors】: Daniel J. DiTursi ; Gregorios A. Katsios ; Petko Bogdanov

【Abstract】: Information diffusion models typically assume a discrete timeline in which an information token spreads in the network. Since users in real-world networks vary significantly in their intensity and periods of activity, our objective in this work is to answer: How to determine a temporal scale that best agrees with the observed information propagation within a network? A key limitation of existing approaches is that they aggregate the timeline into fixed-size windows, which may not fit all network nodes' activity periods. We propose the notion of a heterogeneous network clock: a mapping of events to discrete timestamps that best explains their occurrence according to a given cascade propagation model. We focus on the widely-adopted independent cascade (IC) model and formalize the optimal clock as the one that maximizes the likelihood of all observed cascades. The single optimal clock (OC) problem can be solved exactly in polynomial time. However, we prove that learning multiple optimal clocks (kOC), corresponding to temporal patterns of groups of network nodes, is NP-hard. We propose scalable solutions that run in almost linear time in the total number of cascade activations and discuss approximation guarantees for each variant. Our algorithms and their detected clocks enable improved cascade size classification (up to 8% F1 lift) and improved missing cascade data inference (0.15 better recall). We also demonstrate that the network clocks exhibit consistency within the type of content diffusing in the network and are robust with respect to the propagation probability parameters of the IC model.

【Keywords】: Information diffusion; network clocks; cascades; social network analysis

95. Local Community Detection in Dynamic Networks.

【Paper Link】【Pages】:847-852

【Authors】: Daniel J. DiTursi ; Gaurav Ghosh ; Petko Bogdanov

【Abstract】: Given a time-evolving network, how can we detect communities over periods of high internal and low external interactions? To address this question we generalize traditional local community detection in graphs to the setting of dynamic networks. Adopting existing static-network approaches in an “aggregated” graph of all temporal interactions is not appropriate for the problem as dynamic communities may be short-lived and thus lost when mixing interactions over long periods. Hence, dynamic community mining requires the detection of both the community nodes and an optimal time interval in which they are actively interacting. We propose a filter-and-verify framework for dynamic community detection. To scale to long intervals of graph evolution, we employ novel spectral bounds for dynamic community conductance and employ them to filter suboptimal periods in near-linear time. We also design a time-and-graph-aware locality sensitive hashing family to effectively spot promising community cores. Our method PHASR discovers communities of consistently higher quality (2 to 67 times better) than those of baselines. At the same time, our bounds allow for pruning between 55% and 95% of the search space, resulting in significant savings in running time compared to exhaustive alternatives for even modest time intervals of graph evolution.

【Keywords】: community detection; dynamic graphs; conductance; spectral methods

96. Online Nearest Neighbor Search in Binary Space.

【Paper Link】【Pages】:853-858

【Authors】: Sepehr Eghbali ; Mohammad Hassan Zokaei Ashtiani ; Ladan Tahvildari

【Abstract】: We revisit the K Nearest Neighbors (KNN) problem in large binary datasets which is of major importance in several applied areas. The goal is to find the K nearest items in a dataset to a query point where both the query and the items lie in the Hamming cube. The problem is addressed in its online setting, that is, data items are inserted sequentially into the dataset. To accommodate efficient similarity search and fast insertion of new items, we propose a data structure that partitions the feature space based on the Hamming weights of the binary codes and their substrings. Empirical evaluations on a large-scale dataset show significant speedup over the linear scan baseline.

【Keywords】: Nearest neighbor search; Hamming distance; Similarity search; Hamming weight; Tree search

97. Finding Streams in Knowledge Graphs to Support Fact Checking.

【Paper Link】【Pages】:859-864

【Authors】: Prashant Shiralkar ; Alessandro Flammini ; Filippo Menczer ; Giovanni Luca Ciampaglia

【Abstract】: The volume of information generated online makes it impossible to manually fact-check all claims. Computational approaches for fact checking may be the key to help mitigate the risks of massive misinformation spread. Such approaches can be designed to not only be scalable and effective at assessing veracity of dubious claims, but also to boost a human fact checker's productivity by surfacing relevant facts and patterns to aid their analysis. We present a novel, unsupervised network-flow based approach to determine the truthfulness of a statement of fact expressed in the form of a triple. We view a knowledge graph of background information about real-world entities as a flow network, and show that computational fact checking then amounts to finding a "knowledge stream" connecting the subject and object of the triple. Evaluation on a range of real-world and hand-crafted datasets of facts reveals that this network-flow model can be very effective in discerning true statements from false ones, outperforming existing algorithms on many test cases. Moreover, the model is expressive in its ability to automatically discover several useful patterns and surface relevant facts that may help a human fact checker.

【Keywords】: Knowledge Stream; Fact Checking; Knowledge Graph; Network Flow; Minimum Cost Maximum Flow

98. Generating Synthetic Time Series to Augment Sparse Datasets.

【Paper Link】【Pages】:865-870

【Authors】: Germain Forestier ; François Petitjean ; Hoang Anh Dau ; Geoffrey I. Webb ; Eamonn J. Keogh

【Abstract】: In machine learning, data augmentation is the process of creating synthetic examples in order to augment a dataset used to learn a model. One motivation for data augmentation is to reduce the variance of a classifier, thereby reducing error. In this paper, we propose new data augmentation techniques specifically designed for time series classification, where the space in which they are embedded is induced by Dynamic Time Warping (DTW). The main idea of our approach is to average a set of time series and use the average time series as a new synthetic example. The proposed methods rely on an extension of DTW Barycentric Averaging (DBA), the averaging technique that is specifically developed for DTW. In this paper, we extend DBA to be able to calculate a weighted average of time series under DTW. In this case, instead of each time series contributing equally to the final average, some can contribute more than others. This extension allows us to generate an infinite number of new examples from any set of given time series. To this end, we propose three methods that choose the weights associated to the time series of the dataset. We carry out experiments on the 85 datasets of the UCR archive and demonstrate that our method is particularly useful when the number of available examples is limited (e.g. 2 to 6 examples per class) using a 1-NN DTW classifier. Furthermore, we show that augmenting full datasets is beneficial in most cases, as we observed an increase of accuracy on 56 datasets, no effect on 7 and a slight decrease on only 22.

【Keywords】: time series classification; dynamic time warping; data augmentation

99. Efficient and Invariant Convolutional Neural Networks for Dense Prediction.

【Paper Link】【Pages】:871-876

【Authors】: Hongyang Gao ; Shuiwang Ji

【Abstract】: Convolutional neural networks have shown great success on feature extraction from raw input data such as images. Although convolutional neural networks are invariant to translations on the inputs, they are not invariant to other transformations, including rotation and flip. Recent attempts have been made to incorporate more invariance in image recognition applications, but they are not applicable to dense prediction tasks, such as image segmentation. In this paper, we propose a set of methods based on kernel rotation and flip to enable rotation and flip invariance in convolutional neural networks. The kernel rotation can be achieved on kernels of 3 × 3, while kernel flip can be applied on kernels of any size. By rotating in eight or four angles, the convolutional layers could produce the corresponding number of feature maps based on eight or four different kernels. By using flip, the convolution layer can produce three feature maps. By combining produced feature maps using maxout, the resource requirement could be significantly reduced while still retain the invariance properties. Experimental results demonstrate that the proposed methods can achieve various invariance at reasonable resource requirements in terms of both memory and time.

【Keywords】: Kernel; Training; Image segmentation; Convolution

100. Spectral Lens: Explainable Diagnostics, Tools and Discoveries in Directed, Weighted Graphs.

【Paper Link】【Pages】:877-882

【Authors】: Sebastian Goebl ; Srijan Kumar ; Christos Faloutsos

【Abstract】: How can we quickly explain large-scale directed and weighted graphs? We present Spectral Lens (SL) to analyze a variety of real-world networks with both negative and positive edge weights, like DBLP relationships, email communications and Bitcoin trust votings. SL offers value on three levels: (a) Diagnostics: Spectral Lens combines spectral properties from Singular Value Decomposition to create an SL-Dictionary (SLD) to enhance understanding of any directed, weighted graph. (b) Tools: the SL-Algorithm (SLA) automatically extracts the top groups of nodes with similar connectivity, finds groups of shared connectivity and detects suspicious behaviour in a graph. (c) Discoveries: Experiments on several real-world networks illustrate the effectiveness of SLA. Observations from synthetic and real-world networks reveal relations between spectral and graph properties. We show that SLA is highly scalable and linear on the size of a graph. Analyzing a graph with over 2 million edges takes less than 5 minutes. Overall, SL provides an easy-to-use tool for practitioners to explain a weighted and directed graph quickly, to understand its connectivity and identify regular and anomalous behaviors.

【Keywords】: Graph Mining; Singular Value Decomposition

101. High-Dimensional Dependency Structure Learning for Physical Processes.

【Paper Link】【Pages】:883-888

【Authors】: Jamal Golmohammadi ; Imme Ebert-Uphoff ; Sijie He ; Yi Deng ; Arindam Banerjee

【Abstract】: In this paper, we consider the use of structure learning methods for probabilistic graphical models to identify statistical dependencies in high-dimensional physical processes. Such processes are often synthetically characterized using PDEs (partial differential equations) and are observed in a variety of natural phenomena. In this paper, we present ACLIME-ADMM, an efficient two-step algorithm for adaptive structure learning, which decides a suitable edge specific threshold in a data-driven statistically rigorous manner. Both steps of our algorithm use (inexact) ADMM to solve suitable linear programs, and all iterations can be done in closed form in an efficient block parallel manner. We compare ACLIME-ADMM with baselines on both synthetic data simulated by PDEs that model advection-diffusion processes, and real data of daily global geopotential heights to study information flow in the atmosphere. ACLIME-ADMM is shown to be efficient, stable, and competitive, usually better than the baselines especially on difficult problems. On real data, ACLIME-ADMM recovers the underlying structure of global atmospheric circulation, including switches in wind directions at the equator and tropics entirely from the data.

【Keywords】: structure learning; high-dimensional physical process; PC stable; ACLIME-ADMM; geoscience

102. Learning with Inadequate and Incorrect Supervision.

【Paper Link】【Pages】:889-894

【Authors】: Chen Gong ; Hengmin Zhang ; Jian Yang ; Dacheng Tao

【Abstract】: Practically, we are often in the dilemma that the labeled data at hand are inadequate to train a reliable classifier, and more seriously, some of these labeled data may be mistakenly labeled due to the various human factors. Therefore, this paper proposes a novel semi-supervised learning paradigm that can handle both label insufficiency and label inaccuracy. To address label insufficiency, we use a graph to bridge the data points so that the label information can be propagated from the scarce labeled examples to unlabeled examples along the graph edges. To address label inaccuracy, Graph Trend Filtering (GTF) and Smooth Eigenbase Pursuit (SEP) are adopted to filter out the initial noisy labels. GTF penalizes the l_0 norm of label difference between connected examples in the graph and exhibits better local adaptivity than the traditional l_2 norm-based Laplacian smoother. SEP reconstructs the correct labels by emphasizing the leading eigenvectors of Laplacian matrix associated with small eigenvalues, as these eigenvectors reflect real label smoothness and carry rich class separation cues. We term our algorithm as "Semi-supervised learning under Inadequate and Incorrect Supervision" (SIIS). Thorough experimental results on image classification, text categorization, and speech recognition demonstrate that our SIIS is effective in label error correction, leading to superior performance to the state-of-the-art methods in the presence of label noise and label scarcity.

【Keywords】: semi-supervised learning; label noise; graph trend filtering; smooth eigenbase pursuit

103. Market Basket Prediction Using User-Centric Temporal Annotated Recurring Sequences.

【Paper Link】【Pages】:895-900

【Authors】: Riccardo Guidotti ; Giulio Rossetti ; Luca Pappalardo ; Fosca Giannotti ; Dino Pedreschi

【Abstract】: Nowadays, a hot challenge for supermarket chains is to offer personalized services to their customers. Market basket prediction, i.e., supplying the customer a shopping list for the next purchase according to her current needs, is one of these services. Current approaches are not capable of capturing at the same time the different factors influencing the customer's decision process: co-occurrence, sequentuality, periodicity and recurrency of the purchased items. To this aim, we define a pattern named Temporal Annotated Recurring Sequence (TARS). We define the method to extract TARS and develop a predictor for next basket named TBP (TARS Based Predictor) that, on top of TARS, is able to understand the level of the customer's stocks and recommend the set of most necessary items. A deep experimentation shows that TARS can explain the customers' purchase behavior, and that TBP outperforms the state-of-the-art competitors.

【Keywords】: Next Basket Prediction; Temporal Sequences; Temporal Patterns; Market Basket Analysis

104. Tensor Based Relations Ranking for Multi-relational Collective Classification.

【Paper Link】【Pages】:901-906

【Authors】: Chao Han ; Qingyao Wu ; Michael K. Ng ; Jiezhang Cao ; Mingkui Tan ; Jian Chen

【Abstract】: In this paper, we study relations ranking and object classification for multi-relational data where objects are interconnected by multiple relations. The relations among objects should be exploited for achieving a good classification. While most existing approaches exploit either by directly counting the number of connections among objects or by learning the weight of each relation from labeled data only. In this paper, we propose an algorithm, TensorRRCC, which is able to determine the ranking of relations and the labels of objects simultaneously. Our basic idea is that highly ranked relations within a class should play more important roles in object classification, and class membership information is important for determining a ranking quality over the relations w.r.t. a specific learning task. TensorRRCC implements the idea by modeling a Markov chain on transition probability graphs from connection and feature information with both labeled and unlabeled objects and propagates the ranking scores of relations and relevant classes of objects. An iterative progress is proposed to solve a set of tensor equations to obtain the stationary distribution of relations and objects. We compared our algorithm with current collective classification algorithms on two real-world data sets and the experimental results show the superiority of our method.

【Keywords】: relations ranking; classification; tensor; multi-relational data

105. Sub-Gibbs Sampling: A New Strategy for Inferring LDA.

【Paper Link】【Pages】:907-912

【Authors】: Chuan Hu ; Huiping Cao ; Qixu Gong

【Abstract】: Latent Dirichlet Allocation (LDA) has been widely used in text mining to discover topics from documents. One major approach to learn LDA is Gibbs sampling. The basic Collapsed Gibbs Sampling (CGS) algorithm requires O(NZ) computations to learn an LDA model with Z topics from a corpus containing N tokens. Existing approaches that improve the complexity of CGS focus on reducing the factor Z. In this work, we propose a novel and general Sub-Gibbs Sampling (SGS) strategy to improve the Gibbs-Sampling computation by reducing the sample space. This new strategy targets at reducing the factor N by sampling only a subset of the whole corpus. The design of the SGS strategy is based on two properties that we observe: (i) topic distributions of tokens are skewed and (ii) a subset of documents can approximately represent the semantics of the whole corpus. We prove that the SGS strategy can achieve comparable effectiveness (with bounded errors) and significantly reduce the complexity of existing Gibbs sampling algorithms. Extensive experiments on large real-world data sets show that the proposed SGS strategy is much faster than several state-of-the-art fast Gibbs sampling algorithms and the proposed SGS strategy can learn comparable LDA models as other Gibbs sampling algorithms.

【Keywords】: Topic Models; Gibbs Sampling; Sub-sampling

106. Behind Distribution Shift: Mining Driving Forces of Changes and Causal Arrows.

【Paper Link】【Pages】:913-918

【Authors】: Biwei Huang ; Kun Zhang ; Jiji Zhang ; Ruben Sanchez-Romero ; Clark Glymour ; Bernhard Schölkopf

【Abstract】: We address two important issues in causal discovery from nonstationary or heterogeneous data, where parameters associated with a causal structure may change over time or across data sets. First, we investigate how to efficiently estimate the "driving force" of the nonstationarity of a causal mechanism. That is, given a causal mechanism that varies over time or across data sets and whose qualitative structure is known, we aim to extract from data a low-dimensional and interpretable representation of the main components of the changes. For this purpose we develop a novel kernel embedding of nonstationary conditional distributions that does not rely on sliding windows. Second, the embedding also leads to a measure of dependence between the changes of causal modules that can be used to determine the directions of many causal arrows. We demonstrate the power of our methods with experiments on both synthetic and real data.

【Keywords】: Causal discovery; Data distribution shift; Nonstationary driving force; Kernel mean embedding

107. A Self-Paced Category-Aware Approach for Unsupervised Adaptation Networks.

【Paper Link】【Pages】:919-924

【Authors】: Wenzhen Huang ; Peipei Yang ; Kaiqi Huang

【Abstract】: The success of deep neural networks usually relies on a large number of labeled training samples, which unfortunately are not easy to obtain in practice. Unsupervised domain adaptation focuses on the problem where there is no labeled data in the target domain. In this paper, we propose a novel deep unsupervised domain adaptation method that learns transferable features. Different from most existing methods, it attempts to learn a better domain-invariant feature representation by performing a category-wise adaptation to match the conditional distributions of samples with respect to each category. A self-paced learning strategy is used to bring the awareness of label information gradually, which makes the category-wise adaptation feasible even if the labels are unavailable in target domain. Then, we give detailed theoretical analysis to explain how the better performance is obtained. The experimental results show that our method outperforms the current state of the arts on standard domain adaptation datasets.

【Keywords】: Unsupervised Domain Adaptation; Transfer Learning

108. CRAD: Clustering with Robust Autocuts and Depth.

【Paper Link】【Pages】:925-930

【Authors】: Xin Huang ; Yulia R. Gel

【Abstract】: We develop a new density-based clustering algorithm named CRAD which is based on a new neighbor searching function with a robust data depth as the dissimilarity measure. Our experiments prove that the new CRAD is highly competitive at detecting clusters with varying densities, compared with the existing algorithms such as DBSCAN, OPTICS and DBCA. Furthermore, a new effective parameter selection procedure is developed to select the optimal underlying parameter in the real-world clustering, when the ground truth is unknown. Lastly, we suggest a new clustering framework that extends CRAD from spatial data clustering to time series clustering without a-priori knowledge of the true number of clusters. The performance of CRAD is evaluated through extensive experimental studies.

【Keywords】: clustering; space-time processes; data depth

【Paper Link】【Pages】:931-936

【Authors】: Bhushan Kulkarni ; Sumit Agarwal ; Abir De ; Sourangshu Bhattacharya ; Niloy Ganguly

【Abstract】: Online Social Networks (OSNs) have emerged as a global media for forming and shaping opinions on a broad spectrum of topics like politics, e-commerce, sports, etc. So, research on understanding and predicting opinion dynamics in OSNs, especially using a tractable linear model, has abound in literature. However, these linear models are too simple to uncover the actual complex dynamics of opinion flow in social networks. In this paper, we propose SLANT+, a novel nonlinear generative model for opinion dynamics, by extending our earlier linear opinion model SLANT [7]. To design this model, we rely on a network-guided recurrent neural network architecture which learns a proper temporal representation of the messages as well as the underlying network. Furthermore, we probe various signals from the real life datasets and offer a conceptually interpretable nonlinear function that not only provides concrete clues of the opinion exchange process, but also captures the coupled dynamics of message timings and opinion flow. As a result, with five real-life datasets crawled from Twitter, our proposal gives significant accuracy boost over six state-of-the-art baselines.

【Keywords】: Social networks; Opinion dynamics; Temporal representations

110. Effective Large-Scale Online Influence Maximization.

【Paper Link】【Pages】:937-942

【Authors】: Paul Lagrée ; Olivier Cappé ; Bogdan Cautis ; Silviu Maniu

【Abstract】: In this paper, we study a highly generic version of influence maximization (IM), one of optimizing influence campaigns by sequentially selecting "spread seeds" from a set of candidates, a small subset of the node population, under the hypothesis that, in a given campaign, previously activated nodes remain "persistently" active throughout and thus do not yield further rewards. We call this problem online influence maximization with persistence. We introduce an estimator on the candidates' missing mass - the expected number of nodes that can still be reached from a given seed candidate - and justify its strength to rapidly estimate the desired value. We then describe a novel algorithm, GT-UCB, relying on upper confidence bounds on the missing mass. We show that our approach leads to high-quality spreads on classic IM datasets, even though it makes almost no assumptions on the diffusion medium. Importantly, it is orders of magnitude faster than state-of-the-art IM methods.

【Keywords】: Influence maximization; information diffusion; online social networks; online learning; multi-armed bandits

111. Identifying Media Bias by Analyzing Reported Speech.

【Paper Link】【Pages】:943-948

【Authors】: Konstantina Lazaridou ; Ralf Krestel ; Felix Naumann

【Abstract】: Media analysis can reveal interesting patterns in the way newspapers report the news and how these patterns evolve over time. One example pattern is the quoting choices that media make, which could be used as bias indicators. Media slant can be expressed both with the choice of reporting an event, e.g. a person's statement, but also with the words used to describe the event. Thus, automatic discovery of systematic quoting patterns in the news could illustrate to the readers the media' beliefs, such as political preferences. In this paper, we aim to discover political media bias by demonstrating systematic patterns of reporting speech in two major British newspapers. To this end, we analyze news articles from 2000 to 2015. By taking into account different kinds of bias, such as selection, coverage and framing bias, we show that the quoting patterns of newspapers are predictable.

【Keywords】: media bias; political bias; news analysis; reported speech; classification

112. Fast Compressive Spectral Clustering.

【Paper Link】【Pages】:949-954

【Authors】: Ting Li ; Yiming Zhang ; Dongsheng Li ; Xinwang Liu ; Yuxing Peng

【Abstract】: Compressive spectral clustering (CSC) efficiently leverages graph filter and random sampling techniques to speed up clustering process. However, we find that CSC algorithm suffers from two main problems: i) The direct use of the dichotomy and eigencount techniques for estimating laplacian matrix's k-th eigenvalue is expensive. ii) The computation of polynomial approximation repeats in each iteration for every cluster in the interpolation process, which occupies most of the computation time of CSC. To address these problems, we propose a new approach called FCSC for fast compressive spectral clustering. FCSC addresses the first problem by assuming that the eigenvalues approximately satisfy local uniform distribution, and addresses the second problem by recalculating the pairwise similarity between nodes with low-dimensional representation to reconstruct denoised laplacian matrix. The time complexity of reconstruction is linear with the number of non-zeros in laplacian matrix. As experimentally demonstrated on artificial and real-world datasets, our approach significantly reduces the computation time while preserving high clustering accuracy comparable to previous designs, verifying the effectiveness of FCSC.

【Keywords】: Eigenvalues and eigenfunctions; Matrix decomposition; Laplace equations; Interpolation; Time complexity; Acceleration; Estimation

113. A Broad Learning Approach for Context-Aware Mobile Application Recommendation.

【Paper Link】【Pages】:955-960

【Authors】: Tingting Liang ; Lifang He ; Chun-Ta Lu ; Liang Chen ; Philip S. Yu ; Jian Wu

【Abstract】: With the rapid development of mobile apps, the availability of a large number of mobile apps in application stores brings challenges to locate appropriate apps for users. Providing accurate mobile app recommendation for users becomes an imperative task. Conventional approaches mainly focus on learning users' preferences and app features to predict the user-app ratings. However, most of them did not consider the interactions among the context information of apps. To address this issue, we propose a broad learning approach for Context-Aware app recommendation with Tensor Analysis (CATA). Specifically, we utilize a tensor-based framework to effectively integrate app category information and multi-view features on users and apps, respectively, to facilitate the performance of rating prediction. The multidimensional structure is employed to capture the hidden relationships among the app categories and the multiview features. We develop an efficient factorization method which applies Tucker decomposition to learn the full-order interactions among the app categories and features. Furthermore, we employ a group ℓ₁-norm regularization to learn the group-wise feature importance of each view with respect to each app category. Experiments on a real-world mobile app dataset demonstrate the effectiveness of the proposed method.

【Keywords】: Tensile stress; Mobile communication; Computer science; Navigation; Mobile applications; Buildings

114. Recover Fine-Grained Spatial Data from Coarse Aggregation.

【Paper Link】【Pages】:961-966

【Authors】: Bang Liu ; Borislav Mavrin ; Linglong Kong ; Di Niu

【Abstract】: In this paper, we study a new type of spatial sparse recovery problem, that is to infer the fine-grained spatial distribution of certain density data in a region only based on the aggregate observations recorded for each of its subregions. One typical example of this spatial sparse recovery problem is to infer spatial distribution of cellphone activities based on aggregate mobile traffic volumes observed at sparsely scattered base stations. We propose a novel Constrained Spatial Smoothing (CSS) approach, which exploits the local continuity that exists in many types of spatial data to perform sparse recovery via finite-element methods, while enforcing the aggregated observation constraints through an innovative use of the ADMM algorithm. We also improve the approach to further utilize additional geographical attributes. Extensive evaluations based on a large dataset of phone call records and a demographical dataset from the city of Milan show that our approach significantly outperforms various state-of-the-art approaches, including Spatial Spline Regression (SSR).

【Keywords】: Spatial Sparse Recovery; Constrained Spatial Smoothing; Spatial Spline Regression; Alternating Direction Method of Multipliers

115. Multi-view Graph Embedding with Hub Detection for Brain Network Analysis.

【Paper Link】【Pages】:967-972

【Authors】: Guixiang Ma ; Chun-Ta Lu ; Lifang He ; Philip S. Yu ; Ann B. Ragin

【Abstract】: Multi-view graph embedding and hub detection have both become widely studied problems in the area of graph learning. Both graph embedding and hub detection relate to the node clustering structure of graphs. The multi-view graph embedding usually implies the node clustering structure of the graph based on the multiple views, while hubs are the boundary-spanning nodes across different node clusters in the graph and thus may potentially influence the clustering structure of the graph. However, none of the existing works considered joint learning the multi-view embeddings and the hubs from multi-view graph data. In this paper, we propose to incorporate the hub detection task into the multi-view graph embedding framework so that the two tasks could benefit from each other. Specifically, we propose an auto-weighted framework of Multi-view Graph Embedding with Hub Detection (MVGE-HD) for brain network analysis. The MVGE-HD framework learns a unified graph embedding across all the views while reducing the potential influence of the hubs on blurring the boundaries between node clusters in the graph, thus leading to a clear and discriminative node clustering structure for the graph. We apply MVGE-HD on two real multi-view brain network datasets (i.e., HIV and Bipolar). The experimental results demonstrate the superior performance of the proposed framework in brain network analysis for clinical investigation and application.

【Keywords】: Linear programming; Neuroscience; Diffusion tensor imaging; Correlation

【Paper Link】【Pages】:973-978

【Authors】: Xiao Ma ; Xiaofeng Gao ; Guihai Chen

【Abstract】: In recent years, predicting future hot events in online social networks is becoming increasingly meaningful in marketing, advertisement, and recommendation systems to support companies' strategy making. Currently, most prediction models require long-term observations over the event or depend a lot on other features which are expensive to extract. However, at the early stage of an event, the temporal features of hot events and non-hot events are not distinctive yet. Besides, given the small amount of available data, high noise and complex network structure, those state-of-art models are unable to give an accurate prediction at the very early stage of an event. Hence, we propose two Bayesian perspective models to handle this dilemma. We first mathematically define the hot event prediction problem and introduce the general early stage event prediction framework, then model the five selected features into several continuous distributions, and present two Semi-Naive Bayes Classifier based prediction models, BEEP and SimBEEP, which is the simplified version of BEEP. Extensive experiments on real dataset have demonstrated that our model significantly outperforms the baseline methods.

【Keywords】: Acceleration; Conferences; Data mining; Silicon

117. Automatic Classification of Music Genre Using Masked Conditional Neural Networks.

【Paper Link】【Pages】:979-984

【Authors】: Fady Medhat ; David Chesmore ; John Robinson

【Abstract】: Neural network based architectures used for sound recognition are usually adapted from other application domains such as image recognition, which may not harness the time-frequency representation of a signal. The ConditionaL Neural Networks (CLNN) and its extension the Masked ConditionaL Neural Networks (MCLNN) are designed for multidimensional temporal signal recognition. The CLNN is trained over a window of frames to preserve the inter-frame relation, and the MCLNN enforces a systematic sparseness over the network's links that mimics a filterbank-like behavior. The masking operation induces the network to learn in frequency bands, which decreases the network susceptibility to frequency-shifts in time-frequency representations. Additionally, the mask allows an exploration of a range of feature combinations concurrently analogous to the manual handcrafting of the optimum collection of features for a recognition task. MCLNN have achieved competitive performance on the Ballroom music dataset compared to several hand-crafted attempts and outperformed models based on state-of-the-art Convolutional Neural Networks.

【Keywords】: Restricted Boltzmann Machine; RBM; Conditional RBM; CRBM; Deep Belief Net; DBN; Conditional Neural Network; CLNN; Masked Conditional Neural Network; MCLNN; Music Information Retrieval; MIR

118. Theoretically and Empirically High Quality Estimation of Closeness Centrality.

【Paper Link】【Pages】:985-990

【Authors】: Shogo Murai

【Abstract】: In the field of network analysis, centrality values of graph nodes, which represents the importance of nodes, have been widely studied. In this paper, we focus on one of the most basic centrality measures: closeness centrality. Since the exact computation of closeness centrality for all nodes of a network is prohibitively costly for massive networks, algorithms for estimating closeness centrality have been studied. In previous works, theoretical bounds on relative error have been improved. However, for complex networks such as social networks, empirical estimation qualities have hardly been improved since the sampling-based algorithm was proposed. In this paper, we propose simple and highly scalable algorithms for estimating closeness centrality of undirected networks. Our algorithms have theoretically and empirically better estimation quality than previous ones. As a result, our algorithms achieve strong quality guarantees and experimentally small relative errors at the same time. Also, our algorithms can be extended to strongly connected directed networks. Moreover, we can apply our algorithms to weighted centrality, in which nodes may have different weight, with slight modification.

【Keywords】: Estimation; Complex networks; Algorithm design and analysis; Approximation algorithms; Random variables; Social network services; Harmonic analysis

119. Efficient Computation of Multiple Density-Based Clustering Hierarchies.

【Paper Link】【Pages】:991-996

【Authors】: Antônio Cavalcante Araújo Neto ; Jörg Sander ; Ricardo J. G. B. Campello ; Mario A. Nascimento

【Abstract】: HDBSCAN, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN is robust w.r.t. mpts, choosing a "good" value for it can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may reveal themselves at different values of mpts. To explore results for a range of mpts, one has to run HDBSCAN for each value in the range independently, which is computationally inefficient. In this paper we propose an efficient approach to compute all HDBSCAN hierarchies for a range of mpts by replacing the graph used by HDBSCAN with a much smaller graph that is guaranteed to contain the required information. Our experiments show that our approach can obtain, for example, over one hundred hierarchies for a cost equivalent to running HDBSCAN about 2 times. In fact, this speedup tends to increase with the number of hierarchies to be computed.

【Keywords】: Clustering; HDBSCAN*; Hierarchical Clustering; Relative Neighborhood Graph

120. Reductions for Frequency-Based Data Mining Problems.

【Paper Link】【Pages】:997-1002

【Authors】: Stefan Neumann ; Pauli Miettinen

【Abstract】: Computational complexity of various maximal pattern mining problems, including maximal frequent items, maximal frequent subgraphs in labelled graphs, and maximal subsequences with no repetitions, is studied, and the complexities are found to be equivalent under novel constrained reductions. The results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014].

【Keywords】: Data mining; Computational complexity; Itemsets; Algorithm design and analysis

121. Ranking Causal Anomalies by Modeling Local Propagations on Networked Systems.

【Paper Link】【Pages】:1003-1008

【Authors】: Jingchao Ni ; Wei Cheng ; Kai Zhang ; Dongjin Song ; Tan Yan ; Haifeng Chen ; Xiang Zhang

【Abstract】: Complex systems are prevalent in many fields such as finance, security and industry. A fundamental problem in system management is to perform diagnosis in case of system failure such that the causal anomalies, i.e., root causes, can be identified for system debugging and repair. Recently, invariant network has proven a powerful tool in characterizing complex system behaviors. In an invariant network, a node represents a system component, and an edge indicates a stable interaction between two components. Recent approaches have shown that by modeling fault propagation in the invariant network, causal anomalies can be effectively discovered. Despite their success, the existing methods have a major limitation: they typically assume there is only a single and global fault propagation in the entire network. However, in real-world large-scale complex systems, it's more common for multiple fault propagations to grow simultaneously and locally within different node clusters and jointly define the system failure status. Inspired by this key observation, we propose a two-phase framework to identify and rank causal anomalies. In the first phase, a probabilistic clustering is performed to uncover impaired node clusters in the invariant network. Then, in the second phase, a low-rank network diffusion model is designed to backtrack causal anomalies in different impaired clusters. Extensive experimental results on real-life datasets demonstrate the effectiveness of our method.

【Keywords】: Invariant network; causal anomaly detection; matrix factorization

122. Automated Medical Diagnosis by Ranking Clusters Across the Symptom-Disease Network.

【Paper Link】【Pages】:1009-1014

【Authors】: Jingchao Ni ; Hongliang Fei ; Wei Fan ; Xiang Zhang

【Abstract】: The rapid growth of medical recording data has increased the demand for automated analysis. An important problem in recent medical research is automated medical diagnosis, which is to infer likely diseases for the observed symptoms. Existing approaches typically perform the inference on a sparse bipartite graph with two sets of nodes representing diseases and symptoms, respectively. By using this graph, existing methods basically assume no direct dependency exists between diseases (or symptoms), which may not be true in practice. To address this limitation, we propose to integrate two domain networks encoding similarities between diseases and those between symptoms to avoid information loss as well as to alleviate the sparsity problem of the bipartite graph. Another limitation of the existing methods is that they usually output a ranked list of diseases mixed from very different etiologies which greatly limits their practical usefulness. An ideal method should allow a clustered structure in the disease ranking list so that both similar and different diseases can be easily identified. Therefore, we formulate automated diagnosis as a novel cross-domain cluster ranking problem, which identifies and ranks the disease clusters simultaneously in the symptom-disease network. Our formulation employs a joint learning scheme in which the dual procedures of cluster finding and cluster ranking are coupled and mutually reinforced. Experimental results on real-world datasets demonstrate the effectiveness of our method.

【Keywords】: Network clustering; cluster ranking; medical diagnosis; matrix factorization

123. LCD: A Fast Contrastive Divergence Based Algorithm for Restricted Boltzmann Machine.

【Paper Link】【Pages】:1015-1020

【Authors】: Lin Ning ; Randall Pittman ; Xipeng Shen

【Abstract】: Restricted Boltzmann Machine (RBM) is the building block of Deep Belief Nets and other deep learning tools. Fast learning and prediction are both essential for practical usage of RBM-based machine learning techniques. This paper proposes Lean Contrastive Divergence (LCD), a modified Contrastive Divergence (CD) algorithm, to accelerate RBM learning and prediction without changing the results. LCD avoids most of the required computations with two optimization techniques. The first is called bounds-based filtering, which, through triangle inequality, replaces expensive calculations of many vector dot products with fast bounds calculations. The second is delta product, which effectively detects and avoids many repeated calculations in the core operation of RBM, Gibbs Sampling. The optimizations are applicable to both the standard contrastive divergence learning algorithm and its variations. Results show that the optimizations can produce several-fold (up to 3X for training and 5.3X for prediction) speedups.

【Keywords】: Optimization; Training; Liquid crystal displays; Machine learning; Prediction algorithms; Probability; Computer science

【Paper Link】【Pages】:1021-1026

【Authors】: Tianyi Pan ; Alan Kuhnle ; Xiang Li ; My T. Thai

【Abstract】: Online Social Networks (OSNs) are effective platforms for viral marketing. Due to their importance, viral marketing related problems in OSNs have been extensively studied in the past decade. However, none of the existing works can cope with the situation that the propagation rate dynamically increases for popular topics, as they all assume known propagation rates. In this paper, to better describe realistic information propagation in OSNs, we propose a novel model, Dynamic Influence Propagation (DIP), that allows propagation rate to change during the diffusion. We then define a new research problem: Threshold Activation Problem under DIP (TAP-DIP) to study the impact of DIP. TAP-DIP adds extra complexity on the already #P-hard TAP problem. Despite it hardness, we are able to approximate TAP-DIP with O(log|V|) ratio. Sitting in the core of our algorithm are the Lipschitz optimization technique and a novel solution to the general version of TAP, the Multi-TAP problem. Using various real OSN datasets, we experimentally demonstrate the impact of DIP and that our solution not only generates high-quality seed sets when being aware of the rate increase, but also is scalable.

【Keywords】: Electronics packaging; Integrated circuit modeling; Approximation algorithms; Twitter; Heuristic algorithms; Algorithm design and analysis; Probability density function

125. Domain Specific Feature Transfer for Hybrid Domain Adaptation.

【Paper Link】【Pages】:1027-1032

【Authors】: Pengfei Wei ; Yiping Ke ; Chi-Keong Goh

【Abstract】: Heterogeneous domain adaptation needs supplementary information to link up domains. However, this supplementary information is unavailable in many real cases. In this paper, a new problem setting called hybrid domain adaptation is investigated. It is a special case of heterogeneous domain adaptation in which different domains share some common features, but also have their own domain specific features. In this case, it can be efficiently solved without any supplementary information by using the common features to link up the domains in adaptation. We propose a domain specific feature transfer (DSFT) method, which can link up different domains using the common features and simultaneously reduce domain divergences. Specifically, we first learn the translations between the common features and the domain specific features. Then we cross-use the learned translations to transfer the domain specific features of one domain to another domain. Finally, we compose a homogeneous space in which the domain divergences are minimized. Extensive experiments verify the effectiveness of our proposed method.

【Keywords】: knowledge transfer; domain specific feature; hybrid domain adaptation

126. Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis.

【Paper Link】【Pages】:1033-1038

【Authors】: Soujanya Poria ; Erik Cambria ; Devamanyu Hazarika ; Navonil Majumder ; Amir Zadeh ; Louis-Philippe Morency

【Abstract】: Multimodal sentiment analysis involves identifying sentiment in videos and is a developing field of research. Unlike current works, which model utterances individually, we propose a recurrent model that is able to capture contextual information among utterances. In this paper, we also introduce attentionbased networks for improving both context learning and dynamic feature fusion. Our model shows 6-8% improvement over the state of the art on a benchmark dataset.

【Keywords】: Feature extraction; Videos; Context modeling; Sentiment analysis; Visualization; Social network services; Fuses

127. Statistical Link Label Modeling for Sign Prediction: Smoothing Sparsity by Joining Local and Global Information.

【Paper Link】【Pages】:1039-1044

【Authors】: Amin Javari ; Hongxiang Qiu ; Elham Barzegaran ; Mahdi Jalili ; Kevin Chen-Chuan Chang

【Abstract】: One of the major issues in signed networks is to use network structure to predict the missing sign of an edge. In this paper, we introduce a novel probabilistic approach for the sign prediction problem. The main characteristic of the proposed models is their ability to adapt to the sparsity level of an input network. Building a model that has an ability to adapt to the sparsity of the data has not yet been considered in the previous related works. We suggest that there exists a dilemma between local and global structures and attempt to build sparsity adaptive models by resolving this dilemma. To this end, we propose probabilistic prediction models based on local and global structures and integrate them based on the concept of smoothing. The model relies more on the global structures when the sparsity increases, whereas it gives more weights to the information obtained from local structures for low levels of the sparsity. The proposed model is assessed on three real-world signed networks, and the experiments reveal its consistent superiority over the state of the art methods. As compared to the previous methods, the proposed model not only better handles the sparsity problem, but also has lower computational complexity and can be updated using real-time data streams.

【Keywords】: Link label prediction; Signed networks; Smoothing; Local structure; Global structure

128. Novel Exact and Approximate Algorithms for the Closest Pair Problem.

【Paper Link】【Pages】:1045-1050

【Authors】: Sanguthevar Rajasekaran ; Subrata Saha ; Xingyu Cai

【Abstract】: The closest pair problem (CPP) is an important problem that has numerous applications in clustering, graph partitioning, image processing, patterns identification, intrusion detection, etc. Numerous algorithms have been presented for solving the CPP. For instance, on n points there exists an O(n log n) time algorithm for CPP (when the dimension is a constant). There also exist randomized algorithms with an expected linear run time. However these algorithms do not perform well in practice. The algorithms that are employed in practice have a worst case quadratic run time. One of the best performing algorithms for the CPP is MK (originally designed for solving the time series motif finding problem). In this paper we present an elegant exact algorithm called MPR for the CPP that performs better than MK. Also, we present approximation algorithms for the CPP that are faster than MK by up to a factor of more than 40, while maintaining a very good accuracy.

【Keywords】: closest pair problem; exact algorithms; approximate algorithms; randomized algorithms; time series motif mining

129. Informing the Use of Hyperparameter Optimization Through Metalearning.

【Paper Link】【Pages】:1051-1056

【Authors】: Samantha Sanders ; Christophe G. Giraud-Carrier

【Abstract】: One of the challenges of data mining is finding hyperparameters for a learning algorithm that will produce the best model for a given dataset. Hyperparameter optimization automates this process, but it can still take significant time. It has been found that hyperparameter optimization does not always result in induced models with significant improvement over default values, yet no systematic analysis of the role of hyperparameter optimization in machine learning has been conducted. We use metalearning to inform the decision of whether to optimize hyperparameters based on expected performance improvement and computational cost.

【Keywords】: Metalearning; hyperparameter optimization

130. Kernel-Based Feature Extraction for Collaborative Filtering.

【Paper Link】【Pages】:1057-1062

【Authors】: Saket Sathe ; Charu C. Aggarwal ; Xiangnan Kong ; Xinyue Liu

【Abstract】: Singular value decomposition (SVD) has been used widely in the literature to recover the missing entries of a matrix. The basic principle in such methods is to assume that the correlated data is distributed with a low-rank structure. The knowledge of the low-rank structure is then used to predict the missing entries. SVD is based on the assumption that the data (user ratings) are distributed on a linear hyperplane. This is not always the case, and the data could often be distributed on a nonlinear hyperplane. Therefore, in this paper, we explore the methodology of kernel feature extraction to complement off-the-shelf methods for improving their accuracy. The extracted features can be used to enhance a variety of existing methods such as biased matrix factorization and SVD++. We present experimental results illustrating the effectiveness of using this approach.

【Keywords】: Kernel; Feature extraction; Principal component analysis; Robustness; Manifolds; Distributed databases; Computational modeling

131. Reputation-Based Ranking Systems and Their Resistance to Bribery.

【Paper Link】【Pages】:1063-1068

【Authors】: João Saúde ; Guilherme Ramos ; Carlos Caleiro ; Soummya Kar

【Abstract】: We study bribery resistance properties in two classes of reputation-based ranking systems, where the rankings are computed by weighting the rates given by users with their reputations. In the first class, the rankings are the result of the aggregation of all the ratings, and all users are provided with the same ranking for each item. In the second class, there is a first step that clusters users by their rating pattern similarities, and then the rankings are computed cluster-wise. Hence, for each item, there is a different ranking for distinct clusters. We study the setting where the seller of each item can bribe users to rate the item, if they did not rate it before, or to increase their previous rating on the item. We model bribing strategies under these ranking scenarios and explore under which conditions it is profitable to bribe a user, presenting, in several cases, the optimal bribing strategies. By computing dedicated rankings to each cluster, we show that bribing, in general, is not as profitable as in the simpler without clustering. Finally, we illustrate our results with experiments using real data.

【Keywords】: Reputation-based Ranking Systems; Bribery; Data Mining

132. The Many Faces of Link Fraud.

【Paper Link】【Pages】:1069-1074

【Authors】: Neil Shah ; Hemank Lamba ; Alex Beutel ; Christos Faloutsos

【Abstract】: Most past work on social network link fraud detection tries to separate genuine users from fraudsters, implicitly assuming that there is only one type of fraudulent behavior. But is this assumption true? And, in either case, what are the characteristics of such fraudulent behaviors? In this work, we set up honeypots ("dummy" social network accounts), and buy fake followers (after careful IRB approval). We report the signs of such behaviors including oddities in local network connectivity, account attributes, and similarities and differences across fraud providers. Most valuably, we discover and characterize several types of fraud behaviors. We discuss how to leverage our insights in practice by engineering strongly performing entropy-based features and demonstrating high classification accuracy. Our contributions are (a) observations: we analyze our honeypot fraudster ecosystem and give surprising insights into the multifaceted behaviors of these fraudster types, and (b) features: we propose novel features that give strong (>0.95 precision/recall) discriminative power on ground-truth Twitter data.

【Keywords】: link fraud; anomaly detection; social network; social media; honeypots

133. Synchronization-Inspired Co-Clustering and Its Application to Gene Expression Data.

【Paper Link】【Pages】:1075-1080

【Authors】: Junming Shao ; Chongming Gao ; Wei Zeng ; Jingkuan Song ; Qinli Yang

【Abstract】: In this paper, we propose a new synchronization-inspired co-clustering algorithm by dynamic simulation, called CoSync, which aims to discover biologically relevant subgroups embedding in a given gene expression data matrix. The basic idea is to view a gene expression data matrix as a dynamical system, and the weighted two-sided interactions are imposed on each element of the matrix from both aspects of genes and conditions, resulting in the values of all element in a co-cluster synchronizing together. Experiments show that our algorithm allows uncovering high-quality co-clusterings embedded in gene expression data sets and has its superiority over many state-of-the-art algorithms.

【Keywords】: co-clustering; gene expression data; synchronization

134. Exploring Common and Distinct Structural Connectivity Patterns Between Schizophrenia and Major Depression via Cluster-Driven Nonnegative Matrix Factorization.

【Paper Link】【Pages】:1081-1086

【Authors】: Junming Shao ; Zhongjing Yu ; Peiyan Li ; Wei Han ; Christian Sorg ; Qinli Yang

【Abstract】: In this paper, we introduce a novel method to discover common and distinct structural connectivity patterns between SZP and MDD via a Cluster-Driven Nonnegative Matrix Factorization (called CD-NMF). Specifically, CD-NMF is applied to decompose the joint structural connectivity map into common and distinct parts, and each part is further factorized into two sub-matrices (i.e. common/distinct basis matrix and common/distinct encoding matrix) correspondingly. By imposing the clustering constraints on common and distinct encoding matrices, the discriminative patterns as well as the common patterns between the two disorders are extracted simultaneously. Experimental results demonstrate that CD-NMF allows finding the common and distinct structural patterns effectively. More importantly, the derived distinct patterns, show powerful ability to discriminate the patients of schizophrenia and major depressive disorder.

【Keywords】: nonnegative matrix factorization; structural connectivity; biomarker

135. WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams.

【Paper Link】【Pages】:1087-1092

【Authors】: Kijung Shin

【Abstract】: If we cannot store all edges in a graph stream, which edges should we store to estimate the triangle count accurately?Counting triangles (i.e., cycles of length three) is a fundamental graph problem with many applications in social network analysis, web mining, anomaly detection, etc. Recently, much effort has been made to accurately estimate global and local triangle counts in streaming settings with limited space. Although existing methods use sampling techniques without considering temporal dependencies in edges, we observe temporal locality in real dynamic graphs. That is, future edges are more likely to form triangles with recent edges than with older edges. In this work, we propose a single-pass streaming algorithm called Waiting-Room Sampling (WRS) for global and local triangle counting. WRS exploits the temporal locality by always storing the most recent edges, which future edges are more likely to form triangles with, in the waiting room, while it uses reservoir sampling for the remaining edges. Our theoretical and empirical analyses show that WRS is: (a) Fast and 'any time': runs in linear time, always maintaining and updating estimates while new edges arrive, (b) Effective: yields up to 47% smaller estimation error than its best competitors, and (c) Theoretically sound: gives unbiased estimates with small variances under the temporal locality.

【Keywords】: triangle counting; graph stream; edge sampling

136. Collaborative Inference of Coexisting Information Diffusions.

【Paper Link】【Pages】:1093-1098

【Authors】: Yanchao Sun ; Cong Qian ; Ning Yang ; Philip S. Yu

【Abstract】: The purpose of diffusion history inference is to reconstruct the missing traces of information diffusion according to incomplete observations. Existing methods, however, often focus only on single diffusion trace, while in a real-world social network, there often coexist multiple information diffusions. In this paper, we propose a novel approach called Collaborative Inference Model (CIM) for the problem of the inference of coexisting information diffusions. CIM can holistically model multiple information diffusions without any prior assumption of diffusion models, and collaboratively infer the histories of the coexisting information diffusions via low-rank approximation with a fusion of heterogeneous constraints generated from additional data sources. We also propose an optimized algorithm called Time Window based Parallel Decomposition Algorithm (TWPDA) to speed up the inference without compromise on the accuracy. Extensive experiments are conducted on real-world datasets to verify the effectiveness and efficiency of CIM and TWPDA.

【Keywords】: Social network; Information diffusion; Sparse tensor approximation

137. Epidemic Forecasting Framework Combining Agent-Based Models and Smart Beam Particle Filtering.

【Paper Link】【Pages】:1099-1104

【Authors】: Farzaneh Tabataba ; Bryan L. Lewis ; Milad Hosseinipour ; Foroogh S. Tabataba ; Srinivasan Venkatramanan ; Jiangzhuo Chen ; Dave Higdon ; Madhav Marathe

【Abstract】: Over the past decades, numerous techniques have been developed to forecast the temporal evolution of epidemic outbreaks. This paper proposes an approach that combines high resolution agent-based models using realistic social contact networks for simulating epidemic evolution with a particle filter based method for assimilation based forecasting. Agent-based modeling using realistic social contact networks provides two key advantages: (i) they capture the causal processes underlying the epidemic and hence are useful to understand the role of interventions on the course of the epidemics - typically time series models cannot capture this and as a result often do not perform well in such situations; (ii) they provide detailed forecast information - this allows us to produce forecast at high levels of temporal, spatial and social granularity. We also propose a new variation of particle filter technique called beam search particle filtering. The modification allows us to more efficiently search the parameter space which is necessitated by the fact that agent-based techniques are computationally expensive. We illustrate our methodology on the synthetic dataset of Ebola provided as a part of the NSF/NIH Ebola forecasting challenge. Our results show the efficacy of the proposed approach and suggest that agent-based causal models can be combined with filtering techniques to yield a new class of assimilation models for infectious disease forecasting.

【Keywords】: Epidemic forecasting; Particle Filter; Data assimilation; Ebola disease modeling

138. Autoregressive Tensor Factorization for Spatio-Temporal Predictions.

【Paper Link】【Pages】:1105-1110

【Authors】: Koh Takeuchi ; Hisashi Kashima ; Naonori Ueda

【Abstract】: Analysis of spatio-temporal data is a common research topic that requires the interpolations of unknown locations and the predictions of feature observations by utilizing information about where and when the data were observed. One of the most difficult problems is to make predictions of unknown locations. Tensor factorization methods are popular in this field because of their capability of handling multiple types of spatio-temporal data, dealing with missing values, and providing computationally efficient parameter estimation procedures. However, unlike traditional approaches such as spatial autoregressive models, the existing tensor factorization methods have not tried to learn spatial autocorrelations. These methods employ previously inferred spatial dependencies, often resulting in poor performances on the problem of making interpolations and predictions of unknown locations. In this paper, we propose a new tensor factorization method that estimates low-rank latent factors by simultaneously learning the spatial and temporal autocorrelations. We introduce new spatial autoregressive regularizers based on existing spatial autoregressive models and provide an efficient estimation procedure. With experiments on publicly available traffic transporting data, we demonstrate that our proposed method significantly improves the predictive performances in our problems in comparison to the existing state-of-the-art spatio-temporal analysis methods.

【Keywords】: Tensile stress; Biological system modeling; Computational modeling; Correlation; Interpolation; Reactive power; Matrix decomposition

139. Time-Aware Latent Hierarchical Model for Predicting House Prices.

【Paper Link】【Pages】:1111-1116

【Authors】: Fei Tan ; Chaoran Cheng ; Zhi Wei

【Abstract】: It is widely acknowledged that the value of a house is the mixture of a large number of characteristics. House price prediction thus presents a unique set of challenges in practice. While a large body of works are dedicated to this task, their performance and applications have been limited by the shortage of long time span of transaction data, the absence of real-world settings and the insufficiency of housing features. To this end, a time-aware latent hierarchical model is introduced to capture underlying spatiotemporal interactions behind the evolution of house prices. The hierarchical perspective obviates the need for historical transaction data of exactly same houses when temporal effects are considered. The proposed framework is examined on a large-scale dataset of the property transaction in Beijing. The whole experimental procedure strictly complies with the real-world scenario. The empirical evaluation results demonstrate the outperformance of our approach over alternative competitive methods.

【Keywords】: Predictive models; Optimization; Data models; Economics; Education; Indexes

140. Crowdsourced Correlation Clustering with Relative Distance Comparisons.

【Paper Link】【Pages】:1117-1122

【Authors】: Antti Ukkonen

【Abstract】: Crowdsourced, or human computation based clustering algorithms usually rely on relative distance comparisons, as these are easier to elicit from human workers than absolute distance information. We build upon existing work on correlation clustering, a well-known non-parametric approach to clustering, and present a novel clustering algorithm for human computation. We first define a novel variant of correlation clustering that is based on relative distance comparisons, and briefly outline an approximation algorithm for this problem. As a second contribution, we propose a more practical algorithm, which we empirically compare against existing methods from literature. Experiments with synthetic data suggest that our approach can outperform more complex methods. Also, our method efficiently finds good and intuitive clusterings from real relative distance comparison data.

【Keywords】: Clustering algorithms; Approximation algorithms; Correlation; Algorithm design and analysis; Optimized production technology; Search problems

141. Multimodal Content Analysis for Effective Advertisements on YouTube.

【Paper Link】【Pages】:1123-1128

【Authors】: Nikhita Vedula ; Wei Sun ; Hyunhwan Lee ; Harsh Gupta ; Mitsunori Ogihara ; Joseph Johnson ; Gang Ren ; Srinivasan Parthasarathy

【Abstract】: The recent advancement of web-scale digital advertising saw a paradigm shift from the conventional focus of digital advertisement distribution towards integrating digital processes and methodologies and forming a seamless workflow of advertisement design, production, distribution, and effectiveness monitoring. In this work, we implemented a computational framework for the predictive analysis of the content-based features extracted from advertisement video files and various effectiveness metrics to aid the design and production processes of commercial advertisements. Our proposed predictive analysis framework extracts multi-dimensional temporal patterns from the content of advertisement videos using multimedia signal processing and natural language processing tools. The pattern analysis part employs an architecture of cross modality feature learning where data streams from different feature dimensions are employed to train separate neural network models and then these models are fused together to learn a shared representation. Subsequently, a neural network model trained on this joint representation is utilized as a classifier for predicting advertisement effectiveness. Based on the predictive patterns identified between the content features and the effectiveness metrics of advertisements, we have elicited a useful set of auditory, visual and textual patterns that is strongly correlated with the proposed effectiveness metrics while can be readily implemented in the design and production processes of commercial advertisements. We validate our approach using subjective ratings from a dedicated user study, the text sentiment strength of online viewer comments, and a viewer opinion metric of the likes/views ratio of each advertisement from YouTube video-sharing website.

【Keywords】: Feature extraction; Measurement; Multimedia communication; Production; Streaming media; Neural networks; Visualization

142. Balanced Distribution Adaptation for Transfer Learning.

【Paper Link】【Pages】:1129-1134

【Authors】: Jindong Wang ; Yiqiang Chen ; Shuji Hao ; Wenjie Feng ; Zhiqi Shen

【Abstract】: Transfer learning has achieved promising results by leveraging knowledge from the source domain to annotate the target domain which has few or none labels. Existing methods often seek to minimize the distribution divergence between domains, such as the marginal distribution, the conditional distribution or both. However, these two distances are often treated equally in existing algorithms, which will result in poor performance in real applications. Moreover, existing methods usually assume that the dataset is balanced, which also limits their performances on imbalanced tasks that are quite common in real problems. To tackle the distribution adaptation problem, in this paper, we propose a novel transfer learning approach, named as Balanced Distribution Adaptation (BDA), which can adaptively leverage the importance of the marginal and conditional distribution discrepancies, and several existing methods can be treated as special cases of BDA. Based on BDA, we also propose a novel Weighted Balanced Distribution Adaptation (W-BDA) algorithm to tackle the class imbalance issue in transfer learning. W-BDA not only considers the distribution adaptation between domains but also adaptively changes the weight of each class. To evaluate the proposed methods, we conduct extensive experiments on several transfer learning tasks, which demonstrate the effectiveness of our proposed algorithms over several state-of-the-art methods.

【Keywords】: Transfer learning; domain adaptation; distribution adaptation; class imbalance

143. DPiSAX: Massively Distributed Partitioned iSAX.

【Paper Link】【Pages】:1135-1140

【Authors】: Djamel Edine Yagoubi ; Reza Akbarinia ; Florent Masseglia ; Themis Palpanas

【Abstract】: Indexing is crucial for many data mining tasks that rely on efficient and effective similarity query processing. Consequently, indexing large volumes of time series, along with high performance similarity query processing, have became topics of high interest. For many applications across diverse domains though, the amount of data to be processed might be intractable for a single machine, making existing centralized indexing solutions inefficient. We propose a parallel indexing solution that gracefully scales to billions of time series, and a parallel query processing strategy that, given a batch of queries, efficiently exploits the index. Our experiments, on both synthetic and real world data, illustrate that our index creation algorithm works on 1 billion time series in less than 2 hours, while the state of the art centralized algorithms need more than 5 days. Also, our distributed querying algorithm is able to efficiently process millions of queries over collections of billions of time series, thanks to an effective load balancing mechanism.

【Keywords】: Time series analysis; Query processing; Indexing; Sparks; Partitioning algorithms; Data mining

144. An Influence-Receptivity Model for Topic Based Information Cascades.

【Paper Link】【Pages】:1141-1146

【Authors】: Ming Yu ; Varun Gupta ; Mladen Kolar

【Abstract】: We consider the problem of estimating the latent structure of a social network based on observational data on information diffusion processes, or cascades. Here for a given cascade, we only observe the time a node/agent is infected but not the source of infection. Existing literature has focused on estimating network diffusion matrix without any underlying assumptions on the structure of the network. We propose a novel model for inferring network diffusion matrix based on the intuition that an information datum is more likely to propagate among two nodes if they are interested in similar topics, which are common with the information. In particular, our model endows each node with an influence vector (how authoritative they are on each topic) and a receptivity vector (how susceptible they are on each topic). We show how this node-topic structure can be estimated from observed cascades. The estimated model can be used to build recommendation system based on the receptivity vectors, as well as for marketing based on the influence vectors.

【Keywords】: Heuristic algorithms; Media; Diseases; Information technology; Inference algorithms; Immune system; Business

145. Risk Control of Best Arm Identification in Multi-armed Bandits via Successive Rejects.

【Paper Link】【Pages】:1147-1152

【Authors】: Xiaotian Yu ; Irwin King ; Michael R. Lyu

【Abstract】: Best arm identification in stochastic Multi-Armed Bandits (MAB) has become an essential variant in the research line of bandits for decision-making problems. In previous work, the best arm usually refers to an arm with the highest expected payoff in a given decision-arm set. However, in many practical scenarios, it would be more important and desirable to incorporate the risk of an arm into the best decision. In this paper, motivated by practical applications with risk via bandits, we investigate the problem of Risk Control of Best Arm Identification (RCBAI) in stochastic MAB. Based on the technique of Successive Rejects (SR), we show that the error resulting from the mean-variance estimation is sub-Gamma by setting mild assumptions on stochastic payoffs of arms. Besides, we develop an algorithm named as RCMAB. SR, and derive an upper bound for the probability of error for RCBAI in stochastic MAB. We demonstrate the superiority of the RCMAB. SR algorithm in synthetic datasets, and then apply the RCMAB. SR algorithm in financial data for yearly investments to show its superiority for practical applications.

【Keywords】: Multi-armed bandits; Risk control; Successive rejects; Sub-Gamma noise

146. Multi-level Multi-task Learning for Modeling Cross-Scale Interactions in Nested Geospatial Data.

【Paper Link】【Pages】:1153-1158

【Authors】: Shuai Yuan ; Jiayu Zhou ; Pang-Ning Tan ; C. Emi Fergus ; Tyler Wagner ; Patricia A. Soranno

【Abstract】: Predictive modeling of nested geospatial data is a challenging problem as the models must take into account potential interactions among variables defined at different spatial scales. These cross-scale interactions, as they are commonly known, are particularly important to understand relationships among ecological properties at macroscales. In this paper, we present a novel, multi-level multi-task learning framework for modeling nested geospatial data in the lake ecology domain. Specifically, we consider region-specific models to predict lake water quality from multi-scaled factors. Our framework enables distinct models to be developed for each region using both its local and regional information. The framework also allows information to be shared among the region-specific models through their common set of latent factors. Such information sharing helps to create more robust models especially for regions with limited or no training data. In addition, the framework can automatically determine cross-scale interactions between the regional variables and the local variables that are nested within them. Our experimental results show that the proposed framework outperforms all the baseline methods in at least 64% of the regions for 3 out of 4 lake water quality datasets evaluated in this study. Furthermore, the latent factors can be clustered to obtain a new set of regions that is more aligned with the response variables than the original regions that were defined a priori from the ecology domain.

【Keywords】: Geospatial analysis; Lakes; Data models; Predictive models; Mathematical model; Biological system modeling; Linear programming

147. Wave2Vec: Learning Deep Representations for Biosignals.

【Paper Link】【Pages】:1159-1164

【Authors】: Ye Yuan ; Guangxu Xun ; Qiuling Suo ; Kebin Jia ; Aidong Zhang

【Abstract】: Time series data mining has gained increasing attention in health domain. Recently, researchers attempt to employ Natural Language Processing (NLP) to health data mining, in order to learn proper representations of discrete medical concepts from Electronic Health Records (EHRs). However, existing models do not take continuous physiological records into account, which are naturally existed in EHRs. The major challenges for this task are to model non-obvious representations from observed high dimensional biosignals, and to interpret the learned features. To address these issues, we propose Wave2Vec, an end-to-end deep learning model, to bridge the gap between biosignal processing and language modeling. Wave2Vec jointly learns both inherent and embedding representations of biosignals at the same time. To evaluate the performance of our model in clinical task, we carry out experiments on two real world benchmark biosignal datasets. Experimental results show that the proposed Wave2Vec model outperforms the six feature leaning baselines in biosignal processing.

【Keywords】: deep learning; representation learning; context learning; biosignals

148. Recurrent Encoder-Decoder Networks for Time-Varying Dense Prediction.

【Paper Link】【Pages】:1165-1170

【Authors】: Tao Zeng ; Bian Wu ; Jiayu Zhou ; Ian Davidson ; Shuiwang Ji

【Abstract】: Dense prediction is concerned with predicting a label for each of the input units, such as pixels of an image. Accurate dense prediction for time-varying inputs finds applications in a variety of domains, such as video analysis and medical imaging. Such tasks need to preserve both spatial and temporal structures that are consistent with the inputs. Despite the success of deep learning methods in a wide range of artificial intelligence tasks, time-varying dense prediction is still a less explored domain. Here, we proposed a general encoder-decoder network architecture that aims to addressing time-varying dense prediction problems. Given that there are both intra-image spatial structure information and temporal context information to be processed simultaneously in such tasks, we integrated fully convolutional networks (FCNs) with recurrent neural networks (RNNs) to build a recurrent encoder-decoder network. The proposed network is capable of jointly processing two types of information. Specifically, we developed convolutional RNN (CRNN) to allow dense sequence processing. More importantly, we designed CRNN bottleneck modules for alleviating the excessive computational cost incurred by carrying out multiple convolutions in the CRNN layer. This novel design is shown to be a critical innovation in building very flexible and efficient deep models for time-varying dense prediction. Altogether, the proposed model handles time-varying information with the CRNN layers and spatial structure information with the FCN architectures. The multiple heterogeneous modules can be integrated into the same network, which can be trained end-to-end to perform time-varying dense prediction. Experimental results showed that our model is able to capture both high-resolution spatial information and relatively low-resolution temporal information as compared to other existing models.

【Keywords】: Dense prediction; time-varying data; convolutional neural networks; recurrent neural networks; bottleneck module

149. A Practically Competitive and Provably Consistent Algorithm for Uplift Modeling.

【Paper Link】【Pages】:1171-1176

【Authors】: Yan Zhao ; Xiao Fang ; David Simchi-Levi

【Abstract】: Randomized experiments have been critical tools of decision making for decades. However, subjects can show significant heterogeneity in response to treatments in many important applications. Therefore it is not enough to simply know which treatment is optimal for the entire population. What we need is a model that correctly customize treatment assignment base on subject characteristics. The problem of constructing such models from randomized experiments data is known as Uplift Modeling in the literature. Many algorithms have been proposed for uplift modeling and some have generated promising results on various data sets. Yet little is known about the theoretical properties of these algorithms. In this paper, we propose a new tree-based ensemble algorithm for uplift modeling. Experiments show that our algorithm can achieve competitive results on both synthetic and industry-provided data. In addition, by properly tuning the "node size" parameter, our algorithm is proved to be consistent under mild regularity conditions. This is the first consistent algorithm for uplift modeling that we are aware of.

【Keywords】: Uplift modeling; personalized treatment learning

150. Incorporating Spatio-Temporal Smoothness for Air Quality Inference.

【Paper Link】【Pages】:1177-1182

【Authors】: Xiangyu Zhao ; Tong Xu ; Yanjie Fu ; Enhong Chen ; Hao Guo

【Abstract】: It is well recognized that air quality inference is of great importance for environmental protection. However, due to the limited monitoring stations and various impact factors, e.g., meteorology, traffic volume and human mobility, inference of air quality index (AQI) could be a difficult task. Recently, with the development of new ways for collecting and integrating urban, mobile, and public service data, there is a potential to leverage spatial relatedness and temporal dependencies for better AQI estimation. To that end, in this paper, we exploit a novel spatio-temporal multi-task learning strategy and develop an enhanced framework for AQI inference. Specifically, both time dependence within a single monitoring station, and spatial relatedness across all the stations will be captured, and then well trained with effective optimization to support AQI inference tasks. As air-quality related features from cross-domain data have been extracted and quantified, comprehensive experiments based on real-world datasets validate the effectiveness of our proposed framework with significant margin compared with several state-of-the-art baselines, which support the hypothesis that our spatio-temporal multi-task learning framework could better predict and interpret AQI fluctuation.

【Keywords】: Urban Computing; AQI Prediction; Multi-task Learning

151. An Automatic Approach for Transit Advertising in Public Transportation Systems.

【Paper Link】【Pages】:1183-1188

【Authors】: Chen Zhang ; Hao Wang ; Hui Xiong

【Abstract】: Transit advertising provides frequent exposure to a large number of residents in different urban regions. However, traditional methods are generally manual and qualitatively based on a rough estimation, such as the number of passengers taken by a bus or functional regions covered. How to accurately put an advertisement on appropriate transportations becomes an important task for potential business value. In this paper, our goal is to recommend top-k bus routes for an ad, which can maximize advertising effectiveness, by mining multiple data sources, such as Smart Card Transaction (SCT) data, geographic data and point of interests (POIs) data. We propose a framework that captures the mobility patterns of passengers and the characteristics of bus stations so as to evaluate the influence of an ad for the passengers quantitatively. Specifically, we first extract the passenger trajectories to model passenger mobility from SCT data and then model each bus station over three characteristics (i.e., hub degree, topic distribution, district region) by geographic data and nearby POIs. According to three characteristics of bus stations, we formulate three application scenarios and design the corresponding methods to make top-k bus routes recommendation. Extensive experiments on real-world public transportation data validate the effectiveness of our proposed approach.

【Keywords】: Transit advertising; passenger mobility; Optimal route selection

152. iNEAT: Incomplete Network Alignment.

【Paper Link】【Pages】:1189-1194

【Authors】: Si Zhang ; Hanghang Tong ; Jie Tang ; Jiejun Xu ; Wei Fan

【Abstract】: Network alignment and network completion are two fundamental cornerstones behind many high-impact graph mining applications. The state-of-the-arts have been addressing these tasks in parallel. In this paper, we argue that network alignment and completion are inherently complementary with each other, and hence propose to jointly address them so that the two tasks can benefit from each other. We formulate it from the optimization perspective, and propose an effective algorithm iNEAT to solve it. The proposed method offers two distinctive advantages. First (Alignment accuracy), our method benefits from higher-quality input networks while mitigates the effect of incorrectly inferred links introduced by the completion task itself. Second (Alignment efficiency), thanks to the low-rank structure of the complete networks and alignment matrix, the alignment can be significantly accelerated. The extensive experiments demonstrate the performance of our algorithm.

【Keywords】: Incomplete network alignment; network completion; low-rank

153. GaDei: On Scale-Up Training as a Service for Deep Learning.

【Paper Link】【Pages】:1195-1200

【Authors】: Wei Zhang ; Minwei Feng ; Yunhui Zheng ; Yufei Ren ; Yandong Wang ; Ji Liu ; Peng Liu ; Bing Xiang ; Li Zhang ; Bowen Zhou ; Fei Wang

【Abstract】: Deep learning (DL) training-as-a-service (TaaS) is an important emerging industrial workload. TaaS must satisfy a wide range of customers who have no experience and/or resources to tune DL hyper-parameters (e.g., mini-batch size and learning rate), and meticulous tuning for each user's dataset is prohibitively expensive. Therefore, TaaS hyper-parameters must be fixed with values that are applicable to all users. Unfortunately, few research papers have studied how to design a system for TaaS workloads. By evaluating the IBM Watson Natural Language Classfier (NLC) workloads, the most popular IBM cognitive service used by thousands of enterprise-level clients globally, we provide empirical evidence that only the conservative hyper-parameter setup (e.g., small mini-batch size) can guarantee acceptable model accuracy for a wide range of customers. Unfortunately, smaller mini-batch size requires higher communication bandwidth in a parameter-server based DL training system. In this paper, we characterize the exceedingly high communication bandwidth requirement of TaaS using representative industrial deep learning workloads. We then present GaDei, a highly optimized shared-memory based scale-up parameter server design. We evaluate GaDei using both commercial benchmarks and public benchmarks and demonstrate that GaDei significantly outperforms the state-of-the-art parameter-server based implementation while maintaining the required accuracy. GaDei achieves near-best-possible runtime performance, constrained only by the hardware limitation. Furthermore, to the best of our knowledge, GaDei is the only scale-up DL system that provides fault-tolerance.

【Keywords】: Training as a Service; Deep Learning; System

154. Scalable Constrained Spectral Clustering via the Randomized Projected Power Method.

【Paper Link】【Pages】:1201-1206

【Authors】: Weifeng Zhi ; Buyue Qian ; Ian Davidson

【Abstract】: Constrained spectral clustering is an important area with many applications. However, most previous work has only been applied to relatively small data sets: graphs with thousands of points. This prevents this work from being applied to the large data sets found in application domains such as medical imaging and document data. Recent work on constrained and unconstrained spectral clustering has explored scalability of these methods via data approximations such as the Nystrom method which requires the selection of landmarks. However, compressing a graph may lead to undesirable results and poses the additional problem of how to chose landmarks. Instead in this paper, we propose a fast and scalable numerical algorithmic solution for the constrained clustering problem. We show the convergence and stability of our approach by proving its rate of convergence and demonstrate the effectiveness of our algorithm with empirical results on several real data sets. Our approach achieved comparable accuracy as popular constrained spectral clustering algorithms but taking several hundred times less time.

【Keywords】: Clustering; Spectral; Constraints; Constrained Clustering

155. New Class Adaptation Via Instance Generation in One-Pass Class Incremental Learning.

【Paper Link】【Pages】:1207-1212

【Authors】: Yue Zhu ; Kai Ming Ting ; Zhi-Hua Zhou

【Abstract】: One pass learning updates a model with only a single scan of the dataset, without storing historical data. Previous studies focus on classification tasks with a fixed class set, and will perform poorly in an open dynamic environment when new classes emerge in a data stream. The performance degrades because the classifier needs to receive a sufficient number of instances from new classes to establish a good model. This can take a long period of time. In order to reduce this period to deal with any-time prediction task, we introduce a framework to handle emerging new classes called One-Pass Class Incremental Learning (OPCIL). The central issue in OPCIL is: how to effectively adapt a classifier of existing classes to incorporate emerging new classes. We call it the new class adaptation issue, and propose a new approach to address it, which requires only one new class instance. The key is to generate pseudo instances which are optimized to satisfy properties that produce a good discriminative classifier. We provide the necessary propertiesand optimization procedures required to address this issue. Experiments validate the effectiveness of this approach.

【Keywords】: New class adaptation; instance generation; one-pass learning; class incremental learning