18th ICDM 2018:Singapore

IEEE International Conference on Data Mining, ICDM 2018, Singapore, November 17-20, 2018. IEEE Computer Society 【DBLP Link

Paper Num: 195 || Session Num: 0

1. On Big Wisdom.

Paper Link】 【Pages】:1-2

【Authors】: Minghui Wu ; Xindong Wu

【Abstract】: We define Big Wisdom with a HAO/BIBLE framework, which integrates human intelligence (HI), artificial intelligence (AI) and organizational/business intelligence (O/BI) with Bigdata analytics in large environments, for industrial intelligence in organizational activities. Big Wisdom starts with Bigdata, discovers Big Knowledge, and facilitates human and machine synergism for complex problem solving. When the HAO/BIBLE framework is applied to a regular (non-Bigdata) environment, it becomes the well-known PEAS agent structure, and when the knowledge graph in HAO/BIBLE relies on domain expertise (rather than Big Knowledge), HAO/BIBLE serves as an expert system.

【Keywords】: Expert systems; Problem-solving; Artificial intelligence

2. Landscape of Practical Blockchain Systems and their Applications.

Paper Link】 【Pages】:3-4

【Authors】: C. Mohan

【Abstract】: Provides an abstract of the keynote presentation and may include a brief professional biography of the presenter. The complete presentation was not made available for publication as part of the conference proceedings.

【Keywords】:

3. Automatic Optical Coherence Tomography Imaging Analysis for Retinal Disease Screening Using Machine Learning Techniques.

Paper Link】 【Pages】:5

【Authors】: Ramamohanarao Kotagiri

【Abstract】: Provides an abstract of the keynote presentation and may include a brief professional biography of the presenter. The complete presentation was not made available for publication as part of the conference proceedings.

【Keywords】:

4. Blockchain Data Analytics.

Paper Link】 【Pages】:6

【Authors】: Cuneyt Gurcan Akcora ; Murat Kantarcioglu ; Yulia R. Gel

【Abstract】: Over the last couple of years, Bitcoin cryptocurrency and the Blockchain technology that forms the basis of Bitcoin have witnessed an unprecedented attention. Designed to facilitate a secure distributed platform without central regulation, Blockchain is heralded as a novel paradigm that will be as powerful as Big Data, Cloud Computing, and Machine Learning. The Blockchain technology garners an ever increasing interest of researchers in various domains that benefit from scalable cooperation among trust-less parties. As Blockchain data analytics further proliferates, a need to glean successful approaches and to disseminate them among a diverse body of data scientists became a critical task. As an inter-disciplinary team of researchers, our aim is to fill this vital role. In this tutorial, we offer a holistic view on Blockchain Data Analytics. Starting with the core components of Blockchain, we will discuss the state of art in Blockchain data analytics for privacy, security, finance, and management domains. We will share tutorial notes and further reading pointers on the tutorial website blockchaintutorial.github.io.

【Keywords】: blockchain; bitcoin; ethereum; machine learning

5. Discourse Processing and Its Applications in Text Mining.

Paper Link】 【Pages】:7

【Authors】: Shafiq R. Joty ; Giuseppe Carenini ; Raymond T. Ng ; Gabriel Murray

【Abstract】: Discourse processing is a suite of Natural Language Processing (NLP) tasks to uncover linguistic structures from texts at several levels, which can support many text mining applications. This involves identifying the topic structure, the coherence structure, the coreference structure, and the conversation structure for conversational discourse. Taken together, these structures can inform text summarization, essay scoring, sentiment analysis, machine translation, information extraction, question answering, and thread recovery. The tutorial starts with an overview of basic concepts in discourse analysis – monologue vs. conversation, synchronous vs. asynchronous conversation, and key linguistic structures in discourse analysis. It then covers traditional machine learning methods along with the most recent works using deep learning, and compare their performances on benchmark datasets. For each discourse structure we describe, we show its applications in downstream text mining tasks. Methods and metrics for evaluation are discussed in detail. We conclude the tutorial with an interactive discussion of future challenges and opportunities.

【Keywords】: Discourse processing; coherence; conversation structure; text summarization; information extraction; Discourse Parsing

6. Which Outlier Detector Should I use?

Paper Link】 【Pages】:8

【Authors】: Kai Ming Ting ; Sunil Aryal ; Takashi Washio

【Abstract】: This tutorial has four aims: (1) Providing the current comparative works on different outlier detectors, and analysing the strengths and weaknesses of these works and their recommendations. (2) Presenting non-obvious applications of outlier detectors. This provides examples of how outlier detectors are used in areas which are not normally considered to be the domains of outlier detection. (3) Inviting the research community to explore future research directions, in terms of both comparative study and outlier detection in general. (4) Giving an advice on the factors to consider when choosing an outlier detector, and strengths and weaknesses of some "top" recommended algorithms based on the current understanding in the literature.

【Keywords】: Anomaly Detection tutorial

7. On Multi-query Local Community Detection.

Paper Link】 【Pages】:9-18

【Authors】: Yuchen Bian ; Yaowei Yan ; Wei Cheng ; Wei Wang ; Dongsheng Luo ; Xiang Zhang

【Abstract】: Local community detection, which aims to find a target community containing a set of query nodes, has recently drawn intense research interest. The existing local community detection methods usually assume all query nodes are from the same community and only find a single target community. This is a strict requirement and does not allow much flexibility. In many real-world applications, however, we may not have any prior knowledge about the community memberships of the query nodes, and different query nodes may be from different communities. To address this limitation of the existing methods, we propose a novel memory-based random walk method, MRW, that can simultaneously identify multiple target local communities to which the query nodes belong. In MRW, each query node is associated with a random walker. Different from commonly used memoryless random walk models, MRW records the entire visiting history of each walker. The visiting histories of walkers can help unravel whether they are from the same community or not. Intuitively, walkers with similar visiting histories are more likely to be in the same community. Moreover, MRW allows walkers with similar visiting histories to reinforce each other so that they can better capture the community structure instead of being biased to the query nodes. We provide rigorous theoretical foundation for the proposed method and develop efficient algorithms to identify multiple target local communities simultaneously. Comprehensive experimental evaluations on a variety of real-world datasets demonstrate the effectiveness and efficiency of the proposed method.

【Keywords】: multiple queries; local community detection; memory based random walk

8. Realization of Random Forest for Real-Time Evaluation through Tree Framing.

Paper Link】 【Pages】:19-28

【Authors】: Sebastian Buschjäger ; Kuan-Hsun Chen ; Jian-Jia Chen ; Katharina Morik

【Abstract】: The optimization of learning has always been of particular concern for big data analytics. However, the ongoing integration of machine learning models into everyday life also demand the evaluation to be extremely fast and in real-time. Moreover, in the Internet of Things, the computing facilities that run the learned model are restricted. Hence, the implementation of the model application must take the characteristics of the executing platform into account Although there exist some heuristics that optimize the code, principled approaches for fast execution of learned models are rare. In this paper, we introduce a method that optimizes the execution of Decision Trees (DT). Decision Trees form the basis of many ensemble methods, such as Random Forests (RF) or Extremely Randomized Trees (ET). For these methods to work best, trees should be as large as possible. This challenges the data and the instruction cache of modern CPUs and thus demand a more careful memory layout. Based on a probabilistic view of decision tree execution, we optimize the two most common implementation schemes of decision trees. We discuss the advantages and disadvantages of both implementations and present a theoretically well-founded memory layout which maximizes locality during execution in both cases. The method is applied to three computer architectures, namely ARM (RISC), PPC (Extended RISC) and Intel (CISC) and is automatically adopted to the specific architecture by a code generator. We perform over 1800 experiments on several real-world data sets and report an average speed-up of 2 to 4 across all three architectures by using the proposed memory layout. Moreover, we find that our implementation outperforms sklearn, which was used to train the models by a factor of 1500.

【Keywords】: random forest, decision trees, caching, computer architecture

9. Social Recommendation with Missing Not at Random Data.

Paper Link】 【Pages】:29-38

【Authors】: Jiawei Chen ; Can Wang ; Martin Ester ; Qihao Shi ; Yan Feng ; Chun Chen

【Abstract】: With the explosive growth of online social networks, many social recommendation methods have been proposed and demonstrated that social information has potential to improve the recommendation performance. However, existing social recommendation methods always assume that the data is missing at random (MAR) but this is rarely the case. In fact, by analysing two real-world social recommendation datasets, we observed the following interesting phenomena: (1) users tend to consume and rate the items that they like and the items that have been consumed by their friends. (2) When the items have been consumed by more friends, the average values of the observed ratings will become smaller, not larger as assumed by the existing models. To model these phenomena, we integrate the missing not at random (MNAR) assumption in social recommendation and propose a new social recommendation method SPMF-MNAR, which models the observation process of rating data based on user's preference and social influence. Extensive experiments conducted on large real-world datasets validate that SPMF-MNAR achieves better performance than existing social recommendation methods and the non-social methods based on MNAR assumption.

【Keywords】: MNAR; Social recommendation; Graphic model

10. Prerequisite-Driven Deep Knowledge Tracing.

Paper Link】 【Pages】:39-48

【Authors】: Penghe Chen ; Yu Lu ; Vincent W. Zheng ; Yang Pian

【Abstract】: Knowledge tracing serves as the key technique in the computer supported education environment (e.g., intelligent tutoring systems) to model student's knowledge states. While the Bayesian knowledge tracing and deep knowledge tracing models have been developed, the sparseness of student's exercise data still limits knowledge tracing's performance and applications. In order to address this issue, we advocate for and propose to incorporate the knowledge structure information, especially the prerequisite relations between pedagogical concepts, into the knowledge tracing model. Specifically, by considering how students master pedagogical concepts and their prerequisites, we model prerequisite concept pairs as ordering pairs. With a proper mathematical formulation, this property can be utilized as constraints in designing knowledge tracing model. As a result, the obtained model can have a better performance on student concept mastery prediction. In order to evaluate this model, we test it on five different real world datasets, and the experimental results show that the proposed model achieves a significant performance improvement by comparing with three knowledge tracing models.

【Keywords】: Knowledge Tracing, Prerequisite Modeling, Deep Learning

11. TADA: Trend Alignment with Dual-Attention Multi-task Recurrent Neural Networks for Sales Prediction.

Paper Link】 【Pages】:49-58

【Authors】: Tong Chen ; Hongzhi Yin ; Hongxu Chen ; Lin Wu ; Hao Wang ; Xiaofang Zhou ; Xue Li

【Abstract】: As a common strategy in sales-supply chains, the prediction of sales volume offers precious information for companies to achieve a healthy balance between supply and demand. In practice, the sales prediction task is formulated as a time series prediction problem which aims to predict the future sales volume for different products with the observation of various influential factors (e.g., brand, season, discount, etc.) and corresponding historical sales records. However, with the development of contemporary commercial markets, the dynamic interaction between influential factors with different semantic meanings becomes more subtle, causing challenges in fully capturing dependencies among these variables. Besides, though seeking similar trends from the history benefits the accuracy for the prediction of upcoming sales, existing methods hardly suit sales prediction tasks because the trends in sales time series are more irregular and complex. Hence, we gain insights from the encoder-decoder recurrent neural network (RNN) structure, and propose a novel framework named TADA to carry out trend alignment with dualattention, multi-task RNNs for sales prediction. In TADA, we innovatively divide the influential factors into internal feature and external feature, which are jointly modelled by a multi-task RNN encoder. In the decoding stage, TADA utilizes two attention mechanisms to compensate for the unknown states of influential factors in the future and adaptively align the upcoming trend with relevant historical trends to ensure precise sales prediction. Experimental results on two real-world datasets comprehensively show the superiority of TADA in sales prediction tasks against other state-of-the-art competitors.

【Keywords】: Sales Prediction, Time Series Data, Deep Neural Network

12. Rational Neural Networks for Approximating Graph Convolution Operator on Jump Discontinuities.

Paper Link】 【Pages】:59-68

【Authors】: Zhiqian Chen ; Feng Chen ; Rongjie Lai ; Xuchao Zhang ; Chang-Tien Lu

【Abstract】: For node level graph encoding, a recent important state-of-art method is the graph convolutional networks (GCN), which nicely integrate local vertex features and graph topology in the spectral domain. However, current studies suffer from several drawbacks: (1) graph CNNs rely on Chebyshev polynomial approximation which results in oscillatory approximation at jump discontinuities; (2) Increasing the order of Chebyshev polynomial can reduce the oscillations issue, but also incurs unaffordable computational cost; (3) Chebyshev polynomials require degree Ω(poly(1/ε)) to approximate a jump signal such as |x|, while rational function only needs O(poly log(1/ε)). However, it is non-trivial to apply rational approximation without increasing computational complexity due to the denominator. In this paper, the superiority of rational approximation is exploited for graph signal recovering. RatioanlNet is proposed to integrate rational function and neural networks. We show that the rational function of eigenvalues can be rewritten as a function of graph Laplacian, which can avoid multiplication by the eigenvector matrix. Focusing on the analysis of approximation on graph convolution operation, a graph signal regression task is formulated. Under graph signal regression task, its time complexity can be significantly reduced by graph Fourier transform. To overcome the local minimum problem of neural networks model, a relaxed Remez algorithm is utilized to initialize the weight parameters. Convergence rate of RatioanlNet and polynomial based methods on a jump signal is analyzed for a theoretical guarantee. The extensive experimental results demonstrated that our approach could effectively characterize the jump discontinuities, outperforming competing methods by a substantial margin on both synthetic and real-world graphs.

【Keywords】: deep learning; graph; approximation

13. Learning Community Structure with Variational Autoencoder.

Paper Link】 【Pages】:69-78

【Authors】: Jun Jin Choong ; Xin Liu ; Tsuyoshi Murata

【Abstract】: Discovering community structure in networks remains a fundamentally challenging task. From scientific domains such as biology, chemistry and physics to social networks the challenge of identifying community structures in different kinds of network is challenging since there is no universal definition of community structure. Furthermore, with the surge of social networks, content information has played a pivotal role in defining community structure, demanding techniques beyond its traditional approach. Recently, network representation learning have shown tremendous promise. Leveraging on recent advances in deep learning, one can exploit deep learning's superiority to a network problem. Most predominantly, successes in supervised and semi-supervised task has shown promising results in network representation learning tasks such as link prediction and graph classification. However, much has yet to be explored in the literature of community detection which is an unsupervised learning task. This paper proposes a deep generative model for community detection and network generation. Empowered with Bayesian deep learning, deep generative models are capable of exploiting non-linearities while giving insights in terms of uncertainty. Hence, this paper proposes Variational Graph Autoencoder for Community Detection (VGAECD). Extensive experiment shows that it is capable of outperforming existing state-of-the-art methods. The generalization of the proposed model also allows the model to be considered as a graph generator. Additionally, unlike traditional methods, the proposed model does not require a predefined community structure definition. Instead, it assumes the existence of latent similarity between nodes and allows the model to find these similarities through an automatic model selection process. Optionally, it is capable of exploiting feature-rich information of a network such as node content, further increasing its performance.

【Keywords】: community detection; variational autoencoder; generative model

14. Imbalanced Augmented Class Learning with Unlabeled Data by Label Confidence Propagation.

Paper Link】 【Pages】:79-88

【Authors】: Si-Yu Ding ; Xu-Ying Liu ; Min-Ling Zhang

【Abstract】: As a practical problem in open and dynamic environments, class-incremental learning has attracted much attention from many fields. Learning with augmented class (LAC) problem formulates one of the core difficulties of class-incremental learning: instances of augmented class need to be predicted with the restriction that only examples from seen classes are observed in training phase. LACU framework advances the study of LAC problem by exploiting unlabeled data, while it does not take into account an important practical problem widely-existing in real-world applications of LAC - imbalanced class distributions among seen classes, which will further increase the learning difficulties of LAC problem. We propose a novel approach Label Confidence Propagation (LCP) to tackle the problem of imbalanced augmented class learning with unlabeled data. LCP enlarges the labeled training data set by estimating class labels for unlabeled data, to meet the challenge of lacking supervision information of augmented classes via identifying some of their instances, and to alleviate the damage of class-imbalance via identifying more instances for each seen class. LCP firstly initializes label confidence, i.e., the posterior probability distributions of all classes (including augmented classes) for unlabeled data, then iteratively propagates label confidence to identify a valid label for each unlabeled instance to enlarge the labeled training data set. Finally, LCP predicts for unseen instances by linear neighborhood reconstruction to be robust to potential noise. Results on abundant experiments show that LCP is significantly superior to many state-of-the-art methods, and robust to high imbalance ratio and high open level. LCP can sufficiently unleash its strength especially when there are abundant unlabeled data available.

【Keywords】: augmented class learning; LACU framework; class imbalance; unlabeled data

15. Sequential Pattern Sampling with Norm Constraints.

Paper Link】 【Pages】:89-98

【Authors】: Lamine Diop ; Cheikh Talibouya Diop ; Arnaud Giacometti ; Dominique Haoyuan Li ; Arnaud Soulet

【Abstract】: In recent years, the field of pattern mining has shifted to user-centered methods. In such a context, it is necessary to have a tight coupling between the system and the user where mining techniques provide results at any time or within a short response time of only few seconds. Pattern sampling is a non-exhaustive method for instantly discovering relevant patterns that ensures a good interactivity while providing strong statistical guarantees due to its random nature. Curiously, such an approach investigated for itemsets and subgraphs has not yet been applied to sequential patterns, which are useful for a wide range of mining tasks and application fields. In this paper, we propose the first method for sequential pattern sampling. In addition to address sequential data, the originality of our approach is to introduce a constraint on the norm to control the length of the drawn patterns and to avoid the pitfall of the "long tail" where the rarest patterns flood the user. We propose a new constrained two-step random procedure, named CSSampling, that randomly draws sequential patterns according to frequency with an interval constraint on the norm. We demonstrate that this method performs an exact sampling. Moreover, despite the use of rejection sampling, the experimental study shows that CSSampling remains efficient and the constraint helps to draw general patterns of the "head". We also illustrate how to benefit from these sampled patterns to instantly build an associative classifier dedicated to sequences. This classification approach rivals state of the art proposals showing the interest of constrained sequential pattern sampling.

【Keywords】: Pattern Mining, Pattern Sampling, Sequential Data

16. Probabilistic Streaming Tensor Decomposition.

Paper Link】 【Pages】:99-108

【Authors】: Yishuai Du ; Yimin Zheng ; Kuang-chih Lee ; Shandian Zhe

【Abstract】: Tensor decomposition is a fundamental tool for multiway data analysis. While most decomposition algorithms operate a collection of static data and perform batch processes, many applications produce data in a streaming manner — every time a subset of entries are generated, and previously seen entries cannot be revisited. In such scenarios, traditional decomposition approaches will be inappropriate, because they cannot provide timely updates when new data come in, and they need to access the whole dataset many times for batch optimization. To address this issue, we propose POST, a PrObabilistic Streaming Tensor decomposition algorithm, which enables real-time updates and predictions upon receiving new tensor entries, and supports dynamic growth of all the modes. Compared with the state-of-the-art streaming decomposition approach MAST, POST is more flexible in that it can handle arbitrary orders of streaming entries, and hence is more widely applicable. In addition, as a Bayesian inference algorithm, POST can quantify the uncertainty of the latent embeddings via their posterior distributions, and the confidence levels of the missing entry value predictions. On several real-world datasets, POST exhibits better or comparable predictive performance than MAST and other static decomposition algorithms.

【Keywords】: tensor data; streaming decomposition; posterior inference

17. Hierarchical Hybrid Feature Model for Top-N Context-Aware Recommendation.

Paper Link】 【Pages】:109-116

【Authors】: Yingpeng Du ; Hongzhi Liu ; Zhonghai Wu ; Xing Zhang

【Abstract】: Precise prediction of users' behavior is critical for users' satisfaction and platforms' benefit. A user's behavior heavily depends on the user's general preference and contextual information (current location, weather etc.). In this paper, we propose a succinct hierarchical framework named Hierarchical Hybrid Feature Model (HHFM). It combines users' general taste and diverse contextual information into a hybrid feature representation to profile users' dynamic preference w.r.t context. Meanwhile, we propose an n-way concatenation pooling strategy to capture the non-linear and complex inherent structures of real-world data, which were ignored by most existing methods like Factorization Machines. Conceptually, our model subsumes several existing methods when choosing proper concatenation and pooling strategies. Extensive experiments show our model consistently outperforms state-of-the-art methods on three real-world data sets.

【Keywords】: Context-aware recommendation, Feature interactions, Hierarchical framework, Pooling strategy

18. Deep Semantic Correlation Learning Based Hashing for Multimedia Cross-Modal Retrieval.

Paper Link】 【Pages】:117-126

【Authors】: Xiaolong Gong ; Linpeng Huang ; Fuwei Wang

【Abstract】: For many large-scale multimedia datasets and web contents, the nearest neighbor search methods based on the hashing strategy for cross-modal retrieval have attracted considerable attention due to its fast query speed and low storage cost. Most existing hashing methods try to map different modalities to Hamming embedding in a supervised way where the semantic information comes from a large manual label matrix and each sample in different modalities is usually encoded by a sparse label vector. However, previous studies didn't address the semantic correlation learning challenges and couldn't make the best use of the prior semantic information. Therefore, they cannot preserve the accurate semantic similarities and often degrade the performance of hashing function learning. To fill this gap, we firstly proposed a novel Deep Semantic Correlation learning based Hashing framework (DSCH) that generates unified hash codes in an end-to-end deep learning architecture for cross-modal retrieval task. The major contribution in this work is to effectively automatically construct the semantic correlation between data representation and demonstrate how to utilize correlation information to generate hash codes for new samples. In particular, DSCH integrates latent semantic embedding with a unified hash embedding to strengthen the similarity information among multiple modalities. Furthermore, additional graph regularization is employed in our framework, to capture the correspondences from the inter-modal and intra-modal. Our model simultaneously learns the semantic correlation and the unified hash codes, which enhances the effectiveness of cross-modal retrieval task. Experimental results show the superior accuracy of our proposed approach to several state-of-the-art cross-modality methods on two large datasets.

【Keywords】: Deep hashing; Cross-modal Retrieval; Multi modal Embedding; Semantic Correlation

19. Learning Sequential Behavior Representations for Fraud Detection.

Paper Link】 【Pages】:127-136

【Authors】: Jia Guo ; Guannan Liu ; Yuan Zuo ; Junjie Wu

【Abstract】: Fraud detection is usually regarded as finding a needle in haystack, which is a challenging task because fraudulences are buried in massive normal behaviors. Indeed, a fraudulent incident usually takes place in consecutive time steps to gain illegal benefits, which provides unique clues to probing frauds by considering a complete behavioral sequence, rather than detecting frauds from a snapshot of behaviors. Also, fraudulent behaviors may entail different parties, such that the interaction pattern between sources and targets can help distinguish frauds from normal behaviors. Therefore, in this paper, we model the attributed behavioral sequences generated from consecutive behaviors, in order to capture the sequential patterns, while those deviate from the pattern can be regarded as fraudulence. Considering the characteristics of behavioral sequence, we propose a novel model, HAInt-LSTM, by augmenting traditional LSTM with a modified forget gate where interval time between consecutive time steps are considered. Meanwhile, we employ a self-historical attention mechanism to allow for long-time dependencies, which can help identify repeated or cyclical appearances. In addition, we encode the source information as an interaction module to enhance the learning of behavioral sequences. To validate the effectiveness of the learned sequential behavior representations, we experiment on real-world telecommunication dataset under both supervised and unsupervised scenarios. Experimental results show that the learned representations can better identify fraudulent behaviors, and also show a clear cut with normal sequences in the lower dimensional embedding space through visualization. Last but not least, we visualize the weights of attention mechanism to provide rational interpretation of human behavioral periodicity.

【Keywords】: Fraud Detection, Behavioral Sequence, LSTM, Attention

20. Defending Against Adversarial Samples Without Security through Obscurity.

Paper Link】 【Pages】:137-146

【Authors】: Wenbo Guo ; Qinglong Wang ; Kaixuan Zhang ; Alexander G. Ororbia II ; Sui Huang ; Xue Liu ; C. Lee Giles ; Lin Lin ; Xinyu Xing

【Abstract】: It has been recently shown that deep neural networks (DNNs) are susceptible to a particular type of attack that exploits a fundamental flaw in their design. This attack consists of generating particular synthetic examples referred to as adversarial samples. These samples are constructed by slightly manipulating real data-points that change "fool" the original DNN model, forcing it to misclassify previously correctly classified samples with high confidence. Many believe addressing this flaw is essential for DNNs to be used in critical applications such as cyber security. Previous work has shown that learning algorithms that enhance the robustness of DNN models all use the tactic of "security through obscurity". This means that security can be guaranteed only if one can obscure the learning algorithms from adversaries. Once the learning technique is disclosed, DNNs protected by these defense mechanisms are still susceptible to adversarial samples. In this work, we investigate by examining how previous research dealt with this and propose a generic approach to enhance a DNN's resistance to adversarial samples. More specifically, our approach integrates a data transformation module with a DNN, making it robust even if we reveal the underlying learning algorithm. To demonstrate the generality of our proposed approach and its potential for handling cyber security applications, we evaluate our method and several other existing solutions on datasets publicly available, such as a large scale malware dataset and MNIST and IMDB datasets. Our results indicate that our approach typically provides superior classification performance and robustness to attacks compared with state-of-art solutions.

【Keywords】: Adversarial deep learning, security through obscurity, data transformation, malware detection

21. EDLT: Enabling Deep Learning for Generic Data Classification.

Paper Link】 【Pages】:147-156

【Authors】: Huimei Han ; Xingquan Zhu ; Ying Li

【Abstract】: This paper proposes to enable deep learning for generic machine learning tasks. Our goal is to allow deep learning to be applied to data which are already represented in instancefeature tabular format for a better classification accuracy. Because deep learning relies on spatial/temporal correlation to learn new feature representation, our theme is to convert each instance of the original dataset into a synthetic matrix format to take the full advantage of the feature learning power of deep learning methods. To maximize the correlation of the matrix, we use 0/1 optimization to reorder features such that the ones with strong correlations are adjacent to each other. By using a two dimensional feature reordering, we are able to create a synthetic matrix, as an image, to represent each instance. Because the synthetic image preserves the original feature values and data correlation, existing deep learning algorithms, such as convolutional neural networks (CNN), can be applied to learn effective features for classification. Our experiments on 20 generic datasets, using CNN as the deep learning classifier, confirm that enabling deep learning to generic datasets has clear performance gain, compared to generic machine learning methods. In addition, the proposed method consistently outperforms simple baselines of using CNN for generic dataset. As a result, our research allows deep learning to be broadly applied to generic datasets for learning and classification (Algorithm source code is available at http://github.com/hhmzwc/EDLT).

【Keywords】: Deep learning; feature learning; convolutional neural networks; classification

22. dpMood: Exploiting Local and Periodic Typing Dynamics for Personalized Mood Prediction.

Paper Link】 【Pages】:157-166

【Authors】: He Huang ; Bokai Cao ; Philip S. Yu ; Chang-Dong Wang ; Alex D. Leow

【Abstract】: Mood disorders are common and associated with significant morbidity and mortality. Early diagnosis has the potential to greatly alleviate the burden of mental illness and the ever increasing costs to families and society. Mobile devices provide us a promising opportunity to detect the users' mood in an unobtrusive manner. In this study, we use a custom keyboard which collects keystrokes' meta-data and accelerometer values. Based on the collected time series data in multiple modalities, we propose a deep personalized mood prediction approach, called dpMood, by integrating convolutional and recurrent deep architectures as well as exploring each individual's circadian rhythm. Experimental results not only demonstrate the feasibility and effectiveness of using smart-phone meta-data to predict the presence and severity of mood disturbances in bipolar subjects, but also show the potential of personalized medical treatment for mood disorders.

【Keywords】: typing dynamics; bipolar disorder; mood prediction; deep learning

23. Asynchronous Dual Free Stochastic Dual Coordinate Ascent for Distributed Data Mining.

Paper Link】 【Pages】:167-176

【Authors】: Zhouyuan Huo ; Xue Jiang ; Heng Huang

【Abstract】: The primal-dual distributed computational methods have broad large-scale data mining applications. Previous primal-dual distributed methods are not applicable when the dual formulation is not available, e.g. the sum-of-non-convex objectives. Moreover, these algorithms and theoretical analysis are based on the fundamental assumption that the computing speeds of multiple machines in a cluster are similar. However, the straggler problem is an unavoidable practical issue in the distributed system because of the existence of slow machines. Therefore, the total computational time of the distributed optimization methods is highly dependent on the slowest machine. In this paper, we address these two issues by proposing novel distributed asynchronous dual free stochastic dual coordinate ascent algorithm for distributed data mining. Our method does not need the dual formulation of the target problem in the computation. We tackle the straggler problem through asynchronous communication and the negative effect of slow machines is significantly alleviated. We also analyze the convergence rate of our method and prove the linear convergence rate even if the individual functions in objective are non-convex. Experiments on both convex and nonconvex loss functions are used to validate our statements.

【Keywords】: Asynchronous Distributed Data Mining; Stochastic Dual Coordinate Ascent; Big Data Mining

24. Representing Networks with 3D Shapes.

Paper Link】 【Pages】:177-186

【Authors】: Shengmin Jin ; Reza Zafarani

【Abstract】: There has been a surge of interest in machine learning in graphs, as graphs and networks are ubiquitous across the globe and within science and engineering: road networks, power grids, protein-protein interaction networks, scientific collaboration networks, social networks, to name a few. Recent machine learning research has focused on efficient and effective ways to represent graph structure. Existing graph representation methods such as network embedding techniques learn to map a node (or a graph) to a vector in a low-dimensional vector space. However, the mapped values are often difficult to interpret, lacking information on the structure of the network or its subgraphs. Instead of using a low-dimensional vector to represent a graph, we propose to represent a network with a 3-dimensional shape: the network shape. We introduce the first network shape, a Kronecker hull, which represents a network as a 3D convex polyhedron using stochastic Kronecker graphs. We present a linear time algorithm to build Kronecker hulls. Network shapes provide a compact representation of networks that is easy to visualize and interpret. They captures various properties of not only the network, but also its subgraphs. For instance, they can provide the distribution of subgraphs within a network, e.g., what proportion of subgraphs are structurally similar to the whole network? Using experiments on real-world networks, we show how network shapes can be used in various applications, from computing similarity between two graphs (using the overlap between network shapes of two networks) to graph compression, where a graph with millions of nodes can be represented with a convex hull with less than 40 boundary points.

【Keywords】: Network Shapes, Graph Representation, Kronecker Hulls, Network Convex Hull

25. Cross-Domain Labeled LDA for Cross-Domain Text Classification.

Paper Link】 【Pages】:187-196

【Authors】: Baoyu Jing ; Chenwei Lu ; Deqing Wang ; Fuzhen Zhuang ; Cheng Niu

【Abstract】: Cross-domain text classification aims at building a classifier for a target domain which leverages data from both source and target domain. One promising idea is to minimize the feature distribution differences of the two domains. Most existing studies explicitly minimize such differences by an exact alignment mechanism (aligning features by one-to-one feature alignment, projection matrix etc.). Such exact alignment, however, will restrict models' learning ability and will further impair models' performance on classification tasks when the semantic distributions of different domains are very different. To address this problem, we propose a novel group alignment which aligns the semantics at group level. In addition, to help the model learn better semantic groups and semantics within these groups, we also propose a partial supervision for model's learning in source domain. To this end, we embed the group alignment and a partial supervision into a cross-domain topic model, and propose a Cross-Domain Labeled LDA (CDL-LDA). On the standard 20Newsgroup and Reuters dataset, extensive quantitative (classification, perplexity etc.) and qualitative (topic detection) experiments are conducted to show the effectiveness of the proposed group alignment and partial supervision.

【Keywords】: Cross Domain Text Classification; Topic Modeling; Group Alignment; Partial Supervision

26. Self-Attentive Sequential Recommendation.

Paper Link】 【Pages】:197-206

【Authors】: Wang-Cheng Kang ; Julian McAuley

【Abstract】: Sequential dynamics are a key feature of many modern recommender systems, which seek to capture the 'context' of users' activities on the basis of actions they have performed recently. To capture such patterns, two approaches have proliferated: Markov Chains (MCs) and Recurrent Neural Networks (RNNs). Markov Chains assume that a user's next action can be predicted on the basis of just their last (or last few) actions, while RNNs in principle allow for longer-term semantics to be uncovered. Generally speaking, MC-based methods perform best in extremely sparse datasets, where model parsimony is critical, while RNNs perform better in denser datasets where higher model complexity is affordable. The goal of our work is to balance these two goals, by proposing a self-attention based sequential model (SASRec) that allows us to capture long-term semantics (like an RNN), but, using an attention mechanism, makes its predictions based on relatively few actions (like an MC). At each time step, SASRec seeks to identify which items are 'relevant' from a user's action history, and use them to predict the next item. Extensive empirical studies show that our method outperforms various state-of-the-art sequential models (including MC/CNN/RNN-based approaches) on both sparse and dense datasets. Moreover, the model is an order of magnitude more efficient than comparable CNN/RNN-based models. Visualizations on attention weights also show how our model adaptively handles datasets with various density, and uncovers meaningful patterns in activity sequences.

【Keywords】: Sequential Recommendation; Collaborative Filtering

27. Explainable Time Series Tweaking via Irreversible and Reversible Temporal Transformations.

Paper Link】 【Pages】:207-216

【Authors】: Isak Karlsson ; Jonathan Rebane ; Panagiotis Papapetrou ; Aristides Gionis

【Abstract】: Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. In this paper, we formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, we want to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class. We show that the problem is NP-hard, and focus on two instantiations of the problem, which we refer to as reversible and irreversible time series tweaking. The classifier under investigation is the random shapelet forest classifier. Moreover, we propose two algorithmic solutions for the two problems along with simple optimizations, as well as a baseline solution using the nearest neighbor classifier. An extensive experimental evaluation on a variety of real datasets demonstrates the usefulness and effectiveness of our problem formulation and solutions.

【Keywords】: time series classification; interpretability; explainability; time series tweaking

28. Utilizing In-store Sensors for Revisit Prediction.

Paper Link】 【Pages】:217-226

【Authors】: Sundong Kim ; Jae-Gil Lee

【Abstract】: Predicting revisit intention is very important for the retail industry. Converting first-time visitors to repeating customers is of prime importance for high profitability. However, revisit analyses for offline retail businesses have been conducted on a small scale in previous studies, mainly because their methodologies have mostly relied on manually collected data. With the help of noninvasive monitoring, analyzing a customer's behavior inside stores has become possible, and revisit statistics are available from the large portion of customers who turn on their Wi-Fi or Bluetooth devices. Using Wi-Fi fingerprinting data from ZOYI, we propose a systematic framework to predict the revisit intention of customers using only signals received from their mobile devices. Using data collected from seven flagship stores in downtown Seoul, we achieved 67-80% prediction accuracy for all customers and 64-72% prediction accuracy for first-time visitors. The performance improvement by considering customer mobility was 4.7-24.3%. Our framework showed a feasibility to predict revisits using customer mobility from Wi-Fi signals, that have not been considered in previous marketing studies. Toward this goal, we examine the effect of data collection period on the prediction performance and present the robustness of our model on missing customers. Finally, we discuss the difficulties of securing prediction accuracy with the features that look promising but turn out to be unsatisfactory.

【Keywords】: Predictive Analytics; Feature Engineering; Revisit Prediction; Data Mining; Marketing; Retail Analytics; Indoor Mobility

29. Fast Single-Class Classification and the Principle of Logit Separation.

Paper Link】 【Pages】:227-236

【Authors】: Gil Keren ; Sivan Sabato ; Björn W. Schuller

【Abstract】: We consider neural network training, in applications in which there are many possible classes, but at test-time, the task is a binary classification task of determining whether the given example belongs to a specific class, where the class of interest can be different each time the classifier is applied. For instance, this is the case for real-time image search. We define the Single Logit Classification (SLC) task: training the network so that at test-time, it would be possible to accurately identify whether the example belongs to a given class in a computationally efficient manner, based only on the output logit for this class. We propose a natural principle, the Principle of Logit Separation, as a guideline for choosing and designing losses suitable for the SLC. We show that the cross-entropy loss function is not aligned with the Principle of Logit Separation. In contrast, there are known loss functions, as well as novel batch loss functions that we propose, which are aligned with this principle. In total, we study seven loss functions. Our experiments show that indeed in almost all cases, losses that are aligned with the Principle of Logit Separation obtain at least 20% relative accuracy improvement in the SLC task compared to losses that are not aligned with it, and sometimes considerably more. Furthermore, we show that fast SLC does not cause any drop in binary classification accuracy, compared to standard classification in which all logits are computed, and yields a speedup which grows with the number of classes. For instance, we demonstrate a 10x speedup when the number of classes is 400,000.

【Keywords】: Neural networks; classification; extreme classification; loss functions

30. Summarizing Network Processes with Network-Constrained Boolean Matrix Factorization.

Paper Link】 【Pages】:237-246

【Authors】: Furkan Kocayusufoglu ; Minh X. Hoang ; Ambuj K. Singh

【Abstract】: Understanding and modeling complex network processes is an important task in many real-world applications. The first challenge is to discover patterns in such complex data. In this work, our goal is to summarize different processes in a network by a small yet interpretable set of network patterns, each of which represents a local community of connected nodes frequently participating in the same network processes. We formulate this problem as a Boolean Matrix Factorization with a network constraint, which we prove to be NP-hard. We then propose an efficient algorithm that incrementally adds the best patterns and achieve scalability with two further improvements. First, to decide which network processes contain which network patterns, we introduce two mapping algorithms with linear costs. Second, to systematically mine the exponential subgraph search space for good patterns, we devise two sampling algorithms based on Monte Carlo Markov Chain. Experimental results on both synthetic and real-world datasets show that our solutions are scalable and find network patterns that effectively summarize network processes.

【Keywords】: Boolean Matrix Factorization, Network constraint, Network Processes

31. SSDMV: Semi-Supervised Deep Social Spammer Detection by Multi-view Data Fusion.

Paper Link】 【Pages】:247-256

【Authors】: Chaozhuo Li ; Senzhang Wang ; Lifang He ; Philip S. Yu ; Yanbo Liang ; Zhoujun Li

【Abstract】: The explosive use of social media makes it a popular platform for malicious users, known as social spammers, to overwhelm legitimate users with unwanted content. Most existing social spammer detection approaches are supervised and need a large number of manually labeled data for training, which is infeasible in practice. To address this issue, some semi-supervised models are proposed by incorporating side information such as user profiles and posted tweets. However, these shallow models are not effective to deeply learn the desirable user representations for spammer detection, and the multi-view data are usually loosely coupled without considering their correlations. In this paper, we propose a Semi-Supervised Deep social spammer detection model by Multi-View data fusion (SSDMV). The insight is that we aim to extensively learn the task-relevant discriminative representations for users to address the challenge of annotation scarcity. Under a unified semi-supervised learning framework, we first design a deep multi-view feature learning module which fuses information from different views, and then propose a label inference module to predict labels for users. The mutual refinement between the two modules ensures SSDMV to be able to both generate high quality features and make accurate predictions.Empirically, we evaluate SSDMV over two real social network datasets on three tasks, and the results demonstrate that SSDMV significantly outperforms the state-of-the-art methods.

【Keywords】: Social Spammer Detection; Deep Learning; Semi supervised Learning

32. Accelerating Experimental Design by Incorporating Experimenter Hunches.

Paper Link】 【Pages】:257-266

【Authors】: Cheng Li ; Santu Rana ; Sunil Gupta ; Vu Nguyen ; Svetha Venkatesh ; Alessandra Sutti ; David Rubin de Celis Leal ; Teo Slezak ; Murray Height ; Mazher Mohammed ; Ian Gibson

【Abstract】: Experimental design is a process of obtaining a product with target property via experimentation. Bayesian optimization offers a sample-efficient tool for experimental design when experiments are expensive. Often, expert experimenters have 'hunches' about the behavior of the experimental system, offering potentials to further improve the efficiency. In this paper, we consider per-variable monotonic trend in the underlying property that results in a unimodal trend in those variables for a target value optimization. For example, sweetness of a candy is monotonic to the sugar content. However, to obtain a target sweetness, the utility of the sugar content becomes a unimodal function, which peaks at the value giving the target sweetness and falls off both ways. In this paper, we propose a novel method to solve such problems that achieves two main objectives: a) the monotonicity information is used to the fullest extent possible, whilst ensuring that b) the convergence guarantee remains intact. This is achieved by a two-stage Gaussian process modeling, where the first stage uses the monotonicity trend to model the underlying property, and the second stage uses 'virtual' samples, sampled from the first, to model the target value optimization function. The process is made theoretically consistent by adding appropriate adjustment factor in the posterior computation, necessitated because of using the 'virtual' samples. The proposed method is evaluated through both simulations and real world experimental design problems of a) new short polymer fiber with the target length, and b) designing of a new three dimensional porous scaffolding with a target porosity. In all scenarios our method demonstrates faster convergence than the basic Bayesian optimization approach not using such 'hunches'.

【Keywords】: Bayesian optimization, monotonicity knowledge, prior knowledge, hyper-parameter tuning, experimental design

33. Concept Mining via Embedding.

Paper Link】 【Pages】:267-276

【Authors】: Keqian Li ; Hanwen Zha ; Yu Su ; Xifeng Yan

【Abstract】: In this work, we study the problem of concept mining, which serves as the first step in transforming unstructured text into structured information, and supports downstream analytical tasks such as information extraction, organization, recommendation and search. Previous work mainly relies on statistical signals, existing knowledge bases, or predefined linguistic patterns. In this work, we propose a novel approach that mines concepts based on their occurrence contexts, by learning embedding vector representations that summarize the context information for each possible candidates, and use these embeddings to evaluate the concept's global quality and their fitness to each local context. Experiments over several real-world corpora demonstrate the superior performance of our method. A publicly available implementation is provided at https://github.com/kleeeeea/ECON.

【Keywords】: concept mining; information extraction; embedding learning

34. A Semi-Supervised and Inductive Embedding Model for Churn Prediction of Large-Scale Mobile Games.

Paper Link】 【Pages】:277-286

【Authors】: Xi Liu ; Muhe Xie ; Xidao Wen ; Rui Chen ; Yong Ge ; Nick Duffield ; Na Wang

【Abstract】: Mobile gaming has emerged as a promising market with billion-dollar revenues. A variety of mobile game platforms and services have been developed around the world. One critical challenge for these platforms and services is to understand user churn behavior in mobile games. Accurate churn prediction will benefit many stakeholders such as game developers, advertisers, and platform operators. In this paper, we present the first large-scale churn prediction solution for mobile games. In view of the common limitations of the state-of-the-art methods built upon traditional machine learning models, we devise a novel semi-supervised and inductive embedding model that jointly learns the prediction function and the embedding function for user-app relationships. We model these two functions by deep neural networks with a unique edge embedding technique that is able to capture both contextual information and relationship dynamics. We also design a novel attributed random walk technique that takes into consideration both topological adjacency and attribute similarities. To evaluate the performance of our solution, we collect real-world data from the Samsung Game Launcher platform that includes tens of thousands of games and hundreds of millions of user-app interactions. The experimental results with this data demonstrate the superiority of our proposed model against existing state-of-the-art methods.

【Keywords】: churn prediction; representation learning; graph embedding; semi-supervised learning; mobile apps

35. Enhancing Very Fast Decision Trees with Local Split-Time Predictions.

Paper Link】 【Pages】:287-296

【Authors】: Viktor Losing ; Heiko Wersing ; Barbara Hammer

【Abstract】: An increasing number of industrial areas recognize the opportunities of Big Data, requiring highly efficient algorithms which enable real-time processing to reduce the burden of data storage and maintenance. Decision trees are extremely fast, highly accurate and easy to use in practice. Merging multiple decision trees to an ensemble leads to one of the most powerful machine learning methods. The Very Fast Decision Tree is the state-of-the-art incremental decision tree induction algorithm, capable of learning from massive data streams. It is successful due to its theoretical guarantees based on the Hoeffding bound as well as its competitive performance in terms of classification accuracy and time / space efficiency. In this paper, we increase the efficiency even further by replacing its global splitting scheme, which periodically tries to split every n_min examples. Instead, we utilize local statistics to predict the split-time, thus, avoiding unnecessary split-attempts, usually dominating the computational cost. Concretely, we use the class distributions of previous split-attempts to approximate the minimum number of examples until the Hoeffding bound is met. This cautious approach yields by design a low delay and reduces the number of split-attempts at the same time. We extensively evaluate our method using common stream-learning benchmarks also considering non-stationary environments. The experiments confirm a substantially reduced run-time without a loss in classification performance.

【Keywords】: Decision trees, data streams, supervised classification, concept drift

36. Collective Human Behavior in Cascading System: Discovery, Modeling and Applications.

Paper Link】 【Pages】:297-306

【Authors】: Yunfei Lu ; Linyun Yu ; Tianyang Zhang ; Chengxi Zang ; Peng Cui ; Chaoming Song ; Wenwu Zhu

【Abstract】: The collective behavior, describing spontaneously emerging social processes and events, is ubiquitous in both physical society and online social media. The knowledge of collective behavior is critical in understanding and predicting social movements, fads, riots and so on. However, detecting, quantifying and modeling the collective behavior in online social media at large scale are seldom unexplored. In this paper, we examine a real-world online social media with more than 1.7 million information spreading records, which explicitly document the detailed human behavior in this online information cascading system. We observe evident collective behavior in information cascading, and then propose metrics to quantify the collectivity. We find that previous information cascading models cannot capture the collective behavior in the real-world and thus never utilize it. Furthermore, we propose a generative framework with a latent user interest layer to capture the collective behavior in cascading system. Our framework achieves high accuracy in modeling the information cascades with respect to popularity, structure and collectivity. By leveraging the knowledge of collective behavior, our model shows the capability of making predictions without temporal features or early-stage information. Our framework can serve as a more generalized one in modeling cascading system, and, together with empirical discovery and applications, advance our understanding of human behavior.

【Keywords】: Collective Human Behavior; Information Cascades; Generative Framework

37. ResumeNet: A Learning-Based Framework for Automatic Resume Quality Assessment.

Paper Link】 【Pages】:307-316

【Authors】: Yong Luo ; Huaizheng Zhang ; Yongjie Wang ; Yonggang Wen ; Xinwen Zhang

【Abstract】: Recruitment of appropriate people for certain positions is critical for any companies or organizations. Manually screening to select appropriate candidates from large amounts of resumes can be exhausted and time-consuming. However, there is no public tool that can be directly used for automatic resume quality assessment (RQA). This motivates us to develop a method for automatic RQA. Since there is also no public dataset for model training and evaluation, we build a dataset for RQA by collecting around 10K resumes, which are provided by a private resume management company. By investigating the dataset, we identify some factors or features that could be useful to discriminate good resumes from bad ones, e.g., the consistency between different parts of a resume. Then a neural-network model is designed to predict the quality of each resume, where some text processing techniques are incorporated. To deal with the label deficiency issue in the dataset, we propose several variants of the model by either utilizing the pair/triplet-based loss, or introducing some semi-supervised learning technique to make use of the abundant unlabeled data. Both the presented baseline model and its variants are general and easy to implement. Various popular criteria including the receiver operating characteristic (ROC) curve, F-measure and ranking-based average precision (AP) are adopted for model evaluation. We compare the different variants with our baseline model. Since there is no public algorithm for RQA, we further compare our results with those obtained from a website that can score a resume. Experimental results in terms of different criteria demonstrate effectiveness of the proposed method. We foresee that our approach would transform the way of future human resources management.

【Keywords】: Resume quality assessment; dataset and features; neural network; text processing

38. Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms.

Paper Link】 【Pages】:317-326

【Authors】: Panagiotis Mandros ; Mario Boley ; Jilles Vreeken

【Abstract】: The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, which justifies the usage of worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search.

【Keywords】: knowledge discovery; approximate functional dependency; information theory; optimization; branch-and-bound

39. Tell me Something My Friends do not Know: Diversity Maximization in Social Networks.

Paper Link】 【Pages】:327-336

【Authors】: Antonis Matakos ; Aristides Gionis

【Abstract】: Social media have a great potential to improve information dissemination in our society, yet, they have been held accountable for a number of undesirable effects, such as polarization and filter bubbles. It is thus important to understand these negative phenomena and develop methods to combat them. In this paper we propose a novel approach to address the problem of breaking filter bubbles in social media. We do so by aiming to maximize the diversity of the information exposed to connected social-media users. We formulate the problem of maximizing the diversity of exposure as a quadratic-knapsack problem. We show that the proposed diversity-maximization problem is inapproximable, and thus, we resort to polynomial non-approximable algorithms, inspired by solutions developed for the quadratic knapsack problem, as well as scalable greedy heuristics. We complement our algorithms with instance-specific upper bounds, which are used to provide empirical approximation guarantees for the given problem instances. Our experimental evaluation shows that a proposed greedy algorithm followed by randomized local search is the algorithm of choice given its quality-vs.-efficiency trade-off.

【Keywords】: Diversity Maximization; Filter Bubble; Quadratic Knapsack

40. Intelligent Salary Benchmarking for Talent Recruitment: A Holistic Matrix Factorization Approach.

Paper Link】 【Pages】:337-346

【Authors】: Qingxin Meng ; Hengshu Zhu ; Keli Xiao ; Hui Xiong

【Abstract】: As a vital process to the success of an organization, salary benchmarking aims at identifying the right market rate for each job position. Traditional approaches for salary benchmarking heavily rely on the experiences from domain experts and limited market survey data, which have difficulties in handling the dynamic scenarios with the timely benchmarking requirement. To this end, in this paper, we propose a data-driven approach for intelligent salary benchmarking based on large-scale fine-grained online recruitment data. Specifically, we first construct a salary matrix based on the large-scale recruitment data and creatively formalize the salary benchmarking problem as a matrix completion task. Along this line, we develop a Holistic Salary Benchmarking Matrix Factorization (HSBMF) model for predicting the missing salary information in the salary matrix. Indeed, by integrating multiple confounding factors, such as company similarity, job similarity, and spatial-temporal similarity, HSBMF is able to provide a holistic and dynamic view for fine-grained salary benchmarking. Finally, extensive experiments on large-scale real-world data clearly validate the effectiveness of our approach for job salary benchmarking.

【Keywords】: Salary Benchmarking; Talent Recruitment; Matrix Factorization

41. Maximally Consistent Sampling and the Jaccard Index of Probability Distributions.

Paper Link】 【Pages】:347-356

【Authors】: Ryan Moulton ; Yunjiang Jiang

【Abstract】: We introduce simple, efficient algorithms for computing a MinHash of a probability distribution, suitable for both sparse and dense data, with equivalent running times to the state of the art for both cases. The collision probability of these algorithms is a new measure of the similarity of positive vectors which we investigate in detail. We describe the sense in which this collision probability is optimal for any Locality Sensitive Hash based on sampling. We argue that this similarity measure is more useful for probability distributions than the similarity pursued by other algorithms for weighted MinHash, and is the natural generalization of the Jaccard index.

【Keywords】: Locality Sensitive Hashing, Retrieval, MinHash, Jaccard index, Jensen-Shannon divergence

42. Apk2vec: Semi-Supervised Multi-view Representation Learning for Profiling Android Applications.

Paper Link】 【Pages】:357-366

【Authors】: Annamalai Narayanan ; Charlie Soh ; Lihui Chen ; Yang Liu ; Lipo Wang

【Abstract】: Building behavior profiles of Android applications (apps) with holistic, rich and multi-view information (e.g., incorporating several semantic views of an app such as API sequences, system calls, etc.) would help catering downstream analytics tasks such as app categorization, recommendation and malware analysis significantly better. Towards this goal, we design a semisupervised Representation Learning (RL) framework named apk2vec to automatically generate a compact representation (aka profile/embedding) for a given app. More specifically, apk2vec has the three following unique characteristics which make it an excellent choice for large-scale app profiling: (1) it encompasses information from multiple semantic views such as API sequences, permissions, etc., (2) being a semi-supervised embedding technique, it can make use of labels associated with apps (e.g., malware family or app category labels) to build high quality app profiles, and (3) it combines RL and feature hashing which allows it to efficiently build profiles of apps that stream over time (i.e., online learning). The resulting semi-supervised multi-view hash embeddings of apps could then be used for a wide variety of downstream tasks such as the ones mentioned above. Our extensive evaluations with more than 42,000 apps demonstrate that apk2vec's app profiles could significantly outperform state-of-the-art techniques in four app analytics tasks namely, malware detection, familial clustering, app clone detection and app recommendation.

【Keywords】: Representation Learning; Deep Learning; Graph Embedding; Malware Detection; App Recommendation; Android

43. Collaborative Translational Metric Learning.

Paper Link】 【Pages】:367-376

【Authors】: Chanyoung Park ; Donghyun Kim ; Xing Xie ; Hwanjo Yu

【Abstract】: Recently, matrix factorization–based recommendation methods have been criticized for the problem raised by the triangle inequality violation. Although several metric learning–based approaches have been proposed to overcome this issue, existing approaches typically project each user to a single point in the metric space, and thus do not suffice for properly modeling the intensity and the heterogeneity of user-item relationships in implicit feedback. In this paper, we propose TransCF to discover such latent user-item relationships embodied in implicit user-item interactions. Inspired by the translation mechanism popularized by knowledge graph embedding, we construct user-item specific translation vectors by employing the neighborhood information of users and items, and translate each user toward items according to the user's relationships with the items. Our proposed method outperforms several state-of-the-art methods for top-N recommendation on seven real-world data by up to 17% in terms of hit ratio. We also conduct extensive qualitative evaluations on the translation vectors learned by our proposed method to ascertain the benefit of adopting the translation mechanism for implicit feedback-based recommendations.

【Keywords】: Recommender system, Metric learning, Collaborative filtering

44. Privacy-Preserving Temporal Record Linkage.

Paper Link】 【Pages】:377-386

【Authors】: Thilina Ranbaduge ; Peter Christen

【Abstract】: Record linkage (RL) is the process of identifying matching records from different databases that refer to the same entity. It is common that the attribute values of records that belong to the same entity do evolve over time, for example people can change their surname or address. Therefore, to identify the records that refer to the same entity over time, RL should make use of temporal information such as the time-stamp of when a record was created and/or update last. However, if RL needs to be conducted on information about people, due to privacy and confidentiality concerns organizations are often not willing or allowed to share sensitive data in their databases, such as personal medical records, or location and financial details, with other organizations. This paper is the first to propose a privacy-preserving temporal record linkage (PPTRL) protocol that can link records across different databases while ensuring the privacy of the sensitive data in these databases. We propose a novel protocol based on Bloom filter encoding which incorporates the temporal information available in records during the linkage process. Our approach uses homomorphic encryption to securely calculate the probabilities of entities changing attribute values in their records over a period of time. Based on these probabilities we generate a set of masking Bloom filters to adjust the similarities between record pairs. We provide a theoretical analysis of the complexity and privacy of our technique and conduct an empirical study on large real databases containing several millions of records. The experimental results show that our approach can achieve better linkage quality compared to non-temporal PPRL while providing privacy to individuals in the databases that are being linked.

【Keywords】: Homomorphic encryption; Bloom filters; Temporal Linkage

45. SuperPart: Supervised Graph Partitioning for Record Linkage.

Paper Link】 【Pages】:387-396

【Authors】: Russell Reas ; Steve Ash ; Rob Barton ; Andrew Borthwick

【Abstract】: Identifying sets of items that are equivalent to one another is a problem common to many fields. Systems addressing this generally have at their core a function s(d_i, d_j) for computing the similarity between pairs of records d_i, d_j. The output of s() can be interpreted as a weighted graph where edges indicate the likelihood of two records matching. Partitioning this graph into equivalence classes is non-trivial due to the presence of inconsistencies and imperfections in s(). Numerous algorithmic approaches to the problem have been proposed, but (1) it is unclear which approach should be used on a given dataset; (2) the algorithms do not generally output a confidence in their decisions; and (3) require error-prone tuning to a particular notion of ground truth. We present SuperPart, a scalable, supervised learning approach to graph partitioning. We demonstrate that SuperPart yields competitive results on the problem of detecting equivalent records without manual selection of algorithms or an exhaustive search over hyperparameters. Also, we show the quality of SuperPart's confidence measures by reporting Area Under the Precision-Recall Curve metrics that exceed a baseline measure by 11%. Finally, to bolster additional research in this domain, we release three new datasets derived from real-world Amazon product data along with ground-truth partitionings.

【Keywords】: clustering; supervised learning; record linkage; data integration

46. Finding Events in Temporal Networks: Segmentation Meets Densest-Subgraph Discovery.

Paper Link】 【Pages】:397-406

【Authors】: Polina Rozenshtein ; Francesco Bonchi ; Aristides Gionis ; Mauro Sozio ; Nikolaj Tatti

【Abstract】: In this paper we study the problem of discovering a timeline of events in a temporal network. We model events as dense subgraphs that occur within intervals of network activity. We formulate the event-discovery task as an optimization problem, where we search for a partition of the network timeline into k non-overlapping intervals, such that the intervals span subgraphs with maximum total density. The output is a sequence of dense subgraphs along with corresponding time intervals, capturing the most interesting events during the network lifetime. A naive solution to our optimization problem has polynomial but prohibitively high running time complexity. We adapt existing recent work on dynamic densest-subgraph discovery and approximate dynamic programming to design a fast approximation algorithm. Next, to ensure richer structure, we adjust the problem formulation to encourage coverage of a larger set of nodes. This problem is NP-hard even for static graphs. However, on static graphs a simple greedy algorithm leads to approximate solution due to submodularity. We extended this greedy approach for the case of temporal networks. However, the approximation guarantee does not hold. Nevertheless, according to the experiments, the algorithm finds good quality solutions.

【Keywords】: approximate dynamic programming; dynamic densest subgraph; segmentation

47. DipTransformation: Enhancing the Structure of a Dataset and Thereby Improving Clustering.

Paper Link】 【Pages】:407-416

【Authors】: Benjamin Schelling ; Claudia Plant

【Abstract】: A data set might have a well-defined structure, but this does not necessarily lead to good clustering results. If the structure is hidden in an unfavourable scaling, clustering will usually fail. The aim of this work is to present a technique which enhances the data set by re-scaling and transforming its features and thus emphasizing and accentuating its structure. If the structure is sufficiently clear, clustering algorithms will perform far better. To show that our algorithm works well, we have conducted extensive experiments on several real-world data sets, where we improve clustering not only for k-means, which is our main focus, but also for other standard clustering algorithms.

【Keywords】: Clustering, Dip-Test, Dataset-Transformation

48. ProSecCo: Progressive Sequence Mining with Convergence Guarantees.

Paper Link】 【Pages】:417-426

【Authors】: Sacha Servan-Schreiber ; Matteo Riondato ; Emanuel Zgraggen

【Abstract】: We present PROSECCO, an algorithm for the progressive mining of frequent sequences from large transactional datasets: it processes the dataset in blocks and outputs, after having analyzed each block, a high-quality approximation of the collection of frequent sequences. These intermediate results have strong probabilistic approximation guarantees and the final output is the exact collection of frequent sequences. Our correctness analysis uses the Vapnik-Chervonenkis (VC) dimension, a key concept from statistical learning theory. The results of our experimental evaluation of PROSECCO on real and artificial datasets show that it produces fast-converging high-quality results almost immediately. Its practical performance is even better than what is guaranteed by the theoretical analysis, and it can even be faster than existing state-of-the-art non-progressive algorithms.

【Keywords】: pattern mining; sequences; sampling; data exploration; statistical learning theory; interactivity

49. Local Low-Rank Hawkes Processes for Temporal User-Item Interactions.

Paper Link】 【Pages】:427-436

【Authors】: Jin Shang ; Mingxuan Sun

【Abstract】: Hawkes processes have become very popular in modeling multiple recurrent user-item interaction events that exhibit mutual-excitation properties in various domains. Generally, modeling the interaction sequence of each user-item pair as an independent Hawkes process is ineffective since the prediction accuracy of future event occurrences for users and items with few observed interactions is low. On the other hand, multivariate Hawkes processes (MHPs) can be used to handle multi-dimensional random processes where different dimensions are correlated with each other. However, an MHP either fails to describe the correct mutual influence between dimensions or become computational inhibitive in most real-world events involving a large collection of users and items. To tackle this challenge, we propose local low-rank Hawkes processes to model large-scale user-item interactions, which efficiently captures the correlations of Hawkes processes in different dimensions. In addition, we design an efficient convex optimization algorithm to estimate model parameters and present a parallel algorithm to further increase the computation efficiency. Extensive experiments on real-world datasets demonstrate the performance improvements of our model in comparison with the state of the art.

【Keywords】: Hawkes Process; Kernel Smoothing; Sequential Data

50. Multi-label Learning with Label Enhancement.

Paper Link】 【Pages】:437-446

【Authors】: Ruifeng Shao ; Ning Xu ; Xin Geng

【Abstract】: The task of multi-label learning is to predict a set of relevant labels for the unseen instance. Traditional multi-label learning algorithms treat each class label as a logical indicator of whether the corresponding label is relevant or irrelevant to the instance, i.e., +1 represents relevant to the instance and -1 represents irrelevant to the instance. Such label represented by -1 or +1 is called logical label. Logical label cannot reflect different label importance. However, for real-world multi-label learning problems, the importance of each possible label is generally different. For the real applications, it is difficult to obtain the label importance information directly. Thus we need a method to reconstruct the essential label importance from the logical multilabel data. To solve this problem, we assume that each multi-label instance is described by a vector of latent real-valued labels, which can reflect the importance of the corresponding labels. Such label is called numerical label. The process of reconstructing the numerical labels from the logical multi-label data via utilizing the logical label information and the topological structure in the feature space is called Label Enhancement. In this paper, we propose a novel multi-label learning framework called LEMLL, i.e., Label Enhanced Multi-Label Learning, which incorporates regression of the numerical labels and label enhancement into a unified framework. Extensive comparative studies validate that the performance of multi-label learning can be improved significantly with label enhancement and LEMLL can effectively reconstruct latent label importance information from logical multi-label data.

【Keywords】: multi-label learning; label importance; label enhancement

51. Synthetic Oversampling with the Majority Class: A New Perspective on Handling Extreme Imbalance.

Paper Link】 【Pages】:447-456

【Authors】: Shiven Sharma ; Colin Bellinger ; Bartosz Krawczyk ; Osmar R. Zaïane ; Nathalie Japkowicz

【Abstract】: The class imbalance problem is a pervasive issue in many real-world domains. Oversampling methods that inflate the rare class by generating synthetic data are amongst the most popular techniques for resolving class imbalance. However, they concentrate on the characteristics of the minority class and use them to guide the oversampling process. By completely overlooking the majority class, they lose a global view on the classification problem and, while alleviating the class imbalance, may negatively impact learnability by generating borderline or overlapping instances. This becomes even more critical when facing extreme class imbalance, where the minority class is strongly underrepresented and on its own does not contain enough information to conduct the oversampling process. We propose a novel method for synthetic oversampling that uses the rich information inherent in the majority class to synthesize minority class data. This is done by generating synthetic data that is at the same Mahalanbois distance from the majority class as the known minority instances. We evaluate over 26 benchmark datasets, and show that our method offers a distinct performance improvement over the existing state-of-the-art in oversampling techniques.

【Keywords】: class imbalance; synthetic oversampling; resampling; classification; Mahalanobis distance

52. GINA: Group Gender Identification Using Privacy-Sensitive Audio Data.

Paper Link】 【Pages】:457-466

【Authors】: Jiaxing Shen ; Oren Lederman ; Jiannong Cao ; Florian Berg ; Shaojie Tang ; Alex Pentland

【Abstract】: Group gender is essential in understanding social interaction and group dynamics. With the increasing privacy concerns of studying face-to-face communication in natural settings, many participants are not open to raw audio recording. Existing voice-based gender identification methods rely on acoustic characteristics caused by physiological differences and phonetic differences. However, these methods might become ineffective with privacy-sensitive audio for two main reasons. First, compared to raw audio, privacy-sensitive audio contains significantly fewer acoustic features. Moreover, natural settings generate various uncertainties in the audio data. In this paper, we make the first attempt to identify group gender using privacy-sensitive audio. Instead of extracting acoustic features from privacy-sensitive audio, we focus on conversational features including turn-taking behaviors and interruption patterns. However, conversational behaviors are unstable in gender identification as human behaviors are affected by many factors like emotion and environment. We utilize ensemble feature selection and a two-stage classification to improve the effectiveness and robustness of our approach. Ensemble feature selection could reduce the risk of choosing an unstable subset of features by aggregating the outputs of multiple feature selectors. In the first stage, we infer the gender composition (mixed-gender or same-gender) of a group which is used as an additional input feature for identifying group gender in the second stage. The estimated gender composition significantly improves the performance as it could partially account for the dynamics in conversational behaviors. According to the experimental evaluation of 100 people in 273 meetings, the proposed method outperforms baseline approaches and achieves an F1-score of 0.77 using linear SVM.

【Keywords】: gender detection; group gender identification; nonlinguistic audio analysis

53. Deep Headline Generation for Clickbait Detection.

Paper Link】 【Pages】:467-476

【Authors】: Kai Shu ; Suhang Wang ; Thai Le ; Dongwon Lee ; Huan Liu

【Abstract】: Clickbaits are catchy social posts or sensational headlines that attempt to lure readers to click. Clickbaits are pervasive on social media and can have significant negative impacts on both users and media ecosystems. For example, users may be misled to receive inaccurate information or fall into click-jacking attacks. Similarly, media platforms could lose readers' trust and revenues due to the prevalence of clickbaits. To computationally detect such clickbaits on social media using a supervised learning framework, one of the major obstacles is the lack of large-scale labeled training data, due to the high cost of labeling. With the recent advancements of deep generative models, to address this challenge, we propose to generate synthetic headlines with specific styles and explore their utilities to help improve clickbait detection. In particular, we propose to generate stylized headlines from original documents with style transfer. Furthermore, as it is non-trivial to generate stylized headlines due to several challenges such as the discrete nature of texts and the requirements of preserving semantic meaning of document while achieving style transfer, we propose a novel solution, named as Stylized Headline Generation (SHG), that can not only generate readable and realistic headlines to enlarge original training data, but also help improve the classification capacity of supervised learning. The experimental results on real-world datasets demonstrate the effectiveness of SHG in generating high-quality and high-utility headlines for clickbait detection.

【Keywords】: Data augmentation; Deep generative model; Clickbait detection

54. Multi-task Sparse Metric Learning for Monitoring Patient Similarity Progression.

Paper Link】 【Pages】:477-486

【Authors】: Qiuling Suo ; Weida Zhong ; Fenglong Ma ; Ye Yuan ; Mengdi Huai ; Aidong Zhang

【Abstract】: A clinically meaningful distance metric, which is learned from measuring patient similarity, plays an important role in clinical decision support applications. Several metric learning approaches have been proposed to measure patient similarity, but they are mostly designed for learning the metric at only one time point/interval. It leads to a problem that those approaches cannot reflect the similarity variations among patients with the progression of diseases. In order to capture similarity information from multiple future time points simultaneously, we formulate a multi-task metric learning approach to identify patient similarity. However, it is challenging to directly apply traditional multi-task metric learning methods to learn such similarities due to the high dimensional, complex and noisy nature of healthcare data. Besides, the disease labels often have clinical relationships, which should not be treated as independent. Unfortunately, traditional formulation of the loss function ignores the degree of labels' similarity. To tackle the aforementioned challenges, we propose mtTSML, a multi-task triplet constrained sparse metric learning method, to monitor the similarity progression of patient pairs. In the proposed model, the distance for each task can be regarded as the combination of a common part and a task-specific one in the transformed low-rank space. We then perform sparse feature selection for each individual task to select the most discriminative information. Moreover, we use triplet constraints to guarantee the margin between similar and less similar pairs according to the ordered information of disease severity levels (i.e. labels). The experimental results on two real-world healthcare datasets show that the proposed multi-task metric learning method significantly outperforms the state-of-the-art baselines, including both single-task and multi-task metric learning methods.

【Keywords】: Metric Learning; Patient Similarity; Multi-task Learning; Sparse Regularization

55. A Blended Deep Learning Approach for Predicting User Intended Actions.

Paper Link】 【Pages】:487-496

【Authors】: Fei Tan ; Zhi Wei ; Jun He ; Xiang Wu ; Bo Peng ; Haoran Liu ; Zhenyu Yan

【Abstract】: User intended actions are widely seen in many areas. Forecasting these actions and taking proactive measures to optimize business outcome is a crucial step towards sustaining the steady business growth. In this work, we focus on predicting attrition, which is one of typical user intended actions. Conventional attrition predictive modeling strategies suffer a few inherent drawbacks. To overcome these limitations, we propose a novel end-to-end learning scheme to keep track of the evolution of attrition patterns for the predictive modeling. It integrates user activity logs, dynamic and static user profiles based on multi-path learning. It exploits historical user records by establishing a decaying multi-snapshot technique. And finally it employs the precedent user intentions via guiding them to the subsequent learning procedure. As a result, it addresses all disadvantages of conventional methods. We evaluate our methodology on two public data repositories and one private user usage dataset provided by Adobe Creative Cloud. The extensive experiments demonstrate that it can offer the appealing performance in comparison with several existing approaches as rated by different popular metrics. Furthermore, we introduce an advanced interpretation and visualization strategy to effectively characterize the periodicity of user activity logs. It can help to pinpoint important factors that are critical to user attrition and retention and thus suggests actionable improvement targets for business practice. Our work will provide useful insights into the prediction and elucidation of other user intended actions as well.

【Keywords】: Customer Attrition; Predictive Modeling; Interpretation

56. Interactive Unknowns Recommendation in E-Learning Systems.

Paper Link】 【Pages】:497-506

【Authors】: Shan-Yun Teng ; Jundong Li ; Lo Pang-Yun Ting ; Kun-Ta Chuang ; Huan Liu

【Abstract】: The arise of E-learning systems has led to an anytime-anywhere-learning environment for everyone by providing various online courses and tests. However, due to the lack of teacher-student interaction, such ubiquitous learning is generally not as effective as offline classes. In traditional offline courses, teachers facilitate real-time interaction to teach students in accordance with personal aptitude from students' feedback in classes. Without the interruption of instructors, it is difficult for users to be aware of personal unknowns. In this paper, we address an important issue on the exploration of 'user unknowns' from an interactive question-answering process in E-learning systems. A novel interactive learning system, called CagMab, is devised to interactively recommend questions with a round-by-round strategy, which contributes to applications such as a conversational bot for self-evaluation. The flow enables users to discover their weakness and further helps them to progress. In fact, despite its importance, discovering personal unknowns remains a challenging problem in E-learning systems. Even though formulating the problem with the multi-armed bandit framework provides a solution, it often leads to suboptimal results for interactive unknowns recommendation as it simply relies on the contextual features of answered questions. Note that each question is associated with concepts and similar concepts are likely to be linked manually or systematically, which naturally forms the concept graphs. Mining the rich relationships among users, questions and concepts could be potentially helpful in providing better unknowns recommendation. To this end, in this paper, we develop a novel interactive learning framework by borrowing strengths from concept-aware graph embedding for learning user unknowns. Our experimental studies on real data show that the proposed framework can effectively discover user unknowns in an interactive fashion for the recommendation in E-learning systems.

【Keywords】: Unknowns Recommender System; Multi-Armed Bandit; Concept-Aware Graph Embedding; E-learning System

57. The Impact of Environmental Stressors on Human Trafficking.

Paper Link】 【Pages】:507-516

【Authors】: Sabina Tomkins ; Golnoosh Farnadi ; Brian Amanatullah ; Lise Getoor ; Steven Minton

【Abstract】: Severe environmental events have extreme effects on all segments of society, including criminal activity. Extreme weather events, such as tropical storms, fires, and floods create instability in communities, and can be exploited by criminal organizations. Here we investigate the potential impact of catastrophic storms on the criminal activity of human trafficking. We propose three theories of how these catastrophic storms might impact trafficking and provide evidence for each. Researching human trafficking is made difficult by its illicit nature and the obscurity of high-quality data. Here, we analyze online advertisements for services which can be collected at scale and provide insights into traffickers' behavior. To successfully combine relevant heterogenous sources of information, as well as spatial and temporal structure, we propose a collective, probabilistic approach. We implement this approach with Probabilistic Soft Logic, a probabilistic programming framework which can flexibly model relational structure and for which inference of future locations is highly efficient. Furthermore, this framework can be used to model hidden structure, such as latent links between locations. Our proposed approach can model and predict how traffickers move. In addition, we propose a model which learns connections between locations. This model is then adapted to have knowledge of environmental events, and we demonstrate that incorporating knowledge of environmental events can improve prediction of future locations. While we have validated our models on the impact of severe weather on human trafficking, we believe our models can be generalized to a variety of other settings in which environmental events impact human behavior.

【Keywords】: human trafficking; spatio temporal; environmental stressors; probabilistic programming

58. Multi-label Answer Aggregation Based on Joint Matrix Factorization.

Paper Link】 【Pages】:517-526

【Authors】: Jinzheng Tu ; Guoxian Yu ; Carlotta Domeniconi ; Jun Wang ; Guoqiang Xiao ; Maozu Guo

【Abstract】: Crowdsourcing is a useful and economic approach to data annotation. To obtain annotation of high quality, various aggregation approaches have been developed, which take into account different factors that impact the quality of aggregated answers. However, existing methods generally focus on single-label (multi-class and binary) tasks, and they ignore the inter-correlation between labels, and thus may have compromised quality. In this paper, we introduce a Multi-Label answer aggregation approach based on Joint Matrix Factorization (ML-JMF). ML-JMF selectively and jointly factorizes the sample-label association matrices collected from different annotators into products of individual and shared low-rank matrices. As such, it takes advantage of the robustness of low-rank matrix approximation to noise, and reduces the impact of unreliable annotators by assigning small (zero) weights to their annotation matrices. In addition, it takes advantage of the correlation among labels by leveraging the shared low-rank matrix, and of the similarity between annotators using the individual low-rank matrices to guide the factorization. ML-JMF pursues the low-rank matrices via a unified objective function, and introduces an iterative technique to optimize it. ML-JMF finally uses the optimized low-rank matrices and weights to infer the ground-truth labels. Our experimental results on multi-label datasets show that ML-JMF outperforms competitive methods in inferring ground truth labels. Our approach can identify unreliable annotators, and is robust against their misleading answers through the assignment of small (zero) weights to their annotation.

【Keywords】: Crowdsourcing, Multi-Label Learning, Joint Matrix Factorization, Spammers

59. Semi-Supervised Anomaly Detection with an Application to Water Analytics.

Paper Link】 【Pages】:527-536

【Authors】: Vincent Vercruyssen ; Wannes Meert ; Gust Verbruggen ; Koen Maes ; Ruben Baumer ; Jesse Davis

【Abstract】: Nowadays, all aspects of a production process are continuously monitored and visualized in a dashboard. Equipment is monitored using a variety of sensors, natural resource usage is tracked, and interventions are recorded. In this context, a common task is to identify anomalous behavior from the time series data generated by sensors. As manually analyzing such data is laborious and expensive, automated approaches have the potential to be much more efficient as well as cost effective. While anomaly detection could be posed as a supervised learning problem, typically this is not possible as few or no labeled examples of anomalous behavior are available and it is oftentimes infeasible or undesirable to collect them. Therefore, unsupervised approaches are commonly employed which typically identify anomalies as deviations from normal (i.e., common or frequent) behavior. However, in many real-world settings several types of normal behavior exist that occur less frequently than some anomalous behaviors. In this paper, we propose a novel constrained-clustering-based approach for anomaly detection that works in both an unsupervised and semi-supervised setting. Starting from an unlabeled data set, the approach is able to gradually incorporate expert-provided feedback to improve its performance. We evaluated our approach on real-world water monitoring time series data from supermarkets in collaboration with Colruyt Group, one of Belgiums largest retail companies. Empirically, we found that our approach outperforms the current detection system as well as several other baselines. Our system is currently deployed and used by the company to analyze water usage for 20 stores on a daily basis.

【Keywords】: anomaly detection; time series; water analytics; semi supervised machine learning

60. Incomplete Label Uncertainty Estimation for Petition Victory Prediction with Dynamic Features.

Paper Link】 【Pages】:537-546

【Authors】: Junxiang Wang ; Yuyang Gao ; Andreas Züfle ; Jingyuan Yang ; Liang Zhao

【Abstract】: It is important for decision-makers to effectively and proactively differentiate the significance of various public concerns, and address them with optimal strategy under the limited resources. Online Petition Platforms (OPPs) are replacing traditional social and market surveys for the advantages of low financial cost and high-fidelity social indicators. Despite benefits from OPPs, the raw information from millions of petition signers can easily overwhelm decision makers. In addition, spatio-temporal and semantic dissemination patterns increase the complexity of such OPP data. These two aspects show the necessity of a framework that learns from all available data, which is encoded by dynamic representation of features, to predict whether a petition will successfully lead to a change by decision makers. To build such framework, we need to overcome several challenges including: 1) missing values in dynamic features; 2) strong uncertainty in petition prediction; 3) unknown labels for ongoing petitions and 4) Scalability regarding increasing features and petitions. To address these difficulties simultaneously, we propose a novel chain-structure Multi-task Learning framework with Uncertainty Estimation (MLUE) to predict potentially victorious petitions, which facilitates the process of decision making. Specifically, we divide data into different Increasing Feature Blocks (IFBs) according to missing patterns. Besides, we propose a novel criterion to estimate uncertainty in order to label petitions as early as possible. To handle the challenge of scalability, we present an Expectation-Maximization (EM)-based algorithm to optimize the non-convex objective function accurately and efficiently. Various experiments on six petition datasets demonstrate that our MLUE outperformed other baselines by a large margin.

【Keywords】: petition victory prediction; unlabeled data; missing value; multi task learning; uncertainty estimation

61. Human-Centric Urban Transit Evaluation and Planning.

Paper Link】 【Pages】:547-556

【Authors】: Guojun Wu ; Yanhua Li ; Jie Bao ; Yu Zheng ; Jieping Ye ; Jun Luo

【Abstract】: Public transits, such as buses and subway lines, offer affordable ride-sharing services and reduce the road network traffic, thus have significant impacts in mitigating the urban traffic congestion problem. However, it is non-trivial to evaluate a new transit plan, such as a new bus route or a new subway line, of its future ridership prior to actual deployment, since the travel preferences of passengers along the planned routes may vary. In this paper, we make the first attempt to model passengers' preferences of making various transit choices using a Markov Decision Process (MDP). Moreover, we develop a novel inverse preference learning algorithm to infer the passengers' preferences and predict the future human behavior changes, e.g., ridership, of a new urban transit plan before its deployment. We validate our proposed framework using a unique real-world dataset (from Shenzhen, China) with three subway lines opened during the data time span. With the data collected from both before and after the transit plan deployments, Our evaluation results demonstrated that the proposed framework can predict the ridership with only 19.8% relative error, which is 23%-51% lower than other baseline approaches.

【Keywords】: Urban Computing; Inverse Reinforcement Learning; Human-Centric Transit Plan Evaluation

62. A United Approach to Learning Sparse Attributed Network Embedding.

Paper Link】 【Pages】:557-566

【Authors】: Hao Wang ; Enhong Chen ; Qi Liu ; Tong Xu ; Dongfang Du ; Wen Su ; Xiaopeng Zhang

【Abstract】: Recently, the Network Representation Learning (NRL) techniques, which target at learning the low-dimension vector representation of graph structures, have attracted wide attention due to the effectiveness on various social-oriented application. Though large efforts have been made on the joint analysis combining node attributes with the network structure, they may usually fail to summarize the weighted correlations within nodes and attributes, especially when the nodes suffer extremely sparse attributes. To that end, in this paper, we propose a novel Sparse Attributed Network Embedding (SANE) framework to learn the network structure and sparse attribute information simultaneously in a united approach. Specifically, we first embed the nodes and attributes into a low-dimensional vector space. Then we introduce the pairwise method to capture the interaction between nodes and sparse attributes, and aggregate the attribute information of neighbors to alleviate sparsity for obtaining a better vector representation of node embeddings, which will be used in following network representation learning task. Along this line, we maintain the network structure by maximizing the probability of predicting the center node according to surrounding context nodes. Different from previous work, we introduce an attention mechanism to adaptively weigh the strength of interactions between each context node and the center node, according to the node attribute similarity. Furthermore, we combine the attention network with CBOW model to learn the similarity of the network structure and node attributes simultaneously. Extensive experiments on public datasets have validated the effectiveness of our SANE model with significant margin compared with the state-of-the-art baselines, which demonstrates the potential of adaptively attribute analysis in network embedding.

【Keywords】: network embedding; attributed network; attention mechanism; sparse attributes

63. Deep Structure Learning for Fraud Detection.

Paper Link】 【Pages】:567-576

【Authors】: Haibo Wang ; Chuan Zhou ; Jia Wu ; Weizhen Dang ; Xingquan Zhu ; Jilong Wang

【Abstract】: Fraud detection is of great importance because fraudulent behaviors may mislead consumers or bring huge losses to enterprises. Due to the lockstep feature of fraudulent behaviors, fraud detection problem can be viewed as finding suspicious dense blocks in the attributed bipartite graph. In reality, existing attribute-based methods are not adversarially robust, because fraudsters can take some camouflage actions to cover their behavior attributes as normal. More importantly, existing structural information based methods only consider shallow topology structure, making their effectiveness sensitive to the density of suspicious blocks. In this paper, we propose a novel deep structure learning model named DeepFD to differentiate normal users and suspicious users. DeepFD can preserve the non-linear graph structure and user behavior information simultaneously. Experimental results on different types of datasets demonstrate that DeepFD outperforms the state-of-the-art baselines.

【Keywords】: Fraud Detection; Density Block; Graph Structure Learning; Behavior Similarity

64. ASTM: An Attentional Segmentation Based Topic Model for Short Texts.

Paper Link】 【Pages】:577-586

【Authors】: Jiamiao Wang ; Ling Chen ; Lu Qin ; Xindong Wu

【Abstract】: To address the data sparsity problem in short text understanding, various alternative topic models leveraging word embeddings as background knowledge have been developed recently. However, existing models combine auxiliary information and topic modeling in a straightforward way without considering human reading habits. In contrast, extensive studies have proven that it is full of potential in textual analysis by taking into account human attention. Therefore, we propose a novel model, Attentional Segmentation based Topic Model (ASTM), to integrate both word embeddings as supplementary information and an attention mechanism that segments short text documents into fragments of adjacent words receiving similar attention. Each segment is assigned to a topic and each document can have multiple topics. We evaluate the performance of our model on three real-world short text datasets. The experimental results demonstrate that our model outperforms the state-of-the-art in terms of both topic coherence and text classification.

【Keywords】: topic model, word embedding, topic embedding, short texts , human attention

65. A Reinforcement Learning Framework for Explainable Recommendation.

Paper Link】 【Pages】:587-596

【Authors】: Xiting Wang ; Yiru Chen ; Jie Yang ; Le Wu ; Zhengtao Wu ; Xing Xie

【Abstract】: Explainable recommendation, which provides explanations about why an item is recommended, has attracted increasing attention due to its ability in helping users make better decisions and increasing users' trust in the system. Existing explainable recommendation methods either ignore the working mechanism of the recommendation model or are designed for a specific recommendation model. Moreover, it is difficult for existing methods to ensure the presentation quality of the explanations (e.g., consistency). To solve these problems, we design a reinforcement learning framework for explainable recommendation. Our framework can explain any recommendation model (model-agnostic) and can flexibly control the explanation quality based on the application scenario. To demonstrate the effectiveness of our framework, we show how it can be used for generating sentence-level explanations. Specifically, we instantiate the explanation generator in the framework with a personalized-attention-based neural network. Offline experiments demonstrate that our method can well explain both collaborative filtering methods and deep-learning-based models. Evaluation with human subjects shows that the explanations generated by our method are significantly more useful than the explanations generated by the baselines.

【Keywords】: Explainable recommendation, reinforcement learning, personalized explanation, attention networks

66. Exploiting Topic-Based Adversarial Neural Network for Cross-Domain Keyphrase Extraction.

Paper Link】 【Pages】:597-606

【Authors】: Yanan Wang ; Qi Liu ; Chuan Qin ; Tong Xu ; Yijun Wang ; Enhong Chen ; Hui Xiong

【Abstract】: Keyphrases have been widely used in large document collections for providing a concise summary of document content. While significant efforts have been made on the task of automatic keyphrase extraction, existing methods have challenges in training a robust supervised model when there are insufficient labeled data in the resource-poor domains. To this end, in this paper, we propose a novel Topic-based Adversarial Neural Network (TANN) method, which aims at exploiting the unlabeled data in the target domain and the data in the resource-rich source domain. Specifically, we first explicitly incorporate the global topic information into the document representation using a topic correlation layer. Then, domain-invariant features are learned to allow the efficient transfer from the source domain to the target by utilizing adversarial training on the topic-based representation. Meanwhile, to balance the adversarial training and preserve the domain-private features in the target domain, we reconstruct the target data from both forward and backward directions. Finally, based on the learned features, keyphrase are extracted using a tagging method. Experiments on two realworld cross-domain scenarios demonstrate that our method can significantly improve the performance of keyphrase extraction on unlabeled or insufficiently labeled target domain.

【Keywords】: Adversarial Network; Transfer Learning; Keyphrase Extraction

67. Bug Localization via Supervised Topic Modeling.

Paper Link】 【Pages】:607-616

【Authors】: Yaojing Wang ; Yuan Yao ; Hanghang Tong ; Xuan Huo ; Min Li ; Feng Xu ; Jian Lu

【Abstract】: Bug tracking systems, which help to track the reported software bugs, have been widely used in software development and maintenance. In these systems, recognizing relevant source files among a large number of source files for a given bug report is a time-consuming and labor-intensive task for software developers. To tackle this problem, information retrieval methods have been widely used to capture either the textual similarities or the semantic similarities between bug reports and source files. However, these two types of similarities are usually considered separately and the historical bug fixings are largely ignored by the existing methods. In this paper, we propose a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report. In particular, the proposed model is built upon three key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. By integrating the above three observations, the proposed STMLOCATOR utilizes historical fixings in a supervised way and learns both the textual similarities and semantic similarities between bug reports and source files. We further consider a special type of bug reports with stack-traces in bug reports, and propose a variant of STMLOCATOR to tailor for such bug reports. Experimental evaluations on three real data sets demonstrate that the proposed STMLOCATOR can achieve up to 23.6% improvement in terms of prediction accuracy over its best competitors, and scales linearly with the size of the data. Moreover, the proposed variant further improves STMLOCATOR by up to 76.2% on those bug reports with stack-traces.

【Keywords】: Bug Localization; Bug Report; Supervised topic modeling

68. Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching.

Paper Link】 【Pages】:617-626

【Authors】: Zhaodong Wang ; Zhiwei Qin ; Xiaocheng Tang ; Jieping Ye ; Hongtu Zhu

【Abstract】: Ride dispatching is a central operation task on a ride-sharing platform to continuously match drivers to trip-requesting passengers. In this work, we model the ride dispatching problem as a Markov Decision Process and propose learning solutions based on deep Q-networks with action search to optimize the dispatching policy for drivers on ride-sharing platforms. We train and evaluate dispatching agents for this challenging decision task using real-world spatio-temporal trip data from the DiDi ride-sharing platform. A large-scale dispatching system typically supports many geographical locations with diverse demand-supply settings. To increase learning adaptability and efficiency, we propose a new transfer learning method Correlated Feature Progressive Transfer, along with two existing methods, enabling knowledge transfer in both spatial and temporal spaces. Through an extensive set of experiments, we demonstrate the learning and optimization capabilities of our deep reinforcement learning algorithms. We further show that dispatching policies learned by transferring knowledge from a source city to target cities or across temporal space within the same city significantly outperform those without transfer learning.

【Keywords】: Ride dispatching, Deep reinforcement learning, Transfer learning, Spatio-temporal mining

69. A Low Rank Weighted Graph Convolutional Approach to Weather Prediction.

Paper Link】 【Pages】:627-636

【Authors】: Tyler Wilson ; Pang-Ning Tan ; Lifeng Luo

【Abstract】: Weather forecasting is an important but challenging problem as one must contend with the inherent non-linearities and spatiotemporal autocorrelation present in the data. This paper presents a novel deep learning approach based on a coupled weighted graph convolutional LSTM (WGC-LSTM) to address these challenges. Specifically, our proposed approach uses an LSTM to capture the inherent temporal autocorrelation of the data and a graph convolution to model its spatial relationships. As the weather condition can be influenced by various spatial factors besides the distance between locations, e.g., topography, prevailing winds and jet streams, imposing a fixed graph structure based on the proximity between locations is insufficient to train a robust deep learning model. Instead, our proposed approach treats the adjacency matrix of the graph as a model parameter that can be learned from the training data. However, this introduces an additional O(|V|^2) parameters to be estimated, where V is the number of locations. With large graphs this may also lead to slower performance as well as susceptibility to overfitting. We propose a modified version of our approach that can address this difficulty by assuming that the adjacency matrix is either sparse or low rank. Experimental results using two real-world weather datasets show that WGC-LSTM outperforms all other baseline methods for the majority of the evaluated locations.

【Keywords】: graph convolution; long short-term memory; deep learning; weather prediction

70. Robust Cascade Reconstruction by Steiner Tree Sampling.

Paper Link】 【Pages】:637-646

【Authors】: Han Xiao ; Çigdem Aslay ; Aristides Gionis

【Abstract】: We consider a network where an infection has taken place and a subset of infected nodes has been partially observed. Our goal is to reconstruct the underlying cascade that is likely to have generated these observations. We reduce this cascade-reconstruction problem to computing the marginal probability that a node is infected given the partial observations, which is a #P-hard problem. To circumvent this issue, we resort to estimating infection probabilities by generating a sample of probable cascades, which span the nodes that have already been observed to be infected, and avoid the nodes that have been observed to be uninfected. The sampling problem corresponds to sampling directed Steiner trees with a given set of terminals, which is a problem of independent interest and has received limited attention in the literature. For the latter problem we propose two novel algorithms with provable guarantees on the sampling distribution of the returned Steiner trees. The resulting method improves over state-of-the-art approaches that often make explicit assumptions about the infection-propagation model, or require additional parameters. Our method provides a more robust approach to the cascade-reconstruction problem, which makes weaker assumptions about the infection model, requires fewer additional parameters, and can be used to estimate node infection probabilities. We experimentally validate the proposed reconstruction algorithm on real-world graphs with both synthetic and real cascades. We show that our method outperforms all other baseline strategies in most cases.

【Keywords】: infection cascades; epidemics; cascade reconstruction; Steiner tree sampling

71. Dr. Right!: Embedding-Based Adaptively-Weighted Mixture Multi-classification Model for Finding Right Doctors with Healthcare Experience Data.

Paper Link】 【Pages】:647-656

【Authors】: Xin Xu ; Yanjie Fu ; Haoyi Xiong ; Bo Jin ; Xiaolin Li ; Shuli Hu ; Minghao Yin

【Abstract】: Finding a right doctor with suitable expertise that meets one's health needs is important yet challenging. In this paper, we study the problem of finding high-rated doctors for a specific disease using imbalanced and heterogeneous healthcare experience rating data. We develop a data analytical framework, namely Dr. Right!, which incorporates the so-called network-textual embeddings, together with data-imbalance-aware mixture multi-classification models to rate doctors per specific disease. First, Dr. Right! collects the comments and rating records from patients for doctors on specific diseases from an online hospital and constructs a doctor-patient-disease network, where every edge weight is a pairwise average rating (experience score) among doctors, patients, and diseases. Then, Dr. Right! learns the embeddings of patient experiences from textual comments using the Word2Vec, as well as the embeddings of doctors and diseases from the doctor-patient-disease network via the Node2Vec. The two types of embeddings are fused to represent a doctor-patient pair. With the embedding representations of doctor-patient pairs, Dr. Right! learns an adaptively-weighted mixture multi-classification model to map a doctor-disease pair to an experience rating score, while addressing the challenges of data imbalance and group heterogeneity. Finally, extensive experimental results demonstrate the enhanced performances of Dr. Right! for predicting the disease-specific experience scores of doctors.

【Keywords】: Feature embedding; adaptively weighted learning; mixture multi classicifation; Healthcare; doctor disease pair; disease specific

72. Meta-Graph Based HIN Spectral Embedding: Methods, Analyses, and Insights.

Paper Link】 【Pages】:657-666

【Authors】: Carl Yang ; Yichen Feng ; Pan Li ; Yu Shi ; Jiawei Han

【Abstract】: Heterogeneous information network (HIN) has drawn significant research attention recently, due to its power of modeling multi-typed multi-relational data and facilitating various downstream applications. In this decade, many algorithms have been developed for HIN modeling, including traditional similarity measures and recent embedding techniques. Most algorithms on HIN leverage meta-graphs or meta-paths (special cases of meta-graphs) to capture various semantics. Given any arbitrary set of meta-graphs, existing algorithms either consider them as equally important or study their different importance through supervised learning. Their performance largely relies on prior knowledge and labeled data. While unsupervised embedding has shown to be a fundamental solution for various homogeneous network mining tasks, for HIN, it is a much harder problem due to such a presence of various meta-graphs. In this work, we propose to study the utility of different meta-graphs, as well as how to simultaneously leverage multiple meta-graphs for HIN embedding in an unsupervised manner. Motivated by prolific research on homogeneous networks, especially spectral graph theory, we firstly conduct a systematic empirical study on the spectrum and embedding quality of different meta-graphs on multiple HINs, which leads to an efficient method of meta-graph assessment. It also helps us to gain valuable insight into the higher-order organization of HINs and indicates a practical way of selecting useful embedding dimensions. Further, we explore the challenges of combining multiple meta-graphs to capture the multi-dimensional semantics in HIN through reasoning from mathematical geometry and arrive at an embedding compression method of autoencoder with l2,1-loss, which finds the most informative meta-graphs and embeddings in an end-to-end unsupervised manner. Finally, empirical analysis suggests a unified workflow to close the gap between our meta-graph assessment and combination methods. To the best of our knowledge, this is the first research effort to provide rich theoretical and empirical analyses on the utility of meta-graphs and their combinations, especially regarding HIN embedding. Extensive experimental comparisons with various state-of-the-art neural network based embedding methods on multiple real-world HINs demonstrate the effectiveness and efficiency of our framework in finding useful meta-graphs and generating high-quality HIN embeddings.

【Keywords】: heterogeneous data; spectral analysis; network embedding

73. Towards Interpretation of Recommender Systems with Sorted Explanation Paths.

Paper Link】 【Pages】:667-676

【Authors】: Fan Yang ; Ninghao Liu ; Suhang Wang ; Xia Hu

【Abstract】: Despite the wide application in recent years, most recommender systems are not capable of providing interpretations together with recommendation results, which impedes both deployers and customers from understanding or trusting the results. Recent advances in recommendation models, such as deep learning models, usually involve extracting latent representations of users and items. However, the representation space is not directly comprehensible since each dimension usually does not have any specific meaning. In addition, recommender systems incorporate various sources of information, such as user behaviors, item information, and other side content information. Properly organizing different types of information, as well as effectively selecting important information for interpretation, is challenging and has not been fully tackled by conventional interpretation methods. In this paper, we propose a post-hoc method called Sorted Explanation Paths (SEP) to interpret recommendation results. Specifically, we first build a unified heterogeneous information network to incorporate multiple types of objects and relations based on representations from the recommender system and information from the dataset. Then, we search for explanation paths between given recommendation pairs, and use the set of simple paths to construct semantic explanations. Next, three heuristic metrics, i.e., credibility, readability and diversity, are designed to measure the validity of each explanation path, and to sort all the paths comprehensively. The top-ranked explanation paths are selected as the final interpretation. After that, practical issues on computation and efficiency of the proposed SEP method are also handled by corresponding approaches. Finally, we conduct experiments on three real-world benchmark datasets, and demonstrate the applicability and effectiveness of the proposed SEP method.

【Keywords】: Recommender systems; Post-hoc interpretations; Heterogeneous information network

74. LEEM: Lean Elastic EM for Gaussian Mixture Model via Bounds-Based Filtering.

Paper Link】 【Pages】:677-686

【Authors】: Shuai Yang ; Xipeng Shen

【Abstract】: Gaussian Mixture Model (GMM) is widely used in characterizing complicated real-world data and has played a crucial role in many pattern recognition problems. GMM is usually trained by Expectation Maximization algorithm (EM) which is computationally intensive. Previous studies have proposed a family of variants of EM. By considering only the data points that are the most important to a model in a GMM when updating that model, they help reduce some GMM training time. They are named Elastic EM in this paper. This work proposes several novel optimizations to further accelerate Elastic EM. These optimizations detect and avoid unnecessary probability calculations through novel bounds-based filtering at E-step as well as a Delta optimization to the M-step. Together, they create Lean Elastic EM (LEEM), which brings multi-fold speedups on six datasets of various sizes and dimensions.

【Keywords】: Mixture Model, Acceleration, Expectation Maximization, Elastic EM

75. Collapsed Variational Inference for Nonparametric Bayesian Group Factor Analysis.

Paper Link】 【Pages】:687-696

【Authors】: Sikun Yang ; Heinz Koeppl

【Abstract】: Group factor analysis (GFA) methods have been widely used to infer the common structure and the group-specific signals from multiple related datasets in various fields including systems biology and neuroimaging. To date, most available GFA models require Gibbs sampling or slice sampling to perform inference, which prevents the practical application of GFA to large-scale data. In this paper we present an efficient collapsed variational inference (CVI) algorithm for the nonparametric Bayesian group factor analysis (NGFA) model built upon an hierarchical beta Bernoulli process. Our CVI algorithm proceeds by marginalizing out the group-specific beta process parameters, and then approximating the true posterior in the collapsed space using mean field methods. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our CVI algorithm for the NGFA compared with state-of-the-art GFA methods.

【Keywords】: group factor analysis; hierarchical beta Bernoulli processes; collapsed variational inference

76. DE-RNN: Forecasting the Probability Density Function of Nonlinear Time Series.

Paper Link】 【Pages】:697-706

【Authors】: Kyongmin Yeo ; Igor Melnyk ; Nam Nguyen ; Eun Kyung Lee

【Abstract】: Model-free identification of a nonlinear dynamical system from the noisy observations is of current interest due to its direct relevance to many applications in Industry 4.0. Making a prediction of such noisy time series constitutes a problem of learning the nonlinear time evolution of a probability distribution. Capability of most of the conventional time series models is limited when the underlying dynamics is nonlinear, multi-scale or when there is no prior knowledge at all on the system dynamics. We propose DE-RNN (Density Estimation Recurrent Neural Network) to learn the probability density function (PDF) of a stochastic process with an underlying nonlinear dynamics and compute the time evolution of the PDF for a probabilistic forecast. A Recurrent Neural Network (RNN)-based model is employed to learn a nonlinear operator for the temporal evolution of the stochastic process. We use a softmax layer for a numerical discretization of a smooth PDF, which transforms a function approximation problem to a classification task. A regularized cross-entropy method is introduced to impose a smoothness condition on the estimated probability distribution. A Monte Carlo procedure to compute the temporal evolution of the distribution for a multiple-step forecast is presented. It is shown that the proposed algorithm can learn the nonlinear multi-scale dynamics from the noisy observations and provides an effective tool to forecast time evolution of the underlying probability distribution. Evaluation of the algorithm on three synthetic and two real data sets shows advantage over the compared baselines, and a potential value to a wide range of problems in physics and engineering.

【Keywords】: Recurrent Neural Network; Time Series; Uncertainty Quantification; Forecast; Dynamical System

77. Online Dictionary Learning with Confidence.

Paper Link】 【Pages】:707-716

【Authors】: Shan You ; Chang Xu ; Chao Xu

【Abstract】: Online dictionary learning has received intensive attention in signal processing field with streaming or dynamic data. Different from classical online dictionary learning methods that treat all atoms equally, in this paper, we present a novel online dictionary learning with a confidence parameter introduced on each of atoms. The confidence indicates the scale of the update of atoms during online learning; frequently estimated atoms are thus supposed to be updated less aggressively than those of low usage frequency. This updating mechanism is beneficial for learning with dirty examples. As a result, the outliers would be prevented from influencing much on the frequently used atoms that have been well estimated through the clean examples. In detail, we employ variance of the atoms to measure the confidence on the quality of the learned atoms. And frequently-used atoms are supposed to have smaller variances (i.e. more confidence) than those of low usage frequency. Our algorithm consists of two main parts, namely, confidence-weighted sparse coding and dictionary update with confidence fine-tuning, each of which can be solved efficiently by our designed optimization methods. In addition, due to the introduced confidence, our algorithm does not have to depend on the widely-used ℓ_1 norm in order to realize the robustness against the outliers, which is usually computationally expensive in the training cost. Experimental results on synthetic and benchmark datasets demonstrate the imposed confidence on atoms in the dictionary can improve the performance of the learned dictionary. The quality of the obtained dictionary is comparable to that of state-of-the-art robust online dictionary learning methods, however, our method is much faster than those ℓ_1 norm based ones.

【Keywords】: dictionary learning; online learning; outliers; confidence

78. MuVAN: A Multi-view Attention Network for Multivariate Temporal Data.

Paper Link】 【Pages】:717-726

【Authors】: Ye Yuan ; Guangxu Xun ; Fenglong Ma ; Yaqing Wang ; Nan Du ; Kebin Jia ; Lu Su ; Aidong Zhang

【Abstract】: Recent advances in attention networks have gained enormous interest in time series data mining. Various attention mechanisms are proposed to soft-select relevant timestamps from temporal data by assigning learnable attention scores. However, many real-world tasks involve complex multivariate time series that continuously measure target from multiple views. Different views may provide information of different levels of quality varied over time, and thus should be assigned with different attention scores as well. Unfortunately, the existing attention-based architectures cannot be directly used to jointly learn the attention scores in both time and view domains, due to the data structure complexity. Towards this end, we propose a novel multi-view attention network, namely MuVAN, to learn fine-grained attentional representations from multivariate temporal data. MuVAN is a unified deep learning model that can jointly calculate the two-dimensional attention scores to estimate the quality of information contributed by each view within different timestamps. By constructing a hybrid focus procedure, we are able to bring more diversity to attention, in order to fully utilize the multi-view information. To evaluate the performance of our model, we carry out experiments on three real-world benchmark datasets. Experimental results show that the proposed MuVAN model outperforms the state-of-the-art deep representation approaches in different real-world tasks. Analytical results through a case study demonstrate that MuVAN can discover discriminative and meaningful attention scores across views over time, which improves the feature representation of multivariate temporal data.

【Keywords】: attention mechanism, representation learning, multivariate temporal data, deep learning

79. Adversarially Learned Anomaly Detection.

Paper Link】 【Pages】:727-736

【Authors】: Houssam Zenati ; Manon Romain ; Chuan-Sheng Foo ; Bruno Lecouat ; Vijay Chandrasekhar

【Abstract】: Anomaly detection is a significant and hence well-studied problem. However, developing effective anomaly detection methods for complex and high-dimensional data remains a challenge. As Generative Adversarial Networks (GANs) are able to model the complex high-dimensional distributions of real-world data, they offer a promising approach to address this challenge. In this work, we propose an anomaly detection method, Adversarially Learned Anomaly Detection (ALAD) based on bi-directional GANs, that derives adversarially learned features for the anomaly detection task. ALAD then uses reconstruction errors based on these adversarially learned features to determine if a data sample is anomalous. ALAD builds on recent advances to ensure data-space and latent-space cycle-consistencies and stabilize GAN training, which results in significantly improved anomaly detection performance. ALAD achieves state-of-the-art performance on a range of image and tabular datasets while being several hundred-fold faster at test time than the only published GAN-based method.

【Keywords】: Anomaly Detection; Unsupervised Learning; Deep Learning; Generative Adversarial Networks; Novelty Detection

80. SINE: Scalable Incomplete Network Embedding.

Paper Link】 【Pages】:737-746

【Authors】: Daokun Zhang ; Jie Yin ; Xingquan Zhu ; Chengqi Zhang

【Abstract】: Attributed network embedding aims to learn low-dimensional vector representations for nodes in a network, where each node contains rich attributes/features describing node content. Because network topology structure and node attributes often exhibit high correlation, incorporating node attribute proximity into network embedding is beneficial for learning good vector representations. In reality, large-scale networks often have incomplete/missing node content or linkages, yet existing attributed network embedding algorithms all operate under the assumption that networks are complete. Thus, their performance is vulnerable to missing data and suffers from poor scalability. In this paper, we propose a Scalable Incomplete Network Embedding (SINE) algorithm for learning node representations from incomplete graphs. SINE formulates a probabilistic learning framework that separately models pairs of node-context and node-attribute relationships. Different from existing attributed network embedding algorithms, SINE provides greater flexibility to make the best of useful information and mitigate negative effects of missing information on representation learning. A stochastic gradient descent based online algorithm is derived to learn node representations, allowing SINE to scale up to large-scale networks with high learning efficiency. We evaluate the effectiveness and efficiency of SINE through extensive experiments on real-world networks. Experimental results confirm that SINE outperforms state-of-the-art baselines in various tasks, including node classification, node clustering, and link prediction, under settings with missing links and node attributes. SINE is also shown to be scalable and efficient on large-scale networks with millions of nodes/edges and high-dimensional node features. The source code of this paper is available at https://github.com/daokunzhang/SINE.

【Keywords】: network embeddding; incomplete network; large scale

81. Image-Enhanced Multi-level Sentence Representation Net for Natural Language Inference.

Paper Link】 【Pages】:747-756

【Authors】: Kun Zhang ; Guangyi Lv ; Le Wu ; Enhong Chen ; Qi Liu ; Han Wu ; Fangzhao Wu

【Abstract】: Natural Language Inference (NLI) task requires an agent to determine the semantic relation between a premise sentence (p) and a hypothesis sentence (h), which demands sufficient understanding about sentences from lexical knowledge to global semantic. Due to the issues such as polysemy, ambiguity, as well as fuzziness of sentences, fully understanding sentences is still challenging. To this end, we propose an Image-Enhanced Multi-Level Sentence Representation Net (IEMLRN), a novel architecture that is able to utilize the image to enhance the sentence semantic understanding at different scales. To be specific, we introduce the corresponding image of sentences as reference information, which can be helpful for sentence semantic understanding and inference relation evaluation. Since image information might be related to the sentence semantics at different scales, we design a multi-level architecture to understand sentences from different granularity and generate the sentence representation more precisely. Experimental results on the large-scale NLI corpus and real-world NLI alike corpus demonstrate that IEMLRN can simultaneously improve the performance. It is noteworthy that IEMLRN significantly outperforms the state-of-the-art sentence-encoding based models on the challenging hard subset and challenging lexical subset of SNLI corpus.

【Keywords】: Natural Language Inference; Sentence Semantic; Image-Enhanced Representation; Multiple Level

82. CADEN: A Context-Aware Deep Embedding Network for Financial Opinions Mining.

Paper Link】 【Pages】:757-766

【Authors】: Liang Zhang ; Keli Xiao ; Hengshu Zhu ; Chuanren Liu ; Jingyuan Yang ; Bo Jin

【Abstract】: Following the recent advances of artificial intelligence, financial text mining has gained new potential to benefit theoretical research with practice impacts. An essential research question for financial text mining is how to accurately identify the actual financial opinions (e.g., bullish or bearish) behind words in plain text. Traditional methods mainly consider this task as a text classification problem with solutions based on machine learning algorithms. However, most of them rely heavily on the hand-crafted features extracted from the text. Indeed, a critical issue along this line is that the latent global and local contexts of the financial opinions usually cannot be fully captured. To this end, we propose a context-aware deep embedding network for financial text mining, named CADEN, by jointly encoding the global and local contextual information. Especially, we capture and include an attitude-aware user embedding to enhance the performance of our model. We validate our method with extensive experiments based on a real-world dataset and several state-of-the-art baselines for investor sentiment recognition. Our results show a consistently superior performance of our approach for identifying the financial opinions from texts of different formats.

【Keywords】: financial opinions; text mining; embedding; neural networks; deep learning

83. Integrative Analysis of Patient Health Records and Neuroimages via Memory-Based Graph Convolutional Network.

Paper Link】 【Pages】:767-776

【Authors】: Xi Zhang ; Jingyuan Chou ; Fei Wang

【Abstract】: With the arrival of the big data era, more and more data are becoming readily available in various real-world applications and those data are usually highly heterogeneous. Taking computational medicine as an example, we have both Electronic Health Records (EHR) and medical images for each patient. For complicated diseases such as Parkinson's and Alzheimer's, both EHR and neuroimaging information are very important for disease understanding because they contain complementary aspects of the disease. However, EHR and neuroimage are completely different. So far the existing research has been mainly focusing on one of them. In this paper, we proposed a framework, Memory-Based Graph Convolution Network (MemGCN), to perform integrative analysis with such multi-modal data. Specifically, GCN is used to extract useful information from the patients' neuroimages. The information contained in the patient EHRs before the acquisition of each brain image is captured by a memory network because of its sequential nature. The information contained in each brain image is combined with the information read out from the memory network to infer the disease state at the image acquisition timestamp. To further enhance the analytical power of MemGCN, we also designed a multi-hop strategy that allows multiple reading and updating on the memory can be performed at each iteration. We conduct experiments using the patient data from the Parkinson's Progression Markers Initiative (PPMI) with the task of classification of Parkinson's Disease (PD) cases versus controls. We demonstrate that superior classification performance can be achieved with our proposed framework, compared with existing approaches involving a single type of data.

【Keywords】: graph convolutional network; memory network; neuroimages; EHR

84. Chinese Medical Concept Normalization by Using Text and Comorbidity Network Embedding.

Paper Link】 【Pages】:777-786

【Authors】: Yizhou Zhang ; Xiaojun Ma ; Guojie Song

【Abstract】: Chinese medical concept normalization, which maps non-standard medical concepts to standard expressions, is a NLP task with wide-ranging applications in medical big data research and clinical statistic. Many previous works apply supervised methods which require a lot of annotated data. However, they can not address the challenge brought by the high cost of medical data annotation, which requires sufficient professional knowledge and experience. Meanwhile, existing unsupervised methods perform poorly facing the various non-standard expression from different data sources. In this paper, we propose DUNE, Disease Unsupervised Normalization by Embedding, an unsupervised Chinese medical concept normalization framework by applying denoising auto-encoder (DAE) and network embedding. We formulate this task as finding mention-entity pairs with great text and comorbidity similarity. To handle the noise in text, we design a multi-view attention based denoising auto-encoder (MADAE) to capture text information from multiple views, reduce the influence of noise, and transform text to denoised vectors. To introduce comorbidity information, we construct a comorbidity network with both standard and non-standard disease names as nodes from medical records. Because of the diversity of nonstandard expressions, one disease perhaps corresponds to several different nodes, which causes noise in comorbidity network. To handle such network structure noise, we propose a denoising network embedding framework, which reduce the structure noise with the help of text information, to embed the nodes to vectors for comorbidity similarity measurement. Convincing experiment results show that our method performs better than existing unsupervised baselines and approaches the performance of classical supervised machine learning model on this task.

【Keywords】: medical concept normalization, comorbidity network embedding, denoising auto-encoder

85. Billion-Scale Network Embedding with Iterative Random Projection.

Paper Link】 【Pages】:787-796

【Authors】: Ziwei Zhang ; Peng Cui ; Haoyang Li ; Xiao Wang ; Wenwu Zhu

【Abstract】: Network embedding, which learns low-dimensional vector representation for nodes in the network, has attracted considerable research attention recently. However, the existing methods are incapable of handling billion-scale networks, because they are computationally expensive and, at the same time, difficult to be accelerated by distributed computing schemes. To address these problems, we propose RandNE (Iterative Random Projection Network Embedding), a novel and simple billion-scale network embedding method. Specifically, we propose a Gaussian random projection approach to map the network into a low-dimensional embedding space while preserving the high-order proximities between nodes. To reduce the time complexity, we design an iterative projection procedure to avoid the explicit calculation of the high-order proximities. Theoretical analysis shows that our method is extremely efficient, and friendly to distributed computing schemes without any communication cost in the calculation. We also design a dynamic updating procedure which can efficiently incorporate the dynamic changes of the networks without error aggregation. Extensive experimental results demonstrate the efficiency and efficacy of RandNE over state-of-the-art methods in several tasks including network reconstruction, link prediction and node classification on multiple datasets with different scales, ranging from thousands to billions of nodes and edges.

【Keywords】: Network Embedding; High-order Proximity; Billion-Scale; Dynamic Networks; Distributed Computing

86. Zero-Shot Learning: An Energy Based Approach.

Paper Link】 【Pages】:797-806

【Authors】: Tianxiang Zhao ; Guiquan Liu ; Le wu ; Chao Ma ; Enhong Chen

【Abstract】: Zero-shot learning deals with the problem when the training domain and the test domain have different class sets of image instances. To tackle the problem of some classes in the test data never appeared in the training set, a most popular approach is to map both images and classes in a common space under the embedding based framework. Nevertheless, most embedding based models suffered from the semantic loss problem. Furthermore, the expressive power is limited by representing classes and images as mere points. To tackle these problems, in this paper, we propose an Energy-Based Zero-shot Learning model (EBZL) to encode the association between class attributes and input images for zero-shot learning. EBZL is composed of two parts. The first part is a variational autoencoder that reduces the input dimension of images with representative hidden representations. By feeding the hidden representations as the input of the second part, the second part works as the energy function part based on the deep Boltzmann machine. Specifically, we adapt tradition deep Boltzmann machine to a supervised setting without changing its property as an undirected probabilistic graphic model, which helps to preserve semantic integrity and circumvents semantic loss problem. We further utilize variational inference techniques and mean-field approximation to reduce time complexity in model training process. Finally, extensive experimental results on several real-world datasets clearly show the effectiveness of our proposed method.

【Keywords】: zero-shot learning; deep Boltzmann machine

87. Deep Learning Based Scalable Inference of Uncertain Opinions.

Paper Link】 【Pages】:807-816

【Authors】: Xujiang Zhao ; Feng Chen ; Jin-Hee Cho

【Abstract】: Subjective Logic (SL) is one of well-known belief models that can explicitly deal with uncertain opinions and infer unknown opinions based on a rich set of operators of fusing multiple opinions. Due to high simplicity and applicability, SL has been popularly applied in a variety of decision making in the area of cybersecurity, opinion models, and/or trust / social network analysis. However, SL has been facing an issue of scalability to deal with a large-scale network data. In addition, SL has shown a bounded prediction accuracy due to its inherent parametric nature by treating heterogeneous data and network structure homogeneously based on the assumption of a Bayesian network. In this work, we take one step further to deal with uncertain opinions for unknown opinion inference. We propose a deep learning (DL)-based opinion inference model while node-level opinions are still formalized based on SL. The proposed DL-based opinion inference model handles node-level opinions explicitly in a large-scale network using graph convoluational network (GCN) and variational autoencoder (VAE) techniques. We adopted the GCN and VAE due to their powerful learning capabilities in dealing with a large-scale network data without parametric fusion operators and/or Bayesian network assumption. This work is the first that leverages the merits of both DL (i.e., GCN and VAE) and a belief model (i.e., SL) where each node level opinion is modeled by the formalism of SL while GCN and VAE are used to achieve non-parametric learning with low complexity. By mapping the node-level opinions modeled by the GCN to their equivalent Beta PDFs (probability density functions), we develop a network-driven VAE to maximize prediction accuracy of unknown opinions while significantly reducing algorithmic complexity. We validate our proposed DL-based algorithm using real-world datasets via extensive simulation experiments for comparative performance analysis.

【Keywords】: Uncertainty; Subjective Logic; Graphical Convolutional Network (GCN); Variational Autoencoder (VAE); Network Mining

88. Dynamic Truth Discovery on Numerical Data.

Paper Link】 【Pages】:817-826

【Authors】: Shi Zhi ; Fan Yang ; Zheyi Zhu ; Qi Li ; Zhaoran Wang ; Jiawei Han

【Abstract】: Truth discovery aims at obtaining the most credible information from multiple sources that provide noisy and conflicting values. Due to the ubiquitous existence of data conflict in practice, truth discovery has been attracting a lot of research attention recently. Unfortunately, existing truth discovery models all miss an important issue of truth discovery - the truth evolution problem. In many real-life scenarios, the latent true value often keeps changing dynamically over time instead of staying static. We study the dynamic truth discovery problem in the space of numerical truth discovery. This problem cannot be addressed by existing models because of the new challenges of capturing time-evolving source dependency in a continuous space as well as handling missing data on the fly. We propose a model named EvolvT for dynamic truth discovery on numerical data. With the hidden Markov framework, EvolvT captures three key aspects of dynamic truth discovery with a unified model: truth transition regularity, source quality, and source dependency. The most distinguishable feature of the modeling part of EvolvT is that it employs Kalman filtering to model truth evolution. As such, EvolvT not only can principally infer source dependency in a continuous space, but also can handle missing data in a natural way. We establish an expectation-maximization (EM) algorithm for parameter inference of EvolvT and present an efficient online version for the parameter inference procedure. Our experiments on real-world applications demonstrate its advantages over the state-of-the-art truth discovery approaches.

【Keywords】: Truth Discovery; Kalman Filtering; Streaming Data

89. Independent Feature and Label Components for Multi-label Classification.

Paper Link】 【Pages】:827-836

【Authors】: Yongjian Zhong ; Chang Xu ; Bo Du ; Lefei Zhang

【Abstract】: Investigating correlation between example features and example labels is essential to solve classification problems. However, identification and calculation of the correlation between features and labels can be rather difficult for high-dimensional multi-label data. Both feature embedding and label embedding have been developed to tackle this challenge, and a shared subspace for both labels and features are usually learned by existing embedding methods to simultaneously reduce dimensionality of features and labels. In contrast, this paper suggests to learn separated subspaces for features and labels by maximizing the independence between components in each subspace and maximizing the correlation between these two subspaces. The learned independent label components indicates fundamental combinations of labels in multi-label datasets, which thus helps to reveals the correlation between labels. On the other hand, the learned independent feature components lead to a compact representation of example features. The connections between the proposed algorithm and existing embedding methods have been discussed. Experimental results on real-world multi-label datasets demonstrate the necessity of exploring independence components from multi-label data and the effectiveness of the proposed algorithm.

【Keywords】: Multi-label; Embedding; Canonical Correlation

90. Matrix Profile XI: SCRIMP++: Time Series Motif Discovery at Interactive Speeds.

Paper Link】 【Pages】:837-846

【Authors】: Yan Zhu ; Chin-Chia Michael Yeh ; Zachary Zimmerman ; Kaveh Kamgar ; Eamonn J. Keogh

【Abstract】: Time series motif discovery is an important primitive for time series analytics, and is used in domains as diverse as neuroscience, music and sports analytics. In recent years, algorithmic advances (coupled with hardware improvements) have greatly expanded the purview of motif discovery. Nevertheless, we argue that there is an insatiable need for further scalability. This is because more than most types of analytics, motif discovery benefits from interactivity. The two state-of-the-art algorithms to find motifs are STOMP, which requires O(n2) time, and STAMP, which, despite being an O(logn) factor slower, is the preferred solution for most applications, as it is a fast converging anytime algorithm. In favorable scenarios STAMP needs only to be run to a small fraction of completion to provide a very accurate approximation of the top-k motifs. In this work we introduce SCRIMP++, an O(n2) time algorithm that is also an anytime algorithm, combining the best features of STOMP and STAMP. As we shall show, SCRIMP++ maintains all the desirable properties of the original algorithms, but converges much faster, in almost all scenarios producing the correct output after spending a tiny fraction of the full computation time. We argue that for many end-users, this allows motif discovery to be performed in interactive sessions. Moreover, this interactivity can be game changing in terms of the analytics that can be performed.

【Keywords】: Time Series; Anytime Algorithms; Motif Discovery

91. Fast Rectangle Counting on Massive Networks.

Paper Link】 【Pages】:847-856

【Authors】: Rong Zhu ; Zhaonian Zou ; Jianzhong Li

【Abstract】: Rectangle has been recognized as an essential motif in a large number of real-world networks. Counting rectangles in a network plays an important role in network analysis. This paper comprehensively studies the rectangle counting problem on large networks. We propose a novel counting paradigm called the wedge-centric counting, where a wedge is a simple path consisting of three vertices. Unlike the traditional edge-centric counting, the wedge-centric counting uses wedges instead of edges as building blocks of rectangles. The main advantage of the wedge-centric counting is that it does not need to access two-hop neighbors. Based on this paradigm, we develop a collection of rectangle counting algorithms, including an in-memory algorithm with lower time complexity, an external-memory algorithm with the optimal I/O complexity, and two randomized algorithms with provable error bounds. The experimental results on a variety of real networks verify the effectiveness and the efficiency of the proposed wedge-centric rectangle counting algorithms.

【Keywords】: rectangle counting; wedge centric; in memory algorithm; external memory algorithm; approximate algorithm

92. NetGist: Learning to Generate Task-Based Network Summaries.

Paper Link】 【Pages】:857-862

【Authors】: Sorour E. Amiri ; Bijaya Adhikari ; Aditya Bharadwaj ; B. Aditya Prakash

【Abstract】: Given a network, can we visualize it for any given task, highlighting the important characteristics? Networks are widespread, and hence summarizing and visualizing them is of primary interest for many applications such as viral marketing, extracting communities and immunization. Summaries can help in solving new problems in visualization, sense-making, and in many other goals. However, most prior work focuses on generic structural summarization techniques or on developing specific algorithms for specific tasks. This is both tedious and challenging. As a result, for several popular tasks, there do not exist readymade summarization methods. In this paper, we explore a promising alternative approach instead. We propose NetGist, a framework which automatically learns how to generate a summary for a given task on a given network. In addition to generating the required summary, this also allows us to reuse the learned process on other similar networks. We formulate a novel task-based graph summarization problem and leverage reinforcement learning to design a flexible framework for our solution. Via extensive experiments, we show that NetGist robustly and effectively learns meaningful summaries, and helps solve challenging problems, and aids in complex task-based sense-making of networks.

【Keywords】: Graph Summarization; Task Based; Reinforcement Learning

93. Maximizing the Diversity of Exposure in a Social Network.

Paper Link】 【Pages】:863-868

【Authors】: Çigdem Aslay ; Antonis Matakos ; Esther Galbrun ; Aristides Gionis

【Abstract】: Social-media platforms have created new ways for citizens to stay informed and participate in public debates. However, to enable a healthy environment for information sharing, social deliberation, and opinion formation, citizens need to be exposed to sufficiently diverse viewpoints that challenge their assumptions, instead of being trapped inside filter bubbles. In this paper, we take a step in this direction and propose a novel approach to maximize the diversity of exposure in a social network. We formulate the problem in the context of information propagation, as a task of recommending a small number of news articles to selected users. We propose a realistic setting where we take into account content and user leanings, and the probability of further sharing an article. This setting allows us to capture the balance between maximizing the spread of information and ensuring the exposure of users to diverse viewpoints. The resulting problem can be cast as maximizing a monotone and submodular function subject to a matroid constraint on the allocation of articles to users. It is a challenging generalization of the influence maximization problem. Yet, we are able to devise scalable approximation algorithms by introducing a novel extension to the notion of random reverse-reachable sets. We experimentally demonstrate the efficiency and scalability of our algorithm on several real-world datasets.

【Keywords】: diversity maximization; filter bubbles; social influence; sampling

94. Semi-Supervised Community Detection Using Structure and Size.

Paper Link】 【Pages】:869-874

【Authors】: Arjun Bakshi ; Srinivasan Parthasarathy ; Kannan Srinivasan

【Abstract】: In recent years there have been a few semi-supervised community detection approaches that use community membership information, or node metadata to improve their performance. However, communities have always been thought of as clique-like structures, while the idea of finding and leveraging other patterns in communities is relatively unexplored. Online social networks provide a corpus of real communities in large graphs which can be used to understand dataset specific community patterns. In this paper, we design a way to represent communities concisely in an easy to compute feature space. We design an efficient community detection algorithm that uses size and structural information of communities from a training set to find communities in the rest of the graph. We show that our approach achieves 10% higher F1 scores on average compared to several other methods on large real-world graph datasets, even when the training set is small.

【Keywords】: community detection; semi-supervised

95. Heterogeneous Hyper-Network Embedding.

Paper Link】 【Pages】:875-880

【Authors】: Inci M. Baytas ; Cao Xiao ; Fei Wang ; Anil K. Jain ; Jiayu Zhou

【Abstract】: Heterogeneous hyper-networks is used to represent multi-modal and composite interactions between data points. In such networks, several different types of nodes form a hyperedge. Heterogeneous hyper-network embedding learns a distributed node representation under such complex interactions while preserving the network structure. However, this is a challenging task due to the multiple modalities and composite interactions. In this study, a deep approach is proposed to embed heterogeneous attributed hyper-networks with complicated and non-linear node relationships. In particular, a fully-connected and graph convolutional layers are designed to project different types of nodes into a common low-dimensional space, a tuple-wise similarity function is proposed to preserve the network structure, and a ranking based loss function is used to improve the similarity scores of hyperedges in the embedding space. The proposed approach is evaluated on synthetic and real world datasets and a better performance is obtained compared with baselines.

【Keywords】: heterogeneous hypergraph; network embedding; similarity learning

96. Accurate Causal Inference on Discrete Data.

Paper Link】 【Pages】:881-886

【Authors】: Kailash Budhathoki ; Jilles Vreeken

【Abstract】: Additive Noise Models (ANMs) provide a theoretically sound approach to inferring the most likely causal direction between pairs of random variables given only a sample from their joint distribution. The key assumption is that the effect is a function of the cause, with additive noise that is independent of the cause. In many cases ANMs are identifiable. Their performance, however, hinges on the chosen dependence measure, the assumption we make on the true distribution. In this paper we propose to use Shannon entropy to measure the dependence within an ANM, which gives us a general approach by which we do not have to assume a true distribution, nor have to perform explicit significance tests during optimization. The information-theoretic formulation gives us a general, efficient, identifiable, and, as the experiments show, highly accurate method for causal inference on pairs of discrete variables—achieving (near) 100% accuracy on both synthetic and real data.

【Keywords】: Causal Inference; Information Theory; Additive Noise Models; Discrete Data

97. A Variable-Order Regime Switching Model to Identify Significant Patterns in Financial Markets.

Paper Link】 【Pages】:887-892

【Authors】: Philippe Chatigny ; Rongbo Chen ; Jean-Marc Patenaude ; Shengrui Wang

【Abstract】: The identification and prediction of complex behaviors in time series are fundamental problems of interest in the field of financial data analysis. Autoregressive (AR) model and Regime switching (RS) models have been used successfully to study the behaviors of financial time series. However, conventional RS models evaluate regimes by using a fixed-order Markov chain and underlying patterns in the data are not considered in their design. In this paper, we propose a novel RS model to identify and predict regimes based on a weighted conditional probability distribution (WCPD) framework capable of discovering and exploiting the significant underlying patterns in time series. Experimental results on stock market data, with 200 stocks, suggest that the structures underlying the financial market behaviors exhibit different dynamics and can be leveraged to better define regimes with superior prediction capabilities than traditional models.

【Keywords】: Regime Switching model; pattern detection; Financial time series; WCPD

98. Exploiting Spatio-Temporal Correlations with Multiple 3D Convolutional Neural Networks for Citywide Vehicle Flow Prediction.

Paper Link】 【Pages】:893-898

【Authors】: Cen Chen ; Kenli Li ; Sin G. Teo ; Guizi Chen ; Xiaofeng Zou ; Xulei Yang ; Ramaseshan C. Vijay ; Jiashi Feng ; Zeng Zeng

【Abstract】: Predicting vehicle flows is of great importance to traffic management and public safety in smart cities, and very challenging as it is affected by many complex factors, such as spatio-temporal dependencies with external factors (e.g., holidays, events and weather). Recently, deep learning has shown remarkable performance on traditional challenging tasks, such as image classification, due to its powerful feature learning capabilities. Some works have utilized LSTMs to connect the high-level layers of 2D convolutional neural networks (CNNs) to learn the spatio-temporal features, and have shown better performance as compared to many classical methods in traffic prediction. However, these works only build temporal connections on the high-level features at the top layer while leaving the spatio-temporal correlations in the low-level layers not fully exploited. In this paper, we propose to apply 3D CNNs to learn the spatio-temporal correlation features jointly from lowlevel to high-level layers for traffic data. We also design an end-to-end structure, named as MST3D, especially for vehicle flow prediction. MST3D can learn spatial and multiple temporal dependencies jointly by multiple 3D CNNs, combine the learned features with external factors and assign different weights to different branches dynamically. To the best of our knowledge, it is the first framework that utilizes 3D CNNs for traffic prediction. Experiments on two vehicle flow datasets Beijing and New York City have demonstrated that the proposed framework, MST3D, outperforms the state-of-the-art methods.

【Keywords】: 3D CNNs, spatio-temporal dependencies, traffic prediction, vehicle flow prediction

99. DrugCom: Synergistic Discovery of Drug Combinations Using Tensor Decomposition.

Paper Link】 【Pages】:899-904

【Authors】: Huiyuan Chen ; Jing Li

【Abstract】: Personalized treatments and targeted therapies are the most promising approaches to treat complex diseases, especially for cancer. However, drug resistance is often acquired after treatments. To overcome or reduce drug resistance, treatments using drug combinations have been actively investigated in the literature. Existing methods mainly focus on chemical properties of drugs for potential combination therapies without considering relationships among different diseases. Also, they often do not consider the rich knowledge of drugs and diseases, which can enhance the prediction of drug combinations. This motivates us to develop a new computational method that can predict the beneficial drug combinations. We propose DrugCom, a tensor-based framework for computing drug combinations across different diseases by integrating multiple heterogeneous data sources of drugs and diseases. DrugCom first constructs a primary third-order tensor (i.e., drug×drug ×disease) and several similarity matrices from multiple data sources regarding drugs (e.g., chemical structure) and diseases (e.g., disease phenotype). DrugCom then formulates an objective function, which simultaneously factorizes coupled tensor and matrices to reveal the molecular mechanisms of drug synergy. We adopt the alternating direction method of multipliers algorithm to effectively solve the optimization problem. Extensive experimental studies using real-world datasets demonstrate superior performance of DrugCom.

【Keywords】: Drug Combinations, Tensor Decomposition, Disease, Drug Discovery, Multi-view Learning

100. Cost Effective Multi-label Active Learning via Querying Subexamples.

Paper Link】 【Pages】:905-910

【Authors】: Xia Chen ; Guoxian Yu ; Carlotta Domeniconi ; Jun Wang ; Zhao Li ; Zili Zhang

【Abstract】: Multi-label active learning addresses the scarce labeled example problem by querying the most valuable unlabeled examples, or example-label pairs, to achieve a better performance with limited query cost. Current multi-label active learning methods require the scrutiny of the whole example in order to obtain its annotation. In contrast, one can find positive evidence with respect to a label by examining specific patterns (i.e., subexample), rather than the whole example, thus making the annotation process more efficient. Based on this observation, we propose a novel two-stage cost effective multi-label active learning framework, called CMAL. In the first stage, a novel example-label pair selection strategy is introduced. Our strategy leverages label correlation and label space sparsity of multi-label examples to select the most uncertain example-label pairs. Specifically, the unknown relevant label of an example can be inferred from the correlated labels that are already assigned to the example, thus reducing the uncertainty of the unknown label. In addition, the larger the number of relevant examples of a particular label, the smaller the uncertainty of the label is. In the second stage, CMAL queries the most plausible positive subexample-label pairs of the selected example-label pairs. Comprehensive experiments on multi-label datasets collected from different domains demonstrate the effectiveness of our proposed approach on cost effective queries. We also show that leveraging label correlation and label sparsity contribute to saving costs.

【Keywords】: multi-label learning, active learning, cost effective, label correlation

101. Semi-Convex Hull Tree: Fast Nearest Neighbor Queries for Large Scale Data on GPUs.

Paper Link】 【Pages】:911-916

【Authors】: Yewang Chen ; Lida Zhou ; Nizar Bouguila ; Bineng Zhong ; Fei Wu ; Zhen Lei ; Jixiang Du ; Hailin Li

【Abstract】: A fast exact nearest neighbor search algorithm over large scale data is proposed based on semi-convex hull tree, where each node represents a semi-convex hull, which is made of a set of hyper planes. When performing the task of nearest neighbor queries, unnecessary distance computations can be greatly reduced by quadratic programming. GPUs are also used to accelerate the query process. Experiments conducted on both Intel(R) HD Graphics 4400 and Nvidia Geforce GTX1050 TI, as well as theoretical analysis show that the proposed algorithm yields significant improvements and outperforms current k-d tree based nearest neighbor query algorithms and others

【Keywords】: semi convex hull; k-d tree; kNN; buffer k-d tree

102. Dynamic Illness Severity Prediction via Multi-task RNNs for Intensive Care Unit.

Paper Link】 【Pages】:917-922

【Authors】: Weitong Chen ; Sen Wang ; Guodong Long ; Lina Yao ; Quan Z. Sheng ; Xue Li

【Abstract】: Most of the existing analytics on ICU data mainly focus on mortality risk prediction and phenotyping analysis. However, they have limitations in providing sufficient evidence for decision making in a dynamically changing clinical environment. In this paper, we propose a novel approach that simultaneously analyses different organ systems to predict the illness severity of patients in an ICU, which can intuitively reflect the condition of the patients in a timely fashion. Specifically, we develop a novel deep learning model, namely MTRNN-ATT, which is based on multi-task recurrent neural networks. The physiological features of each organ system in time-series representations are learned by a single long short-term memory unit as a specific task. To utilize the relationships between organ systems, we use a shared LSTM unit to exploit the correlations between different tasks for further performance improvement. Also, we apply an attention mechanism in our deep model to learn the selective features at each stage to achieve better prediction results. We conduct extensive experiments on a real-world clinical dataset (MIMIC-III) to compare our method with many state-of-the-art methods. The experiment results demonstrate that the proposed approach performs better on the prediction tasks of illness severity scores.

【Keywords】: Deep Learning; Multi-task learning; Clinical informatics; illness severity prediction

103. Discovering Topical Interactions in Text-Based Cascades Using Hidden Markov Hawkes Processes.

Paper Link】 【Pages】:923-928

【Authors】: Jayesh Choudhari ; Anirban Dasgupta ; Indrajit Bhattacharya ; Srikanta Bedathur

【Abstract】: Social media conversations unfold based on complex interactions between users, topics and time. While recent models have been proposed to capture network strengths between users, users' topical preferences and temporal patterns between posting and response times, interaction patterns between topics has not been studied. We propose the Hidden Markov Hawkes Process (HMHP) that incorporates topical Markov Chains within Hawkes processes to jointly model topical interactions along with user-user and user-topic patterns. We propose a Gibbs sampling algorithm for HMHP that jointly infers the network strengths, diffusion paths, the topics of the posts as well as the topic-topic interactions. We show using experiments on real and semi-synthetic data that HMHP is able to generalize better and recover the network strengths, topics and diffusion paths more accurately than state-of-the-art baselines. More interestingly, HMHP finds insightful interactions between topics in real tweets which no existing model is able to do.

【Keywords】: Hawkes Process; Markov Chain; Topical Cascades; Gibbs Sampling; Hidden Markov Hawkes Process

104. Signed Graph Convolutional Networks.

Paper Link】 【Pages】:929-934

【Authors】: Tyler Derr ; Yao Ma ; Jiliang Tang

【Abstract】: Due to the fact much of today's data can be represented as graphs, there has been a demand for generalizing neural network models for graph data. One recent direction that has shown fruitful results, and therefore growing interest, is the usage of graph convolutional neural networks (GCNs). They have been shown to provide a significant improvement on a wide range of tasks in network analysis, one of which being node representation learning. The task of learning low-dimensional node representations has shown to increase performance on a plethora of other tasks from link prediction and node classification, to community detection and visualization. Simultaneously, signed networks (or graphs having both positive and negative links) have become ubiquitous with the growing popularity of social media. However, since previous GCN models have primarily focused on unsigned networks (or graphs consisting of only positive links), it is unclear how they could be applied to signed networks due to the challenges presented by negative links. The primary challenges are based on negative links having not only a different semantic meaning as compared to positive links, but their principles are inherently different and they form complex relations with positive links. Therefore we propose a dedicated and principled effort that utilizes balance theory to correctly aggregate and propagate the information across layers of a signed GCN model. We perform empirical experiments comparing our proposed signed GCN against state-of-the-art baselines for learning node representations in signed networks. More specifically, our experiments are performed on four real-world datasets for the classical link sign prediction problem that is commonly used as the benchmark for signed network embeddings algorithms.

【Keywords】: Signed Networks; Graph Convolutional Networks; Network Embedding; Balance Theory

105. Outlier Detection in Urban Traffic Flow Distributions.

Paper Link】 【Pages】:935-940

【Authors】: Youcef Djenouri ; Arthur Zimek ; Marco Chiarandini

【Abstract】: Urban traffic data consists of observations like number and speed of cars or other vehicles at certain locations as measured by deployed sensors. These numbers can be interpreted as traffic flow which in turn relates to the capacity of streets and the demand of the traffic system. City planners are interested in studying the impact of various conditions on the traffic flow, leading to unusual patterns, i.e., outliers. Existing approaches to outlier detection in urban traffic data take into account only individual flow values (i.e., an individual observation). This can be interesting for real time detection of sudden changes. Here, we face a different scenario: The city planners want to learn from historical data, how special circumstances (e.g., events or festivals) relate to unusual patterns in the traffic flow, in order to support improved planing of both, events and the layout of the traffic system. Therefore, we propose to consider the sequence of traffic flow values observed within some time interval. Such flow sequences can be modeled as probability distributions of flows. We adapt an established outlier detection method, the local outlier factor (LOF), to handling flow distributions rather than individual observations. We apply the outlier detection online to extend the database with new flow distributions that are considered inliers. For the validation we consider a special case of our framework for comparison with state-of-the-art outlier detection on flows. In addition, a real case study on urban traffic flow data showcases that our method finds meaningful outliers in the traffic flow data.

【Keywords】: outlier detection; urban traffic data; flow distributions

106. The HyperKron Graph Model for Higher-Order Features.

Paper Link】 【Pages】:941-946

【Authors】: Nicole Eikmeier ; Arjun S. Ramani ; David F. Gleich

【Abstract】: In this manuscript we present the HyperKron Graph model: an extension of the Kronecker Model, but with a distribution over hyperedges. We prove that we can efficiently generate graphs from this model in time proportional to the number of edges times a small log-factor, and find that in practice the runtime is linear with respect to the number of edges. We illustrate a number of useful features of the HyperKron model including non-trivial clustering and highly skewed degree distributions. Finally, we fit the HyperKron model to real-world networks, and demonstrate the model's flexibility with a complex application of the HyperKron model to networks with coherent feed-forward loops.

【Keywords】: kronecker graph, graph model, hyperedges

107. Using Balancing Terms to Avoid Discrimination in Classification.

Paper Link】 【Pages】:947-952

【Authors】: Simon Enni ; Ira Assent

【Abstract】: From personalized ad delivery and healthcare to criminal sentencing, more decisions are made with help from methods developed in the fields of data mining and machine learning than ever before. However, their widespread use has raised concerns about the discriminatory impact which the methods may have on people subject to these decisions. Recently, imbalance in the misclassification rates between groups has been identified as a source of discrimination. Such discrimination is not handled by most existing work in discrimination-aware data mining, and it can persist even if other types of discrimination are alleviated. In this article, we present the Balancing Terms (BT) method to address this problem. BT balances the error rates of any classifier with a differentiable prediction function, and unlike existing work, it can incorporate a preference for the trade-off between fairness and accuracy. We empirically evaluate BT on real-world data, demonstrating that our method produces tradeoffs between error rate balance and total classification error that are superior and in only few cases comparable to the state-of-the-art.

【Keywords】: discrimination; classification; machine learning

108. SedanSpot: Detecting Anomalies in Edge Streams.

Paper Link】 【Pages】:953-958

【Authors】: Dhivya Eswaran ; Christos Faloutsos

【Abstract】: Given a stream of edges from a time-evolving (un) weighted (un) directed graph, we consider the problem of detecting anomalous edges in near real-time using sublinear memory. We propose SedanSpot, a principled randomized algorithm, which exploits two tell-tale signs of anomalous edges: they tend to (i) occur as bursts of activity and (ii) connect parts of the graph which are sparsely connected. SedanSpot has the following desirable properties: (a) Burst Resistance: It provably downsamples edges from bursty periods of network traffic, (b) Holistic scoring: It takes into account the whole (sampled) graph while scoring the anomalousness of an edge, giving diminishing importance to far-away neighbors, (c) Efficiency: It supports fast updates and scoring and hence can be efficiently maintained over stream; further, it can detect anomalous edges in sublinear space and constant time per edge. Through experiments on real-world data, we demonstrate that SedanSpot is 3x faster and 270% more accurate (in terms of AUC) than the state-of-the-art.

【Keywords】: anomaly detection; edge stream; sampling; random walk

109. Heterogeneous Data Integration by Learning to Rerank Schema Matches.

Paper Link】 【Pages】:959-964

【Authors】: Avigdor Gal ; Haggai Roitman ; Roee Shraga

【Abstract】: Schema matching is a task at the heart of integrating heterogeneous structured and semi-structured data with applications in data warehousing, process matching, data analysis recommendations, Web table matching, etc. Schema matching is known to be an uncertain process and a common method of overcoming this uncertainty is by introducing a human expert with a ranked list of possible schema matches from which the expert may choose, known as top-K matching. In this work we propose a learning algorithm that utilizes an innovative set of features to rerank a list of schema matches and improves upon the ranking of the best match. The proposed algorithm assists the matching process by introducing a quality set of alternative matches to a human expert. It also serves as a step towards eliminating the involvement of human experts as decision makers in a matching process altogether. A large scale empirical evaluation with real-world benchmark shows the effectiveness of the proposed algorithmic solution.

【Keywords】: Schema Matching, Heterogeneous Data Integration, Uncertainty, Learning to Rerank

110. Matrix Profile XII: MPdist: A Novel Time Series Distance Measure to Allow Data Mining in More Challenging Scenarios.

Paper Link】 【Pages】:965-970

【Authors】: Shaghayegh Gharghabi ; Shima Imani ; Anthony J. Bagnall ; Amirali Darvishzadeh ; Eamonn J. Keogh

【Abstract】: At their core, many time series data mining algorithms can be reduced to reasoning about the shapes of time series subsequences. This requires a distance measure, and most algorithms use Euclidean Distance or Dynamic Time Warping (DTW) as their core subroutine. We argue that these distance measures are not as robust as the community believes. The undue faith in these measures derives from an overreliance on benchmark datasets and self-selection bias. The community is reluctant to address more difficult domains, for which current distance measures are ill-suited. In this work, we introduce a novel distance measure MPdist. We show that our proposed distance measure is much more robust than current distance measures. Furthermore, it allows us to successfully mine datasets that would defeat any Euclidean or DTW distance-based algorithm. Additionally, we show that our distance measure can be computed so efficiently, it allows analytics on fast streams.

【Keywords】: Time Series; Distance Measure; Matrix Profile

111. DAPPER: Scaling Dynamic Author Persona Topic Model to Billion Word Corpora.

Paper Link】 【Pages】:971-976

【Authors】: Robert Giaquinto ; Arindam Banerjee

【Abstract】: Extracting common narratives from multi-author dynamic text corpora requires complex models, such as the Dynamic Author Persona (DAP) topic model. However, such models are complex and can struggle to scale to large corpora, often because of challenging non-conjugate terms. To overcome such challenges, we adapt new ideas in approximate inference to the DAP model, resulting in the DAP Performed Exceedingly Rapidly (DAPPER) topic model. Specifically, we develop Conjugate-Computation Variational Inference (CVI) based variational Expectation-Maximization (EM) for learning the model, yielding fast, closed form updates for each document, replacing iterative optimization in earlier work. Our results show significant improvements in model fit and training time without needing to compromise the model's temporal structure or the application of Regularized Variation Inference (RVI). We demonstrate the scalability and effectiveness of the DAPPER model on multiple datasets, including the CaringBridge corpus — a collection of 9 million journals written by 200,000 authors during health crises.

【Keywords】: topic modeling, graphical model, regularized variational inference, healthcare, text mining, approximate inference, non-conjugate models

112. Multi-level Hypothesis Testing for Populations of Heterogeneous Networks.

Paper Link】 【Pages】:977-982

【Authors】: Guilherme Gomes ; Vinayak Rao ; Jennifer Neville

【Abstract】: We consider hypothesis testing and anomaly detection on datasets where each observation is a weighted network. Current approaches to hypothesis testing for weighted networks typically require thresholding the edge-weights, to transform the data to binary networks. This results in a loss of information, and outcomes are sensitive to choice of threshold levels. Our work avoids this, and we consider weighted-graph observations in two situations, 1) where each graph belongs to one of two populations, and 2) where entities belong to one of two populations, with each entity possessing multiple graphs (indexed e.g. by time). We propose a hierarchical Bayesian hypothesis testing framework that models each population with a mixture of latent space models for weighted networks, and then tests populations of networks for differences in distribution over components.Our framework is capable of population-level, entity-specific, as well as edge-specific hypothesis testing. We apply it to synthetic data and two real-world datasets: a social media dataset involving word co-occurrences from discussions on Twitter of the political unrest in Brazil, and a medical dataset involving fMRI brain-scans of human subjects. The results show that our proposed method has lower Type-I error and higher statistical power compared to previous alternatives that need to threshold the edge weights.

【Keywords】: network analysis; hypothesis testing; populations of networks; weighted graphs; bayesian modeling; latent space model

113. Multi-view Feature Selection for Heterogeneous Face Recognition.

Paper Link】 【Pages】:983-988

【Authors】: Jie Gui ; Ping Li

【Abstract】: While the task of feature selection has been studied for many years, the topic of multi-view feature selection for heterogeneous face recognition (HFR) such as visible (VIS) image versus near infrared (NIR) image recognition, photo versus sketch recognition, and face recognition across pose, is rarely studied. In this paper, we propose a multi-view feature selection method (MvFS) for HFR. To the best of our knowledge, MvFS is the first algorithm to address the problem of multiview feature selection for HFR, in which the dimensionalities of different views are the same and the number of selected features of different views are the same. The proposed algorithm is simple and computationally efficient. Our experiments confirm the effectiveness of MvFS.

【Keywords】: multi-view feature selection, multi-view discriminant analysis, heterogeneous face recognition

114. Bitcoin Volatility Forecasting with a Glimpse into Buy and Sell Orders.

Paper Link】 【Pages】:989-994

【Authors】: Tian Guo ; Albert Bifet ; Nino Antulov-Fantulin

【Abstract】: Bitcoin is one of the most prominent decentralized digital cryptocurrencies. Ability to understand which factors drive the fluctuations of the Bitcoin price and to what extent they are predictable is interesting both from the theoretical and practical perspective. In this paper, we study the problem of the Bitcoin short-term volatility forecasting based on volatility history and order book data. Order book, consisting of buy and sell orders over time, reflects the intention of the market and is closely related to the evolution of volatility. We propose temporal mixture models capable of adaptively exploiting both volatility history and order book features. By leveraging rolling and incremental learning and evaluation procedures, we demonstrate the prediction performance of our model as well as studying the robustness, in comparison to a variety of statistical and machine learning baselines. Meanwhile, our temporal mixture model enables to decipher the time-varying effect of order book features on volatility. It demonstrates the prospect of our temporal mixture model as an interpretable forecasting framework over heterogeneous Bitcoin data.

【Keywords】: cryptocurrency; mixture models; prediction modes; machine learning; deep learning; robustness

115. Differentially Private Prescriptive Analytics.

Paper Link】 【Pages】:995-1000

【Authors】: Haripriya Harikumar ; Santu Rana ; Sunil Gupta ; Thin Nguyen ; Ramachandra Kaimal ; Svetha Venkatesh

【Abstract】: Privacy preservation is important. Prescriptive analytics is a method to extract corrective actions to avoid undesirable outcomes. We propose a privacy preserving prescriptive analytics algorithm to protect the data used during the construction of the prescriptive analytics algorithm. We use differential privacy mechanism to achieve strong privacy guarantee. Differential privacy mechanism requires computation of sensitivity: maximum change in the output between two training datasets, which is differed by only one instance. The main challenge we addressed is the computation of sensitivity of the prescription vector. In absence of any analytical form, we construct a nested global optimization problem to compute the sensitivity. We solve the optimization problem using constrained Bayesian optimization, as the nested structure makes the objective function expensive. We demonstrate our algorithm on two real world datasets and observe that the prescription vectors remains useful even after making them private.

【Keywords】: Differential Privacy, Prescriptive analytics, Laplace distribution, Bayesian optimization, Constrained optimization

116. A General Cross-Domain Recommendation Framework via Bayesian Neural Network.

Paper Link】 【Pages】:1001-1006

【Authors】: Jia He ; Rui Liu ; Fuzhen Zhuang ; Fen Lin ; Cheng Niu ; Qing He

【Abstract】: Collaborative filtering is an effective and widely used recommendation approach by applying the user-item rating matrix for recommendations, however, which usually suffers from cold-start and sparsity problems. To address these problems, hybrid methods are proposed to incorporate auxiliary information such as user/item profiles to collaborative filtering models; Cross-domain recommendation systems add a new dimension to solve these problems by leveraging ratings from other domains to improve recommendation performance. Among these methods, deep neural network based recommendation systems achieve excellent performance due to their excellent ability in learning powerful representations. However, these cross-domain recommendation systems based on deep neural network rarely consider the uncertainty of weights. Therefore, they maybe lack of calibrated probabilistic predictions and make overly confident decisions. Along this line, we propose a general cross-domain recommendation framework via Bayesian neural network to incorporate auxiliary information, which takes advantage of both the hybrid recommendation methods and the cross-domain recommendation systems. Specifically, our framework consists of two kinds of neural networks, one to learn the low dimensional representation from the one-hot codings of users/items, while the other one is to project the auxiliary information of users/items into another latent space. The final rating is produced by integrating the latent representations of the one-hot codings of users/items and the auxiliary information of users/items. The latent representations of users learnt from ratings and auxiliary information are shared across different domains for knowledge transfer. Moreover, we capture the uncertainty in all weights by representing weights with Gaussian distributions to make calibrated probabilistic predictions. We have done extensive experiments on real-world data sets to verify the effectiveness of our framework.

【Keywords】: Cross-domain learning; Recommendation systems; Bayesian neural network

117. A Self-Organizing Tensor Architecture for Multi-view Clustering.

Paper Link】 【Pages】:1007-1012

【Authors】: Lifang He ; Chun-Ta Lu ; Yong Chen ; Jiawei Zhang ; Linlin Shen ; Philip S. Yu ; Fei Wang

【Abstract】: In many real-world applications, data are often unlabeled and comprised of different representations/views which often provide information complementary to each other. Although several multi-view clustering methods have been proposed, most of them routinely assume one weight for one view of features, and thus inter-view correlations are only considered at the view-level. These approaches, however, fail to explore the explicit correlations between features across multiple views. In this paper, we introduce a tensor-based approach to incorporate the higher-order interactions among multiple views as a tensor structure. Specifically, we propose a multi-linear multi-view clustering (MMC) method that can efficiently explore the full-order structural information among all views and reveal the underlying subspace structure embedded within the tensor. Extensive experiments on realworld datasets demonstrate that our proposed MMC algorithm clearly outperforms other related state-of-the-art methods.

【Keywords】: Multi-view clustering; tensor; tensor decomposition; regression

118. Estimating Latent Relative Labeling Importances for Multi-label Learning.

Paper Link】 【Pages】:1013-1018

【Authors】: Shuo He ; Lei Feng ; Li Li

【Abstract】: In multi-label learning, each instance is associated with multiple labels simultaneously. Most of the existing approaches directly treat each label in a crisp manner, i.e. one class label is either relevant or irrelevant to the instance. However, the latent relative importance of each relevant label is regrettably ignored. In this paper, we propose a novel multi-label learning approach that aims to estimate the latent labeling importances while training the inductive model simultaneously. Specifically, we present a biconvex formulation with both instance and label graph regularization, and solve this problem using an alternating way. On the one hand, the inductive model is trained by minimizing the least squares loss of fitting the latent relative labeling importances. On the other hand, the latent relative labeling importances are estimated by the modeling outputs via a specially constrained label propagation procedure. Through the mutual adaption of the inductive model training and the specially constrained label propagation, an effective multi-label learning model is therefore built by optimally estimating the latent relative labeling importances. Extensive experimental results clearly show the effectiveness of the proposed approach.

【Keywords】: multi-label learning; relevant labels; latent relative labeling inportances

119. Characteristic Subspace Learning for Time Series Classification.

Paper Link】 【Pages】:1019-1024

【Authors】: Yuanduo He ; Jialiang Pei ; Xu Chu ; Yasha Wang ; Zhu Jin ; Guangju Peng

【Abstract】: This paper presents a novel time series classification algorithm. It exploits time-delay embedding to transform time series into a set of points as a distribution, and attempt to classify time series by classifying corresponding distributions. It proposes a novel geometrical feature, i.e. characteristic subspace, from embedding points for classification, and leverages class-weighted support vector machine (SVM) to learn for it. An efficient boosting strategy is also developed to enable a linear time training. The experiments show great potentials of this novel algorithm on accuracy, efficiency and interpretability.

【Keywords】: Time series classification, time-delay embedding, characteristic subspace learning, support vector machine

120. Pseudo-Implicit Feedback for Alleviating Data Sparsity in Top-K Recommendation.

Paper Link】 【Pages】:1025-1030

【Authors】: Yun He ; Haochen Chen ; Ziwei Zhu ; James Caverlee

【Abstract】: We propose PsiRec, a novel user preference propagation recommender that incorporates pseudo-implicit feedback for enriching the original sparse implicit feedback dataset. Three of the unique characteristics of PsiRec are: (i) it views user-item interactions as a bipartite graph and models pseudo-implicit feedback from this perspective; (ii) its random walks-based approach extracts graph structure information from this bipartite graph, toward estimating pseudo-implicit feedback; and (iii) it adopts a Skip-gram inspired measure of confidence in pseudo-implicit feedback that captures the pointwise mutual information between users and items. This pseudo-implicit feedback is ultimately incorporated into a new latent factor model to estimate user preference in cases of extreme sparsity. PsiRec results in improvements of 21.5% and 22.7% in terms of Precision@10 and Recall@10 over state-of-the-art Collaborative Denoising Auto-Encoders. Our implementation is available at https://github.com/heyunh2015/PsiRecICDM2018.

【Keywords】: data sparsity, random walk, recommender systems, implicit feedback, collaborative filtering

121. Confident Kernel Sparse Coding and Dictionary Learning.

Paper Link】 【Pages】:1031-1036

【Authors】: Babak Hosseini ; Barbara Hammer

【Abstract】: In recent years, kernel-based sparse coding (K-SRC) has received a special attention due to its efficient representation of nonlinear data structures in the feature space. Nevertheless, the existing K-SRC methods suffer from the lack of consistency between their training and test optimization frameworks. In this work, we propose a novel confident K-SRC and dictionary learning algorithm (CKSC) which focuses on the discriminative reconstruction of the data based on its representation in the kernel space. CKSC focuses on reconstructing each data sample via weighted contributions which are confident in its corresponding class of data. We employ novel discriminative terms to apply this scheme to both training and test frameworks in our algorithm. This increases the consistency of these optimization frameworks and improves the discriminative performance in the recall phase. In addition, CKSC directly employs the supervised information in its dictionary learning framework to enhance the discriminative structure of the dictionary. For empirical evaluations, we implement our CKSC algorithm on multivariate time-series benchmarks such as DynTex++ and UTKinect. Our claims regarding the superior performance of the proposed algorithm are justified throughout comparing its classification results to the state-of-the-art K-SRC algorithms.

【Keywords】: Discriminative dictionary learning; Kernel sparse coding; Non-negative reconstruction

122. Highly Parallel Sequential Pattern Mining on a Heterogeneous Platform.

Paper Link】 【Pages】:1037-1042

【Authors】: Yu-Heng Hsieh ; Chun-Chieh Chen ; Hong-Han Shuai ; Ming-Syan Chen

【Abstract】: Sequential pattern mining can be applied to various fields such as disease prediction and stock analysis. Many algorithms have been proposed for sequential pattern mining, together with acceleration methods. In this paper, we show that a heterogeneous platform with CPU and GPU is more suitable for sequential pattern mining than traditional CPU-based approaches since the support counting process is inherently succinct and repetitive. Therefore, we propose the PArallel SequenTial pAttern mining algorithm, referred to as PASTA, to accelerate sequential pattern mining by combining the merits of CPU and GPU computing. Explicitly, PASTA adopts the vertical bitmap representation of database to exploits the GPU parallelism. In addition, a pipeline strategy is proposed to ensure that both CPU and GPU on the heterogeneous platform operate concurrently to fully utilize the computing power of the platform. Furthermore, we develop a swapping scheme to mitigate the limited memory problem of the GPU hardware without decreasing the performance. Finally, comprehensive experiments are conducted to analyze PASTA with different baselines. The experiments show that PASTA outperforms the state-of-the-art algorithms by orders of magnitude on both real and synthetic datasets.

【Keywords】: Data Mining; Frequent Sequential Pattern; GPGPU; Parallel Computing; Heterogeneous Platform

123. A Harmonic Motif Modularity Approach for Multi-layer Network Community Detection.

Paper Link】 【Pages】:1043-1048

【Authors】: Ling Huang ; Chang-Dong Wang ; Hong-Yang Chao

【Abstract】: During the past several years, multi-layer network community detection has drawn an increasing amount of attention and many approaches have been developed from different perspectives. Despite the success, they mainly rely on the lower-order connectivity structure at the level of individual nodes and edges. However, the higher-order connectivity structure plays the essential role as the building block for multiplex networks, which may contain better signature of community than edge. The main challenge in utilizing higher-order structure for multi-layer network community detection is that the most representative higher-order structure may vary from one layer to another. In this paper, we propose a higher-order structural approach for multi-layer network community detection, termed harmonic motif modularity (HM-Modularity). The key idea is to design a novel higher-order structure, termed harmonic motif, which is able to integrate higher-order structural information from multiple layers to construct a primary layer. The higher-order structural information of each individual layer is also extracted, which is taken as the auxiliary information for discovering the multi-layer community structure. A coupling is established between the primary layer and each auxiliary layer. Finally, a harmonic motif modularity is designed to generate the community structure. By solving the optimization problem of the harmonic motif modularity, the community labels of the primary layer can be obtained to reveal the community structure of the original multi-layer network. Experiments have been conducted to show the effectiveness of the proposed method.

【Keywords】: community detection; multi-layer network; higher-order structure; motif; modularity

124. Learning Semantic Features for Software Defect Prediction by Code Comments Embedding.

Paper Link】 【Pages】:1049-1054

【Authors】: Xuan Huo ; Yang Yang ; Ming Li ; De-Chuan Zhan

【Abstract】: Software Quality Assurance (SQA) is essential in software development and many defect prediction methods based on machine learning have been proposed to identify defective modules. However, most existing defect prediction models do not provide good defect prediction results, and the semantic features reflecting the detective patterns may not be well-captured via traditional feature extraction methods. More information such as code comments should be also be embedded to generate semantic features respecting the source code functionality. Therefore, how to embed code comments for defect prediction is a big challenge, and another problem is that many comments of source code are missing in real-world applications. In this paper, we propose a novel defect prediction model named CAP-CNN (Convolutional Neural Network for Comments Augmented Programs), which is a deep learning model that automatically embeds code comments in generating semantic features from the source code for software defect prediction. To overcome the missing comments problem, a novel training strategy is used in CAP-CNN that the network encodes and absorb comments information to generate semantic features automatically during training process, which does not need testing modules to contain comments. Experimental results on several widely-used software data sets indicate that the comment features are able to improve defect prediction performance.

【Keywords】: data mining, software mining, software defect prediction

125. DeepDiffuse: Predicting the 'Who' and 'When' in Cascades.

Paper Link】 【Pages】:1055-1060

【Authors】: Mohammad Raihanul Islam ; Sathappan Muthiah ; Bijaya Adhikari ; B. Aditya Prakash ; Naren Ramakrishnan

【Abstract】: Cascades are an accepted model to capturing how information diffuses across social network platforms. A large body of research has been focused on dissecting the anatomy of such cascades and forecasting their progression. One recurring theme involves predicting the next stage(s) of cascades utilizing pertinent information such as the underlying social network, structural properties of nodes (e.g., degree) and (partial) histories of cascade propagation. However, such type of granular information is rarely available in practice. We study in this paper the problem of cascade prediction utilizing only two types of (coarse) information, viz. which node is infected and its corresponding infection time. We first construct several simple baselines to solve this cascade prediction problem. Then we describe the shortcomings of these methods and propose a new solution leveraging recent progress in embeddings and attention models from representation learning. We also perform an exhaustive analysis of our methods on several real world datasets. Our proposed model outperforms the baselines and several other state-of-the-art methods.

【Keywords】: Social Networks; Cascade and Diffusion Prediction

126. Interpretable Word Embeddings for Medical Domain.

Paper Link】 【Pages】:1061-1066

【Authors】: Kishlay Jha ; Yaqing Wang ; Guangxu Xun ; Aidong Zhang

【Abstract】: Word embeddings are finding their increasing application in a variety of biomedical Natural Language Processing (bioNLP) tasks, ranging from drug discovery to automated disease diagnosis. While these word embeddings in their entirety have shown meaningful syntactic and semantic regularities, however, the meaning of individual dimensions remains elusive. This becomes problematic both in general and particularly in sensitive domains such as bio-medicine, wherein, the interpretability of results is crucial to its widespread adoption. To address this issue, in this study, we aim to improve the interpretability of pre-trained word embeddings generated from a text corpora, and in doing so provide a systematic approach to formalize the problem. More specifically, we exploit the rich categorical knowledge present in the biomedical domain, and propose to learn a transformation matrix that transforms the input embeddings to a new space where they are both interpretable and retain their original expressive features. Experiments conducted on the largest available biomedical corpus suggests that the model is capable of performing interpretability that resembles closely to the human-level intuition.

【Keywords】: word embeddings, interpretability, biomedicine

127. FI-GRL: Fast Inductive Graph Representation Learning via Projection-Cost Preservation.

Paper Link】 【Pages】:1067-1072

【Authors】: Fei Jiang ; Lei Zheng ; Jin Xu ; Philip S. Yu

【Abstract】: Graph representation learning aims at transforming graph data into meaningful low-dimensional vectors to facilitate the employment of machine learning and data mining algorithms designed for general data. Most current graph representation learning approaches are transductive, which means that they require all the nodes in the graph are known when learning graph representations and these approaches cannot naturally generalize to unseen nodes. In this paper, we present a Fast Inductive Graph Representation Learning framework (FI-GRL) to learn nodes' low-dimensional representations. Our approach can obtain accurate representations for seen nodes with provable theoretical guarantees and can easily generalize to unseen nodes. Empirically, when the amount of seen nodes are larger than that of unseen nodes, FI-GRL always achieves excellent results. Our algorithm is fast, simple to implement and theoretically guaranteed. Extensive experiments on real datasets demonstrate the superiority of our algorithm on both efficacy and efficiency over both macroscopic level (clustering) and microscopic level (structural hole detection) applications. The full version of this paper is available on arxiv.

【Keywords】: Graph Representation Learning, Inductive Learning, Graph Mining

128. Mixed Bagging: A Novel Ensemble Learning Framework for Supervised Classification Based on Instance Hardness.

Paper Link】 【Pages】:1073-1078

【Authors】: Ahmedul Kabir ; Carolina Ruiz ; Sergio A. Alvarez

【Abstract】: We introduce a novel ensemble learning framework for supervised classification. Our proposed framework, mixed bagging, is a form of bootstrap aggregating (bagging) in which the sampling process takes into account the classification hardness of the training instances. The classification hardness, or simply hardness, of an instance is defined as the probability that the instance will be misclassified by a classification model built from the remaining instances in the training set. We incorporate instance hardness into the bagging process by varying the sampling probability of each instance based on its estimated hardness. Bootstraps of differing hardness can be created in this way by over-representing, under-representing and equally representing harder instances. This results in a diverse committee of classifiers induced from the bootstraps, whose individual outputs can be aggregated to achieve a final class prediction. We propose two versions of mixed bagging – one where the bootstraps are grouped as easy, regular or hard, with all bootstraps in one group having the same hardness; and the other where the hardness of bootstraps change gradually from one iteration to the next. We have tested our system on 47 publicly available binary classification problems using C4.5 Decision Trees of varying depth as base learners. We find that the proposed mixed bagging methods perform better than traditional bagging and weighted bagging (wagging) regardless of the base learner. The proposed method also outperforms AdaBoost when the base learner consists of deeper decision trees. We examine the results of mixed bagging in terms of bias-variance decomposition and find that mixed bagging is better than AdaBoost at reducing variance and better than traditional bagging at reducing inductive bias.

【Keywords】: Ensemble learning; Instance hardness; Mixed Bagging

129. Clustering on Sparse Data in Non-overlapping Feature Space with Applications to Cancer Subtyping.

Paper Link】 【Pages】:1079-1084

【Authors】: Tianyu Kang ; Kourosh Zarringhalam ; Marieke L. Kuijjer ; Ping Chen ; John Quackenbush ; Wei Ding

【Abstract】: This paper presents a new algorithm, Reinforced and Informed Network-based Clustering(RINC), for finding unknown groups of similar data objects in sparse and largely non-overlapping feature space where a network structure among features can be observed. Sparse and non-overlapping unlabeled data become increasingly common and available especially in text mining and biomedical data mining. RINC inserts a domain informed model into a modelless neural network. In particular, our approach integrates physically meaningful feature dependencies into the neural network architecture and soft computational constraint. Our learning algorithm efficiently clusters sparse data through integrated smoothing and sparse auto-encoder learning. The informed design requires fewer samples for training and at least part of the model becomes explainable. The architecture of the reinforced network layers smooths sparse data over the network dependency in the feature space. Most importantly, through back-propagation, the weights of the reinforced smoothing layers are simultaneously constrained by the remaining sparse auto-encoder layers that set the target values to be equal to the raw inputs. Empirical results demonstrate that RINC achieves improved accuracy and renders physically meaningful clustering results.

【Keywords】: Unsupervised Learning; Clustering; Artificial Neural Networks

130. Time-Discounting Convolution for Event Sequences with Ambiguous Timestamps.

Paper Link】 【Pages】:1085-1090

【Authors】: Takayuki Katsuki ; Takayuki Osogami ; Akira Koseki ; Masaki Ono ; Michiharu Kudo ; Masaki Makino ; Atsushi Suzuki

【Abstract】: This paper proposes a method for modeling event sequences with ambiguous timestamps, a time-discounting convolution. Unlike in ordinary time series, time intervals are not constant, small time-shifts have no significant effect, and inputting timestamps or time durations into a model is not effective. The criteria that we require for the modeling are providing robustness against time-shifts or timestamps uncertainty as well as maintaining the essential capabilities of time-series models, i.e., forgetting meaningless past information and handling infinite sequences. The proposed method handles them with a convolutional mechanism across time with specific parameterizations, which efficiently represents the event dependencies in a time-shift invariant manner while discounting the effect of past events, and a dynamic pooling mechanism, which provides robustness against the uncertainty in timestamps and enhances the time-discounting capability by dynamically changing the pooling window size. In our learning algorithm, the decaying and dynamic pooling mechanisms play critical roles in handling infinite and variable length sequences. Numerical experiments on real-world event sequences with ambiguous timestamps and ordinary time series demonstrated the advantages of our method.

【Keywords】: Electronic health records, Electronic healthcare, Health information management, Event sequence, Convolutional neural networks, Time series analysis

131. Volatility Drift Prediction for Transactional Data Streams.

Paper Link】 【Pages】:1091-1096

【Authors】: Yun Sing Koh ; David Tse Jung Huang ; Chris Pearce ; Gillian Dobbie

【Abstract】: The reasons for concept drift in a data stream can vary widely, from deterioration of a machine to a change in peoples' buying patterns. In order to effectively detect concept drifts, most predictive stream mining systems contain a drift detector that monitors and signals concept drifts. However, few of these systems are designed to find drifts in transactional datasets, which have unlabelled data. Transactional datasets describe events, such as orders or payments, which are traditionally analysed using association rules. In this paper, we propose a novel drift detection technique, ProChange, that has two parts. The first part is a drift detector, VR-Change, that finds both real and virtual drifts in unlabelled transactional data streams using the Hellinger distance. The second part is a drift predictor, which models the volatility of drifts using a probabilistic network to predict the location of future drifts. Using the predictor, we can dynamically adapt the confidence threshold, enabling VR-Change to be more sensitive around potential future drift points. We evaluated the performance of ProChange by comparing it against traditional detectors showing that it detects both real and virtual drifts effectively and efficiently in terms of accuracy.

【Keywords】: Virtual Drift Detection; Association Rule Mining; Data Streams; Concept Drift

132. Summarizing Graphs at Multiple Scales: New Trends.

Paper Link】 【Pages】:1097

【Authors】: Danai Koutra ; Jilles Vreeken ; Francesco Bonchi

【Abstract】: Recent advances in computing resources have made it possible to collect enormous amounts of interconnected data, such as social media interactions, web activity, knowledge bases, product and service purchases, autonomous vehicle routing, smart home sensor data, and more. The massive scale and complexity of this data, however, not only vastly surpasses human processing power, but also goes beyond limitations with regard to computation and storage. That is, there is an urgent need for methods and tools that summarize large interconnected data to enable faster computations, storage reduction, interactive large-scale visualization and understanding, and pattern discovery. Network summarization-which aims to find a small representation of an original, larger graph-features a variety of methods with different goals and for different input data representations (e.g., attributed graphs, time-evolving or streaming graphs, heterogeneous graphs). The objective of this tutorial is to give a systematic overview of methods for summarizing and explaining graphs at different scales: the node-group level, the network level, and the multi-network level. We emphasize the current challenges, present real-world applications, and highlight the open research problems in this vibrant research area.

【Keywords】: network summarization; node group level; network level; multi network level; large scale data; graph summaries

133. Fast Tucker Factorization for Large-Scale Tensor Completion.

Paper Link】 【Pages】:1098-1103

【Authors】: Dongha Lee ; Jaehyung Lee ; Hwanjo Yu

【Abstract】: Tensor completion is the task of completing multi-aspect data represented as a tensor by accurately predicting missing entries in the tensor. It is mainly solved by tensor factorization methods, and among them, Tucker factorization has attracted considerable interests due to its powerful ability to learn latent factors and even their interactions. Although several Tucker methods have been developed to reduce the memory and computational complexity, the state-of-the-art method still 1) generates redundant computations and 2) cannot factorize a large tensor that exceeds the size of memory. This paper proposes FTcom, a fast and scalable Tucker factorization method for tensor completion. FTcom performs element-wise updates for factor matrices based on coordinate descent, and adopts a novel caching algorithm which stores frequently-required intermediate data. It also uses a tensor file for disk-based data processing and loads only a small part of the tensor at a time into the memory. Experimental results show that FTcom is much faster and more scalable compared to all other competitors. It significantly shortens the training time of Tucker factorization, especially on real-world tensors, and it can be executed on a billion-scale tensor which is bigger than the memory capacity within a single machine.

【Keywords】: Tensor completion; Tucker factorization; Coordinate descent; Caching algorithm; Disk-based data processing

134. Diagnosis Prediction via Medical Context Attention Networks Using Deep Generative Modeling.

Paper Link】 【Pages】:1104-1109

【Authors】: Wonsung Lee ; Sungrae Park ; Weonyoung Joo ; Il-Chul Moon

【Abstract】: Predicting the clinical outcome of patients from the historical electronic health records (EHRs) is a fundamental research area in medical informatics. Although EHRs contain various records associated with each patient, the existing work mainly dealt with the diagnosis codes by employing recurrent neural networks (RNNs) with a simple attention mechanism. This type of sequence modeling often ignores the heterogeneity of EHRs. In other words, it only considers historical diagnoses and does not incorporate patient demographics, which correspond to clinically essential context, into the sequence modeling. To address the issue, we aim at investigating the use of an attention mechanism that is tailored to medical context to predict a future diagnosis. We propose a medical context attention (MCA)-based RNN that is composed of an attention-based RNN and a conditional deep generative model. The novel attention mechanism utilizes the derived individual patient information from conditional variational autoencoders (CVAEs). The CVAE models a conditional distribution of patient embeddings and his/her demographics to provide the measurement of patient's phenotypic difference due to illness. Experimental results showed the effectiveness of the proposed model.

【Keywords】: sequential diagnosis prediction; healthcare informatics; recurrent neural networks; variational autoencoders; attention mechanism

135. Next Point-of-Interest Recommendation with Temporal and Multi-level Context Attention.

Paper Link】 【Pages】:1110-1115

【Authors】: Ranzhen Li ; Yanyan Shen ; Yanmin Zhu

【Abstract】: With the prosperity of the location-based social networks, next Point-of-Interest (POI) recommendation has become an important service and received much attention in recent years. The next POI is dynamically determined by the mobility pattern and various contexts associated with user check-in sequence. However, exploring spatial-temporal mobility patterns and incorporating heterogeneous contextual factors for recommendation are challenging issues to be resolved. In this paper, we introduce a novel neural network model named TMCA (Temporal and Multi-level Context Attention) for next POI recommendation. Our model employs the LSTM-based encoder-decoder framework, which is able to automatically learn deep spatial-temporal representations for historical check-in activities and integrate multiple contextual factors using the embedding method in a unified manner. We further propose the temporal and multi-level context attention mechanisms to adaptively select relevant check-in activities and contextual factors for next POI preference prediction. Extensive experiments have been conducted using two real-world check-in datasets. The results verify (1) the superior performance of our proposed method in different evaluation metrics, compared with several state-of-the-art methods; and (2) the effectiveness of the temporal and multi-level context attention mechanisms on recommendation performance.

【Keywords】: POI recommendation; attention mechanism; spatial and temporal; sequential prediction

136. Transfer Hawkes Processes with Content Information.

Paper Link】 【Pages】:1116-1121

【Authors】: Tianbo Li ; Pengfei Wei ; Yiping Ke

【Abstract】: Hawkes processes are widely used for modeling event cascades. However, content and cross-domain information which is also instrumental in modeling is usually neglected. In this paper, we propose a novel model called transfer Hybrid Least Square for Hawkes (trHLSH) that incorporates Hawkes processes with content and cross-domain information. We also present the effective learning algorithm for the model. Evaluation on both synthetic and real-world datasets demonstrates that the proposed model can jointly learn knowledge from temporal, content and cross-domain information, and has better performance in terms of network recovery and prediction.

【Keywords】: Hawkes processes, transfer learning, event cascades

137. DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN.

Paper Link】 【Pages】:1122-1127

【Authors】: Swee Kiat Lim ; Yi Loo ; Ngoc-Trung Tran ; Ngai-Man Cheung ; Gemma Roig ; Yuval Elovici

【Abstract】: Recently, the introduction of the generative adversarial network (GAN) and its variants has enabled the generation of realistic synthetic samples, which has been used for enlarging training sets. Previous work primarily focused on data augmentation for semi-supervised and supervised tasks. In this paper, we instead focus on unsupervised anomaly detection and propose a novel generative data augmentation framework optimized for this task. In particular, we propose to oversample infrequent normal samples - normal samples that occur with small probability, e.g., rare normal events. We show that these samples are responsible for false positives in anomaly detection. However, oversampling of infrequent normal samples is challenging for real-world high-dimensional data with multimodal distributions. To address this challenge, we propose to use a GAN variant known as the adversarial autoencoder (AAE) to transform the high-dimensional multimodal data distributions into low-dimensional unimodal latent distributions with well-defined tail probability. Then, we systematically oversample at the 'edge' of the latent distributions to increase the density of infrequent normal samples. We show that our oversampling pipeline is a unified one: it is generally applicable to datasets with different complex data distributions. To the best of our knowledge, our method is the first data augmentation technique focused on improving performance in unsupervised anomaly detection. We validate our method by demonstrating consistent improvements across several real-world datasets.

【Keywords】: unsupervised learning; generative adversarial networks; adversarial autoencoders; data augmentation; anomaly detection

138. Privacy-Preserving Multi-task Learning.

Paper Link】 【Pages】:1128-1133

【Authors】: Kunpeng Liu ; Nitish Uplavikar ; Wei Jiang ; Yanjie Fu

【Abstract】: Multi-task learning (MTL), improving learning performance by transferring information between related tasks, has drawn more and more attention in the data mining field. To tackle tasks whose data are stored at different locations (or nodes), distributed MTL was proposed. It not only enhances the learning performance but also improves the computing efficiency since it transforms the original centralized computing framework into a distributed computing framework under which computations can be done in parallel. The major drawback of the distributed MTL is a potential violation of confidentiality when the data stored at each node contain sensitive information (e.g., medical records). Some distributed MTL algorithms were designed to protect the original by only transferring aggregate information (e.g., supports or gradients) from each node to a server who combines the received information to produce the desired models. However, since aggregate data may still leak sensitive information, the security guarantee of the existing solutions cannot be formally proved or verified. Thus, the goal of this paper is to develop a provable privacy-preserving multi-task learning (PP-MTL) protocol that incorporates the state of the art cryptographic techniques to achieve the best security guarantee. We also conducted experiments to demonstrate the efficiency of our proposed method.

【Keywords】: Privacy Preserving; Multi Task Learning

139. Distribution Preserving Multi-task Regression for Spatio-Temporal Data.

Paper Link】 【Pages】:1134-1139

【Authors】: Xi Liu ; Pang-Ning Tan ; Zubin Abraham ; Lifeng Luo ; Pouyan Hatami

【Abstract】: For many spatio-temporal applications, building regression models that can reproduce the true data distribution is often as important as building models with high prediction accuracy. For example, knowing the future distribution of daily temperature and precipitation can help scientists determine their long-term trends and assess their potential impact on human and natural systems. As conventional methods are designed to minimize residual errors, the shape of their predicted distribution may not be consistent with their actual distribution. To overcome this challenge, this paper presents a novel, distribution-preserving multi-task learning framework for multi-location prediction of spatio-temporal data. The framework employs a non-parametric density estimation approach with L2-distance to measure the divergence between the predicted and true distribution of the data. Experimental results using climate data from more than 1500 weather stations in the United States show that the proposed framework reduces the distribution error for more than 78% of the stations without degrading the prediction accuracy significantly.

【Keywords】: multi-task learning; spatio-temporal regression

140. TreeGAN: Syntax-Aware Sequence Generation with Generative Adversarial Networks.

Paper Link】 【Pages】:1140-1145

【Authors】: Xinyue Liu ; Xiangnan Kong ; Lei Liu ; Kuorong Chiang

【Abstract】: Generative Adversarial Networks (GANs) have shown great capacity on image generation, in which a discriminative model guides the training of a generative model to construct images that resemble real images. Recently, GANs have been extended from generating images to generating sequences (e.g., poems, music and codes). Existing GANs on sequence generation mainly focus on general sequences, which are grammar-free. In many real-world applications, however, we need to generate sequences in a formal language with the constraint of its corresponding grammar. For example, to test the performance of a database, one may want to generate a collection of SQL queries, which are not only similar to the queries of real users, but also follow the SQL syntax of the target database. Generating such sequences is highly challenging because both the generator and discriminator of GANs need to consider the structure of the sequences and the given grammar in the formal language. To address these issues, we study the problem of syntax-aware sequence generation with GANs, in which a collection of real sequences and a set of pre-defined grammatical rules are given to both discriminator and generator. We propose a novel GAN framework, namely TreeGAN, to incorporate a given Context-Free Grammar (CFG) into the sequence generation process. In TreeGAN, the generator employs a recurrent neural network (RNN) to construct a parse tree. Each generated parse tree can then be translated to a valid sequence of the given grammar. The discriminator uses a tree-structured RNN to distinguish the generated trees from real trees. We show that TreeGAN can generate sequences for any CFG and its generation fully conforms with the given syntax. Experiments on synthetic and real data sets demonstrated that TreeGAN significantly improves the quality of the sequence generation in context-free languages.

【Keywords】: Generative Adversarial Networks; Tree Generation; Sequence Generation; Context-Free Language

141. Deep Discriminative Features Learning and Sampling for Imbalanced Data Problem.

Paper Link】 【Pages】:1146-1151

【Authors】: Yi-Hsun Liu ; Chien-Liang Liu ; Shin-Mu Tseng

【Abstract】: The imbalanced data problem occurs in many application domains and is considered to be a challenging problem in machine learning and data mining. Most resampling methods for synthetic data focus on minority class without considering the data distribution of major classes. In contrast to previous works, the proposed method considers both majority classes and minority classes to learn feature embeddings and utilizes appropriate loss functions to make feature embedding as discriminative as possible. The proposed method is a comprehensive framework and different deep learning feature extractors can be utilized for different domains. We conduct experiments utilizing seven numerical datasets and one image dataset based on multiclass classification tasks. The experimental results indicate that the proposed method provides accurate and stable results.

【Keywords】: Imbalanced Data; Synthetic Sampling; Feature Embedding; Center Loss; Triplet Loss

142. D-CARS: A Declarative Context-Aware Recommender System.

Paper Link】 【Pages】:1152-1157

【Authors】: Rosni Lumbantoruan ; Xiangmin Zhou ; Yongli Ren ; Zhifeng Bao

【Abstract】: Context-aware recommendation has emerged as perhaps the most popular service over online sites, and has seen applications to domains as diverse as entertainment, e-business, e-health and government services. There has been recent significant progress on the quality and scalability of recommender systems. However, we believe that different target users concern different contexts when they select an online item, which can greatly affect the quality of recommendation, and have not been investigated yet. In this paper, we propose a new type of recommender system, Declarative Context-Aware Recommender System (D-CARS), which enables the personalization of the contexts exploited for each target user by automatically analysing the viewing history of users. First, we propose a novel User-Window Non-negative Matrix Factorization topic model (UW-NMF) that adaptively identifies the significant contexts of users and constructs user profiles in a personalized manner. Then, we design a novel declarative context-aware recommendation algorithm that exploits the user context preference to identify a group of item candidates and its context distribution, based on a Subspace Ensemble Tree Model (SETM), which is constructed in the identified context subspace for item recommendation. Finally, we propose an algorithm that incrementally maintains our SETM model. Extensive experiments are conducted to prove the high effectiveness and efficiency of our D-CARS system.

【Keywords】: declarative; context-aware; recommender system

143. Leveraging Hypergraph Random Walk Tag Expansion and User Social Relation for Microblog Recommendation.

Paper Link】 【Pages】:1158-1163

【Authors】: Huifang Ma ; Di Zhang ; Weizhong Zhao ; Yanru Wang ; Zhongzhi Shi

【Abstract】: Recommending valuable contents for microblog users is an important way to improve users' experiences. As high quality descriptors of user semantics, tags have always been used to represent users' interests or attributes. In this work, we propose a microblog recommendation approach via hypergraph random walk tag expansion and user social relation. More specifically, microblogs are considered as hyperedges and terms are taken as hypervertexs for each user, and the weighting strategies for both hyperedges and hypervertexs are established. Random walk is performed on the weighted hypergraph to obtain a number of terms as tags for users. And then the tag similarity matrix and the user-tag matrix can be constructed based on tag probability correlations and weight of each tag. Besides, the significance of user social relation is also considered for recommendation. Moreover, an iterative updating scheme is developed to get the final user-tag matrix for computing the similarities between microblogs and users. Experimental results show that the algorithm is effective in microblog recommendation.

【Keywords】: Hypergraph; Random Walk; Tag Expansion; Probability Correlation; Microblog Recommendation

144. Deep Heterogeneous Autoencoders for Collaborative Filtering.

Paper Link】 【Pages】:1164-1169

【Authors】: Yukun Ma ; Jiu Xu ; Björn Stenger ; Chen Liu ; Yu Hirate

【Abstract】: This paper leverages heterogeneous auxiliary information to address the data sparsity problem of recommender systems. We propose a model that learns a shared feature space from heterogeneous data, such as item descriptions, product tags and online purchase history, to obtain better predictions. Our model consists of autoencoders, not only for numerical and categorical data, but also for sequential data, which enables capturing user tastes, item characteristics and the recent dynamics of user preference. We learn the autoencoder architecture for each data source independently in order to better model their statistical properties. Our evaluation on two MovieLens datasets and an e-commerce dataset shows that mean average precision and recall improve over state-of-the-art methods.

【Keywords】: Deep Autoencoder; Heterogeneous Data; Shared Representation; Sequential Data Modeling; Collaborative Filtering

145. Text Segmentation on Multilabel Documents: A Distant-Supervised Approach.

Paper Link】 【Pages】:1170-1175

【Authors】: Saurav Manchanda ; George Karypis

【Abstract】: Segmenting text into semantically coherent segments is an important task with applications in information retrieval and text summarization. Developing accurate topical segmentation requires the availability of training data with ground truth information at the segment level. However, generating such labeled datasets, especially for applications in which the meaning of the labels is user-defined, is expensive and time-consuming. In this paper, we develop an approach that instead of using segment-level ground truth information, it instead uses the set of labels that are associated with a document and are easier to obtain as the training data essentially corresponds to a multilabel dataset. Our method, which can be thought of as an instance of distant supervision, improves upon the previous approaches by exploiting the fact that consecutive sentences in a document tend to talk about the same topic, and hence, probably belong to the same class. Experiments on the text segmentation task on a variety of datasets show that the segmentation produced by our method beats the competing approaches on four out of five datasets and performs at par on the fifth dataset. On the multilabel text classification task, our method performs at par with the competing approaches, while requiring significantly less time to estimate than the competing approaches.

【Keywords】: distant supervision; multilabel; segmentation

146. Spatial Contextualization for Closed Itemset Mining.

Paper Link】 【Pages】:1176-1181

【Authors】: Altobelli B. Mantuan ; Leandro Augusto Frata Fernandes

【Abstract】: We present the Spatial Contextualization for Closed Itemset Mining (SCIM) algorithm, an approach that builds a space for the target database in such a way that relevant itemsets can be retrieved regarding the relative spatial location of their items. Our algorithm uses Dual Scaling to map the items of the database to a multidimensional space called Solution Space. The representation of the database in the Solution Space assists in the interpretation and definition of overlapping clusters of related items. Therefore, instead of using the minimum support threshold, a distance threshold is defined concerning the reference and the maximum distances computed per cluster during the mapping procedure. Closed itemsets are efficiently retrieved by a new procedure that uses an FP-Tree, a CFI-Tree and the proposed spatial contextualization. Experiments show that the mean all-confidence measure of itemsets retrieved by our technique outperforms results from state-of-the-art algorithms. Additionally, we use the Minimum Description Length (MDL) metric to verify how descriptive are the collections of mined patterns.

【Keywords】: closed itemset; data mining; dual scaling; clustering

147. Deep Knowledge Tracing and Dynamic Student Classification for Knowledge Tracing.

Paper Link】 【Pages】:1182-1187

【Authors】: Sein Minn ; Yi Yu ; Michel C. Desmarais ; Feida Zhu ; Jill-Jênn Vie

【Abstract】: In Intelligent Tutoring System (ITS), tracing the student's knowledge state during learning has been studied for several decades in order to provide more supportive learning instructions. In this paper, we propose a novel model for knowledge tracing that i) captures students' learning ability and dynamically assigns students into distinct groups with similar ability at regular time intervals, and ii) combines this information with a Recurrent Neural Network architecture known as Deep Knowledge Tracing. Experimental results confirm that the proposed model is significantly better at predicting student performance than well known state-of-the-art techniques for student modelling.

【Keywords】: Student model, Deep knowledge tracing, K-means clustering, RNNs, LSTMs

148. Robust Densest Subgraph Discovery.

Paper Link】 【Pages】:1188-1193

【Authors】: Atsushi Miyauchi ; Akiko Takeda

【Abstract】: Dense subgraph discovery is an important primitive in graph mining, which has a wide variety of applications in diverse domains. In the densest subgraph problem, given an undirected graph G = (V, E) with an edge-weight vector w, we aim to find a subset of vertices S that maximizes the density, i.e., w(S) / |S|, where w(S) is the sum of the weights of the edges in the subgraph induced by S. Although the densest subgraph problem is one of the most well-studied optimization problems for dense subgraph discovery, there is an implicit strong assumption; it is assumed that the weights of all the edges are known exactly as input. In real-world applications, there are often cases where we have only uncertain information of the edge weights. In this study, we provide a framework for dense subgraph discovery under the uncertainty of edge weights. Specifically, we address such an uncertainty issue using the theory of robust optimization. First, we formulate our fundamental problem, the robust densest subgraph problem, and present a simple algorithm. We then formulate the robust densest subgraph problem with sampling oracle that models dense subgraph discovery using an edge-weight sampling oracle, and present an algorithm with a strong theoretical performance guarantee. Computational experiments using both synthetic graphs and popular real-world graphs demonstrate the effectiveness of our proposed algorithms.

【Keywords】: Graph mining; densest subgraph; uncertainty; robust optimization

149. Improving Deep Forest by Confidence Screening.

Paper Link】 【Pages】:1194-1199

【Authors】: Ming Pang ; Kai-Ming Ting ; Peng Zhao ; Zhi-Hua Zhou

【Abstract】: Most studies about deep learning are based on neural network models, where many layers of parameterized nonlinear differentiable modules are trained by backpropagation. Recently, it has been shown that deep learning can also be realized by non-differentiable modules without backpropagation training called deep forest. The developed representation learning process is based on a cascade of cascades of decision tree forests, where the high memory requirement and the high time cost inhibit the training of large models. In this paper, we propose a simple yet effective approach to improve the efficiency of deep forest. The key idea is to pass the instances with high confidence directly to the final stage rather than passing through all the levels. We also provide a theoretical analysis suggesting a means to vary the model complexity from low to high as the level increases in the cascade, which further reduces the memory requirement and time cost. Our experiments show that the proposed approach achieves highly competitive predictive performance with significantly reduced time cost and memory requirement by up to one order of magnitude.

【Keywords】: classification, ensemble methods, deep forest, confidence screening

150. Query-Efficient Black-Box Attack by Active Learning.

Paper Link】 【Pages】:1200-1205

【Authors】: Pengcheng Li ; Jinfeng Yi ; Lijun Zhang

【Abstract】: Deep neural network (DNN) as a popular machine learning model is found to be vulnerable to adversarial attack. This attack constructs adversarial examples by adding small perturbations to the raw input, while appearing unmodified to human eyes but will be misclassified by a well-trained classifier. In this paper, we focus on the black-box attack setting where attackers have almost no access to the underlying models. To conduct black-box attack, a popular approach aims to train a substitute model based on the information queried from the target DNN. The substitute model can then be attacked using existing white-box attack approaches, and the generated adversarial examples will be used to attack the target DNN. Despite its encouraging results, this approach suffers from poor query efficiency, i.e., attackers usually needs to query a huge amount of times to collect enough information for training an accurate substitute model. To this end, we first utilize state-of-the-art white-box attack methods to generate samples for querying, and then introduce an active learning strategy to significantly reduce the number of queries needed. Besides, we also propose a diversity criterion to avoid the sampling bias. Our extensive experimental results on MNIST and CIFAR-10 show that the proposed method can reduce more than 90% of queries while preserve attacking success rates and obtain an accurate substitute model which is more than 85% similar with the target oracle.

【Keywords】: Deep Neural Network; Active Learning

151. Predicted Edit Distance Based Clustering of Gene Sequences.

Paper Link】 【Pages】:1206-1211

【Authors】: Sakti Pramanik ; A. K. M. Tauhidul Islam ; Shamik Sural

【Abstract】: Effective mining of huge amount of DNA and RNA fragments generated by next generation sequencing (NGS) technologies is facilitated by developing efficient tools to partition these sequence fragments (reads) based on their level of similarities using edit distance. However, edit distance calculation for all pairwise sequence fragments to cluster these huge data sets is a significant performance bottleneck. In this paper we propose a predicted Edit distance based clustering to significantly lower clustering time. Existing clustering methods for sequence fragments, such as, k-mer based VSEARCH and Locality Sensitive Hash based LSH-Div achieve much reduced clustering time but at the cost of significantly lower cluster quality. We show, through extensive performance analysis, clustering based on this predicted Edit distance provides more than 99% accurate clusters while providing an order of magnitude faster clustering time than actual Edit distance based clustering.

【Keywords】: Edit distance prediction; OTU clustering

152. Tracking and Forecasting Dynamics in Crowdfunding: A Basis-Synthesis Approach.

Paper Link】 【Pages】:1212-1217

【Authors】: Xiaoying Ren ; Linli Xu ; Tianxiang Zhao ; Chen Zhu ; Junliang Guo ; Enhong Chen

【Abstract】: Crowdfunding is an emerging online fundraising mechanism for creators to launch campaigns (projects) to solicit funds or expand their influence. Tracking the dynamics, i.e., daily funding amounts can be of great help to campaign creators as well as contributors. Previous works on this subject either fit the fluctuations of time-series with predefined stochastic process or apply a regularization term to constrain learned tendencies, resulting in limited generalization abilities. Patterns of funding-amount sequences in crowdfunding are often exclusive and non-linear, making previous predictors suboptimal. To tackle this problem, we propose a novel method based on synthesized bases which can be composed into arbitrary patterns. Concretely, we build a large set of candidate basis from which we select based on reliability, diversity and latent structures. We use representations of sequences in this basis space as a predictor, and adopt a dual-graph to exploit neighbouring information to enhance its prediction quality. Experimental results demonstrate the effectiveness of our method.

【Keywords】: crowdfunding; time-series forecasting

153. Demographic Inference Via Knowledge Transfer in Cross-Domain Recommender Systems.

Paper Link】 【Pages】:1218-1223

【Authors】: Jin Shang ; Mingxuan Sun ; Kevyn Collins-Thompson

【Abstract】: User demographics such as age and gender are very useful in recommender systems for applications such as personalization services and marketing, but may not always be available for individual users. Existing approaches can infer users' private demographics based on ratings, given labeled data from users who share demographics. However, such labeled information is not always available in many e-commerce services, particularly small online retailers and most media sites, for which no user registration is required. We introduce a novel probabilistic matrix factorization model for demographic transfer that enables knowledge transfer from the source domain, in which users' ratings and the corresponding demographics are available, to the target domain, in which we would like to infer unknown user demographics from ratings. Our proposed method is based on two observations: (1) Items from different but related domains may share the same latent factors such as genres and styles, and (2) Users who share similar demographics are likely to prefer similar genres across domains. This approach can align latent factors across domains that share neither common users nor common items, associating user demographics with latent factors in a unified framework. Experiments on cross-domain datasets demonstrate that the proposed method consistently improves demographic classification accuracy over existing methods.

【Keywords】: Demographic inference; Recommender systems; Matrix factorization

154. T2S: Domain Adaptation Via Model-Independent Inverse Mapping and Model Reuse.

Paper Link】 【Pages】:1224-1229

【Authors】: Zhi-Yu Shen ; Ming Li

【Abstract】: Domain adaptation, which is able to leverage the abundant supervision from the source domain and limited supervision in the target domain to construct a model for the data in the target domain, has drawn significant attentions. Most of the existing domain adaptation methods elaborate to map the information derived from the source domain to the target domain for model construction in the target domain. However, such a 'Source' (S) to 'Target' (T) mapping usually involves 'tailoring' the information from the source domain to fit the target domain, which may lose valuable information in the source domain for model construction. Moreover, such a mapping is usually tightly coupled with the model construction, which is more complex than a separate model construction or mapping construction. In this paper, we provide an alternative way for domain adaptation, named T2S. Instead of mapping the 'S' to 'T' and constructing a model in 'T', we inversely map 'T' to 'S' and reuse the model that has been well-trained with abundant information in 'S' for prediction. Such an approach enjoys the abundant information in source domain for model construction and the simplicity of learning mapping separately with limited supervision in target domain. Experiments on both synthetic and real-world data sets indicate the effectiveness of our framework.

【Keywords】: domain adaptation; model reuse

155. An Efficient Many-Class Active Learning Framework for Knowledge-Rich Domains.

Paper Link】 【Pages】:1230-1235

【Authors】: Weishi Shi ; Qi Yu

【Abstract】: The high cost for labeling data instances is a key bottleneck for training effective supervised learning models. This is especially the case in domains such as medicine and bioinformatics, where expert knowledge is required for understanding and extracting the underlying semantics of data. Active learning provides a means to reduce human labeling efforts by identifying the most informative data instances. In this paper, we propose a cost-effective active learning framework to further lessen human efforts, especially in knowledge-rich domains where a large number of classes may be subject to scrutiny during decision making. In particular, this framework employs a novel many-class sampling model, MC-S, for data sample selection. MC-S is further augmented with convex hull-based sampling to achieve faster convergence of active learning. Evaluation studies conducted over multiple real-world datasets with many classes demonstrate that the proposed framework significantly reduces the overall labeling efforts through fast convergence and early stop of active learning.

【Keywords】: active learning; data sampling; knowledge-rich domain

156. Record2Vec: Unsupervised Representation Learning for Structured Records.

Paper Link】 【Pages】:1236-1241

【Authors】: Adelene Y. L. Sim ; Andrew Borthwick

【Abstract】: Structured records - data with a fixed number of descriptive fields (or attributes) - are often represented by one-hot encoded or term frequency-inverse document frequency (TF-IDF) weighted vectors. These vectors are typically sparse and long, and are inefficient in representing structured records. Here, we introduce Record2Vec, a framework for generating dense embeddings of structured records by training associations between attributes within record instances. We build our embedding from a simple premise that structured records have attributes that are associated, and therefore we can train the embedding of an attribute based on other attributes (or context), much like how we train embeddings for words based on their surrounding context. Because this embedding technique is general and does not assume the availability of any labeled data, it is extendable across different domains and fields. We demonstrate its utility in the context of clustering, record matching, movie rating and movie genre prediction.

【Keywords】: Machine learning, Unsupervised learning, Knowledge representation

157. Multi-label Adversarial Perturbations.

Paper Link】 【Pages】:1242-1247

【Authors】: Qingquan Song ; Haifeng Jin ; Xiao Huang ; Xia Hu

【Abstract】: Adversarial examples are delicately perturbed inputs, which aim to mislead machine learning models towards incorrect outputs. While existing work focuses on generating adversarial perturbations in multiclass classification problems, many real-world applications fall into the multi-label setting, in which one instance could be associated with more than one label. To analyze the vulnerability and robustness of multi-label learning models, we investigate the generation of multi-label adversarial perturbations. This is a challenging task due to the uncertain number of positive labels associated with one instance, and the fact that multiple labels are usually not mutually exclusive with each other. To bridge the gap, in this paper, we propose a general attacking framework targeting multi-label classification problem and conduct a premier analysis on the perturbations for deep neural networks. Leveraging the ranking relationships among labels, we further design a ranking-based framework to attack multi-label ranking algorithms. Experiments on two different datasets demonstrate the effectiveness of the proposed frameworks and provide insights of the vulnerability of multi-label deep models under diverse targeted attacks.

【Keywords】: Adversarial Machine Learning; Multi Label Learning; Adversarial Attack

158. Clustered Lifelong Learning Via Representative Task Selection.

Paper Link】 【Pages】:1248-1253

【Authors】: Gan Sun ; Yang Cong ; Yu Kong ; Xiaowei Xu

【Abstract】: Consider the lifelong machine learning problem where the objective is to learn new consecutive tasks depending on previously accumulated experiences, i.e., knowledge library. In comparison with most state-of-the-arts which adopt knowledge library with prescribed size, in this paper, we propose a new incremental clustered lifelong learning model with two libraries: feature library and model library, called Clustered Lifelong Learning (CL3), in which the feature library maintains a set of learned features common across all the encountered tasks, and the model library is learned by identifying and adding representative models (clusters). When a new task arrives, the original task model can be firstly reconstructed by representative models measured by capped l2-norm distance, i.e., effectively assigning the new task model to multiple representative models under feature library. Based on this assignment knowledge of new task, the objective of our CL3 model is to transfer the knowledge from both feature library and model library to learn the new task. The new task 1) with a higher outlier probability will then be judged as a new representative, and used to refine both feature library and representative models over time; 2) with lower outlier probability will only update the feature library. For the model optimisation, we cast this problem as an alternating direction minimization problem. To this end, the performance of CL3 is evaluated through comparing with most lifelong learning models, even some batch clustered multi-task learning models.

【Keywords】: Lifelong Learning, Clustering Analysis, Multi-task Learning, Transfer Learning

159. Entire Regularization Path for Sparse Nonnegative Interaction Model.

Paper Link】 【Pages】:1254-1259

【Authors】: Mirai Takayanagi ; Yasuo Tabei ; Hiroto Saigo

【Abstract】: Building sparse combinatorial model with non-negative constraint is essential in solving real-world problems such as in biology, in which the target response is often formulated by additive linear combination of features variables. This paper presents a solution to this problem by combining itemset mining with non-negative least squares. However, once incorporation of modern regularization is considered, then a naive solution requires to solve expensive enumeration problem many times for every regularization parameter. In this paper, we devise a regularization path tracking algorithm such that combinatorial feature is searched and included one by one to the solution set. Our contribution is a proposal of novel bounds specifically designed for the feature search problem. In synthetic dataset, the proposed method is demonstrated to run orders of magnitudes faster than a naive counterpart which does not employ tree pruning. We also empirically show that non-negativity constraints can reduce the number of active features much less than that of LASSO, leading to significant speed-ups in pattern search. In experiments using HIV-1 drug resistance dataset, the proposed method could successfully model the rapidly increasing drug resistance triggered by accumulation of mutations in HIV-1 genetic sequences. We also demonstrate the effectiveness of non-negativity constraints in suppressing false positive features, resulting in a model with smaller number of features and thereby improved interpretability.

【Keywords】: sparse modeling, itemset mining, non negative least squares, lars, regularization path

160. Doc2Cube: Allocating Documents to Text Cube Without Labeled Data.

Paper Link】 【Pages】:1260-1265

【Authors】: Fangbo Tao ; Chao Zhang ; Xiusi Chen ; Meng Jiang ; Tim Hanratty ; Lance M. Kaplan ; Jiawei Han

【Abstract】: Data cube is a cornerstone architecture in multidimensional analysis of structured datasets. It is highly desirable to conduct multidimensional analysis on text corpora with cube structures for various text-intensive applications in healthcare, business intelligence, and social media analysis. However, one bottleneck to constructing text cube is to automatically put millions of documents into the right cube cells so that quality multidimensional analysis can be conducted afterwards-it is too expensive to allocate documents manually or rely on massively labeled data. We propose Doc2Cube, a method that constructs a text cube from a given text corpus in an unsupervised way. Initially, only the label names (e.g., USA, China) of each dimension (e.g., location) are provided instead of any labeled data. Doc2Cube leverages label names as weak supervision signals and iteratively performs joint embedding of labels, terms, and documents to uncover their semantic similarities. To generate joint embeddings that are discriminative for cube construction, Doc2Cube learns dimension-tailored document representations by selectively focusing on terms that are highly label-indicative in each dimension. Furthermore, Doc2Cube alleviates label sparsity by propagating the information from label names to other terms and enriching the labeled term set. Our experiments on real data demonstrate the superiority of Doc2Cube over existing methods.

【Keywords】: data cube; text classification; multidimensional analysis

161. Graph Pattern Mining and Learning through User-Defined Relations.

Paper Link】 【Pages】:1266-1271

【Authors】: Carlos H. C. Teixeira ; Leornado Cotta ; Bruno Ribeiro ; Wagner Meira Jr.

【Abstract】: In this work we propose R-GPM, a parallel computing framework for graph pattern mining (GPM) through a user-defined subgraph relation. More specifically, we enable the computation of statistics of patterns through their subgraph classes, generalizing traditional GPM methods. R-GPM provides efficient estimators for these statistics by employing a MCMC sampling algorithm combined with several optimizations. We provide both theoretical guarantees and empirical evaluations of our estimators in application scenarios such as stochastic optimization of deep high-order graph neural network models and pattern (motif) counting. We also propose and evaluate optimizations that enable improvements of our estimators accuracy, while reducing their computational costs in up to 3-orders-of-magnitude. Finally, we show that R-GPM is scalable, providing near-linear speedups.

【Keywords】: Graph mining; subgraph pattern; sampling; random walk

162. Robust Distributed Anomaly Detection Using Optimal Weighted One-Class Random Forests.

Paper Link】 【Pages】:1272-1277

【Authors】: Yu-Lin Tsou ; Hong-Min Chu ; Cong Li ; Shao-Wen Yang

【Abstract】: Wireless sensor networks (WSNs) have been widely deployed in various applications, e.g., agricultural monitoring and industrial monitoring, for their ease-of-deployment. The low-cost nature makes WSNs particularly vulnerable to changes of extrinsic factors, i.e., the environment, or changes of intrinsic factors, i.e., hardware or software failures. The problem can, often times, be uncovered via detecting unexpected behaviors (anomalies) of devices. However, anomaly detection in WSNs is subject to the following challenges: (1) the limited computation and connectivity, (2) the dynamicity of the environment and network topology, and (3) the need of taking real-time actions in response to anomalies. In this paper, we propose a novel framework using optimal weighted one-class random forests for unsupervised anomaly detection to address the aforementioned challenges in WSNs. The ample experiments showed that our framework not only is feasible but also outperforms the state-of-the-art unsupervised methods in terms of both detection accuracy and resource utilization.

【Keywords】: distributed anomaly detection; one class random forests; wireless sensor networks; internet of things; WSN; IoT; anomaly detection; unsupervised ensemble

163. Sparse Non-linear CCA through Hilbert-Schmidt Independence Criterion.

Paper Link】 【Pages】:1278-1283

【Authors】: Viivi Uurtio ; Sahely Bhadra ; Juho Rousu

【Abstract】: We present SCCA-HSIC, a method for finding sparse non-linear multivariate relations in high-dimensional settings by maximizing the Hilbert-Schmidt Independence Criterion (HSIC). We propose efficient optimization algorithms using a projected stochastic gradient and Nyström approximation of HSIC. We demonstrate the favourable performance of SCCA-HSIC over competing methods in detecting multivariate non-linear relations both in simulation studies, with varying numbers of related variables, noise variables, and samples, as well as in real datasets.

【Keywords】: dimensionality reduction; canonical correlation; sparsity; kernel methods; Hilbert-Schmidt Independence Criterion

164. Imputing Structured Missing Values in Spatial Data with Clustered Adversarial Matrix Factorization.

Paper Link】 【Pages】:1284-1289

【Authors】: Qi Wang ; Pang-Ning Tan ; Jiayu Zhou

【Abstract】: Missing data problem often poses a significant challenge as it may introduce uncertainties into the data analysis. Recent advances in matrix completion have shown competitive imputation performance when applied to many real-world domains. However, there are two major limitations when applying matrix completion methods to spatial data. First, they make a strong assumption that the entries are missing-at-random, which may not hold for spatial data. Second, they may not effectively utilize the underlying spatial structure of the data. To address these limitations, this paper presents a novel clustered adversarial matrix factorization method to explore and exploit the underlying cluster structure of the spatial data in order to facilitate effective imputation. The proposed method utilizes an adversarial network to learn the joint probability distribution of the variables and improve the imputation performance for the missing entries that are not randomly sampled.

【Keywords】: Missing value imputation, deep adversarial network, spatial data

165. Partial Multi-view Clustering via Consistent GAN.

Paper Link】 【Pages】:1290-1295

【Authors】: Qianqian Wang ; Zhengming Ding ; Zhiqiang Tao ; Quanxue Gao ; Yun Fu

【Abstract】: Multi-view clustering, as one of the most important methods to analyze multi-view data, has been widely used in many real-world applications. Most existing multi-view clustering methods perform well on the assumption that each sample appears in all views. Nevertheless, in real-world application, each view may well face the problem of the missing data due to noise, or malfunction. In this paper, a new consistent generative adversarial network is proposed for partial multi-view clustering. We learn a common low-dimensional representation, which can both generate the missing view data and capture a better common structure from partial multi-view data for clustering. Different from the most existing methods, we use the common representation encoded by one view to generate the missing data of the corresponding view by generative adversarial networks, then we use the encoder and clustering networks. This is intuitive and meaningful because encoding common representation and generating the missing data in our model will promote mutually. Experimental results on three different multi-view databases illustrate the superiority of the proposed method.

【Keywords】: partial multi-view, clustering, generative adversarial network

166. EPAB: Early Pattern Aware Bayesian Model for Social Content Popularity Prediction.

Paper Link】 【Pages】:1296-1301

【Authors】: Qitian Wu ; Chaoqi Yang ; Xiaofeng Gao ; Peng He ; Guihai Chen

【Abstract】: The boom of information technology enables social platforms (like Twitter) to disseminate social content (like news) in an unprecedented rate, which makes early-stage prediction for social content popularity of great practical significance. However, most existing studies assume a long-term observation before prediction and suffer from limited precision for early-stage prediction due to insufficient observation. In this paper, we take a fresh perspective, and propose a novel early pattern aware Bayesian model. The early pattern representation, which stands for early time series normalized on future popularity, can address what we call early-stage indistinctiveness challenge. Then we use an expressive evolving function to fit the time series and estimate three interpretable coefficients characterizing temporal effect of observed series on future evolution. Furthermore, Bayesian network is leveraged to model the probabilistic relations among features, early indicators and early patterns. Experiments on three real-world social platforms (Twitter, Weibo and WeChat) show that under different evaluation metrics, our model outperforms other methods in early-stage prediction and possesses low sensitivity to observation time.

【Keywords】: Popularity Prediction; Feature Driven; Early Stage Prediction; Bayesian Model

167. DeepAD: A Deep Learning Based Approach to Stroke-Level Abnormality Detection in Handwritten Chinese Character Recognition.

Paper Link】 【Pages】:1302-1307

【Authors】: Tie-Qiang Wang ; Cheng-Lin Liu

【Abstract】: Writing abnormality detection is very important in education applications, but has received little attention by the community. Considering that abnormally written strokes (writing error or largely distorted stroke) affect the decision confidence of classifier, we propose an approach named DeepAD to detect stroke-level abnormalities in handwritten Chinese characters by analyzing the decision process of deep neural network (DNN). Firstly, to minimize the effect of stroke width variation of handwritten characters, we propose a skeletonization method based on fully convolutional network (FCN) with cross detection. With a convolutional neural network (CNN) for character classification, we evaluate the role of each skeleton pixel by calculating its impact on the prediction of classifier, and detect abnormal strokes by connecting pixels of negative impact. For quantitative evaluation of performance, we build a template-free dataset named SA-CASIA-HW containing 3696 handwritten Chinese characters with various stroke-level abnormalities, and spanning 3000+ different classes written by 60 individual writers. We validate the usefulness of the proposed DeepAD with comparison to related methods.

【Keywords】: handwritten Chinese character recognition (HCCR); stroke-level handwriting abnormality detection; skeletonization; decision making process; conditional sampling

168. Multiple Co-clusterings.

Paper Link】 【Pages】:1308-1313

【Authors】: Xing Wang ; Guoxian Yu ; Carlotta Domeniconi ; Jun Wang ; Zhiwen Yu ; Zili Zhang

【Abstract】: The goal of multiple clusterings is to discover multiple independent ways of organizing a dataset into clusters. Current approaches to this problem just focus on one-way clustering. In many real-world applications, though, it's meaningful and desirable to explore alternative two-way clustering (or co-clusterings), where both samples and features are clustered. To tackle this challenge and unexplored problem, in this paper we introduce an approach, called Multiple Co-Clusterings (MultiCC), to discover non-redundant alternative co-clusterings. MultiCC makes use of matrix tri-factorization to optimize the sample-wise and feature-wise co-clustering indicator matrices, and introduces two non-redundancy terms to enforce diversity among co-clusterings. We then combine the objective of matrix tri-factorization and two non-redundancy terms into a unified objective function and introduce an iterative solution to optimize the function. Experimental results show that MultiCC outperforms existing multiple clustering methods, and it can find interesting co-clusters which cannot be discovered by current solutions.

【Keywords】: multiple clusterings, co-clustering, matrix tri factorization, redundancy

169. Uncluttered Domain Sub-Similarity Modeling for Transfer Regression.

Paper Link】 【Pages】:1314-1319

【Authors】: Pengfei Wei ; Ramón Sagarna ; Yiping Ke ; Yew-Soon Ong

【Abstract】: Transfer covariance functions, which can model domain similarities and adaptively control the knowledge transfer across domains, are widely used in Gaussian process (GP) based transfer learning. We focus on regression problems in a black-box learning scenario, and study a family of rather general transfer covariance functions, T*, that can model the similarity heterogeneity of domains through multiple kernel learning. A necessary and sufficient condition that (i) validates GPs using T* for any data and (ii) provides semantic interpretations is given. Moreover, building on this condition, we propose a computationally inexpensive model learning rule that can explicitly capture different sub-similarities of domains. Extensive experiments on one synthetic dataset and four real-world datasets demonstrate the effectiveness of the learned GP on the sub-similarity capture and the transfer performance.

【Keywords】: Transfer Covariance Function; Sub similarity Modeling

170. Finding Maximal Significant Linear Representation between Long Time Series.

Paper Link】 【Pages】:1320-1325

【Authors】: Jiaye Wu ; Yang Wang ; Peng Wang ; Jian Pei ; Wei Wang

【Abstract】: In some applications on time series data, finding linear correlation between time series is important. However, it is meaningless to measure the global correlation between two long time series. Moreover, more often than not, two time series may be correlated in various segments. To tackle the challenges in measuring linear correlation between two long time series, in this paper, we formulate the novel problem of finding maximal significant linear representation. The major idea is that, given two time series and a quality constraint, we want to find the longest gapped time interval on which a time series can be linearly represented by the other within the quality constraint requirement. We develop a point-based approach, which exploits a novel representation of linear correlation between time series on segments, and transforms the problem into geometric search. We present a systematic empirical study to verify its efficiency and effectiveness.

【Keywords】: time series; correlation; linear representation; geometric search

171. eOTD: An Efficient Online Tucker Decomposition for Higher Order Tensors.

Paper Link】 【Pages】:1326-1331

【Authors】: Houping Xiao ; Fei Wang ; Fenglong Ma ; Jing Gao

【Abstract】: A tensor (i.e., an N-mode array) is a natural representation for multidimensional data. Tucker Decomposition (TD) is one of the most popular methods, and a series of batch TD algorithms have been extensively studied and widely applied in signal/image processing, bioinformatics, etc. However, in many applications, the large-scale tensor is dynamically evolving at all modes, which poses significant challenges for existing approaches to track the TD for such dynamic tensors. In this paper, we propose an efficient Online Tucker Decomposition (eOTD) approach to track the TD of dynamic tensors with an arbitrary number of modes. We first propose corollaries on the multiplication of block tensor matrix. Based on this corollary, eOTD allows us 1) to update the projection matrices using those projection matrices from the previous timestamp and the auxiliary matrices from the current timestamp, and 2) to update the core tensor by a sum of tensors that are obtained by multiplying smaller tensors with matrices. The auxiliary matrices are obtained by solving a series of least square regression tasks, not by performing Singular Value Decompositions (SVD). This overcomes the bottleneck in computation and storage caused by computing SVDs on largescale data. A Modified Gram-Schmidt (MGS) process is further applied to orthonormalize the projection matrices. Theoretically, the output of the eOTD framework is guaranteed to be lowrank. We further prove that the MGS process will not increase Tucker decomposition error. Empirically, we demonstrate that the proposed eOTD achieves comparable accuracy with a significant speedup on both synthetic and real data, where the speedup can be more than 1,500 times on large-scale data.

【Keywords】: Tucker Decomposition; Low rankness; Online Learning

172. Prediction of MicroRNA Subcellular Localization by Using a Sequence-to-Sequence Model.

Paper Link】 【Pages】:1332-1337

【Authors】: Yiqun Xiao ; Jiaxun Cai ; Yang Yang ; Hai Zhao ; Hongbin Shen

【Abstract】: The subcellular localization of microRNAs (miR-NAs) is closely related with their biological functions. Some recent studies have discovered that microRNAs can target to various cellular compartments, and have abundant localization patterns in cells. However, to the best of our knowledge, there has been no computational tool for predicting miRNA subcellular locations to date. The major reason is that the lack of useful information source largely limits the prediction performance using traditional statistical learning approaches. In this study, we regard this prediction task as a Sequence-to-Sequence learning process and propose an attention-based encoder-decoder model, miRLocator, to identify subcellular locations of human miRNAs. The designed miRLocator uses a bidirectional long short-term memory (BiLSTM) module to encode the input sequences, and an LSTM module to decode these context vectors as location sets. Especially, a new encoding method for RNAs, RNA2Vec, and an entropy-based method are incorporated in the model to determine the input and output representations, respectively. The experimental results show that miRLocator achieves promising prediction accuracy with the limited input information, and outperforms the models using hand-designed features and conventional RNN models.

【Keywords】: MicroRNA; Subcellular Localization; RNA2Vec; Sequence-to-Sequence

173. Unsupervised User Identity Linkage via Factoid Embedding.

Paper Link】 【Pages】:1338-1343

【Authors】: Wei Xie ; Xin Mu ; Roy Ka-Wei Lee ; Feida Zhu ; Ee-Peng Lim

【Abstract】: User identity linkage (UIL), the problem of matching user account across multiple online social networks (OSNs), is widely studied and important to many real-world applications. Most existing UIL solutions adopt a supervised or semi-supervised approach which generally suffer from scarcity of labeled data. In this paper, we propose Factoid Embedding, a novel framework that adopts an unsupervised approach. It is designed to cope with different profile attributes, content types and network links of different OSNs. The key idea is that each piece of information about a user identity describes the real identity owner, and thus distinguishes the owner from other users. We represent such a piece of information by a factoid and model it as a triplet consisting of user identity, predicate, and an object or another user identity. By embedding these factoids, we learn the user identity latent representations and link two user identities from different OSNs if they are close to each other in the user embedding space. Our Factoid Embedding algorithm is designed such that as we learn the embedding space, each embedded factoid is "translated" into a motion in the user embedding space to bring similar user identities closer, and different user identities further apart. Extensive experiments are conducted to evaluate Factoid Embedding on two real-world OSNs data sets. The experiment results show that Factoid Embedding outperforms the state-of-the-art methods even without training data.

【Keywords】: user identity linkage, factoid embedding, network embedding

174. A TIMBER Framework for Mining Urban Tree Inventories Using Remote Sensing Datasets.

Paper Link】 【Pages】:1344-1349

【Authors】: Yiqun Xie ; Han Bao ; Shashi Shekhar ; Joseph K. Knight

【Abstract】: Tree inventories are important datasets for many societal applications (e.g., urban planning). However, tree inventories still remain unavailable in most urban areas. We aim to automate tree identification at individual levels in urban areas at a large scale using remote sensing datasets. The problem is challenging due to the complexity of the landscape in urban scenarios and the lack of ground truth data. In related work, tree identification algorithms have mainly focused on controlled forest regions where the landscape is mostly homogeneous with trees, making the methods difficult to generalize to urban environments. We propose a TIMBER framework to find individual trees in complex urban environments and a Core Object REduction (CORE) algorithm to improve the computational efficiency of TIMBER. Experiments show that TIMBER can efficiently detect urban trees with high accuracy.

【Keywords】: remote sensing; TIMBER; urban; tree detection

175. Active Learning on Heterogeneous Information Networks: A Multi-armed Bandit Approach.

Paper Link】 【Pages】:1350-1355

【Authors】: Doris Xin ; Ahmed El-Kishky ; De Liao ; Brandon Norick ; Jiawei Han

【Abstract】: Active learning exploits inherent structures in the unlabeled data to minimize the number of labels required to train an accurate model. It enables effective machine learning in applications with high labeling cost, such as document classification and drug response prediction. We investigate active learning on heterogeneous information networks, with the objective of obtaining accurate node classifications while minimizing the number of labeled nodes. Our proposed algorithm harnesses a multi-armed bandit (MAB) algorithm to determine network structures that identify the most important nodes to the classification task, accounting for node types and without assuming label assortativity. Evaluations on real-world network classification tasks demonstrate that our algorithm outperforms existing methods independent of the underlying classification model.

【Keywords】: active learning; heterogeneous information networks; multi armed bandit

176. Exploiting the Sentimental Bias between Ratings and Reviews for Enhancing Recommendation.

Paper Link】 【Pages】:1356-1361

【Authors】: Yuanbo Xu ; Yongjian Yang ; Jiayu Han ; En Wang ; Fuzhen Zhuang ; Hui Xiong

【Abstract】: In real-world recommendation scenarios, there are two common phenomena: 1) users only provide ratings but there is no review comment. As a result, the historical transaction data available for recommender system are usually unbalanced and sparse; 2) Users' opinions can be better grasped in their reviews than ratings. This indicates that there is always a bias between ratings and reviews. Therefore, it is important that users' ratings and reviews should be mutually reinforced to grasp the users' true opinions. To this end, in this paper, we develop an opinion mining model based on convolutional neural networks for enhancing recommendation (NeuO). Specifically, we exploit a two-step training neural networks, which utilize both reviews and ratings to grasp users' true opinions in unbalanced data. Moreover, we propose a Sentiment Classification scoring method (SC), which employs dual attention vectors to predict the users' sentiment scores of their reviews. A combination function is designed to use the results of SC and user-item rating matrix to catch the opinion bias. Finally, a Multilayer perceptron based Matrix Factorization (MMF) method is proposed to make recommendations with the enhanced user-item matrix. Extensive experiments on real-world data demonstrate that our approach can achieve a superior performance over state-of-the-art baselines on real-world datasets.

【Keywords】: Opinion bias, recommender systems, convolutional neural network, dual attention vectors

177. Enhancing Question Understanding and Representation for Knowledge Base Relation Detection.

Paper Link】 【Pages】:1362-1367

【Authors】: Zihan Xu ; Hai-Tao Zheng ; Zuoyou Fu ; Wei Wang

【Abstract】: Relation detection is a key step in Knowledge Base Question Answering (KBQA), but far from solved due to the significant differences between questions and relations. Previous studies usually treat relation detection as a text matching task, and mainly focus on reducing the detection error with better representations of KB relations. However, the understanding of questions is also important since they are generally more varied. And the text pair representation requires improvement because KB relations are not always counterparts of questions. In this paper, we propose a novel system with enhanced question understanding and representation processes for KB relation detection (QURRD). We design a KBQA-specific slot filling module based on Bi-LSTM-CRF for question understanding. Besides, with two CNNs for modeling and matching text pairs respectively, QURRD obtains richer question-relation representations for semantic analysis, and achieves better performance through learning from multiple tasks. We conduct experiments on both single-relation (Simple-Questions) and multi-relation (WebQSP) benchmarks. Results show that QURRD is robust against the diversity of questions and outperforms the state-of-the-art system on both tasks.

【Keywords】: Relation Detection, KBQA, Semantic Similarity

178. A Knowledge-Enhanced Deep Recommendation Framework Incorporating GAN-Based Models.

Paper Link】 【Pages】:1368-1373

【Authors】: Deqing Yang ; Zikai Guo ; Ziyi Wang ; Juyang Jiang ; Yanghua Xiao ; Wei Wang

【Abstract】: Although many researchers of recommender systems have noted that encoding user-item interactions based on DNNs promotes the performance of collaborative filtering, they ignore that embedding the latent features collected from external sources, e.g., knowledge graphs (KGs), is able to produce more precise recommendation results. Furthermore, CF-based models are still vulnerable to the scenarios of sparse known user-item interactions. In this paper, towards movie recommendation, we propose a novel knowledge-enhanced deep recommendation framework incorporating GAN-based models to acquire robust performance. Specifically, our framework first imports various feature embeddings distilled not only from user-movie interactions, but also from KGs and tags, to constitute initial user/movie representations. Then, user/movie representations are fed into a generator and a discriminator simultaneously to learn final optimal representations through adversarial training, which are conducive to generating better recommendation results. The extensive experiments on a real Douban dataset demonstrate our framework's superiority over some state-of-the-art recommendation models, especially in the scenarios of sparse observed user-movie interactions.

【Keywords】: recommendation; embedding; knowledge graphs; GAN; adversarial training; user-item interactions

179. Adaptive Affinity Learning for Accurate Community Detection.

Paper Link】 【Pages】:1374-1379

【Authors】: Fanghua Ye ; Shenghui Li ; Zhiwei Lin ; Chuan Chen ; Zibin Zheng

【Abstract】: The task of community detection has become a fundamental research problem in complex network analysis. Intuitively, similar nodes are more likely to be contained in the same community. However, most existing community detection methods cannot extract the intrinsic similarity between nodes. Thus, they may fail to identify the real community structures. In this paper, we propose to learn an affinity matrix adaptively, which can capture the intrinsic similarity between nodes accurately, and therefore benefit the community detection results. Specifically, the proposed model first embeds each node into a low-dimensional space through a transformation matrix with the community structures being preserved. Then, our model learns the affinity matrix in this low-dimensional space. The affinity matrix is further utilized to guide the learning of the community membership matrix via manifold regularization. The above three matrices are learned simultaneously and updated iteratively under the framework of Alternating Direction Method of Multipliers (ADMM). Extensive experiments show that our model can outperform the state-of-the-art approaches.

【Keywords】: community detection; affinity learning; nonnegative matrix factorization; complex networks; ADMM

180. A Unified Theory of the Mobile Sequential Recommendation Problem.

Paper Link】 【Pages】:1380-1385

【Authors】: Zeyang Ye ; Keli Xiao ; Yuefan Deng

【Abstract】: A theory is developed to unify the original form, and its many variations, of the mobile sequential recommendation (MSR) problem. The unified theory, expressing the same MSR problem, is superior to the original form in many aspects including a more standardized form. In addition to a newly proposed expected traveling time (ETT) function to measure the quality of recommended routes, we introduce five additional improvements. Also, three essential mathematical properties of the new objective function enable the development of the methods to solve realistic MSR problems with complex conditions. The MSR solutions also support the discovered properties of the proposed objective function. The unified theory should support the long-term decision making for drivers and the traffic department in general.

【Keywords】: mobile sequential recommendation; potential traveling distance; expected traveling time

181. An Integrated Model for Crime Prediction Using Temporal and Spatial Factors.

Paper Link】 【Pages】:1386-1391

【Authors】: Fei Yi ; Zhiwen Yu ; Fuzhen Zhuang ; Xiao Zhang ; Hui Xiong

【Abstract】: Given its importance, crime prediction has attracted a lot of attention in the literature, and several methods have been proposed to discover different aspects of characteristics for crime prediction. In this paper, we propose a Clustered Continuous Conditional Random Field (Clustered-CCRF) model which is able to effectively exploit both spatial and temporal factors for crime prediction in an integrated way. In particular, we observe that the crime number at one specific area is not only conditioned on its own historical records but also has high correlation to crime records from similar areas. Therefore, we propose two factors: an auto-regressed temporal correlation and a feature-based inter-area spatial correlation, to measure such patterns for crime prediction. Further, we present a tree-structured clustering algorithm to discover high similar areas based on spatial characteristics to improve the performance of our proposed model. Experiments on real-world crime dataset demonstrate the superiority of our proposed model over the state-of-the-art methods.

【Keywords】: Crime Prediction; Temporal and Spatial Factors; Continuous Conditional Random Field; Clustering Algorithm

182. Coherent Graphical Lasso for Brain Network Discovery.

Paper Link】 【Pages】:1392-1397

【Authors】: Hang Yin ; Xiangnan Kong ; Xinyue Liu

【Abstract】: In brain network discovery, researchers are interested in discovering brain regions (nodes) and functional connections (edges) between these regions from fMRI scan of human brain. Some recent works propose coherent models to address both of these sub-tasks. However, these approaches either suffer from mathematical inconsistency or fail to distinguish direct connections and indirect connections between the nodes. In this paper, we study the problem of collective discovery of coherent brain regions and direct connections between these regions. Each node of the brain network represents a brain region, i.e., a set of voxels in fMRI with coherent activities. Each edge denotes a direct dependency between two nodes. The discovered brain network represents a Gaussian graphical model that encodes conditional independence between the activities of different brain regions. We propose a novel model, called CGLasso, which combines Graphical Lasso (GLasso) and orthogonal non-negative matrix tri-factorization (ONMtF), to perform nodes discovery and edge detection simultaneously. We perform experiments on synthetic datasets with ground-truth. The results show that the proposed method performs better than the compared baselines in terms of four quantitative metrics. Besides, we also apply the proposed method and other baselines on the real ADHD-200 fMRI dataset. The results demonstrate that our method produces more meaningful networks comparing with other baseline methods.

【Keywords】: Brain Network; Graphical Lasso; Non-negative Matrix Factorization; Coherent Model

183. Feature-Induced Partial Multi-label Learning.

Paper Link】 【Pages】:1398-1403

【Authors】: Guoxian Yu ; Xia Chen ; Carlotta Domeniconi ; Jun Wang ; Zhao Li ; Zili Zhang ; Xindong Wu

【Abstract】: Current efforts on multi-label learning generally assume that the given labels of training instances are noise-free. However, obtaining noise-free labels is quite difficult and often impractical, and the presence of noisy labels may compromise the performance of multi-label learning. Partial multi-label learning (PML) addresses the scenario in which each instance is annotated with a set of candidate labels, of which only a subset corresponds to the ground-truth. The PML problem is more challenging than partial-label learning, since the latter assumes that only one label is valid and may ignore the correlation among candidate labels. To tackle the PML challenge, we introduce a feature induced PML approach called fPML, which simultaneously estimates noisy labels and trains multi-label classifiers. In particular, fPML simultaneously factorizes the observed instance-label association matrix and the instance-feature matrix into low-rank matrices to achieve coherent low-rank matrices from the label and the feature spaces, and a low-rank label correlation matrix as well. The low-rank approximation of the instance-label association matrix is leveraged to estimate the association confidence. To predict the labels of unlabeled instances, fPML learns a matrix that maps the instances to labels based on the estimated association confidence. An empirical study on public multi-label datasets with injected noisy labels, and on archived proteomic datasets, shows that fPML can more accurately identify noisy labels than related solutions, and consequently can achieve better performance on predicting labels of instances than competitive methods.

【Keywords】: multi-label learning, low-rank matrix factorization, noisy labels, label correlation

184. Superlinear Convergence of Randomized Block Lanczos Algorithm.

Paper Link】 【Pages】:1404-1409

【Authors】: Qiaochu Yuan ; Ming Gu ; Bo Li

【Abstract】: The low rank approximation of a matrix is a crucial component in many data mining applications today. A competitive algorithm for this class of problems is the randomized block Lanczos algorithm - an amalgamation of the traditional block Lanczos algorithm with a randomized starting matrix. While empirically this algorithm performs well, there has been scant new theoretical insights on its convergence behavior and approximation accuracy, and those known results have been restricted to impractical settings in the all-important block-size parameter. In this paper, we present a unified singular value convergence analysis for this algorithm, for all valid choices of the block size. We present novel singular value lower bounds and demonstrate superlinear convergence in leading singular values for matrices with decaying singular values. Additionally, we provide results from numerical experiments that validate our analysis.

【Keywords】: low-rank approximation, randomized block Lanczos, block size, singular values

185. Neural Sentence-Level Sentiment Classification with Heterogeneous Supervision.

Paper Link】 【Pages】:1410-1415

【Authors】: Zhigang Yuan ; Fangzhao Wu ; Junxin Liu ; Chuhan Wu ; Yongfeng Huang ; Xing Xie

【Abstract】: Sentence-level sentiment classification aims to mine fine-grained sentiment information from texts. Existing methods for this task are usually based on supervised learning and rely on massive labeled sentences for model training. However, annotating sufficient sentences is expensive and time-consuming. In this paper, we propose a neural sentence-level sentiment classification approach which can exploit heterogeneous sentiment supervision and reduce the dependence on labeled sentences. Besides the sentence-level supervision from labeled sentences, our approach can also incorporate the word-level supervision extracted from sentiment lexicons, document-level supervision extracted from labeled documents and sentiment relations between sentences extracted from unlabeled documents. A unified neural framework is proposed to fuse heterogeneous sentiment supervision to train sentence-level sentiment classification model. Experiments on benchmark datasets validate the effectiveness of our approach.

【Keywords】: Sentiment Classification, Neural Networks, Heterogeneous Supervision

Paper Link】 【Pages】:1416-1421

【Authors】: Chen Zhang ; Xuyuanyu Zhang ; Hao Wang

【Abstract】: The extraction of featured snippet can be considered as the problem of Question Answering (QA). This paper presents a featured snippet extraction system by employing a technique of machine reading comprehension (MRC). Specifically, we first analyze the characteristics of questions with different types and their corresponding answers. Then, we classify a given question into various types, which is incorporated as key features in the subsequent model configuration. Based on that, we present a model to extract the candidate passages from recalled documents in a MRC fashion. Next, a novel MRC model with multiple stages of attention is proposed to extract answers from the selected passages. Last, in the answer re-ranking stage, we design a question type-adaptive model to produce the final answer. The experimental results on two open-domain QA Datasets clearly validate the effectiveness of our system and models in featured snippet extraction.

【Keywords】: Machine Reading Comprehension; Question Answering; Featured Snippet Extraction

187. Similarity-Based Active Learning for Image Classification Under Class Imbalance.

Paper Link】 【Pages】:1422-1427

【Authors】: Chuanhai Zhang ; Wallapak Tavanapong ; Gavin Kijkul ; Johnny Wong ; Piet C. de Groen ; Jung-Hwan Oh

【Abstract】: Many image classification tasks (e.g., medical image classification) have a severe class imbalance problem. Convolutional neural network (CNN) is currently a state-of-the-art method for image classification. CNN relies on a large training dataset to achieve high classification performance. However, manual labeling is costly and may not even be feasible for medical domain. In this paper, we propose a novel similarity-based active deep learning framework (SAL) that deals with class imbalance. SAL actively learns a similarity model to recommend unlabeled rare class samples for experts' manual labeling. Based on similarity ranking, SAL recommends high confidence unlabeled common class samples for automatic pseudo-labeling without experts' labeling effort. To the best of our knowledge, SAL is the first active deep learning framework that deals with a significant class imbalance. Our experiments show that SAL consistently outperforms two other recent active deep learning methods on two challenging datasets. What's more, SAL obtains nearly the upper bound classification performance (using all the images in the training dataset) while the domain experts labeled only 5.6% and 7.5% of all images in the Endoscopy dataset and the Caltech-256 dataset, respectively. SAL significantly reduces the experts' manual labeling efforts while achieving near optimal classification performance.

【Keywords】: Active deep learning; similarity learning; class imbalance; image classification

188. Layerwise Perturbation-Based Adversarial Training for Hard Drive Health Degree Prediction.

Paper Link】 【Pages】:1428-1433

【Authors】: Jianguo Zhang ; Ji Wang ; Lifang He ; Zhao Li ; Philip S. Yu

【Abstract】: With the development of cloud computing and big data, the reliability of data storage systems becomes increasingly important. Previous researchers have shown that machine learning algorithms based on SMART attributes are effective methods to predict hard drive failures. In this paper, we use SMART attributes to predict hard drive health degrees which are helpful for taking different fault tolerant actions in advance. Given the highly imbalanced SMART datasets, it is a nontrivial work to predict the health degree precisely. The proposed model would encounter overfitting and biased fitting problems if it is trained by the traditional methods. In order to resolve this problem, we propose two strategies to better utilize imbalanced data and improve performance. Firstly, we design a layerwise perturbation-based adversarial training method which can add perturbations to any layers of a neural network to improve the generalization of the network. Secondly, we extend the training method to the semi-supervised settings. Then, it is possible to utilize unlabeled data that have a potential of failure to further improve the performance of the model. Our extensive experiments on two real-world hard drive datasets demonstrate the superiority of the proposed schemes for both supervised and semi-supervised classification. The model trained by the proposed method can correctly predict the hard drive health status 5 and 15 days in advance.

【Keywords】: deep neural network; adversarial training; deep learning; hard drive; SMART

189. Heterogeneous Embedding Propagation for Large-Scale E-Commerce User Alignment.

Paper Link】 【Pages】:1434-1439

【Authors】: Vincent W. Zheng ; Mo Sha ; Yuchen Li ; Hongxia Yang ; Yuan Fang ; Zhenjie Zhang ; Kian-Lee Tan ; Kevin Chen-Chuan Chang

【Abstract】: We study the important problem of user alignment in e-commerce: to predict whether two online user identities that access an e-commerce site from different devices belong to one real-world person. As input, we have a set of user activity logs from Taobao and some labeled user identity linkages. User activity logs can be modeled using a heterogeneous interaction graph (HIG), and subsequently the user alignment task can be formulated as a semi-supervised HIG embedding problem. HIG embedding is challenging for two reasons: its heterogeneous nature and the presence of edge features. To address the challenges, we propose a novel Heterogeneous Embedding Propagation (HEP) model. The core idea is to iteratively reconstruct a node's embedding from its heterogeneous neighbors in a weighted manner, and meanwhile propagate its embedding updates from reconstruction loss and/or classification loss to its neighbors. We conduct extensive experiments on large-scale datasets from Taobao, demonstrating that HEP significantly outperforms state-of-the-art baselines often by more than 10% in F-scores.

【Keywords】: E-commerce User Alignment, Heterogeneous Interaction Graph, Heterogeneous Embedding Propagation

190. Robust Regression via Online Feature Selection Under Adversarial Data Corruption.

Paper Link】 【Pages】:1440-1445

【Authors】: Xuchao Zhang ; Shuo Lei ; Liang Zhao ; Arnold P. Boedihardjo ; Chang-Tien Lu

【Abstract】: The presence of data corruption in user-generated streaming data, such as social media, motivates a new fundamental problem that learns reliable regression coefficient when features are not accessible entirely at one time. Until now, several important challenges still cannot be handled concurrently: 1) corrupted data estimation when only partial features are accessible; 2) online feature selection when data contains adversarial corruption; and 3) scaling to a massive dataset. This paper proposes a novel RObust regression algorithm via Online Feature Selection (RoOFS) that concurrently addresses all the above challenges. Specifically, the algorithm iteratively updates the regression coefficients and the uncorrupted set via a robust online feature substitution method. Extensive empirical experiments in both synthetic and real-world data sets demonstrated that the effectiveness of our new method is superior to that of existing methods in the recovery of both feature selection and regression coefficients, with very competitive efficiency.

【Keywords】: Robust Learning; Data Noise; Online Feature Selection

191. Variational Bayesian Inference for Robust Streaming Tensor Factorization and Completion.

Paper Link】 【Pages】:1446-1451

【Authors】: Zheng Zhang ; Cole Hawkins

【Abstract】: Streaming tensor factorization is a powerful tool for processing high-volume and multi-way temporal data in Internet networks, recommender systems and image/video data analysis. Existing streaming tensor factorization algorithms rely on least-squares data fitting and they do not possess a mechanism for tensor rank determination. This leaves them susceptible to outliers and vulnerable to over-fitting. This paper presents a Bayesian robust streaming tensor factorization model to identify sparse outliers, automatically determine the underlying tensor rank and accurately fit low-rank structure. We implement our model in Matlab and compare it with existing algorithms on tensor datasets generated from dynamic MRI and Internet traffic.

【Keywords】: streaming data; tensor factorization; tensor completion; variational bayesian

192. Forecasting Wavelet Transformed Time Series with Attentive Neural Networks.

Paper Link】 【Pages】:1452-1457

【Authors】: Yi Zhao ; Yanyan Shen ; Yanmin Zhu ; Junjie Yao

【Abstract】: This paper studies the problem of time series forecasting. A time series is defined as a sequence of data points listed in time order. Many real-life time series data are driven by multiple latent components which occur at different frequencies. Existing solutions to time series forecasting fail to identify and discriminate these frequency-domain components. Inspired by the recent advent of signal processing and speech recognition techniques that decompose a time series signal into its time-frequency representation - a scalogram (or spectrogram), this paper proposes to explicitly disclose frequency-domain information from a univariate time series using wavelet transform, towards improving forecasting accuracy. Based on the transformed data, we leverage different neural networks to capture local time-frequency features and global long-term trend simultaneously. We further employ the attention mechanism to fuse local and global features in an effective manner. The experimental results on real time series show that our proposed approach achieves better performance than various baseline methods.

【Keywords】: Time series forecasting, wavelet transform, attentive neural networks

193. Online CP Decomposition for Sparse Tensors.

Paper Link】 【Pages】:1458-1463

【Authors】: Shuo Zhou ; Sarah M. Erfani ; James Bailey

【Abstract】: Tensor decomposition techniques such as CANDECOMP/PARAFAC (CP) decomposition have achieved great success across a range of scientific fields. They have been traditionally applied to dense, static data. However, today's datasets are often highly sparse and dynamically changing over time. Traditional decomposition methods such as Alternating Least Squares (ALS) cannot be easily applied to sparse tensors, due to poor efficiency. Furthermore, existing online tensor decomposition methods mostly target dense tensors, and thus also encounter significant scalability issues for sparse data. To address this gap, we propose a new incremental algorithm for tracking the CP decompositions of online sparse tensors on-the-fly. Experiments on nine real-world datasets show that our algorithm is able to produce quality decompositions of comparable quality to the most accurate algorithm, ALS, whilst at the same time achieving speed improvements of up to 250 times and 100 times less memory.

【Keywords】: Tensor Decomposition; CP Decomposition; Sparse Tensor; Online Tensor

194. Density-Adaptive Local Edge Representation Learning with Generative Adversarial Network Multi-label Edge Classification.

Paper Link】 【Pages】:1464-1469

【Authors】: Yang Zhou ; Sixing Wu ; Chao Jiang ; Zijie Zhang ; Dejing Dou ; Ruoming Jin ; Pengwei Wang

【Abstract】: Traditional network representation learning techniques aim to learn latent low-dimensional representation of vertices in graphs. This paper presents a novel edge representation learning framework, GANDLERL, that combines generative adversarial network based multi-label classification with density-adaptive local edge representation learning for producing high-quality low-dimensional edge representations. First, we design a generative adversarial network based multi-label edge classification model to classify rarely labeled edges in graphs with a large amount of noise data into K classes. A four-player zero-sum game model, with the mixed training of true and real-looking fake edges as well as a contrastive loss containing a similar-loss and a dissimilar-loss, is proposed to improve the classification quality of unlabeled edges. Second, a local autoencoder edge representation learning method is developed to design K local representation learning models, each with individual parameters and structure to perform local representation learning on each of K classification-based subgraphs with unique local characteristics and jointly optimize the loss functions within and across classes. Third but last, we propose a density-adaptive edge representation learning method with the optimization at both edge and subgraph levels to address the representation learning of graph data with highly imbalanced vertex degree and edge distribution.

【Keywords】: Generative adversarial network multi-label edge classification; Density-adaptive local edge representation learning

195. Evaluating Top-k Meta Path Queries on Large Heterogeneous Information Networks.

Paper Link】 【Pages】:1470-1475

【Authors】: Zichen Zhu ; Reynold Cheng ; Loc Do ; Zhipeng Huang ; Haoci Zhang

【Abstract】: Heterogeneous information networks (HINs), which are typed graphs with labeled nodes and edges, have attracted tremendous interest from academia and industry. Given two HIN nodes s and t, and a natural number k, we study the discovery of the k most important paths in real time. The paths found can be used to support friend search, product recommendation, anomaly detection, and graph clustering. Although related algorithms have been proposed before, they were primarily designed to return the k shortest paths from unlabeled graphs. This leads to two problems: (1) there are often many shortest paths between s and t, and so it is not easy to choose the k best ones; and (2) it is arguable whether a shorter path implies a more crucial one. To address these issues, we study the top-k meta path query for a HIN. A meta path abstracts multiple path instances into a high-level path pattern, thereby giving more insight between two nodes. We further study several ranking functions that evaluate the importance of meta paths based on frequency and rarity, rather than on path length. We propose a solution that seamlessly integrates these functions into an A* search framework. The connectivity experiment on ACM dataset shows that our proposed method outperforms state-of-the-art algorithms.

【Keywords】: Heterogeneous Information Networks, top-k, meta path