19th ICDM 2019:Beijing, China

2019 IEEE International Conference on Data Mining, ICDM 2019, Beijing, China, November 8-11, 2019. IEEE 【DBLP Link】

Paper Num: 196 || Session Num: 3

Knowledge Graph Contest 2
Regular Papers 95
Short Papers 99

Regular Papers 95

1. Collaborative Graph Walk for Semi-Supervised Multi-label Node Classification.

【Paper Link】【Pages】:1-10

【Authors】: Uchenna Akujuobi ; Yufei Han ; Qiannan Zhang ; Xiangliang Zhang

【Abstract】: In this work, we study semi-supervised multi-label node classification problem in attributed graphs. Classic solutions to multi-label node classification follow two steps, first learn node embedding and then build a node classifier on the learned embedding. To improve the discriminating power of the node embedding, we propose a novel collaborative graph walk, named Multi-Label-Graph-Walk, to finely tune node representations with the available label assignments in attributed graphs via reinforcement learning. The proposed method formulates the multi-label node classification task as simultaneous graph walks conducted by multiple label-specific agents. Furthermore, policies of the label-wise graph walks are learned in a cooperative way to capture first the predictive relation between node labels and structural attributes of graphs; and second, the correlation among the multiple label-specific classification tasks. A comprehensive experimental study demonstrates that the proposed method can achieve significantly better multi-label classification performance than the state-of-the-art approaches and conduct more efficient graph exploration.

【Keywords】: Multi-label node classification; Semi-supervised attributed graph embedding; Reinforcement learning

2. Dataset Recommendation via Variational Graph Autoencoder.

【Paper Link】【Pages】:11-20

【Authors】: Basmah Altaf ; Uchenna Akujuobi ; Lu Yu ; Xiangliang Zhang

【Abstract】: This paper targets on designing a query-based dataset recommendation system, which accepts a query denoting a user's research interest as a set of research papers and returns a list of recommended datasets that are ranked by the potential usefulness for the user's research need. The motivation of building such a system is to save users from spending time on heavy literature review work to find usable datasets.We start by constructing a two-layer network: one layer of citation network, and the other layer of datasets, connected to the firstlayer papers in which they were used. A query highlights a set of papers in the citation layer. However, answering the query as a naive retrieval of datasets linked with these highlighted papers excludes other semantically relevant datasets, which widely exist several hops away from the queried papers. We propose to learn representations of research papers and datasets in the two-layer network using heterogeneous variational graph autoencoder, and then compute the relevance of the query to the dataset candidates based on the learned representations. Our ranked datasets shown in extensive evaluation results are validated to be more truly relevant than those obtained by naive retrieval methods and adoptions of existing related solutions.

【Keywords】: dataset-recommendation; query-based-recommendation; heterogeneous-variational-graph-autoencoder

3. CUDA: Contradistinguisher for Unsupervised Domain Adaptation.

【Paper Link】【Pages】:21-30

【Authors】: Sourabh Balgi ; Ambedkar Dukkipati

【Abstract】: Humans are very sophisticated in learning new information on a completely unknown domain because humans can contradistinguish, i.e., distinguish by contrasting qualities. We learn on a new unknown domain by jointly using unsupervised information directly from unknown domain and supervised information previously acquired knowledge from some other domain. Motivated by this supervised-unsupervised joint learning, we propose a simple model referred as Contradistinguisher (CTDR) for unsupervised domain adaptation whose objective is to jointly learn to contradistinguish on unlabeled target domain in a fully unsupervised manner along with prior knowledge acquired by supervised learning on an entirely different domain. Most recent works in domain adaptation rely on an indirect way of first aligning the source and target domain distributions and then learn a classifier on labeled source domain to classify target domain. This approach of indirect way of addressing the real task of unlabeled target domain classification has three main drawbacks. (i) The sub-task of obtaining a perfect alignment of the domain in itself might be impossible due to large domain shift (e.g., language domains). (ii) The use of multiple classifiers to align the distributions, unnecessarily increases the complexity of the neural networks leading to over-fitting in many cases. (iii) Due to distribution alignment, the domain specific information is lost as the domains get morphed. In this work, we propose a simple and direct approach that does not require domain alignment. We jointly learn CTDR on both source and target distribution for unsupervised domain adaptation task using contradistinguish loss for the unlabeled target domain in conjunction with supervised loss for labeled source domain. Our experiments show that avoiding domain alignment by directly addressing the task of unlabeled target domain classification using CTDR achieves state-of-the-art results on eight visual and four language benchmark domain adaptation datasets.

【Keywords】: computer vision; contrastive feature learning; deep learning; domain adaptation; sentiment analysis; transfer learning; unsupervised learning

4. An Efficient Policy Gradient Method for Conditional Dialogue Generation.

【Paper Link】【Pages】:31-40

【Authors】: Lei Cai ; Shuiwang Ji

【Abstract】: Encoder-decoder models have been commonly used in dialogue generation tasks. However, they tend to generate dull and generic response utterances. To tackle this problem, we consider dialogue generation as a conditional generation problem. For a given context history, our model can generate different response utterances with desirable dialog acts. Our model follows the SeqGAN framework, where the generator takes context history and dialog act as inputs and generates corresponding response utterances. The discriminator computes rewards by considering the quality of entire utterance and dialog act. Our model is trained by a policy gradient approach. To overcome the bottleneck of excessive time complexity incurred by the Monte Carlo search for training, we propose a local discriminator network to compute the individual reward in one forward propagation, thereby dramatically accelerating the training procedure. Experimental results demonstrate that our proposed method can achieve comparative performance with Monte Carlo search, while reducing the training time dramatically.

【Keywords】: deep learning, conditional dialogue generation, policy gradient, local discriminator network

5. Supervised Class Distribution Learning for GANs-Based Imbalanced Classification.

【Paper Link】【Pages】:41-50

【Authors】: Zixin Cai ; Xinyue Wang ; Mingjie Zhou ; Jian Xu ; Liping Jing

【Abstract】: Class imbalance is a challenging problem in many real-world applications such as fraudulent transactions detection in finance and diagnosis of rare diseases in medicine, which has attracted more and more attention in the community of machine learning and data mining. The main issue is how to capture the fundamental characteristics of the imbalanced data distribution. In particular, whether the hidden pattern can be truly mined from minority class is still a largely unanswered question after all it contains limited instances. The existing methods provide only a partial understanding of this issue and result in the biased and inaccurate classifiers. To overcome this issue, we propose a novel imbalanced classification framework with two stages. The first stage aims to accurately determine the class distributions by a supervised class distribution learning method under the Wasserstein auto-encoder framework. The second stage makes use of the generative adversarial networks to simultaneously generate instances according to the learnt class distributions and mine the discriminative structure among classes to train the final classifier. This proposed framework focuses on Supervised Class Distribution Learning for Generative Adversarial Networks-based imbalanced classification (SCDL-GAN). By comparing with the state-of-the-art methods, the experimental results demonstrate that SCDL-GAN consistently benefits the imbalanced classification task in terms of several widely-used evaluation metrics on five benchmark datasets.

【Keywords】: Imbalanced Classification; Class Distribution Learning; Generative Adversarial Networks

6. NVSRN: A Neural Variational Scaling Reasoning Network for Initiative Response Generation.

【Paper Link】【Pages】:51-60

【Authors】: Jinxin Chang ; Ruifang He ; Haiyang Xu ; Kun Han ; Longbiao Wang ; Xiangang Li ; Jianwu Dang

【Abstract】: Open-domain multi-turn dialogue systems are booming in human-machine interactions, which encourage to chat actively and freely in an intelligent natural way. Previous generative conversational models usually employ a single and deterministic encoder-decoder framework to model the semantic consistency between the context and corresponding response. However, they neglect the various dialog patterns (we denote the regularity of topic shifting as dialog pattern) in the conversations, leading to uninformative, non-initiative yet plausible responses. Although the existing variational methods have improved the response diversity to some extent by introducing a global variability into the generative process, they fail to simulate the transfer between topics with directional information due to the weak interpretability of the Gaussian-distributed latent variables. In this paper, we propose a novel Neural Variational Scaling Reasoning Network (NVSRN) for initiative response generation. To this end, our approach has two core ingredients: neural dialog pattern reasoner (reasoner) and topic scaling mechanism. Specifically, inspired by the advantage of von Mises-Fisher (vMF) distribution modeling the directional data (e.g., the topic transfer state), we employ it as the latent space of the reasoner to explore the regularity of topic shifting, which is then used to reason the topic of response. Based on this, a topic scaling mechanism is designed to control the transfer degree of topic in the response generator. The experimental results on two large dialog datasets demonstrate that the proposed model outperforms state-of-the-art baselines. The human evaluation shows the proposed model can produce more informative and initiative responses actively.

【Keywords】: dialogue generation, topic shifting, von Mises Fisher distribution, conditional variational autoencoder, topic scaling mechanism

7. InBEDE: Integrating Contextual Bandit with TD Learning for Joint Pricing and Dispatch of Ride-Hailing Platforms.

【Paper Link】【Pages】:61-70

【Authors】: Haipeng Chen ; Yan Jiao ; Zhiwei Qin ; Xiaocheng Tang ; Hao Li ; Bo An ; Hongtu Zhu ; Jieping Ye

【Abstract】: For both the traditional street-hailing taxi industry and the recently emerged on-line ride-hailing, it has been a major challenge to improve the ride-hailing marketplace efficiency due to spatio-temporal imbalance between the supply and demand, among other factors. Despite the numerous approaches to improve marketplace efficiency using pricing and dispatch strategies, they usually optimize pricing or dispatch separately. In this paper, we show that these two processes are in fact intrinsically interrelated. Motivated by this observation, we make an attempt to simultaneously optimize pricing and dispatch strategies. However, such a joint optimization is extremely challenging due to the inherent huge scale and lack of a uniform model of the problem. To handle the high complexity brought by the new problem, we propose InBEDE (Integrating contextual Bandit with tEmporal DiffErence learning), a learning framework where pricing strategies are learned via a contextual bandit algorithm, and the dispatch strategies are optimized with the help of temporal difference learning. The two learning components proceed in a mutual bootstrapping manner, in the sense that the policy evaluations of the two components are inter-dependent. Evaluated with real-world datasets of two Chinese cities from Didi Chuxing, an online ride-hailing platform, we show that the market efficiency of the ride-hailing platform can be significantly improved using InBEDE.

【Keywords】: Ride hailing platform; Joint pricing and dispatch; Reinforcement learning

8. Neural Feature Search: A Neural Architecture for Automated Feature Engineering.

【Paper Link】【Pages】:71-80

【Authors】: Xiangning Chen ; Bo Qiao ; Weiyi Zhang ; Wei Wu ; Murali Chintalapati ; Dongmei Zhang ; Qingwei Lin ; Chuan Luo ; Xudong Li ; Hongyu Zhang ; Yong Xu ; Yingnong Dang ; Kaixin Sui ; Xu Zhang

【Abstract】: Feature engineering is a crucial step for developing effective machine learning models. Traditionally, feature engineering is performed manually, which requires much domain knowledge and is time-consuming. In recent years, many automated feature engineering methods have been proposed. These methods improve the accuracy of a machine learning model by automatically transforming the original features into a set of new features. However, existing methods either lack ability to perform high-order transformations or suffer from the feature space explosion problem. In this paper, we present Neural Feature Search (NFS), a novel neural architecture for automated feature engineering. We utilize a recurrent neural network based controller to transform each raw feature through a series of transformation functions. The controller is trained through reinforcement learning to maximize the expected performance of the machine learning algorithm. Extensive experiments on public datasets illustrate that our neural architecture is effective and outperforms the existing state-of-the-art automated feature engineering methods. Our architecture can efficiently capture potentially valuable high-order transformations and mitigate the feature explosion problem.

【Keywords】: Feature Engineering, Automated Feature Engineering, Neural Architecture

9. Preference Relationship-Based CrossCMN Scheme for Answer Ranking in Community QA.

【Paper Link】【Pages】:81-90

【Authors】: Qing Chen ; Jianji Wang ; Xuguang Lan ; Nanning Zheng

【Abstract】: Community question answering (CQA) systems aim to provide users with high-quality answers. Nevertheless, unreliable answers are often returned to users in CQA systems, and the phenomenon causes that users have to browse multiple answers to find the best one. To improve such problem, we design a novel scheme, named PW-CrossCMN. The scheme ranks the candidate answers by pair-wise approach based on numerous historical documents. In the scheme, we apply the preference relationship into deep learning framework. Specifically, the scheme consists of two phases. In phase 1, the scheme extracts the features via automated feature engineering to construct the preference vectors and then divides the vectors into balanced positive and negative training samples based on the preference relationship. In phase 2, we build the CrossCMN model, which implements the multi-network parallel convolution and the cross forward propagation of full-connected layers, to achieve training and prediction tasks. Moreover, the multi-layer perception (MLP) is introduced to extract combination features in the prediction module. We perform extensive experiments on two typical datasets, and the results show that our scheme has more excellent performance in answer ranking task compared with several state-of-the-art baselines. In addition, we have released the relevant codes.

【Keywords】: answer ranking; preference relationship; parallel convolution network; community question answer

10. Automatic Clustering by Detecting Significant Density Dips in Multiple Dimensions.

【Paper Link】【Pages】:91-100

【Authors】: Pantelis Chronis ; Spiros Athanasiou ; Spiros Skiadopoulos

【Abstract】: Clustering algorithms are used to find groups of similar items in a dataset. Automatic clustering algorithms achieve this task without requiring users to input critical parameters. A recent automatic clustering methodology uses Hartigan's dip test to detect significant peaks in the distribution of a dataset. This test can detect peaks in the distribution of a one-dimensional variable. To perform clustering in multiple dimensions, algorithms of this methodology rely on one-dimensional transformations of the dataset, which limits their effectiveness. In this paper, we present M-Dip, an automatic clustering algorithm that works directly on multi-dimensional space. M-Dip also assumes that clusters correspond to different peaks in the distribution of the dataset. It separates clusters at the dips that form between neighboring peaks. Dips are detected directly in multi-dimensional space, using a graph-based method. Their statistical significance is evaluated through appropriate simulations. Our experimental evaluation indicates that M-Dip achieves significantly better results than existing algorithms based on Hartigan's dip, as well as other state-of-the-art automatic clustering algorithms.

【Keywords】: Clustering; Density Based Clustering; Dip; Density Dip

11. Generative Oversampling with a Contrastive Variational Autoencoder.

【Paper Link】【Pages】:101-109

【Authors】: Wangzhi Dai ; Kenney Ng ; Kristen A. Severson ; Wei Huang ; Fred Anderson ; Collin Stultz

【Abstract】: Although oversampling methods are widely used to deal with class imbalance problems, most only utilize observed samples in the minority class and ignore the rich information available in the majority class. In this work, we use an oversampling method that leverages information in both the majority and minority classes to mitigate the class imbalance problem. Experimental results on two clinical datasets with highly imbalanced outcomes demonstrate that prediction models can be significantly improved using data obtained from this oversampling method when the number of minority class samples is very small.

【Keywords】: class imbalance; oversampling; generative model; contrastive learning

12. MUSE-RNN: A Multilayer Self-Evolving Recurrent Neural Network for Data Stream Classification.

【Paper Link】【Pages】:110-119

【Authors】: Monidipa Das ; Mahardhika Pratama ; Septiviana Savitri ; Jie Zhang

【Abstract】: In this paper, we propose MUSE-RNN, a multilayer self-evolving recurrent neural network model for real-time classification of streaming data. Unlike the existing approaches, MUSE-RNN offers special treatment towards capturing temporal aspects of data stream through its novel recurrent learning approach based on the teacher forcing policy. Novelties here are twofold. First, in contrast to the traditional RNN models, MUSE-RNN has intrinsic ability to self-adjust its capacity by growing and pruning hidden nodes as well as layers, to handle the ever-changing characteristics of data stream. Second, MUSERNN adopts a unique scoring-based layer adaptation mechanism, which makes it capable of recalling prior tasks, with minimum exploitation of network parameters. The performance of MUSERNN is evaluated in comparison with a number of state-of-theart techniques, using seven popular data streams and continual learning problems under prequential test-then-train protocol. Experimental results demonstrate the effectiveness of MUSERNN in stream classification scenario.

【Keywords】: Recurrent neural network, Data stream, Online learning, Evolving network, Classification

13. Learning Dynamic Author Representations with Temporal Language Models.

【Paper Link】【Pages】:120-129

【Authors】: Edouard Delasalles ; Sylvain Lamprier ; Ludovic Denoyer

【Abstract】: Language models are at the heart of numerous works, notably in the text mining and information retrieval communities. These statistical models aim at extracting word distributions, from simple unigram models to recurrent approaches with latent variables that capture subtle dependencies in texts. However, those models are learned from word sequences only, and authors' identities, as well as publication dates, are seldom considered. We propose a neural model, based on recurrent language modeling, which aims at capturing language diffusion tendencies in author communities through time. By conditioning language models with author and temporal vector states, we are able to leverage the latent dependencies between the text contexts. This allows us to beat several temporal and non-temporal language baselines on two real-world corpora, and to learn meaningful author representations that vary through time.

【Keywords】: representation learning, dynamic language model, diachronic text analysis

14. Closed Form Word Embedding Alignment.

【Paper Link】【Pages】:130-139

【Authors】: Sunipa Dev ; Safia Hassan ; Jeff M. Phillips

【Abstract】: We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e.g., GloVe or word2vec). Our methods are simple and have a closed form to optimally rotate, translate, and scale to minimize root mean squared errors or maximize the average cosine similarity between two embeddings of the same vocabulary into the same dimensional space. Our methods extend approaches known as Absolute Orientation, which are popular for aligning objects in three-dimensions, and generalize an approach by Smith et al. (ICLR 2017). We prove new results for optimal scaling and for maximizing cosine similarity. Then we demonstrate how to evaluate the similarity of embeddings from different sources or mechanisms, and that certain properties like synonyms and analogies are preserved across the embeddings and can be enhanced by simply aligning and averaging ensembles of embeddings.

【Keywords】: embeddings; word embeddings

15. Reinforcement Learning Based Monte Carlo Tree Search for Temporal Path Discovery.

【Paper Link】【Pages】:140-149

【Authors】: Pengfei Ding ; Guanfeng Liu ; Pengpeng Zhao ; An Liu ; Zhixu Li ; Kai Zheng

【Abstract】: An Attributed Dynamic Graph (ADG) contains multiple dynamic attributes associated with each edge. In ADG based applications, people usually can specify multiple constrains in the attributes to illustrate their requirements, such as the total cost, the total travel time and the stopover interval of a flight between two cities. This inspires a type of Multi-Constrained Temporal Path (MCTP) discovery in ADGs, which is a challenging NP-Complete problem. In order to deliver an efficient and effective temporal path discovery method to be used in real-time environment, we propose a Reinforcement Learning (RL) based, Monte Carlo Tree Search algorithm (RLMCTS). RL-MCTS uses a newly designed memory structure to address the challenges of Monte Carlo Tree Search (MCTS) in MCTP discovery. To the best of our knowledge, RL-MCTS is the first RL algorithm that supports path discovery in ADGs. The experimental results on ten real dynamic graphs demonstrate that our algorithm outperforms the state-of-the-art methods in terms of both efficiency and effectiveness.

【Keywords】: path discovery; monte carlo tree; dynamic graph

16. Learning Credible Deep Neural Networks with Rationale Regularization.

【Paper Link】【Pages】:150-159

【Authors】: Mengnan Du ; Ninghao Liu ; Fan Yang ; Xia Hu

【Abstract】: Recent explainability related studies have shown that state-of-the-art DNNs do not always adopt correct evidences to make decisions. It not only hampers their generalization but also makes them less likely to be trusted by end-users. In pursuit of developing more credible DNNs, in this paper we propose CREX, which encourages DNN models to focus more on evidences that actually matter for the task at hand, and to avoid overfitting to data-dependent bias and artifacts. Specifically, CREX regularizes the training process of DNNs with rationales, i.e., a subset of features highlighted by domain experts as justifications for predictions, to enforce DNNs to generate local explanations that conform with expert rationales. Even when rationales are not available, CREX still could be useful by requiring the generated explanations to be sparse. Experimental results on two text classification datasets demonstrate the increased credibility of DNNs trained with CREX. Comprehensive analysis further shows that while CREX does not always improve prediction accuracy on the held-out test set, it significantly increases DNN accuracy on new and previously unseen data beyond test set, highlighting the advantage of the increased credibility.

【Keywords】: Deep neural network; Explainability; Credibility; Expert rationales

17. Beyond Geo-First Law: Learning Spatial Representations via Integrated Autocorrelations and Complementarity.

【Paper Link】【Pages】:160-169

【Authors】: Jiadi Du ; Yunchao Zhang ; Pengyang Wang ; Jennifer Leopold ; Yanjie Fu

【Abstract】: Spatial representation learning (SRL) is to automatically learn feature representations that characterize spatial entities. In this paper, we study the problem of improving spatial representation learning using spatial structure knowledge. We consider two types of structure knowledge: (1) spatial autocorrelations refer to the pattern that similar spatial entities are more likely to share similar roles and configurations. (2) spatial complementarity refers to the effect that the role of a spatial entity can be complemented and augmented by other different yet compatible spatial entities. Along this line, we develop a step-by-step SRL framework to integrate spatial autocorrelations and complementarity. This framework includes four testable steps. First, we construct multi-view POI-POI(Point of Interest) graphs to characterize the static and dynamic patterns of each spatial region. We then use the graphs as inputs to train an adversarial autoencoder (AAE) that can preserve the spatial autocorrelation property and learn representations of spatial entities. Later, with the learned representations extracted from AAE inputs, a Graph Convolutional Network (GCN) is trained in an unsupervised fashion in order to overcome label sparsity and capture the spatial complementarity effect. In this way, we significantly improve the quality of spatial representations. In addition, we apply the proposed method to characterize residential communities for predicting real estate prices. Finally, we present intensive experimental results with real-world real estate data to demonstrate the proposed method effectiveness.

【Keywords】: Spatial Representation; Autocorrelation; Complementarity; AAE; GCN

18. Improving Spectral Clustering with Deep Embedding and Cluster Estimation.

【Paper Link】【Pages】:170-179

【Authors】: Liang Duan ; Charu C. Aggarwal ; Shuai Ma ; Saket Sathe

【Abstract】: Spectral clustering is one of the most popular modern clustering algorithms. It is easy to implement, can be solved efficiently, and very often outperforms other traditional clustering algorithms such as k-means. However, pectral clustering would be insufficient when dealing with most datasets which have complex statistical properties and quires the user to specify the number of clusters (called k). To address these two problems, in this paper, we propose an approach to extending spectral clustering with deep embedding and estimation of the number of clusters. Specifically, we first generate the deep embedding via learning a deep autoencoder, which transforms the raw data into the lower dimensional representations that suitable for clustering. We then provide an effective method to estimate the number of clusters by learning a softmax autoencoder from the deep embedding. We finally extend spectral clustering with the learned embedding and the estimated number. An extensive experimental study on several image and text datasets illustrates the effectiveness and efficiency of our approach.

【Keywords】: spectral clustering; deep embedding; autoencoder; number of clusters; clustering

19. Modeling Engagement Dynamics of Online Discussions using Relativistic Gravitational Theory.

【Paper Link】【Pages】:180-189

【Authors】: Subhabrata Dutta ; Dipankar Das ; Tanmoy Chakraborty

【Abstract】: Online discussions are valuable resources to study user behaviour on a diverse set of topics. Unlike previous studies which model a discussion in a static manner, in the present study, we model it as a time-varying process and solve two inter-related problems - predict which user groups will get engaged with an ongoing discussion, and forecast the growth rate of a discussion in terms of the number of comments. We propose RGNet (Relativistic Gravitational Nerwork), a novel algorithm that uses Einstein Field Equations of gravity to model online discussions as 'cloud of dust' hovering over a user spacetime manifold, attracting users of different groups at different rates over time. We also propose GUVec, a global user embedding method for an online discussion, which is used by RGNet to predict temporal user engagement. RGNet leverages different textual and network-based features to learn the dust distribution for discussions. We employ four baselines - first two using LSTM architecture, third one using Newtonian model of gravity, and fourth one using a logistic regression adopted from a previous work on engagement prediction. Experiments on Reddit dataset show that RGNet achieves 0.72 Micro F1 score and 6.01% average error for temporal engagement prediction of user groups and growth rate forecasting, respectively, outperforming all the baselines significantly. We further employ RGNet to predict non-temporal engagement - whether users will comment to a given post or not. RGNet achieves 0.62 AUC for this task, outperforming existing baseline by 8.77% AUC.

【Keywords】: Engagement modeling; General Theory of Relativity; Online discussion platform

20. Large-Scale Personalized Delivery for Guaranteed Display Advertising with Real-Time Pacing.

【Paper Link】【Pages】:190-199

【Authors】: Zhen Fang ; Yang Li ; Chuanren Liu ; Wenxiang Zhu ; Yu Zheng ; Wenjun Zhou

【Abstract】: Guaranteed display (GD) has been a successful model for display advertising. Existing solutions usually model GD services as a crowd-level supply allocation problem. This formulation, however, not only ignores user heterogeneity within crowds, but also makes it difficult to incorporate individual-level constraints. In this paper, we present an large-scale system for personalized delivery in GD advertising services. A unique contribution is to model the allocation problem at the individual level that accounts for user-ad interactions. Therefore, our system can conveniently incorporate complex constraints, such as the priority of GD contracts, the display frequency of ads, and the effectiveness of ad slots arrangement. Moreover, we develop a real-time pacing strategy to fulfill GD contracts with smooth ad delivery and optimized ad performance, such as cost-per-click (CPC) and cost-per-action (CPA). Our system can be parallelized to efficiently compute the delivery solution with billions decision variables. Using both offline evaluation and online A/B tests, we demonstrate that our solution is effective in terms of both accuracy and efficiency.

【Keywords】: Display Advertising, Guaranteed Display, Sup ply Allocation, User Behavior, Large-Scale System

21. DCMN: Double Core Memory Network for Patient Outcome Prediction with Multimodal Data.

【Paper Link】【Pages】:200-209

【Authors】: Yujuan Feng ; Zhenxing Xu ; Lin Gan ; Ning Chen ; Bin Yu ; Ting Chen ; Fei Wang

【Abstract】: More and more healthcare data are becoming readily available nowadays. These data can help the healthcare professionals and patient themselves to better understand the patient status and potentially lead to improved care quality. However, the analysis of these data are challenging because they are large-scale and heterogeneous, high-dimensional and sparse, temporal but irregularly sampled. In this paper, we propose a method called Double Core Memory Networks (DCMN) to integrate information from different modalities of the longitudinal patient data and learn a joint patient representation effective for downstream analytical tasks such as risk prediction. DCMN is designed not only to disentangle the temporal and non-linear intra-modal dependencies for the data within each modality but also to capture the long-term inter-modal interactions. DCMN models are the end-to-end memory networks with two external memory cores where each modality of data is compressed and stored. Each memory core has an information-flow controller named query to interact with an external memory module. In addition, we incorporate a gating mechanism into basic DCMN model to perform dynamic regulation of memory interaction. DCMN models have multiple computational layers (hops) allowing data of different modalities interacting with each other recurrently along with a mechanism of alternating access of external memory for each memory core hop-by-hop. We evaluate DCMN models on two outcome prediction tasks, including a mortality prediction on the public Medical Information Mart for Intensive Care III (MIMIC-III) database and a cost prediction on the Hospital Quality Monitoring System (HQMS) dataset. Experimental results demonstrate that our DCMN models are more competitive over the baseline methods in the multimodal prediction setting.

【Keywords】: double-core-memory-networks; multimodal-patient-data; outcome-prediction

22. Leveraging Hierarchical Representations for Preserving Privacy and Utility in Text.

【Paper Link】【Pages】:210-219

【Authors】: Oluwaseyi Feyisetan ; Tom Diethe ; Thomas Drake

【Abstract】: Guaranteeing a certain level of user privacy in an arbitrary piece of text is a challenging issue. However, with this challenge comes the potential of unlocking access to vast data stores for training machine learning models and supporting data driven decisions. We address this problem through the lens of dx-privacy, a generalization of Differential Privacy to non Hamming distance metrics. In this work, we explore word representations in Hyperbolic space as a means of preserving privacy in text. We provide a proof satisfying dx-privacy, then we define a probability distribution in Hyperbolic space and describe a way to sample from it in high dimensions. Privacy is provided by perturbing vector representations of words in high dimensional Hyperbolic space to obtain a semantic generalization. We conduct a series of experiments to demonstrate the tradeoff between privacy and utility. Our privacy experiments illustrate protections against an authorship attribution algorithm while our utility experiments highlight the minimal impact of our perturbations on several downstream machine learning models. Compared to the Euclidean baseline, we observe > 20x greater guarantees on expected privacy against comparable worst case statistics.

【Keywords】: privacy; data sanitization; document redaction; privacy analysis

23. Discovering Subdimensional Motifs of Different Lengths in Large-Scale Multivariate Time Series.

【Paper Link】【Pages】:220-229

【Authors】: Yifeng Gao ; Jessica Lin

【Abstract】: Detecting repeating patterns of different lengths in time series, also called variable-length motifs, has received a great amount of attention by researchers and practitioners. Despite the significant progress that has been made in recent single dimensional variable-length motif discovery work, detecting variable-length sub dimensional motifs -patterns that are simultaneously occurring only in a subset of dimensions in multivariate time series-remains a difficult task. The main challenge is scalability. On the one hand, the brute-force enumeration solution, which searches for motifs of all possible lengths, is very time consuming even in single dimensional time series. On the other hand, previous work show that index-based fixed-length approximate motif discovery algorithms such as random projection are not suitable for detecting variable-length motifs due to memory requirement. In this paper, we introduce an approximate variable-length subdimensional motif discovery algorithm called Collaborative HIerarchy based Motif Enumeration (CHIME) to efficiently detect variable-length subdimensional motifs given a minimum motif length in large-scale multivariate time series. We show that the memory cost of the approach is significantly smaller than that of random projection. Moreover, the speed of the proposed algorithm is significantly faster than that of the state-of-the-art algorithms. We demonstrate that CHIME can efficiently detect meaningful variable-length subdimensional motifs in large real world multivariate time series datasets.

【Keywords】: Variable Length; Motif Discovery; Time Series

24. Tabular Cell Classification Using Pre-Trained Cell Embeddings.

【Paper Link】【Pages】:230-239

【Authors】: Majid Ghasemi-Gol ; Jay Pujara ; Pedro A. Szekely

【Abstract】: There is a large amount of data on the web in tabular form, such as excel sheets, CSVs, and web tables. Often, tabular data is meant for human consumption, using data layouts that are difficult for machines to interpret automatically. Previous work uses the stylistic features of tabular cells (e.g. font size, border type, background color) to classify tabular cells by their role in the data layout of the document (top attribute, data, metadata, etc.). In this paper, we propose a method to embed the semantic and contextual information about tabular cells in a low dimension cell embedding space. We then propose an RNN-based classification technique to use these cell vector representations, combining them with stylistic features introduced in previous work, in order to improve the performance of cell type classification in complex documents. We evaluate the performance of our system on three datasets containing documents with various data layouts, in two settings, in-domain, and cross-domain training. Our evaluation result shows that our proposed cell vector representations in combination with our RNN-based classification technique significantly improves cell type classification performance.

【Keywords】: transfer learning; cell type classification; table embeddings; table understanding; tabular data

25. Streaming Random Patches for Evolving Data Stream Classification.

【Paper Link】【Pages】:240-249

【Authors】: Heitor Murilo Gomes ; Jesse Read ; Albert Bifet

【Abstract】: Ensemble methods are a popular choice for learning from evolving data streams. This popularity is due to (i) the ability to simulate simple, yet, successful ensemble learning strategies, such as bagging and random forests; (ii) the possibility of incorporating drift detection and recovery in conjunction to the ensemble algorithm; (iii) the availability of efficient incremental base learners, such as Hoeffding Trees. In this work, we introduce the Streaming Random Patches (SRP) algorithm, an ensemble method specially adapted to stream classification which combines random subspaces and online bagging. We provide theoretical insights and empirical results illustrating different aspects of SRP. In particular, we explain how the widely adopted incremental Hoeffding trees are not, in fact, unstable learners, unlike their batch counterparts, and how this fact significantly influences ensemble methods design and performance. We compare SRP against state-of-the-art ensemble variants for streaming data in a multitude of datasets. The results show how SRP produce a high predictive performance for both real and synthetic datasets. Besides, we analyze the diversity over time and the average tree depth, which provides insights on the differences between local subspace randomization (as in random forest) and global subspace randomization (as in random subspaces).

【Keywords】: Stream Data Mining, Ensemble Learning, Random Subspaces, Random Patches

26. Deep Multi-attributed Graph Translation with Node-Edge Co-Evolution.

【Paper Link】【Pages】:250-259

【Authors】: Xiaojie Guo ; Liang Zhao ; Cameron Nowzari ; Setareh Rafatirad ; Houman Homayoun ; Sai Manoj Pudukotai Dinakarrao

【Abstract】: Generalized from image and language translation, graph translation aims to generate a graph in the target domain by conditioning an input graph in the source domain. This promising topic has attracted fast-increasing attention recently. Existing works are limited to either merely predicting the node attributes of graphs with fixed topology or predicting only the graph topology without considering node attributes, but cannot simultaneously predict both of them, due to substantial challenges: 1) difficulty in characterizing the interactive, iterative, and asynchronous translation process of both nodes and edges and 2) difficulty in discovering and maintaining the inherent consistency between the node and edge in predicted graphs. These challenges prevent a generic, end-to-end framework for joint node and edge attributes prediction, which is a need for real-world applications such as malware confinement in IoT networks and structural-to-functional network translation. These real-world applications highly depend on hand-crafting and ad-hoc heuristic models, but cannot sufficiently utilize massive historical data. In this paper, we termed this generic problem "multi-attributed graph translation" and developed a novel framework integrating both node and edge translations seamlessly. The novel edge translation path is generic which is proven to be a generalization of the existing topology translation models. Then, a spectral graph regularization based on our non-parametric graph Laplacian is proposed to learn and maintain the consistency of the predicted nodes and edges. Finally, extensive experiments on both synthetic and real-world application data demonstrated the effectiveness of the proposed method.

【Keywords】: Multi-attributed graphs; graph translation

27. Efficient Data Representation by Selecting Prototypes with Importance Weights.

【Paper Link】【Pages】:260-269

【Authors】: Karthik S. Gurumoorthy ; Amit Dhurandhar ; Guillermo A. Cecchi ; Charu C. Aggarwal

【Abstract】: Prototypical examples that best summarize and compactly represent an underlying complex data distribution, communicate meaningful insights to humans in domains where simple explanations are hard to extract. In this paper, we present algorithms with strong theoretical guarantees to mine these data sets and select prototypes, a.k.a. representatives that optimally describes them. Our work notably generalizes the recent work by Kim et al. (2016) where in addition to selecting prototypes, we also associate non-negative weights which are indicative of their importance. This extension provides a single coherent framework under which both prototypes and criticisms (i.e. outliers) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the key open questions laid out in Kim et al. (2016). By establishing that our objective function enjoys a key property of that of weak submodularity, we present a fast ProtoDash algorithm and also derive approximation guarantees for the same. We demonstrate the efficacy of our method on diverse domains such as retail, digit recognition (MNIST) and on publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health, thus showcasing the power of our technique in providing actionability (for retail), utility (for MNIST), and insight (on CDC datasets), which arguably are the hallmarks of an effective interpretable machine learning method.

【Keywords】: prototype selection; submodularity; outlier detection; data summarization

28. Interpretable Feature Learning of Graphs using Tensor Decomposition.

【Paper Link】【Pages】:270-279

【Authors】: Shah Muhammad Hamdi ; Rafal A. Angryk

【Abstract】: In recent years, node embedding algorithms, which learn low dimensional vector representations for nodes in a graph, have been one of the key research interests of the graph mining community. The existing algorithms either rely on computationally expensive eigendecomposition of the large matrices, or require tuning of the word embedding-based hyperparameters as a result of representing the graph as a node sequence similar to the sentences in a document. Moreover, the latent features produced by these algorithms are hard to interpret. In this paper, we present two novel tensor decomposition-based node embedding algorithms, that can learn node features from arbitrary types of graphs: undirected, directed, and/or weighted, without relying on eigendecomposition or word embedding-based hyperparameters. Both algorithms preserve the local and global structural properties of the graph by using k-step transition probability matrices to construct third-order multidimensional arrays or tensors and perform CANDECOMP/PARAFAC (CP) decomposition in order to produce an interpretable and low dimensional vector space for the nodes. Our experiments encompass different types of graphs (undirected/directed, unweighted/weighted, sparse/dense) of different domains such as social networking and neuroscience. Our experimental evaluation proves our models to be interpretable with respect to the understandability of the feature space, precise with respect to the network reconstruction and link prediction, and accurate with respect to node classification and graph classification.

【Keywords】: Node embedding; Tensor decomposition; Interpretability of feature space

29. Discriminatively Relabel for Partial Multi-label Learning.

【Paper Link】【Pages】:280-288

【Authors】: Shuo He ; Ke Deng ; Li Li ; Senlin Shu ; Li Liu

【Abstract】: Partial multi-label learning (PML) deals with the problem where each training example is assigned multiple candidate labels, only a part of which are correct. To learn from such PML examples, the straightforward model training tends to be misled by the noise candidate label set. To alleviate this problem, a coupled framework is established in this paper to learn the desired model and perform the relabeling procedure alternatively. In the relabeling procedure, instead of simply extracting relative label confidences, or deterministically eliminating low confidence labels and preserving high confidence labels as ground-truth ones, we introduce a soft sign thresholding operator to adaptively strengthen candidate labels with high confidence and weaken candidate labels with low confidence, which enlarges the difference of confidences of candidate labels within allowable range. We further show that the resulting nonconvex quadratic programming (QP) optimization problem can be relaxed into a convex QP problem with proper conditions. Extensive experiments on synthesized and real-world data sets demonstrate the effectiveness of our proposed approach.

【Keywords】: partial multi-label learning; relabel; soft sign thresholding

30. Distribution of Node Embeddings as Multiresolution Features for Graphs.

【Paper Link】【Pages】:289-298

【Authors】: Mark Heimann ; Tara Safavi ; Danai Koutra

【Abstract】: Graph classification is an important problem in many fields, from bioinformatics and neuroscience to computer vision and social network analysis. That said, the task of comparing graphs for the purpose of graph classification faces several major challenges. In particular, an effective graph comparison method must (1) expressively and inductively compare graphs; (2) efficiently compare large graphs; and (3) enable the use of fast machine learning models for graph classification. To address such challenges, we propose Randomized Grid Mapping (RGM), a fast-to-compute feature map that represents a graph via the distribution of its node embeddings in feature space. We justify RGM with close connections to kernel methods: RGM provably approximates the Laplacian kernel mean map and has the multiresolution properties of the pyramid match kernel. We also show that RGM can be extended to incorporate node labels using the Weisfeiler-Lehman framework. Extensive experiments show that graph classification accuracy with RGM feature maps is better than or competitive with many powerful graph kernels, unsupervised graph feature mappings, and deep neural networks. Moreover, comparing graphs based on their node embeddings with RGM is up to an order of magnitude faster than competitive baselines, while maintaining high classification accuracy.

【Keywords】: node embedding; representation learning; graph classification; graph kernel

31. Multi-aspect Mining of Complex Sensor Sequences.

【Paper Link】【Pages】:299-308

【Authors】: Takato Honda ; Yasuko Matsubara ; Ryo Neyama ; Mutsumi Abe ; Yasushi Sakurai

【Abstract】: In recent years, a massive amount of time-stamped sensor data has been generated and collected by many Internet of Things (IoT) applications, such as advanced automobiles and health care devices. Given such a large collection of complex sensor sequences, which consists of multiple attributes (e.g., sensor, user, timestamp), how can we automatically find important dynamic time-series patterns and the points of variation? How can we summarize all the complex sensor sequences, and achieve a meaningful segmentation? Also, can we see any hidden user-specific differences and outliers? In this paper we present CUBEMARKER, an efficient and effective method for capturing such multi-aspect features in sensor sequences. CUBEMARKER performs multi-way summarization for all attributes, namely, sensors, users, and time, and specifically it extracts multi-aspect features, such as important time-series patterns (i.e., time-aspect features) and hidden groups of users (i.e., user-aspect features), in complex sensor sequences. Our proposed method has the following advantages: (a) It is effective: it extracts multi-aspect features from complex sensor sequences and enables the efficient and effective analysis of complicated datasets; (b) It is automatic: it requires no prior training and no parameter tuning; (c) It is scalable: our method is carefully designed to be linear as regards dataset size and applicable to a large number of sensor sequences. Extensive experiments on real datasets show that CUBEMARKER is effective in that it can capture meaningful patterns for various real-world datasets, such as those obtained from smart factories, human activities, and automobiles. CUBEMARKER consistently outperforms the best state-of-the-art methods in terms of both accuracy and execution speed.

【Keywords】: Time series, IoT sensors, Tensor analysis, Automatic mining

32. Online Budgeted Least Squares with Unlabeled Data.

【Paper Link】【Pages】:309-318

【Authors】: Chen Huang ; Peiyan Li ; Chongming Gao ; Qinli Yang ; Junming Shao

【Abstract】: The scarcity of labeled data in real streaming environments has boosted the study of online semi-supervised learning (SSL). However, existing online SSL models often rely on some specific assumptions (e.g., manifold assumption) and need to maintain some extra constraints (e.g., the Laplacian matrix) on the fly, which is usually time and resource consuming. In this paper, we propose an efficient and effective online semi-supervised learning approach via Budgeted Least Square (BLS). Specifically, we first derive both closed-form transductive and inductive solutions for kernel least squares classification in the semi-supervised setting. Then, together with online kernel learning, BLS allows a concise online update. Besides, the theoretical regret bound of BLS is analysed, and empirical experiments on both static and streaming data further demonstrate its superiority over state-of-the-art algorithms.

【Keywords】: semi-supervised learning; online Learning; classification

【Paper Link】【Pages】:319-328

【Authors】: Hao Huang ; Chenxiao Xu ; Shinjae Yoo

【Abstract】: Discovering the causal graph in multivariate time series data is of great importance for industrial society, yet challenging due to the unknown nonlinearity in the data. Existing works only explore the data in chronological order, and rely on pre-assumed kernels or certain distribution assumption. In this paper, we present a Bi-directional neural network for Causal Graph Learning (Bi-CGL) through weight-sharing and low-rank neural network. It discovers the causal graph by simultaneously exploring input in forward and reverse chronological order. Both directions approach the same causal graph with shared low-rank approximation, which provides robustness and better accuracy against data noise. Experiments on synthetic and real world datasets prove our Bi-CGL's outperformance over existing baselines.

【Keywords】: causal graph learning; bi-directional neural network; weight-sharing

34. Matrix Profile XIX: Time Series Semantic Motifs: A New Primitive for Finding Higher-Level Structure in Time Series.

【Paper Link】【Pages】:329-338

【Authors】: Shima Imani ; Eamonn J. Keogh

【Abstract】: Time series motifs are approximately repeated patterns in real-valued temporal data. They are used for exploratory data mining methods including clustering, classification, segmentation, and rule discovery. Their current definition is limited to finding literal or near-exact matches and is unable to discover higher level semantic structure. Consider a time series generated by an accelerometer on a smartwatch. This data offers the possibility of finding motifs in human behavior. One such example is the motif generated by a handshake. Under current motif definitions, a single-pump handshake would not match a three-pump handshake, even though they are culturally and semantically equivalent events. In this work we generalize the definition of motifs to one which allows us to capture higher level semantic structure. We refer to these as time series semantic motifs. Surprisingly this increased expressiveness does not come at a great cost. Our algorithm Semantic-Motif-Finder takes approximately the same time as current state-of-the-art motif discovery algorithms. Furthermore, we demonstrate the utility of our ideas on diverse datasets.

【Keywords】: time series; motif discovery; semantic data; higher level motif

35. Forest Distance Closeness Centrality in Disconnected Graphs.

【Paper Link】【Pages】:339-348

【Authors】: Yujia Jin ; Qi Bao ; Zhongzhi Zhang

【Abstract】: Ranking the relative importance of nodes is a fundamental issue in network science, especially in the analysis of social and information networks. A variety of importance metrics and related algorithms have been proposed. However, these previous measures either do not apply to disconnected networks or have some weakness when applied to disconnected networks. In this paper, we use forest distance to assess the importance of nodes in a graph, whether connected or disconnected. For a node in a graph, its forest distance is defined as the sum of forest distances from the node to all other nodes in the graph. To illustrate the discriminating power of forest distance, we first compute the exact forest distances for all nodes in the path graph, and show that the order of node importance given by forest distances is in complete agreement with our intuition. We then show that forest distance centrality has a better discriminating power than alternate metrics such as betweenness, harmonic centrality, eigenvector centrality, and PageRank. Also, we present a theoretically guaranteed estimation algorithm, which approximates the forest distances for all nodes in a graph in nearly linear time with respect to the number of edges. Finally, we perform extensive experiments on real-world networks, which show that our approximation algorithm is both efficient and accurate for large networks.

【Keywords】: graph mining, node centrality, forest distance, fast algorithm, theory of computation, Laplacian solver

36. Computing Optimal Assignments in Linear Time for Approximate Graph Matching.

【Paper Link】【Pages】:349-358

【Authors】: Nils M. Kriege ; Pierre-Louis Giscard ; Franka Bause ; Richard C. Wilson

【Abstract】: Finding an optimal assignment between two sets of objects is a fundamental problem arising in many applications, including the matching of 'bag-of-words' representations in natural language processing and computer vision. Solving the assignment problem typically requires cubic time and its pairwise computation is expensive on large datasets. In this paper, we develop an algorithm which can find an optimal assignment in linear time when the cost function between objects is represented by a tree distance. We employ the method to approximate the edit distance between two graphs by matching their vertices in linear time. To this end, we propose two tree distances, the first of which reflects discrete and structural differences between vertices, and the second of which can be used to compare continuous labels. We verify the effectiveness and efficiency of our methods using synthetic and real-world datasets.

【Keywords】: assignment problem; graph matching; graph edit distance; tree distance

37. Multi-hop Knowledge Base Question Answering with an Iterative Sequence Matching Model.

【Paper Link】【Pages】:359-368

【Authors】: Yunshi Lan ; Shuohang Wang ; Jing Jiang

【Abstract】: Knowledge Base Question Answering (KBQA) has attracted much attention and recently there has been more interest in multi-hop KBQA. In this paper, we propose a novel iterative sequence matching model to address several limitations of previous methods for multi-hop KBQA. Our method iteratively grows the candidate relation paths that may lead to answer entities. The method prunes away less relevant branches and incrementally assigns matching scores to the paths. Empirical results demonstrate that our method can significantly outperform existing methods on three different benchmark datasets.

【Keywords】: Knowledge base question answering; Sequence matching model; Multi hop question answering

38. Collaborative Distillation for Top-N Recommendation.

【Paper Link】【Pages】:369-378

【Authors】: Jae-woong Lee ; Minjin Choi ; Jongwuk Lee ; Hyunjung Shim

【Abstract】: Knowledge distillation (KD) is a well-known method to reduce inference latency by compressing a cumbersome teacher model to a small student model. Despite the success of KD in the classification task, applying KD to recommender models is challenging due to the sparsity of positive feedback, the ambiguity of missing feedback, and the ranking problem associated with the top-N recommendation. To address the issues, we propose a new KD model for the collaborative filtering approach, namely collaborative distillation (CD). Specifically, (1) we reformulate a loss function to deal with the ambiguity of missing feedback. (2) We exploit probabilistic rank-aware sampling for the top-N recommendation. (3) To train the proposed model effectively, we develop two training strategies for the student model, called the teacher-and the student-guided training methods, selecting the most useful feedback from the teacher model. Via experimental results, we demonstrate that the proposed model outperforms the state-of-the-art method by 5.5-29.7% and 4.8-27.8% in hit rate (HR) and normalized discounted cumulative gain (NDCG), respectively. Moreover, the proposed model achieves the performance comparable to the teacher model.

【Keywords】: collaborative filtering; recommender systems; knowledge distillation

39. HierCon: Hierarchical Organization of Technical Documents Based on Concepts.

【Paper Link】【Pages】:379-388

【Authors】: Keqian Li ; Shiyang Li ; Semih Yavuz ; Hanwen Zha ; Yu Su ; Xifeng Yan

【Abstract】: In this work we study the hierarchical organization of technical documents, where given a set of documents and a hierarchy of categories, the goal is to assign documents to their corresponding categories. Unlike prior work on supervised hierarchical document categorization that relies on large amount of labeled training data, which is expensive to obtain in closed technical domain and tends to stale as new knowledge emerges, we study this problem in a weak supervision setting, by leveraging semantic information from concepts. The core idea is to project both documents and categories into a common concept embedding space, where their fine-grained similarity can be easily and effectively computed. Experiments over real-world datasets from the subject of computer science, physics & mathematics, and medicine demonstrated the superior performance of our approach over a wide range of state of the art baseline approaches.

【Keywords】: Weakly supervised classification; Hierarchical classification; Concept mining

40. Classify EEG and Reveal Latent Graph Structure with Spatio-Temporal Graph Convolutional Neural Network.

【Paper Link】【Pages】:389-398

【Authors】: Xiaoyu Li ; Buyue Qian ; Jishang Wei ; An Li ; Xuan Liu ; Qinghua Zheng

【Abstract】: Electroencephalogram(EEG) is a test that detect brain activities using multiple electrodes placed on the scalp. Multiple channels of EEG signals are recorded through the electrodes and are widely used in applications such as neurological disease diagnosis, emotion recognition, and behavior modeling. Recently, deep learning methods have been applied to classify EEG signals, where the different EEG channels are almost treated as a 2D grid input to the machine learning model. This data formation doesn't consider The complex connection among the EEG channels is not considered in such data formation. In our work, we treat EEG signals as frames of graph, and propose an end-to-end edge-aware spatio-temporal graph convolutional neural network for EEG classification. Specifically, we iteratively apply graph convolutional layer spatially and standard convolutional layer temporally. Since there is no prior knowledge about the exact connection among EEG channels, in our model, we initialize the connection as complete graph and apply learnable mask to capture graph structure at different levels. Furthermore, we also propose an iterative method based on information aggregation in graph convolution mechanism to reveal the latent graph structure. Empirical evaluation shows that our model achieves superior performance over state-of-the-art methods for EEG classification, and the learnt and revealed latent EEG graph structure is verified to be meaningful by neuroscientists.

【Keywords】: EEG Classification, Graph Convolutional Neural Network, Spatio-Temporal Model, Deep Learning

41. Learning Classifiers on Positive and Unlabeled Data with Policy Gradient.

【Paper Link】【Pages】:399-408

【Authors】: Tianyu Li ; Chien-Chih Wang ; Yukun Ma ; Patricia Ortal ; Qifang Zhao ; Björn Stenger ; Yu Hirate

【Abstract】: Existing algorithms aiming to learn a binary classifier from positive (P) and unlabeled (U) data generally require estimating the class prior or label noises ahead of building a classification model. However, the estimation and classifier learning are normally conducted in a pipeline instead of being jointly optimized. In this paper, we propose to alternatively train the two steps using reinforcement learning. Our proposal adopts a policy network to adaptively make assumptions on the labels of unlabeled data, while a classifier is built upon the output of the policy network and provides rewards to learn a better strategy. The dynamic and interactive training between the policy maker and the classifier can exploit the unlabeled data in a more effective manner and yield a significant improvement on the classification performance. Furthermore, we present two different approaches to represent the actions sampled from the policy. The first approach considers continuous actions as soft labels, while the other uses discrete actions as hard assignment of labels for unlabeled examples. We validate the effectiveness of the proposed method on two benchmark datasets as well as one e-commerce dataset. The result shows the proposed method is able to consistently outperform state-of-the-art methods in various settings.

【Keywords】: Classification; Semi-supervised Learning; Reinforcement Learning; Deep Learning

42. Learning a Low-Rank Tensor of Pharmacogenomic Multi-relations from Biomedical Networks.

【Paper Link】【Pages】:409-418

【Authors】: Zhuliu Li ; Wei Zhang ; R. Stephanie Huang ; Rui Kuang

【Abstract】: Learning pharmacogenomic multi-relations among diseases, genes and chemicals from content-rich biomedical and biological networks can provide important guidance for drug discovery, drug repositioning and disease treatment. Most of the existing methods focus on imputing missing values in the disease-gene, disease chemical and gene-chemical pairwise relations from the observed relations instead of being designed for learning high-order disease-gene-chemical multi-relations. To achieve the goal, we propose a general tensor-based optimization framework and a scalable Graph-Regularized Tensor Completion from Observed Pairwise Relations (GT-COPR) algorithm to infer the multi-relations among the entities across multiple networks in a low-rank tensor, based on manifold regularization with the graph Laplacian of a Cartesian, tensor or strong product of the networks, and consistencies between the collapsed tensors and the observed bipartite relations. Our theoretical analyses also prove the convergence and efficiency of GT-COPR. In the experiments, the tensor fiber-wise and slice-wise evaluations demonstrate the accuracy of GT-COPR for predicting the diseasegene-chemical associations across the large-scale protein-protein interactions network, chemical structural similarity network and phenotype-based human disease network; and the validation on Genomics of Drug Sensitivity in Cancer cell line dataset shows a potential clinical application of GT-COPR for learning diseasespecific chemical-gene interactions. Statistical enrichment analysis demonstrates that GT-COPR is also capable of producing both topologically and biologically relevant disease, gene and chemical components with high significance.

【Keywords】: multi-relational learning, drug repositioning, disease gene prioritization, product graphs, tensor completion

43. One-Stage Deep Instrumental Variable Method for Causal Inference from Observational Data.

【Paper Link】【Pages】:419-428

【Authors】: Adi Lin ; Jie Lu ; Junyu Xuan ; Fujin Zhu ; Guangquan Zhang

【Abstract】: Causal inference from observational data aims to estimate causal effects when controlled experimentation is not feasible, but it faces challenges when unobserved confounders exist. The instrumental variable method resolves this problem by introducing a variable that is correlated with the treatment and affects the outcome only through the treatment. However, existing instrumental variable methods require two stages to separately estimate the conditional treatment distribution and the outcome generating function, which is not sufficiently effective. This paper presents a one-stage approach to jointly estimate the treatment distribution and the outcome generating function through a cleverly designed deep neural network structure. This study is the first to merge the two stages to leverage the outcome to the treatment distribution estimation. Further, the new deep neural network architecture is designed with two strategies (i.e., shared and separate) of learning a confounder representation account for different observational data. Such network architecture can unveil complex relationships between confounders, treatments, and outcomes. Experimental results show that our proposed method outperforms the state-of-the-art methods. It has a wide range of applications, from medical treatment design to policy making, population regulation and beyond.

【Keywords】: observational data, causal inference, instrumental variable, neural networks

44. Guiding Cross-lingual Entity Alignment via Adversarial Knowledge Embedding.

【Paper Link】【Pages】:429-438

【Authors】: Xixun Lin ; Hong Yang ; Jia Wu ; Chuan Zhou ; Bin Wang

【Abstract】: Cross-lingual Entity Alignment (CEA) aims at identifying entities with their counterparts in different language knowledge graphs. Knowledge embedding alignment plays an important role in CEA due to its advantages of easy implementation and run-time robustness. However, existing embedding alignment methods haven't considered the problem of embedding distribution alignment which refers to the alignment of spatial shapes of embedding spaces. To this end, we present a new Adversarial Knowledge Embedding framework (AKE for short) that jointly learns the representation, mapping and adversarial modules in an end-to-end manner. By reducing the discrepancy of embedding distributions, AKE can approximately preserve an isomorphism between source and target embeddings. In addition, we introduce two new orthogonality constraints into mapping to obtain the self-consistency and numerical stability of transformation. Experiments on real-world datasets demonstrate that our method significantly outperforms state-of-the-art baselines.

【Keywords】: Knowledge-graph; cross-lingual-entity-alignment; embedding-distribution; orthogonality-constraints

45. Exploring Semantic Relationships for Image Captioning without Parallel Data.

【Paper Link】【Pages】:439-448

【Authors】: Fenglin Liu ; Meng Gao ; Tianhao Zhang ; Yuexian Zou

【Abstract】: Recently, image captioning has aroused great interest in both academic and industrial worlds. Most existing systems are built upon large-scale datasets consisting of image-sentence pairs, which, however, are time-consuming to construct. In addition, even for the most advanced image captioning systems, it is still difficult to realize deep image understanding. In this work, we achieve unpaired image captioning by bridging the vision and the language domains with high-level semantic information. The motivation stems from the fact that the semantic concepts with the same modality can be extracted from both images and descriptions. To further improve the quality of captions generated by the model, we propose the Semantic Relationship Explorer, which explores the relationships between semantic concepts for better understanding of the image. Extensive experiments on MSCOCO dataset show that we can generate desirable captions without paired datasets. Furthermore, the proposed approach boosts five strong baselines under the paired setting, where the most significant improvement in CIDEr score reaches 8%, demonstrating that it is effective and generalizes well to a wide range of models.

【Keywords】: vision and language, unpaired training data, image captioning, deep neural network

【Paper Link】【Pages】:449-458

【Authors】: Xuanwu Liu ; Zhao Li ; Jun Wang ; Guoxian Yu ; Carlotta Domeniconi ; Xiangliang Zhang

【Abstract】: Hashing has been widely studied for big data retrieval due to its low storage cost and fast query speed. Zero-shot hashing (ZSH) aims to learn a hashing model that is trained using only samples from seen categories, but can generalize well to samples of unseen categories. ZSH generally uses category attributes to seek a semantic embedding space to transfer knowledge from seen categories to unseen ones. As a result, it may perform poorly when labeled data are insufficient. ZSH methods are mainly designed for single-modality data, which prevents their application to the widely spread multi-modal data. On the other hand, existing cross-modal hashing solutions assume that all the modalities share the same category labels, while in practice the labels of different data modalities may be different. To address these issues, we propose a general Cross-modal Zero-shot Hashing (CZHash) solution to effectively leverage unlabeled and labeled multi-modality data with different label spaces. CZHash first quantifies the composite similarity between instances using label and feature information. It then defines an objective function to achieve deep feature learning compatible with the composite similarity preserving, category attribute space learning, and hashing coding function learning. CZHash further introduces an alternative optimization procedure to jointly optimize these learning objectives. Experiments on benchmark multi-modal datasets show that CZHash significantly outperforms related representative hashing approaches both on effectiveness and adaptability.

【Keywords】: Zero-shot Learning, Cross-modal Hashing, Labeled and Unlabeled Data, Deep Learning

47. Performing Co-membership Attacks Against Deep Generative Models.

【Paper Link】【Pages】:459-467

【Authors】: Kin Sum Liu ; Chaowei Xiao ; Bo Li ; Jie Gao

【Abstract】: In this paper we propose a new membership attack method called co-membership attacks against deep generative models including Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs). Specifically, membership attack aims to check whether a given instance x was used in the training data or not. A co-membership attack checks whether the given bundle of n instances were in the training, with the prior knowledge that the bundle was either entirely used in the training or none at all. Successful membership attacks can compromise the privacy of training data when the generative model is published. Our main idea is to cast membership inference of target data x as the optimization of another neural network (called the attacker network) to search for the latent encoding to reproduce x. The final reconstruction error is used directly to conclude whether x was in the training data or not. We conduct extensive experiments on a variety of datasets and generative models showing that: our attacker network outperforms prior membership attacks; co-membership attacks can be substantially more powerful than single attacks; and VAEs are more susceptible to membership attacks compared to GANs.

【Keywords】: Privacy, Deep Generative Model, Unsupervised Learning, Adversarial Machine Learning

48. Matching Novelty While Training: Novel Recommendation Based on Personalized Pairwise Loss Weighting.

【Paper Link】【Pages】:468-477

【Authors】: Kachun Lo ; Tsukasa Ishigaki

【Abstract】: Most works of recommender system seek to provide highly accurate item prediction while having potentially great bias to popular items. Both users and items' providers will suffer if their system has strong preference for monotonous popular items. A better system should consider also item novelty. Previous works of novel recommendation focus mainly on re-ranking a top-N list generated by an accuracy-focused base model. As a result, these frameworks are 2-stage and essentially limited to the base model. In addition, when training the base model, the common BRP loss function treats all pairs in the same manner, consistently suppresses interesting negative items which should have been recommended. In this work, we propose a personalized pairwise novelty weighting for BPR loss function, which covers the limitations of BPR and effectively improves novelty with marginal loss in accuracy. Base model will be guided by the loss weights to learn user preference and to generate novel suggestion list in 1 stage. Comprehensive experiments on 3 public datasets show that our approach effectively promotes novelty with almost no decrease in accuracy.

【Keywords】: Recommender System; Novel Recommendation; Personalized Recommendation; Loss Weighting

49. RiWalk: Fast Structural Node Embedding via Role Identification.

【Paper Link】【Pages】:478-487

【Authors】: Xuewei Ma ; Geng Qin ; Zhiyang Qiu ; Mingxin Zheng ; Zhe Wang

【Abstract】: Nodes performing different functions in a network have different roles, and these roles can be gleaned from the structure of the network. Learning latent representations for the roles of nodes helps to understand the network and to transfer knowledge across networks. However, most existing structural embedding approaches suffer from high computation and space cost or rely on heuristic feature engineering. Here we propose RiWalk, a flexible paradigm for learning structural node representations as it decouples the structural embedding problem into a role identification procedure and a network embedding procedure. Through role identification, rooted kernels with structural dependencies kept are built to better integrate network embedding methods. To demonstrate the effectiveness of RiWalk, we develop two different role identification methods named RiWalk-SP and RiWalk-WL respectively and employ random walk based network embedding methods. Experiments on within-network classification tasks show that our proposed algorithms achieve comparable performance with other baselines while being an order of magnitude more efficient. Besides, we also conduct across-network role classification tasks. The results show potential of structural embeddings in transfer learning. RiWalk is also scalable, making it capable of capturing structural roles in massive networks.

【Keywords】: structural embedding; network embedding; graph kernel; structural role

50. Sharp Characterization of Optimal Minibatch Size for Stochastic Finite Sum Convex Optimization.

【Paper Link】【Pages】:488-497

【Authors】: Atsushi Nitanda ; Tomoya Murata ; Taiji Suzuki

【Abstract】: The minibatching technique has been extensively adopted to facilitate stochastic first-order methods because of their computational efficiency in parallel computing for large-scale machine learning and data mining. However, the optimal minibatch size determination for accelerated stochastic gradient methods is not completely understood. Actually, there appears trade-off between the iteration complexity and the total computational complexity; that is, the number of iterations (minibatch queries) can be decreased by increasing the minibatch size, but too large minibatch size would result in an unnecessarily large total computational cost. In this study, we give a sharp characterization of the minimax optimal minibatch size to achieve the optimal iteration complexity by providing a reachable lower bound for minimizing finite sum of convex functions and, surprisingly, show that the optimal method with the minimax optimal minibatch size can achieve both of the optimal iteration complexity and the optimal total computational complexity simultaneously. Finally, this feature is verified experimentally.

【Keywords】: finite-sum problem, optimal minibatch method

51. Temporal Self-Attention Network for Medical Concept Embedding.

【Paper Link】【Pages】:498-507

【Authors】: Xueping Peng ; Guodong Long ; Tao Shen ; Sen Wang ; Jing Jiang ; Michael Blumenstein

【Abstract】: In longitudinal electronic health records (EHRs), the event records of a patient are distributed over a long period of time and the temporal relations between the events reflect sufficient domain knowledge to benefit prediction tasks such as the rate of inpatient mortality. Medical concept embedding as a feature extraction method that transforms a set of medical concepts with a specific time stamp into a vector, which will be fed into a supervised learning algorithm. The quality of the embedding significantly determines the learning performance over the medical data. In this paper, we propose a medical concept embedding method based on applying a self-attention mechanism to represent each medical concept. We propose a novel attention mechanism which captures the contextual information and temporal relationships between medical concepts. A light-weight neural net, "Temporal Self-Attention Network (TeSAN)", is then proposed to learn medical concept embedding based solely on the proposed attention mechanism. To test the effectiveness of our proposed methods, we have conducted clustering and prediction tasks on two public EHRs datasets comparing TeSAN against five state-of-the-art embedding methods. The experimental results demonstrate that the proposed TeSAN model is superior to all the compared methods. To the best of our knowledge, this work is the first to exploit temporal self-attentive relations between medical events.

【Keywords】: Medical Concept Embedding; Temporal Self Attention; electronic health records; TeSAN; Healthcare

52. Efficient Sketching Algorithm for Sparse Binary Data.

【Paper Link】【Pages】:508-517

【Authors】: Rameshwar Pratap ; Debajyoti Bera ; Karthik Revanuru

【Abstract】: Recent advancement of the WWW, IOT, social network, e-commerce, etc. have generated a large volume of data. These datasets are mostly represented by high dimensional and sparse datasets. Many fundamental subroutines of common data analytic tasks such as clustering, classification, ranking, nearest neighbour search, etc. scale poorly with the dimension of the dataset. In this work, we address this problem and propose a sketching (alternatively, dimensionality reduction) algorithm - BinSketch (Binary Data Sketch) - for sparse binary datasets. BinSketch preserves the binary version of the dataset after sketching and maintains estimates for multiple similarity measures such as Jaccard, Cosine, Inner-Product similarities, and Hamming distance, on the same sketch. We present a theoretical analysis of our algorithm and complement it with extensive experimentation on several real-world datasets. We compare the performance of our algorithm with the state-of-the-art algorithms on the task of mean-square-error and ranking. Our proposed algorithm offers a comparable accuracy while suggesting a significant speedup in the dimensionality reduction time, with respect to the other candidate algorithms. Our proposal is simple, easy to implement, and therefore can be adopted in practice.

【Keywords】: Dimensionality Reduction; Sketching; Locality Sensitive Hashing; MinHash; SimHash

53. Exploiting Multi-domain Visual Information for Fake News Detection.

【Paper Link】【Pages】:518-527

【Authors】: Peng Qi ; Juan Cao ; Tianyun Yang ; Junbo Guo ; Jintao Li

【Abstract】: The increasing popularity of social media promotes the proliferation of fake news. With the development of multimedia technology, fake news attempts to utilize multimedia content with images or videos to attract and mislead readers for rapid dissemination, which makes visual content an important part of fake news. Fake-news images, images attached to fake news posts, include not only fake images that are maliciously tampered but also real images that are wrongly used to represent irrelevant events. Hence, how to fully exploit the inherent characteristics of fake-news images is an important but challenging problem for fake news detection. In the real world, fake-news images may have significantly different characteristics from real-news images at both physical and semantic levels, which can be clearly reflected in the frequency and pixel domain, respectively. Therefore, we propose a novel framework Multi-domain Visual Neural Network (MVNN) to fuse the visual information of frequency and pixel domains for detecting fake news. Specifically, we design a CNN-based network to automatically capture the complex patterns of fake-news images in the frequency domain; and utilize a multi-branch CNN-RNN model to extract visual features from different semantic levels in the pixel domain. An attention mechanism is utilized to fuse the feature representations of frequency and pixel domains dynamically. Extensive experiments conducted on a real world dataset demonstrate that MVNN outperforms existing methods with at least 9.2% in accuracy, and can help improve the performance of multi-modal fake news detection by over 5.2%.

【Keywords】: fake news detection; fake-news images; multi domain; social media

54. Personalized Knowledge Graph Summarization: From the Cloud to Your Pocket.

【Paper Link】【Pages】:528-537

【Authors】: Tara Safavi ; Caleb Belth ; Lukas Faber ; Davide Mottin ; Emmanuel Müller ; Danai Koutra

【Abstract】: The increasing scale of encyclopedic knowledge graphs (KGs) calls for summarization as a way to help users efficiently access and distill world knowledge. Motivated by the disparity between individuals' limited information needs and the massive scale of KGs, in this paper we propose a new problem called personalized knowledge graph summarization. The goal is to construct compact "personal summaries" of KGs containing only the facts most relevant to individuals' interests. Such summaries can be stored and utilized on-device, allowing individuals private, anytime access to the information that interests them most. We formalize the problem as one of constructing a sparse graph, or summary, that maximizes a user's inferred "utility" over a given KG, subject to a user-and device-specific constraint on the summary's size. To solve it, we propose GLIMPSE, a summarization framework that provides theoretical guarantees on the summary's utility and is linear in the number of edges in the KG. In an evaluation with real user queries to open-source, encyclopedic KGs of up to one billion triples, we show that GLIMPSE efficiently creates summaries that outperform strong baselines by up to 19% in query answering F1 score.

【Keywords】: knowledge graphs; graph summarization; personalization

55. Learning to Sample: An Active Learning Framework.

【Paper Link】【Pages】:538-547

【Authors】: Jingyu Shao ; Qing Wang ; Fangbing Liu

【Abstract】: Meta-learning algorithms for active learning are emerging as a promising paradigm for learning the "best" active learning strategy. However, current learning-based active learning approaches still require sufficient training data so as to generalize meta-learning models for active learning. This is contrary to the nature of active learning which typically starts with a small number of labeled samples. The unavailability of large amounts of labeled samples for training meta-learning models would inevitably lead to poor performance (e.g., instabilities and overfitting). In our paper, we tackle these issues by proposing a novel learning-based active learning framework, called Learning To Sample (LTS). This framework has two key components: a sampling model and a boosting model, which can mutually learn from each other in iterations to improve the performance of each other. Within this framework, the sampling model incorporates uncertainty sampling and diversity sampling into a unified process for optimization, enabling us to actively select the most representative and informative samples based on an optimized integration of uncertainty and diversity. To evaluate the effectiveness of the LTS framework, we have conducted extensive experiments on three different classification tasks: image classification, salary level prediction, and entity resolution. The experimental results show that our LTS framework significantly outperforms all the baselines when the label budget is limited, especially for datasets with highly imbalanced classes. In addition to this, our LTS framework can effectively tackle the cold start problem occurring in many existing active learning approaches.

【Keywords】: active learning, meta-learning, uncertainty sampling, diversity sampling, boosting

56. Reinforced Molecule Generation with Heterogeneous States.

【Paper Link】【Pages】:548-557

【Authors】: Fangzhou Shi ; Shan You ; Chang Xu

【Abstract】: De novo molecular design and generation are frequently prescribed in the field of chemistry and biology, for it plays a critical role in maintaining the prosperity of the chemical industry and benefiting the drug discovery. In recent years, reinforcement learning-based methods leverage graphs to represent molecules and generate molecules as a decision making process. However, this vanilla graph representation may neglect the intrinsic context information with molecules and limits the generation performance accordingly. In this paper, we propose to augment the original graph states with the SMILES context vectors. As a result, SMILES representations are easily processed by a simple language model such that the general semantic features of a molecule can be extracted; and the graph representations perform better in handling the topology relationship of each atom. Moreover, we propose a framework that combines supervised learning and reinforcement learning algorithm to take a solid consideration of these two heterogeneous state representations of a molecule, which can fuse the information from both of them and extract more comprehensive features so that more sophisticated decisions can be made by the policy network. Our model also introduces two attention mechanisms, i.e., action-attention, and graph-attention, to further improve the performance. We conduct our experiments on a practical dataset, ZINC, and the experiment results demonstrate that our framework can outperform other baselines in the learning performance of molecule generation and chemical property optimization.

【Keywords】: molecule generation; property optimization; graph representation; SMILES representation; attention mechanism; reinforcement learning

57. Modeling Graphs with Vertex Replacement Grammars.

【Paper Link】【Pages】:558-567

【Authors】: Satyaki Sikdar ; Justus Hibshman ; Tim Weninger

【Abstract】: One of the principal goals of graph modeling is to capture the building blocks of network data in order to study various physical and natural phenomena. Recent work at the intersection of formal language theory and graph theory has explored the use of graph grammars for graph modeling. However, existing graph grammar formalisms, like Hyperedge Replacement Grammars, can only operate on small tree-like graphs. The present work relaxes this restriction by revising a different graph grammar formalism called Vertex Replacement Grammars (VRGs). We show that a variant of the VRG called Clustering-based Node Replacement Grammar (CNRG) can be efficiently extracted from many hierarchical clusterings of a graph. We show that CNRGs encode a succinct model of the graph, yet faithfully preserves the structure of the original graph. In experiments on large real-world datasets, we show that graphs generated from the CNRG model exhibit a diverse range of properties that are similar to those found in the original networks.

【Keywords】: vertex replacement grammar, graph model, graph generators

58. M-estimation in Low-Rank Matrix Factorization: A General Framework.

【Paper Link】【Pages】:568-577

【Authors】: Wei Tu ; Peng Liu ; Jingyu Zhao ; Yi Liu ; Linglong Kong ; Guodong Li ; Bei Jiang ; Guangjian Tian ; Hengshuai Yao

【Abstract】: Many problems in science and engineering can be reduced to the recovery of an unknown large matrix from a small number of random linear measurements. Matrix factorization arguably is the most popular approach for low-rank matrix recovery. Many methods have been proposed using different loss functions, for example the most widely used L ₂ loss, more robust choices such as L ₁ and Huber loss, quantile and expectile loss for skewed data. All of them can be unified into the framework of M-estimation. In this paper, we present a general framework of low-rank matrix factorization based on M-estimation in statistics. The framework mainly involves two steps: firstly we apply Nesterov's smoothing technique to obtain an optimal smooth approximation for non-smooth loss function, such as L ₁ and quantile loss; secondly we exploit an alternative updating scheme along with Nesterov's momentum method at each step to minimize the smoothed loss function. Strong theoretical convergence guarantee has been developed for the general framework, and extensive numerical experiments have been conducted to illustrate the performance of proposed algorithm.

【Keywords】: matrix recovery; M-estimation; matrix factorization; robustness; statistical foundation

59. Towards Making Deep Transfer Learning Never Hurt.

【Paper Link】【Pages】:578-587

【Authors】: Ruosi Wan ; Haoyi Xiong ; Xingjian Li ; Zhanxing Zhu ; Jun Huan

【Abstract】: Transfer learning have been frequently used to improve deep neural network training through incorporating weights of pre-trained networks as the starting-point of optimization for regularization. While deep transfer learning can usually boost the performance with better accuracy and faster convergence, transferring weights from inappropriate networks hurts training procedure and may lead to even lower accuracy. In this paper, we consider deep transfer learning as minimizing a linear combination of empirical loss and regularizer based on pre-trained weights, where the regularizer would restrict the training procedure from lowering the empirical loss, with conflicted descent directions (e.g., derivatives). Following the view, we propose a novel strategy making regularization-based Deep Transfer learning Never Hurt (DTNH) that, for each iteration of training procedure, computes the derivatives of the two terms separately, then re-estimates a new descent direction that does not hurt the empirical loss minimization while preserving the regularization affects from the pre-trained weights. Extensive experiments have been done using common transfer learning regularizers, such as L2-SP and knowledge distillation, on top of a wide range of deep transfer learning benchmarks including Caltech, MIT indoor 67, CIFAR-10 and ImageNet. The empirical results show that the proposed descent direction estimation strategy DTNH can always improve the performance of deep transfer learning tasks based on all above regularizers, even when transferring pre-trained weights from inappropriate networks. All in all, DTNH strategy can improve state-of-the-art regularizers in all cases with 0.1%-7% higher accuracy in all experiments.

【Keywords】: Deep learning, transfer learning, fine tuning, knowledge distillation, and starting point as regularization.

60. Generative Correlation Discovery Network for Multi-label Learning.

【Paper Link】【Pages】:588-597

【Authors】: Lichen Wang ; Zhengming Ding ; Seungju Han ; Jae-Joon Han ; Changkyu Choi ; Yun Fu

【Abstract】: The goal of Multi-label learning is to predict multiple labels of each single instance. This is a challenging problem since the training data is limited, long-tail label distribution, and complicated label correlations. Generally, more training samples and label correlation knowledge would benefit the learning performance. However, it is difficult to obtain large-scale well-labeled datasets, and building such a label correlation map requires sophisticated semantic knowledge. To this end, we propose an end-to-end Generative Correlation Discovery Network (GCDN) method for multi-label learning in this paper. GCDN captures the existing data distribution, and synthesizes diverse data to enlarge the diversity of the training features; meanwhile, it also learns the label correlations based on a specifically-designed, simple but effective correlation discovery network to automatically discover the label correlations and considerately improve the label prediction accuracy. Extensive experiments on several benchmarks are provided to demonstrate the effectiveness, efficiency, and high accuracy of our approach.

【Keywords】: Generative model; Multi label learning; Relation network

61. A Semi-Supervised Graph Attentive Network for Financial Fraud Detection.

【Paper Link】【Pages】:598-607

【Authors】: Daixin Wang ; Yuan Qi ; Jianbin Lin ; Peng Cui ; Quanhui Jia ; Zhen Wang ; Yanming Fang ; Quan Yu ; Jun Zhou ; Shuang Yang

【Abstract】: With the rapid growth of financial services, fraud detection has been a very important problem to guarantee a healthy environment for both users and providers. Conventional solutions for fraud detection mainly use some rule-based methods or distract some features manually to perform prediction. However, in financial services, users have rich interactions and they themselves always show multifaceted information. These data form a large multiview network, which is not fully exploited by conventional methods. Additionally, among the network, only very few of the users are labelled, which also poses a great challenge for only utilizing labeled data to achieve a satisfied performance on fraud detection. To address the problem, we expand the labeled data through their social relations to get the unlabeled data and propose a semi-supervised attentive graph neural network, named SemiGNN to utilize the multi-view labeled and unlabeled data for fraud detection. Moreover, we propose a hierarchical attention mechanism to better correlate different neighbors and different views. Simultaneously, the attention mechanism can make the model interpretable and tell what are the important factors for the fraud and why the users are predicted as fraud. Experimentally, we conduct the prediction task on the users of Alipay, one of the largest third-party online and offline cashless payment platform serving more than 4 hundreds of million users in China. By utilizing the social relations and the user attributes, our method can achieve a better accuracy compared with the state-of-the-art methods on two tasks. Moreover, the interpretable results also give interesting intuitions regarding the tasks.

【Keywords】: graph neural networks; fraud detection; graph embedding; semi supervised model

62. DMFP: A Dynamic Multi-faceted Fine-Grained Preference Model for Recommendation.

【Paper Link】【Pages】:608-617

【Authors】: Huizhao Wang ; Guanfeng Liu ; Yan Zhao ; Bolong Zheng ; Pengpeng Zhao ; Kai Zheng

【Abstract】: The time signals behind a user's historical behaviors are important for better inferring what she prefers to interact with at the next time. For the attention-based recommendation methods, relative position encoding and time intervals division are two common ways to model the time signal behind each behavior. They either only consider the relative position of each behavior in the behavior sequence, or process the continuous temporal features into discrete category features for subsequent tasks, which can hardly capture the dynamic preferences of a user. In addition, although the existing recommendation methods have considered both long-term preference and short-term preference, they ignore the fact that the long-term preference of a user may be multi-faceted, and it is difficult to learn a user's fine-grained short-term preference. In this paper, we propose a Dynamic Multi-faceted Fine-grained Preference model (DMFP), where the multi-hops attention mechanism and the feature-level attention mechanism together with a vertical convolution operation are adopted to capture users' multi-faceted long-term preference and fine-grained short-term preference, respectively. Therefore, DMFP can better support the next item recommendation. Extensive experiments on three real-world datasets illustrate that our model can improve the effectiveness of the recommendation compared with the state-of-the-art methods.

【Keywords】: recommendation; attention; dynamic preference

63. DeepTrust: A Deep User Model of Homophily Effect for Trust Prediction.

【Paper Link】【Pages】:618-627

【Authors】: Qi Wang ; Weiliang Zhao ; Jian Yang ; Jia Wu ; Wenbin Hu ; Qianli Xing

【Abstract】: Trust prediction in online social networks is crucial for information dissemination, product promotion, and decision making. Existing work on trust prediction mainly utilizes the network structure or the low-rank approximation of a trust network. These approaches can suffer from the problem of data sparsity and prediction accuracy. Inspired by the homophily theory, which shows a pervasive feature of social and economic networks that trust relations tend to be developed among similar people, we propose a novel deep user model for trust prediction based on user similarity measurement. It is a comprehensive data sparsity insensitive model that combines a user review behavior and the item characteristics that this user is interested in. With this user model, we firstly generate a user's latent features mined from user review behavior and the item properties that the user cares. Then we develop a pair-wise deep neural network to further learn and represent these user features. Finally, we measure the trust relations between a pair of people by calculating the user feature vector cosine similarity. Extensive experiments are conducted on two real-world datasets, which demonstrate the superior performance of the proposed approach over the representative baseline works.

【Keywords】: Trust prediction, Online social networks, User modeling

64. A Coarse-to-Fine Multi-stream Hybrid Deraining Network for Single Image Deraining.

【Paper Link】【Pages】:628-637

【Authors】: Yanyan Wei ; Zhao Zhang ; Haijun Zhang ; Richang Hong ; Meng Wang

【Abstract】: Single image deraining task is still a very challenging task due to its ill-posed nature in reality. Recently, researchers have tried to fix this issue by training the CNN-based end-to-end models, but they still cannot extract the negative rain streaks from rainy images precisely, which usually leads to an over de-rained or under de-rained result. To handle this issue, this paper proposes a new coarse-to-fine single image deraining framework termed Multi-stream Hybrid Deraining Network (shortly, MH-DerainNet). To obtain the negative rain streaks during training process more accurately, we present a new module named dual path residual dense block, i.e., Residual path and Dense path. The Residual path is used to reuse com-mon features from the previous layers while the Dense path can explore new features. In addition, to concatenate different scaled features, we also apply the idea of multi-stream with shortcuts between cascaded dual path residual dense block based streams. To obtain more distinct derained images, we combine the SSIM loss and perceptual loss to preserve the per-pixel similarity as well as preserving the global structures so that the deraining result is more accurate. Extensive experi-ments on both synthetic and real rainy images demonstrate that our MH-DerainNet can deliver significant improvements over several recent state-of-the-art methods.

【Keywords】: Single image deraining, dual path residual dense block, multi-stream hybrid deraining neural detwork

65. XOR-Based Boolean Matrix Decomposition.

【Paper Link】【Pages】:638-647

【Authors】: Jörg Wicker ; Yan Cathy Hua ; Rayner Rebello ; Bernhard Pfahringer

【Abstract】: Boolean matrix factorization (BMF) is a data summarizing and dimension-reduction technique. Existing BMF methods build on matrix properties defined by Boolean algebra, where the addition operator is the logical inclusive OR and the multiplication operator the logical AND. As a consequence, this leads to the lack of an additive inverse in all Boolean matrix operations, which produces an indelible type of approximation error. Previous research adopted various methods to address such an issue and produced reasonably accurate approximation. However, an exact factorization is rarely found in the literature. In this paper, we introduce a new algorithm named XBMaD (XOR-based Boolean Matrix Decomposition) where the addition operator is defined as the exclusive OR (XOR). This change completely removes the error-mitigation issue of OR-based BMF methods, and allows for an exact error-free factorization. An evaluation comparing XBMaD and classic OR-based methods suggested that XBMAD performed equal or in most cases more accurately and faster.

【Keywords】: Booelan matrix factorization; matrix decomposition; matrix factorization; Boolean algebra; data summarization; data mining

66. Domain-Adversarial Graph Neural Networks for Text Classification.

【Paper Link】【Pages】:648-657

【Authors】: Man Wu ; Shirui Pan ; Xingquan Zhu ; Chuan Zhou ; Lei Pan

【Abstract】: Text classification, in cross-domain setting, is a challenging task. On the one hand, data from other domains are often useful to improve the learning on the target domain; on the other hand, domain variance and hierarchical structure of documents from words, key phrases, sentences, paragraphs, etc. make it difficult to align domains for effective learning. To date, existing cross-domain text classification methods mainly strive to minimize feature distribution differences between domains, and they typically suffer from three major limitations - (1) difficult to capture semantics in non-consecutive phrases and long-distance word dependency because of treating texts as word sequences, (2) neglect of hierarchical coarse-grained structures of document for feature learning, and (3) narrow focus of the domains at instance levels, without using domains as supervisions to improve text classification. This paper proposes an end-to-end, domain-adversarial graph neural networks (DAGNN), for cross-domain text classification. Our motivation is to model documents as graphs and use a domain-adversarial training principle to lean features from each graph (as well as learning the separation of domains) for effective text classification. At the instance level, DAGNN uses a graph to model each document, so that it can capture non-consecutive and long-distance semantics. At the feature level, DAGNN uses graphs from different domains to jointly train hierarchical graph neural networks in order to learn good features. At the learning level, DAGNN proposes a domain-adversarial principle such that the learned features not only optimally classify documents but also separates domains. Experiments on benchmark datasets demonstrate the effectiveness of our method in cross-domain classification tasks.

【Keywords】: Graph neural networks; cross-domain learning; text classification

67. Discriminative Regularized Deep Generative Models for Semi-Supervised Learning.

【Paper Link】【Pages】:658-667

【Authors】: Qianqian Xie ; Jimin Huang ; Min Peng ; Yihan Zhang ; Kaifei Peng ; Hua Wang

【Abstract】: Deep generative models (DGMs) have shown strong performance in semi-supervised learning (SSL), which incorporate discrete class information into the learning process. Yet existing methods generally overfit to the given labeled data, for only considering the conditional probability of labels. In this paper, we propose a novel discriminative regularized deep generative method for SSL, which fully exploits the discriminative and geometric information of data to address the aforementioned issue. Our method introduces the cluster and manifold assumption that maximizes the classification margin between clusters and simultaneously smooths the predictions of the data which is close in the sub-manifold of each cluster, to regularize the learning of the classifier in DGMs. To derive the regularization based on introduced assumptions, we adopt the generated data of DGMs along with labelled and unlabelled data, to model the data manifold and yield clusters based on the Gumbel-softmax distribution. Experimental results on both text and image datasets demonstrate the effectiveness and flexibility of our method, and prove that two introduced assumptions are complementary in guiding the classification boundary, thus improving the discriminative ability of the classifier.

【Keywords】: Deep Generative Models; Discriminative Regularization; Semi-supervised Learning

68. Privacy-Preserving Auto-Driving: A GAN-Based Approach to Protect Vehicular Camera Data.

【Paper Link】【Pages】:668-677

【Authors】: Zuobin Xiong ; Wei Li ; Qilong Han ; Zhipeng Cai

【Abstract】: The autonomous driving (auto-driving) technology has been promoted significantly by the rapid advances in computer vision and deep neural networks. Auto-driving vehicles, nowadays, are fully equipped with numerous sensors such as cameras, geo-sensors, and radar sensors, to capture real-time data inside the vehicles and outside surroundings. Meanwhile, the captured data contains lots of private information about vehicles, drivers and passengers and thus faces a high risk of privacy breaches. Especially, side-channel information can be mined from camera data to identify vehicles' locations and even trajectories, raising serious privacy issues. Unfortunately, the issue, how to resist location-inference attack for camera data in auto-driving, has never been addressed in literature. In this paper, we intend to fill this blank by developing a GAN-based image-toimage translation method named Auto-Driving GAN (ADGAN). Through performance comparisons between ADGAN and the state-of-the-art, the superiority of ADGAN can be validated - offering an effective tradeoff between recognition utility and privacy protection for camera data.

【Keywords】: Autonomous driving; Location privacy; Generative Adversarial Networks; Image generation

【Paper Link】【Pages】:678-687

【Authors】: Pinghua Xu ; Wenbin Hu ; Jia Wu ; Weiwei Liu ; Bo Du ; Jian Yang

【Abstract】: Developing effective network embedding methods for social trust networks (STNs) is a non-trivial problem because two key pieces of information need to be preserved simultaneously: a user's relations to latent factors and the trust transfer patterns that govern what type of relationship will form. In this study, we propose a novel social trust network embedding method (STNE) to address these issues. Specifically, we present a modified Skip-Gram model with negative sampling to jointly learn latent factor features, along with the trust transfer pattern features. Moreover, we define a flexible notion about a user's latent relationships with other users, which generates reliable negative samples for optimization. Extensive experiments on several real-world networks demonstrate the efficacy of the proposed STNE.

【Keywords】: social network; network embedding

70. VSB-DVM: An End-to-End Bayesian Nonparametric Generalization of Deep Variational Mixture Model.

【Paper Link】【Pages】:688-697

【Authors】: Xi Yang ; Yuyao Yan ; Kaizhu Huang ; Rui Zhang

【Abstract】: Mixture of factor analyzers is a fundamental model in unsupervised learning, which is particularly useful for high dimensional data. Recent efforts on deep auto-encoding mixture models made a fruitful progress in clustering. However, in most cases, their performance depends highly on the results of pre-training. Moreover, they tend to ignore the prior information when making clustering assignment, leading to a less strict inference and consequently limiting the performance. In this paper, we propose an end-to-end Bayesian nonparametric generalization of deep mixture model with a Variational Auto-Encoder (VAE) framework. Specifically, we develop a novel model called VSB-DVM exploiting the Variational Stick-Breaking Process to design a Deep Variational Mixture Model. Distinct from the existing deep auto-encoding mixture models, this novel unsupervised deep generative model can learn low-dimensional representations and clustering simultaneously without pre-training. Importantly, a strict inference is proposed using weights of stick-breaking process in a variational way. Furthermore, able to capture the richer statistical structure of the data, VSB-DVM can also generate highly realistic samples for any specified cluster. A series of experiments are carried out, both qualitatively and quantitatively, on benchmark clustering and generation tasks. Comparative results show that the proposed model is able to generate diverse and high-quality samples of data, and also achieves encouraging clustering results outperforming the state-of-the-art algorithms on four real-world datasets.

【Keywords】: Finite Mixture Model, Variational Auto Encoder, Stick-breaking Prior, Deep Embedded Clustering

71. Neural Embedding Propagation on Heterogeneous Networks.

【Paper Link】【Pages】:698-707

【Authors】: Carl Yang ; Jieyu Zhang ; Jiawei Han

【Abstract】: Classification is one of the most important problems in machine learning. To address label scarcity, semi-supervised learning (SSL) has been intensively studied over the past two decades, which mainly leverages data affinity modeled by networks. Label propagation (LP), however, as the most popular SSL technique, mostly only works on homogeneous networks with single-typed simple interactions. In this work, we focus on the more general and powerful heterogeneous networks, which accommodate multi-typed objects and links, and thus endure multi-typed complex interactions. Specifically, we propose neural embedding propagation (NEP), which leverages distributed embeddings to represent objects and dynamically composed modular networks to model their complex interactions. While generalizing LP as a simple instance, NEP is far more powerful in its natural awareness of different types of objects and links, and the ability to automatically capture their important interaction patterns. Further, we develop a series of efficient training strategies for NEP, leading to its easy deployment on real-world heterogeneous networks with millions of objects. With extensive experiments on three datasets, we comprehensively demonstrate the effectiveness, efficiency, and robustness of NEP compared with state-of-the-art network embedding and SSL algorithms.

【Keywords】: network embedding; heterogeneous networks; embedding propagation; label propagation; semi supervised learning; modular networks

72. Discrete Overlapping Community Detection with Pseudo Supervision.

【Paper Link】【Pages】:708-717

【Authors】: Fanghua Ye ; Chuan Chen ; Zibin Zheng ; Rong-Hua Li ; Jeffrey Xu Yu

【Abstract】: Community detection is of significant importance in understanding the structures and functions of networks. Recently, overlapping community detection has drawn much attention due to the ubiquity of overlapping community structures in real-world networks. Nonnegative matrix factorization (NMF), as an emerging standard framework, has been widely employed for overlapping community detection, which obtains nodes' soft community memberships by factorizing the adjacency matrix into low-rank factor matrices. However, in order to determine the ultimate community memberships, we have to post-process the real-valued factor matrix by manually specifying a threshold on it, which is undoubtedly a difficult task. Even worse, a unified threshold may not be suitable for all nodes. To circumvent the cumbersome post-processing step, we propose a novel discrete overlapping community detection approach, i.e., Discrete Nonnegative Matrix Factorization (DNMF), which seeks for a discrete (binary) community membership matrix directly. Thus DNMF is able to assign explicit community memberships to nodes without post-processing. Moreover, DNMF incorporates a pseudo supervision module into it to exploit the discriminative information in an unsupervised manner, which further enhances its robustness. We thoroughly evaluate DNMF using both synthetic and real-world networks. Experiments show that DNMF has the ability to outperform state-of-the-art baseline approaches.

【Keywords】: community detection; overlapping communities; discrete nonnegative matrix factorization; pseudo supervision

【Paper Link】【Pages】:718-727

【Authors】: Yuyang Ye ; Hengshu Zhu ; Tong Xu ; Fuzhen Zhuang ; Runlong Yu ; Hui Xiong

【Abstract】: How to identify high-potential talent (HIPO) earlier in their career always has strategic importance for human resource management. While tremendous efforts have been made in this direction, most existing approaches are still based on the subjective selection of human resource experts. This could lead to unintentional bias and inconsistencies. To this end, in this paper, we propose a neural network based dynamic social profiling approach for quantitatively identifying HIPOs from the newly-enrolled employees by modeling the dynamics of their behaviors in organizational social networks. A basic assumption is that HIPOs usually perform more actively and have higher competencies than their peers to accumulate their social capitals during their daily work practice. Along this line, we first propose to model the social profiles of employees with both Graph Convolutional Network (GCN) and social centrality analysis in a comprehensive way. Then, an adaptive Long Short Term Memory (LSTM) network with global attention mechanism is designed to capture the profile dynamics of employees in the organizational social networks during their early career. Finally, extensive experiments on real-world data clearly validate the effectiveness of our approach as well as the interpretability of our results.

【Keywords】: Human Resource Management, HIPO identification, Social Profiling

74. Automatic Generation of Medical Imaging Diagnostic Report with Hierarchical Recurrent Neural Network.

【Paper Link】【Pages】:728-737

【Authors】: Changchang Yin ; Buyue Qian ; Jishang Wei ; Xiaoyu Li ; Xianli Zhang ; Yang Li ; Qinghua Zheng

【Abstract】: Medical images are widely used in the medical domain for the diagnosis and treatment of diseases. Reading a medical image and summarizing its insights is a routine, yet nonetheless time-consuming task, which often represents a bottleneck in the clinical diagnosis process. Automatic report generation can relieve the issues. However, generating medical reports presents two major challenges: (i) it is hard to accurately detect all the abnormalities simultaneously, especially the rare diseases; (ii) a medical image report consists of many paragraphs and sentences, which are longer than natural image captions. We present a new framework to accurately detect the abnormalities and automatically generate medical reports. The report generation model is based on hierarchical recurrent neural network (HRNN). We introduce a topic matching mechanism to HRNN, so as to make generated reports more accurate and diverse. The soft attention mechanism is also introduced to HRNN model. Experimental results on two image-paragraph pair datasets show that our framework outperforms all the state-of-art methods.

【Keywords】: deep learning; CNN; RNN; image captioning; medical report generation

75. Domain Knowledge Guided Deep Learning with Electronic Health Records.

【Paper Link】【Pages】:738-747

【Authors】: Changchang Yin ; Rongjian Zhao ; Buyue Qian ; Xin Lv ; Ping Zhang

【Abstract】: Due to their promising performance in clinical risk prediction with Electronic Health Records (EHRs), deep learning methods have attracted significant interest from healthcare researchers. However, there are 4 challenges: (i) Data insufficiency. Many methods require large amounts of training data to achieve satisfactory results. (ii) Interpretability. Results from many methods are hard to explain to clinicians (e.g., why the models make particular predictions and which events cause clinical outcomes). (iii) Domain knowledge integration. No existing method dynamically exploits complicated medical knowledge (e.g., relations such as cause and is-caused-by between clinical events). (iv) Time interval information. Most existing methods only consider the relative order of visits from EHRs, but ignore the irregular time intervals between neighboring visits. In the study, we propose a new model, Domain Knowledge Guided Recurrent Neural Networks (DG-RNN), by directly introducing domain knowledge from the medical knowledge graph into an RNN architecture, as well as taking the irregular time intervals into account. Experimental results on heart failure risk prediction tasks show that our model not only outperforms state-of-the-art deep-learning based risk prediction models, but also associates individual medical events with heart failure onset, thus paving the way for interpretable accurate clinical risk predictions.

【Keywords】: deep learning; RNN; EHR; knowledge graph; risk prediction

76. Fast LSTM Inference by Dynamic Decomposition on Cloud Systems.

【Paper Link】【Pages】:748-757

【Authors】: Yang You ; Yuxiong He ; Samyam Rajbhandari ; Wenhan Wang ; Cho-Jui Hsieh ; Kurt Keutzer ; James Demmel

【Abstract】: LSTM (long short-term memory) is a powerful deep learning technique that has been widely used in many real-world data-mining applications such as language modeling and machine translation. In this paper we aim to minimize the latency of LSTM inference on cloud systems without losing accuracy. If an LSTM model does not fit in cache, the latency due to data movement will likely be greater than that due to computation. In this case we reduce model parameters. If, as in most applications we consider, the LSTM models are able to fit the cache of cloud server processors, we focus on reducing the number of floating point operations, which has a corresponding linear impact on the latency of the inference calculation. Thus, in our system, we dynamically reduce model parameters or flops depending on which most impacts latency. Our inference system is based on Singular Value Decomposition (SVD) and Canonical Polyadic Decomposition (CPD). We evaluate our system based on models from a series of real-world applications like language modeling, computer vision, question answering, and sentiment analysis. Users of our system can use either pre-trained models or start from scratch. Our system achieves 15X average speedup for six real-world applications without losing accuracy in inference.

【Keywords】: LSTM, Fast Inference, Dynamic Decomposition

77. Self-Attentive Attributed Network Embedding Through Adversarial Learning.

【Paper Link】【Pages】:758-767

【Authors】: Wenchao Yu ; Wei Cheng ; Charu C. Aggarwal ; Bo Zong ; Haifeng Chen ; Wei Wang

【Abstract】: Network embedding aims to learn the low-dimensional representations/embeddings of vertices which preserve the structure and inherent properties of the networks. The resultant embeddings are beneficial to downstream tasks such as vertex classification and link prediction. A vast majority of real-world networks are coupled with a rich set of vertex attributes, which could be potentially complementary in learning better embeddings. Existing attributed network embedding models, with shallow or deep architectures, typically seek to match the representations in topology space and attribute space for each individual vertex by assuming that the samples from the two spaces are drawn uniformly. The assumption, however, can hardly be guaranteed in practice. Due to the intrinsic sparsity of sampled vertex sequences and incompleteness in vertex attributes, the discrepancy between the attribute space and the network topology space inevitably exists. Furthermore, the interactions among vertex attributes, a.k.a cross features, have been largely ignored by existing approaches. To address the above issues, in this paper, we propose Nettention, a self-attentive network embedding approach that can efficiently learn vertex embeddings on attributed network. Instead of sample-wise optimization, Nettention aggregates the two types of information through minimizing the difference between the representation distributions in the low-dimensional topology and attribute spaces. The joint inference is encapsulated in a generative adversarial training process, yielding better generalization performance and robustness. The learned distributions consider both locality-preserving and global reconstruction constraints which can be inferred from the learning of the adversarially regularized autoencoders. Additionally, a multi-head self-attention module is developed to explicitly model the attribute interactions. Extensive experiments on benchmark datasets have verified the effectiveness of the proposed Nettention model on a variety of tasks, including vertex classification and link prediction.

【Keywords】: network embedding; attributed network; deep embedding; generative adversarial networks; self-attention

【Paper Link】【Pages】:768-777

【Authors】: Junliang Yu ; Min Gao ; Hongzhi Yin ; Jundong Li ; Chongming Gao ; Qinyong Wang

【Abstract】: Most of the recent studies of social recommendation assume that people share similar preferences with their friends and the online social relations are helpful in improving traditional recommender systems. However, this assumption is often untenable as the online social networks are quite sparse and a majority of users only have a small number of friends. Besides, explicit friends may not share similar interests because of the randomness in the process of building social networks. Therefore, discovering a number of reliable friends for each user plays an important role in advancing social recommendation. Unlike other studies which focus on extracting valuable explicit social links, our work pays attention to identifying reliable friends in both the observed and unobserved social networks. Concretely, in this paper, we propose an end-to-end social recommendation framework based on Generative Adversarial Nets (GAN). The framework is composed of two blocks: a generator that is used to produce friends that can possibly enhance the social recommendation model, and a discriminator that is responsible for assessing these generated friends and ranking the items according to both the current user and her friends' preferences. With the competition between the generator and the discriminator, our framework can dynamically and adaptively generate reliable friends who can perfectly predict the current user' preference at a specific time. As a result, the sparsity and unreliability problems of explicit social relations can be mitigated and the social recommendation performance is significantly improved. Experimental studies on real-world datasets demonstrate the superiority of our framework and verify the positive effects of the generated reliable friends.

【Keywords】: Social Recommendation, Recommender Systems, Adversarial Learning, Generative Adversarial Network

79. Transfer Learning with Dynamic Adversarial Adaptation Network.

【Paper Link】【Pages】:778-786

【Authors】: Chaohui Yu ; Jindong Wang ; Yiqiang Chen ; Meiyu Huang

【Abstract】: The recent advances in deep transfer learning reveal that adversarial learning can be embedded into deep networks to learn more transferable features to reduce the distribution discrepancy between two domains. Existing adversarial domain adaptation methods either learn a single domain discriminator to align the global source and target distributions, or pay attention to align subdomains based on multiple discriminators. However, in real applications, the marginal (global) and conditional (local) distributions between domains are often contributing differently to the adaptation. There is currently no method to dynamically and quantitatively evaluate the relative importance of these two distributions for adversarial learning. In this paper, we propose a novel Dynamic Adversarial Adaptation Network (DAAN) to dynamically learn domain-invariant representations while quantitatively evaluate the relative importance of global and local domain distributions. To the best of our knowledge, DAAN is the first attempt to perform dynamic adversarial distribution adaptation for deep adversarial learning. DAAN is extremely easy to implement and train in real applications. We theoretically analyze the effectiveness of DAAN, and it can also be explained in an attention strategy. Extensive experiments demonstrate that DAAN achieves better classification accuracy compared to state-of-the-art deep and adversarial methods. Results also imply the necessity and effectiveness of the dynamic distribution adaptation in adversarial transfer learning.

【Keywords】: domain adaptation; dynamic; global and local; adversarial learning

80. Mining Audio, Text and Visual Information for Talking Face Generation.

【Paper Link】【Pages】:787-795

【Authors】: Lingyun Yu ; Jun Yu ; Qiang Ling

【Abstract】: Providing methods to support audio-visual interaction with growing volumes of video data is an increasingly important challenge for data mining. To this end, there has been some success in speech-driven lip motion generation or talking face generation. Among them, talking face generation aims to generate realistic talking heads synchronized with the audio or text input. This task requires mining the relationship between audio signal/text and lip-sync video frames and ensures the temporal continuity between frames. Due to the issues such as polysemy, ambiguity, and fuzziness of sentences, creating visual images with lip synchronization is still challenging. To overcome the problems above, we present a data-mining framework to learn the synchronous pattern between different channels from large recorded audio/text dataset and visual dataset, and apply it to generate realistic talking face animations. Specifically, we decompose this task into two steps: mouth landmarks prediction and video synthesis. First, a multimodal learning method is proposed to generate accurate mouth landmarks with multimedia inputs (both text and audio). Second, a network named Face2Vid is proposed to generate video frames conditioned on the predicted mouth landmarks. In Face2Vid, optical flow is employed to model the temporal dependency between frames, meanwhile, a self-attention mechanism is introduced to model the spatial dependency across image regions. Extensive experiments demonstrate that our method can generate realistic videos with background, and exhibit the superiorities on accurate synchronization of lip movements and smooth transition of facial movements.

【Keywords】: Data Mining; Facial animation; Adversarial Training; Optical Flow; Self-attention mechanism

81. Jointly Embedding the Local and Global Relations of Heterogeneous Graph for Rumor Detection.

【Paper Link】【Pages】:796-805

【Authors】: Chunyuan Yuan ; Qianwen Ma ; Wei Zhou ; Jizhong Han ; Songlin Hu

【Abstract】: The development of social media has revolutionized the way people communicate, share information and make decisions, but it also provides an ideal platform for publishing and spreading rumors. Existing rumor detection methods focus on finding clues from text content, user profiles, and propagation patterns. However, the local semantic relation and global structural information in the message propagation graph have not been well utilized by previous works. In this paper, we present a novel global-local attention network (GLAN) for rumor detection, which jointly encodes the local semantic and global structural information. We first generate a better integrated representation for each source tweet by fusing the semantic information of related retweets with the attention mechanism. Then, we model the global relationships among all source tweets, retweets, and users as a heterogeneous graph to capture the rich structural information for rumor detection. We conduct experiments on three real-world datasets, and the results demonstrate that GLAN significantly outperforms the state-of-the-art models in both rumor detection and early detection scenarios.

【Keywords】: Rumor detection, heterogeneous graph, attention mechanism, local and global relations, social networks

82. Learning Hierarchical and Shared Features for Improving 3D Neuron Reconstruction.

【Paper Link】【Pages】:806-815

【Authors】: Hao Yuan ; Na Zou ; Shaoting Zhang ; Hanchuan Peng ; Shuiwang Ji

【Abstract】: Neuron tracing, also known as neuron reconstruction, studies 3D morphologies of neurons based on imaging data. Neuron reconstruction is of fundamental importance in computational neuroscience since it is a crucial step towards reverse engineering of the wiring and functions of a brain. On the other hand, it is not possible to manually trace all neurons due to the complexity and cost of this task. Hence, it raises the need of building a computational pipeline to perform automatic neuron reconstruction. In this work, we propose a deep learning approach for improving the accuracy of 3D neuron reconstruction. First, we propose to learn shared features among different images in the whole dataset. Such shared features are learned automatically by our model at different scales. Second, we propose to incorporate such features to guide the information flow in the network. Specifically, we propose to build skip connections between the encoder and the decoder of our networks by incorporating the hierarchical and shared features. Our proposed skip connections are built based on the attention mechanism, where the hierarchical shared features serve as the query matrix and the local input features serve as the key and value matrices. Since the parameters are learned automatically, we expect that only useful spatial information is transmitted to the decoder. We conduct both qualitative and quantitative experiments to demonstrate the effectiveness of our proposed method. Experimental results show that our proposed model has the ability to capture detailed structural information for neurons. Our results also demonstrate that the proposed model is robust to noise. In addition, quantitative evaluations show that our method achieves better performance than other approaches.

【Keywords】: Deep Learning, 3D Neuron Reconstruction, Image Segmentation

83. PERCeIDs: PERiodic CommunIty Detection.

【Paper Link】【Pages】:816-825

【Authors】: Lin Zhang ; Alexander Gorovits ; Petko Bogdanov

【Abstract】: Many complex networked systems, both natural and human-made, exhibit periodic behavior driven by underlying seasonal processes: election cycles and regular sporting events in social networks, cell cycle phases in gene networks, and load variation in infrastructure networks due to weather or daylight patterns. The “natural” periodicity may vary across network communities. At the same time this periodic community behavior is central to (i) understating the overall system dynamics and (ii) for detection of the communities themselves. The predominant approach to dynamic community detection first detects communities and then as a second step quantify seasonality in their activity. How to jointly detect communities and their inherent periodicity, while also accounting for non-periodic one-off events? We propose PERCeIDs, a framework for periodic overlapping community detection from temporal interaction data. We model observed pairwise interaction activity as a mixture of periodic and outlier (non-periodic) components. We explicitly enforce periodic structure within our model by learning a succinct Ramanujan basis dictionary for community behaviors. By explicitly modeling periodicity, PERCeIDs outperforms baselines on both detecting highly overlapping communities with up to 2fold improvement in NMI compared to state-of-the-art baselines, while offering an interpretable temporal structure for discovered communities in the dataset. Implementation of our method is available for download [64].

【Keywords】: Dynamic networks, Community detection, Periodicity detection, Tensor factorization, Optimization

84. Generalized Adversarial Training in Riemannian Space.

【Paper Link】【Pages】:826-835

【Authors】: Shufei Zhang ; Kaizhu Huang ; Rui Zhang ; Amir Hussain

【Abstract】: Adversarial examples, referred to as augmented data points generated by imperceptible perturbations of input samples, have recently drawn much attention. Well-crafted adversarial examples may even mislead state-of-the-art deep neural network (DNN) models to make wrong predictions easily. To alleviate this problem, many studies have focused on investigating how adversarial examples can be generated and/or effectively handled. All existing works tackle this problem in the Euclidean space. In this paper, we extend the learning of adversarial examples to the more general Riemannian space over DNNs. The proposed work is important in that (1) it is a generalized learning methodology since Riemmanian space will be degraded to the Euclidean space in a special case; (2) it is the first work to tackle the adversarial example problem tractably through the perspective of Riemannian geometry; (3) from the perspective of geometry, our method leads to the steepest direction of the loss function, by considering the second order information of the loss function. We also provide a theoretical study showing that our proposed method can truly find the descent direction for the loss function, with a comparable computational time against traditional adversarial methods. Finally, the proposed framework demonstrates superior performance over traditional counterpart methods, using benchmark data including MNIST, CIFAR-10 and SVHN.

【Keywords】: Adversarial examples; deep neural network; riemannian manifold; adversarial training; regularization

85. Learning Structured Twin-Incoherent Twin-Projective Latent Dictionary Pairs for Classification.

【Paper Link】【Pages】:836-845

【Authors】: Zhao Zhang ; Yulin Sun ; Zheng Zhang ; Yang Wang ; Guangcan Liu ; Meng Wang

【Abstract】: In this paper, we extend the popular dictionary pair learning (DPL) into the scenario of twin-projective latent flexible DPL under a structured twin-incoherence. Technically, a novel framework called Twin-Projective Latent Flexible DPL (TP-DPL) is proposed, which minimizes the twin-incoherence constrained flexibly-relaxed reconstruction error to avoid the possible over-fitting issue and produce accurate reconstruction. In this setting, TP-DPL integrates the twin-incoherence based latent flexible DPL and the joint embedding of codes as well as salient features by twin-projection into a unified model in an adaptive neighborhood-preserving manner. Therefore, TP-DPL can unify the procedures of salient feature representation and classification. The twin-incoherence constraint on coefficients and features can explicitly ensure high intra-class compactness and inter-class separation over them. TP-DPL also integrates the adaptive weighting to preserve local neighborhood of both coefficients and salient features within each class explicitly. For efficiency, TP-DPL selects the Frobenius-norm and abandons the costly l0/l1-norm for group sparse representation. Another byproduct is that TP-DPL can directly apply the class-specific twin-projective reconstruction residual to compute the label of data. Extensive results on public databases show that TP-DPL can deliver the state-of-the-art performance.

【Keywords】: Structured twin incoherence, twin-projective latent dictionary pair learning, structured adaptive weighting, discriminative classification

86. Adaptive Structure-Constrained Robust Latent Low-Rank Coding for Image Recovery.

【Paper Link】【Pages】:846-855

【Authors】: Zhao Zhang ; Lei Wang ; Sheng Li ; Yang Wang ; Zheng Zhang ; Zhengjun Zha ; Meng Wang

【Abstract】: In this paper, we propose a robust representation learning model called Adaptive Structure-constrained Low-Rank Coding (AS-LRC) for the latent representation of data. To recover the underlying subspaces more accurately, AS-LRC seamlessly integrates an adaptive weighting based block-diagonal structure-constrained low-rank representation and the group sparse salient feature extraction into a unified framework. Specifically, AS-LRC performs the latent decomposition of given data into a low-rank reconstruction by a block-diagonal codes matrix, a group sparse locality-adaptive salient feature part and a sparse error part. To enforce the block-diagonal structures adaptive to different real datasets for the low-rank recovery, AS-LRC clearly computes an auto-weighting matrix based on the locality-adaptive features and multiplies by the low-rank coefficients for direct minimization at the same time. This encourages the codes to be block-diagonal and can avoid the tricky issue of choosing optimal neighborhood size or kernel width for the weight assignment, suffered in most local geometrical structures-preserving low-rank coding methods. In addition, our AS-LRC selects the L2, 1-norm on the projection for extracting group sparse features rather than learning low-rank features by Nuclear-norm regularization, which can make learnt features robust to noise and outliers in samples, and can also make the feature coding process efficient. Extensive visualizations and numerical results demonstrate the effectiveness of our AS-LRC for image representation and recovery.

【Keywords】: Robust latent subspace recovery, adaptive structure constrained low-rank coding, auto-weighting learning, group sparse salient feature extraction

87. Machine Comprehension-Incorporated Relevance Matching.

【Paper Link】【Pages】:856-865

【Authors】: Chen Zhang ; Hao Wang ; Liang Zhou ; Yijun Wang ; Can Chen

【Abstract】: In current web search engines, the relevance between a query and web pages (i.e. documents) is measured by Text Matching (TM) models. The documents retrieved mainly focus on matching text queries themselves but fail to find target information towards the user intent (e.g. the direct answer to question-style queries). Thus the document containing target information that users want may not be ranked at the top 1 in the search result or even not be recalled. Besides, as voice search and voice-powered assistants are entering our life, queries tend to be long tail ones, which needs a search engine evolved into a higher level of semantic relevance matching. Therefore, it is necessary to build an intent-target relevance matching model in modern search scenarios. This paper proposes a unified model of Machine Comprehension-incorporated Relevance Matching (MCRM). Totally, MCRM models how web users choose the relevant documents to read by observing the titles or summaries, and further look for target information from them. To accomplish that, we first formulate two tasks as Text Matching and Target Extracting. For learning each task, a Context-augmented Matching network (ContMatch) and a Matching-fused machine Comprehension network (MatComprehend) are proposed. Then, they are integrated into an end-to-end framework that can not only measure the semantic relevance but also extract the intent-related target, by deeply comprehending the semantics hidden in queries and exploiting intent-target relations between queries and documents. In MCRM, the two tasks are jointly learned by a multi-task learning approach, where the semantic relevance measured by ContMatch, and the intent-target relevance captured by MatComprehend, are combined and enhanced mutually for boosting the final performance. We conduct extensive experiments on real-world data. The experimental results demonstrate the superiority of MCRM against the state-of-the-art relevance models.

【Keywords】: Relevance matching, Machine comprehension, Multi-task learning

88. Boosted Trajectory Calibration for Traffic State Estimation.

【Paper Link】【Pages】:866-875

【Authors】: Xitong Zhang ; Liyang Xie ; Zheng Wang ; Jiayu Zhou

【Abstract】: Traffic state estimation is among the most critical issues in intelligent transport systems, and it has been widely explored for years considering its significance for diverse industrial applications such as vehicle dispatch and route planning. The availability of vehicle trajectories has provided a promising data source for traffic estimation, as it shed light on traffic status and characterizes human transition activates that are otherwise extremely hard to track. However, there are major challenges in incorporating trajectories in traffic estimation largely because of its inherent uncertainty from data sparsity and semantic ambiguity. Entangled in complicated road topology and spatiotemporal traffic dynamics, properly embedding trajectories in modeling is especially difficult. To address the aforementioned challenges, in this paper, we propose a Boosted Trajectory Calibration (BTRAC) framework to model traffic states of complex road networks that effectively integrates trajectory information. In the framework, we first use a deep neural network to predict the traffic states of each road individually from historical traffic information, along with the prediction uncertainty. Then we refine the predictions by an iterative boosting calibration procedure with embedded trajectories. We conduct extensive large-scale empirical studies on two cities and demonstrate the effectiveness of the proposed approach.

【Keywords】: urban traffic state estimation, trajectories em bedding, spatiotemporal calibration

89. HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories.

【Paper Link】【Pages】:876-885

【Authors】: Yu Zhang ; Frank F. Xu ; Sha Li ; Yu Meng ; Xuan Wang ; Qi Li ; Jiawei Han

【Abstract】: GitHub has become an important platform for code sharing and scientific exchange. With the massive number of repositories available, there is a pressing need for topic-based search. Even though the topic label functionality has been introduced, the majority of GitHub repositories do not have any labels, impeding the utility of search and topic-based analysis. This work targets the automatic repository classification problem as keyword-driven hierarchical classification. Specifically, users only need to provide a label hierarchy with keywords to supply as supervision. This setting is flexible, adaptive to the users' needs, accounts for the different granularity of topic labels and requires minimal human effort. We identify three key challenges of this problem, namely (1) the presence of multi-modal signals; (2) supervision scarcity and bias; (3) supervision format mismatch. In recognition of these challenges, we propose the HiGitClass framework, comprising of three modules: heterogeneous information network embedding; keyword enrichment; topic modeling and pseudo document generation. Experimental results on two GitHub repository collections confirm that HiGitClass is superior to existing weakly-supervised and dataless hierarchical classification methods, especially in its ability to integrate both structured and unstructured data for repository classification. Code and datasets related to this paper are available at https://github.com/yuzhimanhua/HiGitClass.

【Keywords】: hierarchical classification, GitHub, weakly supervised learning

90. Aftershock Detection with Multi-scale Description Based Neural Network.

【Paper Link】【Pages】:886-895

【Authors】: Qi Zhang ; Tong Xu ; Hengshu Zhu ; Lifu Zhang ; Hui Xiong ; Enhong Chen ; Qi Liu

【Abstract】: Aftershocks refer to the smaller earthquakes that occur following large earthquakes, in the same area of the main shock. The task of aftershocks detection, as a crucial and challenging issue in disaster monitoring, has attracted wide research attention in relevant fields. Compared with the traditional detection methods like STA/LTA algorithms or heuristic matching, neural network techniques are regarded as an advanced choice with better pattern recognition ability. However, current neural network-based solutions mainly formulate the seismic wave as ordinary time series, where existing techniques are directly deployed without adaption, and thus fail to obtain competitive performance on the intensive and highly-noise waveforms of aftershocks. To that end, in this paper, we propose a novel framework named Multi-Scale Description based Neural Network (MSDNN) for enhancing aftershock detection. Specifically, MSDNN contains a delicately-designed network structure for capturing both short-term scale and long-term scale seismic features. Therefore, the unique characteristics of seismic waveforms can be fully-exploited for aftershock detection. Furthermore, a multi-task learning strategy is introduced to model the seismic waveforms of multiple monitoring stations simultaneously, which can not only refine the detection performance but also provide additionally quantitative clues for discovering homologous earthquakes. Finally, comprehensive experiments on the data set from aftershocks of the Wenchuan M8.0 Earthquake have clearly validated the effectiveness of our framework compared with several state-of-the-art baselines.

【Keywords】: Multi-Scale Description; Multi-Task Learning; Aftershock Detection

91. Know Your Mind: Adaptive Cognitive Activity Recognition with Reinforced CNN.

【Paper Link】【Pages】:896-905

【Authors】: Xiang Zhang ; Lina Yao ; Xianzhi Wang ; Wenjie Zhang ; Shuai Zhang ; Yunhao Liu

【Abstract】: Electroencephalography (EEG) signals reflect and measure activities in certain brain areas. Its zero clinical risk and easy-to-use features make it a good choice of providing insights into the cognitive process. However, effective analysis of time-varying EEG signals remains challenging. First, EEG signal processing and feature engineering are time-consuming and highly rely on expert knowledge, and most existing studies focus on domain-specific classification algorithms, which may not apply to other domains. Second, EEG signals usually have low signal-to-noise ratios and are more chaotic than other sensor signals. In this regard, we propose a generic EEG-based cognitive activity recognition framework that can adaptively support a wide range of cognitive applications to address the above issues. The framework uses a reinforced selective attention model to choose the characteristic information among raw EEG signals automatically. It employs a convolutional mapping operation to dynamically transform the selected information into a feature space to uncover the implicit spatial dependency of EEG sample distribution. We demonstrate the effectiveness of the framework under three representative scenarios: intention recognition with motor imagery EEG, person identification, and neurological diagnosis, and further evaluate it on three widely used public datasets. The experimental results show our framework outperforms multiple state-of-the-art baselines and achieves competitive accuracy on all the datasets while achieving low latency and high resilience in handling complex EEG signals across various domains. The results confirm the suitability of the proposed generic approach for a range of problems in the realm of brain-computer Interface applications.

【Keywords】: deep learning; reinforcement learning; attention mechanism; brain-computer interface

92. A Parallel Simulated Annealing Enhancement of the Optimal-Matching Heuristic for Ridesharing.

【Paper Link】【Pages】:906-915

【Authors】: Lihao Zhang ; Zeyang Ye ; Keli Xiao ; Bo Jin

【Abstract】: In this paper, we develop an efficient parallel heuristic method to solve the global optimization problem associated with the ridesharing system. Based on the carefully formalized problem and objective function, we fully utilize the heuristic characteristics of the algorithm for handling the real-life constraints in ridesharing. Following the principles of simulated annealing, our method is adaptive in handling the matching and route optimization tasks. We develop an efficient parallel scheme with simulated annealing, named PCSA, for solving the global optimization problem for ridesharing. Our algorithm is capable of efficiently addressing the potential of ridesharing by exploiting the mobility information of the ride requests. Based on extensive experiments on large real-world data, we validate the performance of our parallel heuristic algorithm. Our results confirm the effectiveness and efficiency of the proposed method and its superiority over all other benchmarks.

【Keywords】: Ridesharing; Simulated-Annealing; Global-optimization; Parallel-Computing; Heuristic-method

93. Rank-Based Multi-task Learning for Fair Regression.

【Paper Link】【Pages】:916-925

【Authors】: Chen Zhao ; Feng Chen

【Abstract】: In this work, we develop a novel fairness learning approach for multi-task regression models based on a biased training dataset, using a popular rank-based non-parametric independence test, i.e., Mann Whitney U statistic, for measuring the dependency between target variable and protected variables. To solve this learning problem efficiently, we first reformulate the problem as a new non-convex optimization problem, in which a non-convex constraint is defined based on group-wise ranking functions of individual objects. We then develop an efficient model-training algorithm based on the framework of non-convex alternating direction method of multipliers (NC-ADMM), in which one of the main challenges is to implement an efficient projection oracle to the preceding non-convex set defined based on ranking functions. Through the extensive experiments on both synthetic and real-world datasets, we validated the out-performance of our new approach against several state-of-the-art competitive methods on several popular metrics relevant to fairness learning.

【Keywords】: sum rank, Mann Whitney U statistic, multi-task learning, NC-ADMM, fairness

94. Adversarial Robustness of Similarity-Based Link Prediction.

【Paper Link】【Pages】:926-935

【Authors】: Kai Zhou ; Tomasz P. Michalak ; Yevgeniy Vorobeychik

【Abstract】: Link prediction is one of the fundamental problems in social network analysis. A common set of techniques for link prediction rely on similarity metrics which use the topology of the observed subnetwork to quantify the likelihood of unobserved links. Recently, similarity metrics for link prediction have been shown to be vulnerable to attacks whereby observations about the network are adversarially modified to hide target links. We propose a novel approach for increasing robustness of similarity-based link prediction by endowing the analyst with a restricted set of reliable queries which accurately measure the existence of queried links. The analyst aims to robustly predict a collection of possible links by optimally allocating the reliable queries. We formalize the analyst's problem as a Bayesian Stackelberg game in which they first choose the reliable queries, followed by an adversary who deletes a subset of links among the remaining (unreliable) queries by the analyst. The analyst in our model is uncertain about the particular target link the adversary attempts to hide, whereas the adversary has full information about the analyst and the network. Focusing on similarity metrics using only local information, we show that the problem is NP-Hard for both players, and devise two principled and efficient approaches for solving it approximately. Extensive experiments with real and synthetic networks demonstrate the effectiveness of our approach.

【Keywords】: social network analysis; link prediction; adversarial robustness; game theory

95. Matrix Profile XVIII: Time Series Mining in the Face of Fast Moving Streams using a Learned Approximate Matrix Profile.

【Paper Link】【Pages】:936-945

【Authors】: Zachary Zimmerman ; Nader Shakibay Senobari ; Gareth J. Funning ; Evangelos E. Papalexakis ; Samet Oymak ; Philip Brisk ; Eamonn J. Keogh

【Abstract】: In recent years, the Matrix Profile has emerged as a promising approach to allow data mining on large time series archives. By efficiently computing all of the "essential" distance information between subsequences in a time series, the Matrix Profile makes many analytic problems, including classification and anomaly detection, easy or even trivial. However, for many tasks, in addition to archives of data, we may face never-ending streams of newly arriving data. While there is an algorithm to maintain a Matrix Profile in the face of newly arriving data, it is limited to streams arriving on the order of one Hz and with small archives of historical data. However, in domains as diverse as seismology, neuroscience and entomology, we may encounter datasets that stream at rates that are orders of magnitude faster. In this work we introduce LAMP, a model that predicts, in constant time, the Matrix Profile value that would have been assigned to an incoming subsequence. This allows us to exploit the utility of the Matrix Profile in settings that would otherwise be untenable. While learning LAMP models is computationally expensive, this stage is done offline with an arbitrary computational paradigm. The models can then be deployed on resource-constrained devices including wearable sensors. We demonstrate the utility of LAMP with experiments on diverse and challenging datasets with billions of datapoints on a simple desktop machine. We achieve more than 10000x speedup over exact methods on the same data.

【Keywords】: Time Series; Streaming Data; Matrix Profile

Short Papers 99

96. ChainNet: Learning on Blockchain Graphs with Topological Features.

【Paper Link】【Pages】:946-951

【Authors】: Nazmiye Ceren Abay ; Cuneyt Gurcan Akcora ; Yulia R. Gel ; Murat Kantarcioglu ; Umar D. Islambekov ; Yahui Tian ; Bhavani M. Thuraisingham

【Abstract】: With emergence of blockchain technologies and the associated cryptocurrencies, such as Bitcoin, understanding network dynamics behind Blockchain graphs has become a rapidly evolving research direction. Unlike other financial networks, such as stock and currency trading, blockchain based cryptocurrencies have the entire transaction graph accessible to the public (i.e., all transactions can be downloaded and analyzed). A natural question is then to ask whether dynamics of the transaction graph impacts price of the underlying cryptocurrency. We show that standard graph features such as degree distribution of the transaction graph may not be sufficient to capture network dynamics and its potential impact on fluctuations of Bitcoin price. In contrast, topological features computed from the blockchain graph using the tools of persistent homology, are found to exhibit higher utility for predicting Bitcoin price dynamics.

【Keywords】: blockchain, bitcoin, persistent homology, graph

97. MTEX-CNN: Multivariate Time Series EXplanations for Predictions with Convolutional Neural Networks.

【Paper Link】【Pages】:952-957

【Authors】: Roy Assaf ; Ioana Giurgiu ; Frank Bagehorn ; Anika Schumann

【Abstract】: In this work we present MTEX-CNN, a novel explainable convolutional neural network architecture which can not only be used for making predictions based on multivariate time series data, but also for explaining these predictions. The network architecture consists of two stages and utilizes particular kernel sizes. This allows us to apply gradient based methods for generating saliency maps for both the time dimension and the features. The first stage of the architecture explains which features are most significant to the predictions, while the second stage explains which time segments are the most significant. We validate our approach on two use cases, namely to predict rare server outages in the wild, as well as the average energy production of photovoltaic power plants based on a benchmark data set. We show that our explanations shed light over what the model has learned. We validate this by retraining the network using the most significant features extracted from the explanations and retaining similar performance to training with the full set of features.

【Keywords】: Explainable Machine Learning; Multivariate Time Series; Convolutional Neural Network; Deep Learning

98. Efficient Approximate Solution Path Algorithm for Order Weight L_1-Norm with Accuracy Guarantee.

【Paper Link】【Pages】:958-963

【Authors】: Runxue Bao ; Bin Gu ; Heng Huang

【Abstract】: Variable selection is a challenging problem in high-dimensional linear regression problems with a large number of predictors. Thus, sparsity-inducing and clustering-inducing regularization methods are widely used to identify highly correlated covariates. Ordered Weight L ₁ (OWL) family of regularizers for linear regression perform well to identify precise clusters of correlated covariates and interpret the effect of each variable. Solution path algorithms are helpful to select hyperparameters to tune the OWL model. Due to over-complex representation of the penalty, so far the OWL model has no solution path algorithms for hyperparameter selection. To address this challenge, in this paper, we propose an efficient approximate solution path algorithm (OWLAGPath) to solve the OWL model with accuracy guarantee. For a given accuracy bound ε, OWLAGPath can find the corresponding solutions for the OWL model with numerous hyperparameters while keeping the sparsity and precise features grouping properties. Theoretically, we prove that all the solutions produced by OWLAGPath can strictly satisfy the given accuracy bound ε. The experimental results on three benchmark datasets not only confirm the effectiveness and efficiency of our OWLAGPath algorithm, but also show the advantages of OWLAGPath for model selection than the existing algorithms.

【Keywords】: Sparse regression, variable selection, hyperparameter selection, solution path algorithm.

99. A Wasserstein Subsequence Kernel for Time Series.

【Paper Link】【Pages】:964-969

【Authors】: Christian Bock ; Matteo Togninalli ; M. Elisabetta Ghisu ; Thomas Gumbsch ; Bastian Rieck ; Karsten M. Borgwardt

【Abstract】: Kernel methods are a powerful approach for learning on structured data. However, as we show in this paper, simple but common instances of the popular R-convolution kernel framework can be meaningless when assessing the similarity of two time series through their subsequences. We therefore propose a meaningful approach based on optimal transport theory that simultaneously captures local and global characteristics of time series. Moreover, we demonstrate that our method achieves competitive classification accuracy in comparison to state-of-the art methods across a wide variety of data sets.

【Keywords】: Time Series Classification; Optimal Transport; Wasserstein; R-Convolution Kernels

100. Nearest Neighbor Ensembles: An Effective Method for Difficult Problems in Streaming Classification with Emerging New Classes.

【Paper Link】【Pages】:970-975

【Authors】: Xin-Qiang Cai ; Peng Zhao ; Kai-Ming Ting ; Xin Mu ; Yuan Jiang

【Abstract】: This paper re-examines existing systems in streaming classification with emerging new classes (SENC) problems, where new classes that have not been used to train a classifier may emerge in a data stream. We identify that existing systems have an unspecified assumption that emerging new classes are geometrically far from known classes, or instances of known classes are densely distributed, in the feature space. Using a class separation indicator alpha, we refine the SENC problem into an alpha-SENC problem, where alpha indicates a geometric distance between two classes in the feature space. We show that while most existing systems work well in high-alpha SENC problems (i.e., a new class is geometrically far from a known class or instances of known classes are densely distributed), they perform poorly in low-alpha SENC problems. To solve low-alpha SENC problems effectively, we propose an approach using nearest neighbor ensembles or SENNE. We demonstrate that SENNE is able to handle both the low-alpha and high-alpha SENC problems which can appear at different times in a single data stream.

【Keywords】: classification; data stream; new class; ensemble method; open and dynamic environments

101. VASE: A Twitter-Based Vulnerability Analysis and Score Engine.

【Paper Link】【Pages】:976-981

【Authors】: Haipeng Chen ; Jing Liu ; Rui Liu ; Noseong Park ; V. S. Subrahmanian

【Abstract】: When a new vulnerability is discovered, a Common Vulnerability and Exposure (CVE) number is publicly assigned to it. The vulnerability is then analyzed by the US National Institute of Standards and Technology (NIST) whose Common Vulnerability Scoring System (CVSS) evaluates a severity score that ranges from 0 to 10 for the vulnerability. On average, NIST takes 132.7 days for this - but early knowledge of the CVSS score is critical for enterprise security managers to take defensive actions (e.g. patch prioritization). We present VASE (Vulnerability Analysis and Scoring Engine) that uses Twitter discussions about CVEs to predict CVSS scores before the official assessments from NIST. In order to leverage the intrinsic correlations between different vulnerabilities, VASE adopts a graph convolutional network (GCN) model in which nodes correspond to CVEs. In addition, we propose a novel attention-based input embedding method to extract useful latent features for each CVE node. We show on real-world data that VASE obtains a mean absolute error (MAE) of 1.255 for predicting the CVSS score using only three days of Twitter discussion data after the date a vulnerability is first mentioned on Twitter. VASE can provide predictions for the CVSS scores for 37.85% of the CVEs at least one week earlier than the official assessments by NIST.

【Keywords】: Vulnerability Severity Prediction; Social Media Data Mining; Graph Convolution Networks; Input Embedding

102. Scalable Explanation of Inferences on Large Graphs.

【Paper Link】【Pages】:982-987

【Authors】: Chao Chen ; Yifei Liu ; Xi Zhang ; Sihong Xie

【Abstract】: Probabilistic inferences distill knowledge from graphs to aid human make important decisions. Due to the inherent uncertainty in the model and the complexity of the knowledge, it is desirable to help the end-users understand the inference outcomes. Different from deep or high dimensional parametric models, the lack of interpretability in graphical models is due to the cyclic and long-range dependencies and the byzantine inference procedures. Prior works did not tackle cycles and make the inferences interpretable. We formulate the explanation of probabilistic inferences as a constrained cross-entropy minimization problem to find simple subgraphs that faithfully approximate the inferences. We prove that the optimization is NP-hard, while the objective is not monotonic and submodular to guarantee efficient greedy approximation. We propose a beam search algorithm to find trees to enhance the explanation interpretability and diversity. To allow efficient search on large and dense graphs without hurting faithfulness, we further propose parallelization and a pruning strategy. We demonstrate superior performance on four networks from distinct applications, comparing favorably to other explanation methods, including LIME.

【Keywords】: interpretability, graphical model

103. AMENDER: An Attentive and Aggregate Multi-layered Network for Dataset Recommendation.

【Paper Link】【Pages】:988-993

【Authors】: Yujun Chen ; Yuanhong Wang ; Yutao Zhang ; Juhua Pu ; Xiangliang Zhang

【Abstract】: In this paper, we study the problem of recommending the appropriate datasets for authors, which is implemented to infer the proximity between authors and datasets by leveraging the information from a three-layered network, composed by authors, papers and datasets. To link author-dataset semantically by taking advantage of the rich content information of papers in the intermediate layer, we design an attentive and aggregate multi-layer network learning model. The aggregation is for integrating the intra-layer information of paper content and citations, while the attention is used for coordinating authors at the top-layer and datasets at the bottom-layer in the semantic space learned from papers in the intermediate layer. The experimental study demonstrates the superiority of our method compared with the solutions that extend existing models to our problem.

【Keywords】: Recommendation systems; Multi-layer network; Network embeddinng

104. Session-Based Recommendation with Local Invariance.

【Paper Link】【Pages】:994-999

【Authors】: Tianwen Chen ; Raymond Chi-Wing Wong

【Abstract】: Session-based recommendation is a task to predict users' next actions given a sequence of previous actions in the same session. Existing methods either encode the previous actions in a strict order or completely ignore the order. However, sometimes the order of actions in a short sub-sequence, called the detailed order, may not be important, e.g., when a user is just comparing the same kind of products from different brands. Nevertheless, the high-level ordering information is still useful because the data is sequential in nature. Therefore, a good session-based recommender should pay different attention to the sequential information in different levels of granularity. To this end, we propose a novel model to automatically ignore the insignificant detailed ordering information in some sub-sessions, while keeping the high-level sequential information of the whole sessions. In the model, we first use a full self-attention layer with Gaussian weighting to extract features of sub-sessions, and then we apply a recurrent neural network to capture the high-level sequential information. Extensive experiments on two real-world datasets show that our method outperforms or matches the state-of-the-art methods.

【Keywords】: session based recommendation; behavior modeling; recommender systems; attention model; neural networks

105. Alpha-Beta Sampling for Pairwise Ranking in One-Class Collaborative Filtering.

【Paper Link】【Pages】:1000-1005

【Authors】: Mingyue Cheng ; Runlong Yu ; Qi Liu ; Vincent W. Zheng ; Hongke Zhao ; Hefu Zhang ; Enhong Chen

【Abstract】: This paper introduces Alpha-Beta Sampling (ABS) strategy, which is particularly intended for the sampling problem of pairwise ranking in one-class collaborative filtering (PROCCF). Specifically, ABS strategy places more emphasis on such training examples, including positive item with a lower preference score and negative items with a higher preference score for each gradient step. Then, we provide the corresponding proofs for the ABS strategy from both gradient and ranking perspectives. First, we prove that sampled training examples by ABS strategy can update the model parameters with a large magnitude and analyze two instantiations by combining two specific pairwise algorithms. Second, it can be proved that ABS strategy is equivalent to optimizing for ranking-aware evaluation metrics like Normalized Discounted Cumulative Gain (NDCG). Furthermore, ABS strategy can be very general and applicable in a lot of pairwise structures of pairwise algorithms. Based on ABS strategy, we provide an effective sampling algorithm to dynamically draw items for each SGD update. Finally, we evaluate the ABS strategy by conducting sampling tasks in two representative pairwise algorithms. The experiment results show that the ABS strategy performs significantly better than the baseline strategies.

【Keywords】: Sampling, Pairwise Preference Learning, One Class Collaborative Filtering

106. Counterfactual Attention Supervision.

【Paper Link】【Pages】:1006-1011

【Authors】: Seungtaek Choi ; Haeju Park ; Seung-won Hwang

【Abstract】: Neural attention mechanism has been used as a form of explanation for model behavior. Users can either passively consume explanation, or actively disagree with explanation then supervise attention into more proper values (attention supervision). Though attention supervision was shown to be effective in some tasks, we find the existing attention supervision is biased, for which we propose to augment counterfactual observations to debias and contribute to accuracy gains. To this end, we propose a counterfactual method to estimate such missing observations and debias the existing supervisions. We validate the effectiveness of our counterfactual supervision on widely adopted image benchmark datasets: CUFED and PEC.

【Keywords】: attention supervision; event specific image ranking; event recognition

107. Inductive Embedding Learning on Attributed Heterogeneous Networks via Multi-task Sequence-to-Sequence Learning.

【Paper Link】【Pages】:1012-1017

【Authors】: Yunfei Chu ; Caili Guo ; Tongze He ; Yaqing Wang ; Jenq-Neng Hwang ; Chunyan Feng

【Abstract】: In the paper, we study the problem of inductive embedding learning on attributed heterogeneous networks, and propose a Multi-task sequence-to-sequence learning based Inductive Network Embedding framework (MINE) capturing the attribute similarity, network proximity, and partial label information simultaneously. In particular, MINE trains an encoder function that aggregates information from a node's long-range scope of contexts, with the node attribute sequences generated by the proposed type-guided heterogeneous random walk as inputs. We present an one-to-many multi-task sequence-to-sequence model where the encoder is shared between two related tasks: an unsupervised node identity sequence generation task to learn context-aware embeddings, and a semi-supervised label prediction task to learn semantics-rich embeddings. Extensive experiments on real-world datasets demonstrate that the proposed method significantly outperforms several state-of-the-art methods.

【Keywords】: network embedding; graph embedding; inductive learning; attributed heterogeneous network

108. A Factorized Version Space Algorithm for "Human-In-the-Loop" Data Exploration.

【Paper Link】【Pages】:1018-1023

【Authors】: Luciano Di Palma ; Yanlei Diao ; Anna Liu

【Abstract】: While active learning (AL) has been recently applied to help the user explore a large database to retrieve data instances of interest, existing methods often require a large number of instances to be labeled in order to achieve good accuracy. To address this slow convergence problem, our work augments version space-based AL algorithms, which have strong theoretical results on convergence but are very costly to run, with additional insights obtained in the user labeling process. These insights lead to a novel algorithm that factorizes the version space to perform active learning in a set of subspaces. Our work offers theoretical results on optimality and approximation for this algorithm, as well as optimizations for better performance. Evaluation results show that our factorized version space algorithm significantly outperforms other version space algorithms, as well as a recent factorization-aware algorithm, for large database exploration.

【Keywords】: Active learning; version space; data exploration

109. Optimal Timelines for Network Processes.

【Paper Link】【Pages】:1024-1029

【Authors】: Daniel J. DiTursi ; Carolyn S. Kaminski ; Petko Bogdanov

【Abstract】: Structural models for network dynamics typically assume a discrete timeline of network events (node activation or link creation) and a stochastic generative process giving rise to new events based on the event history and the network structure. In order to employ these models for prediction, observational data is often aggregated at a fixed temporal resolution (e.g., minutes or days). However, the underlying network processes may “speed up” or “slow down” at different points in time, rendering observations unlikely and predictions incorrect. The challenge is to optimize the timescale for the analysis of network event data, which in turn is based on structural models of the underlying network processes. We introduce the general problem of inferring the optimal temporal resolution for network event data. The goal is to map observed network events to discrete time steps by aggregation and/or disaggregation of their original timeline such that they are collectively well-explained by structural dynamics models. We unify network growth and information diffusion models and differentiate between short- and long-memory processes. We demonstrate that while optimal temporal aggregation can be performed in polynomial time, disaggregation-and thus, the general timescale inference problem-is NP-hard. We propose scalable heuristics for the problem, some with approximation guarantees, and employ them for missing event recovery and temporal link prediction, demonstrating significant improvements (absolute increase of 10% in F1 measure for event recovery and of 5% in AUC for link prediction) compared to employing the same algorithms on the default timescale of data collection.

【Keywords】: emporal-networks; Network-clock; Timeline-reconstruction; Structural-network-models

110. Intervention-Aware Early Warning.

【Paper Link】【Pages】:1030-1035

【Authors】: Dhivya Eswaran ; Christos Faloutsos ; Nina Mishra ; Yonatan Naamad

【Abstract】: How can we early warn against an impending student drop out or an adverse health condition in near real-time? More challengingly, how do we learn to early warn from data containing confounding interventions-e.g., tutoring or medicines-while remaining interpretable to the human decision maker? We consider the problem of learning to interpretably early warn from labeled data tainted by interventions. We first identify three principles that an early warning system should follow. We then propose SmokeAlarm which provably obeys these principles and produces early warning scores in an online manner. Notably, learned model is "bi-inspectable", i.e., it can be visualized both in the presence and in the absence of interventions. Experiments demonstrate the efficacy of SmokeAlarm over prior approaches.

【Keywords】: early warning; early prediction; time series; intervention awareness; event detection

111. CAMP: Co-Attention Memory Networks for Diagnosis Prediction in Healthcare.

【Paper Link】【Pages】:1036-1041

【Authors】: Jingyue Gao ; Xiting Wang ; Yasha Wang ; Zhao Yang ; Junyi Gao ; Jiangtao Wang ; Wen Tang ; Xing Xie

【Abstract】: Diagnosis prediction, which aims to predict future health information of patients from historical electronic health records (EHRs), is a core research task in personalized healthcare. Although some RNN-based methods have been proposed to model sequential EHR data, these methods have two major issues. First, they cannot capture fine-grained progression patterns of patient health conditions. Second, they do not consider the mutual effect between important context (e.g., patient demographics) and historical diagnosis. To tackle these challenges, we propose a model called Co-Attention Memory networks for diagnosis Prediction (CAMP), which tightly integrates historical records, fine-grained patient conditions, and demographics with a three-way interaction architecture built on co-attention. Our model augments RNNs with a memory network to enrich the representation capacity. The memory network enables analysis of fine-grained patient conditions by explicitly incorporating a taxonomy of diseases into an array of memory slots. We instantiate the READ/WRITE operations of the memory network so that the memory cooperates effectively with the patient demographics through co-attention mechanism. Experiments on real-world datasets demonstrate that CAMP consistently performs better than state-of-the-art methods.

【Keywords】: diagnosis prediction; memory networks; attention mechanism; healthcare informatics

112. DynGraph2Seq: Dynamic-Graph-to-Sequence Interpretable Learning for Health Stage Prediction in Online Health Forums.

【Paper Link】【Pages】:1042-1047

【Authors】: Yuyang Gao ; Lingfei Wu ; Houman Homayoun ; Liang Zhao

【Abstract】: Online health communities such as the online breast cancer forum enable patients (i.e., users) to interact and help each other within various subforums, which are subsections of the main forum devoted to specific health topics. The changing nature of the users' activities in different subforums can be strong indicators of their health status changes. This additional information could allow health-care organizations to respond promptly and provide additional help for the patient. However, modeling complex transitions of an individual user's activities among different subforums over time and learning how these correspond to his/her health stage are extremely challenging. In this paper, we first formulate the transition of user activities as a dynamic graph with multi-attributed nodes, then formalize the health stage inference task as a dynamic graph-to-sequence learning problem, and hence propose a novel dynamic graph-to-sequence neural networks architecture (DynGraph2Seq) to address all the challenges. Our proposed DynGraph2Seq model consists of a novel dynamic graph encoder and an interpretable sequence decoder that learn the mapping between a sequence of time-evolving user activity graphs and a sequence of target health stages. We go on to propose dynamic graph hierarchical attention mechanisms to facilitate the necessary multi-level interpretability. A comprehensive experimental analysis of its use for a health stage prediction task demonstrates both the effectiveness and the interpretability of the proposed models.

【Keywords】: deep learning; dynamic graph; sequence prediction; health stage prediction

113. DRCGR: Deep Reinforcement Learning Framework Incorporating CNN and GAN-Based for Interactive Recommendation.

【Paper Link】【Pages】:1048-1053

【Authors】: Rong Gao ; Haifeng Xia ; Jing Li ; Donghua Liu ; Shuai Chen ; Gang Chun

【Abstract】: Recently, the application of deep reinforcement learning into the field of session-based interactive recommendation has attracted great attention from researchers. However, despite that some interactive recommendation models based on deep reinforcement learning have been proposed, they still suffer to the following limitations: (1) these works ignore the skip behaviors of sequential patterns in users' clicking behavior; (2) these works fail to incorporate positive feedback and negative feedback into the proposed deep reinforcement recommender system when the positive feedback is sparse. Therefore, to solve the problems mentioned above, a novel Deep Q-Network based recommendation framework incorporating CNN and GAN-based models is proposed to acquire robust performance, named DRCGR. Specifically, in DRCGR, a CNN model is used to capture the sequential features for positive feedback. Then, an adversarial training is adopted to learn optimal negative feedback representations Then, positive/negative representations are fed into DQN simultaneously, which are conducive to generating better action-value function The experimental results based on real-world e-commerce data demonstrate our framework's superiority over some state-of-the-art recommendation models.

【Keywords】: recommendation; deep reinforcement learning; GAN; convolutional filters

114. An Integrated Model for Urban Subregion House Price Forecasting: A Multi-source Data Perspective.

【Paper Link】【Pages】:1054-1059

【Authors】: Chuancai Ge ; Yang Wang ; Xike Xie ; Hengchang Liu ; Zhengyang Zhou

【Abstract】: Urban housing price is widely accepted as an economic indicator of both business and research interest in urban computing. In this work, we propose an effective and fine-grained model for urban subregion housing price predictions. Compared to existing works, our proposal improves the forecasting granularity from city-level to mile-level in spite of data sparsity and complex factors. The fine-grained housing price forecasting has the potential to support a broad scope of applications, ranging from urban planning to housing market recommendations. To achieve that, in this paper, we propose a novel integrated framework, FTD_DenseNet, which incorporates more social and economic features and makes full use of all-level spatiotemporal features. Specifically, the Kalman Filter-based future expection is firstly involved as an influence factor in our model. Extensive empirical studies on real data show the effectiveness of our proposals.

【Keywords】: urban computing, subregion housing price forecasting

115. Fair Adversarial Gradient Tree Boosting.

【Paper Link】【Pages】:1060-1065

【Authors】: Vincent Grari ; Boris Ruf ; Sylvain Lamprier ; Marcin Detyniecki

【Abstract】: Fair classification has become an important topic in machine learning research. While most bias mitigation strategies focus on neural networks, we noticed a lack of work on fair classifiers based on decision trees even though they have proven very efficient. In an up-to-date comparison of state-of-the-art classification algorithms in tabular data, tree boosting outperforms deep learning. For this reason, we have developed a novel approach of adversarial gradient tree boosting. The objective of the algorithm is to predict the output Y with gradient tree boosting while minimizing the ability of an adversarial neural network to predict the sensitive attribute S. The approach incorporates at each iteration the gradient of the neural network directly in the gradient tree boosting. We empirically assess our approach on 4 popular data sets and compare against state-of-the-art algorithms. The results show that our algorithm achieves a higher accuracy while obtaining the same level of fairness, as measured using a set of different common fairness definitions.

【Keywords】: Adversarial; Gradient; Boosting; Fair; machine; learning

116. Quantized Adversarial Training: An Iterative Quantized Local Search Approach.

【Paper Link】【Pages】:1066-1071

【Authors】: Yifan Guo ; Tianxi Ji ; Qianlong Wang ; Lixing Yu ; Pan Li

【Abstract】: Studies find that deep learning models are vulnerable to deliberate adversarial manipulations by attackers. Adversarial training is an effective approach to address this problem. Previous works quantize the input sample space to find appropriate perturbations on the benign samples so as to generate adversarial samples for adversarial training. However, since only the input sample space is quantized with the perturbation space being still continuous, finding the optimal perturbation noise is still a non-convex and computationally expensive problem. Moreover, in this case, the found perturbation noise that will be used to generate an adversarial sample may be strong in the continuous search space, but may become weak after quantization in the input sample space. In this paper, we first develop an Iterative Quantized Local Search (IQLS) algorithm that finds strong perturbation noises by quantizing both the input space and perturbation space. Then, we theoretically analyze and prove the upper bound on the number of iterations needed for the IQLS algorithm, based on which we devise an efficient and effective Quantized Adversarial Training (QAT) scheme. Experiment results on six public datasets show that our proposed scheme outperforms state-of-the-art methods to defend against different adversarial attacks. Particularly, QAT improves the system performance by 14%, 11%, 16% on average on CIFAR-10, SVHN, and CIFAR-100 datasets respectively compared with the existing defense schemes, and reduces the computing time by about 60%.

【Keywords】: Adversarial training; input space quantization; perturbation space quantization; deep learning; image classification

117. A Weighted Aggregating SGD for Scalable Parallelization in Deep Learning.

【Paper Link】【Pages】:1072-1077

【Authors】: Pengzhan Guo ; Zeyang Ye ; Keli Xiao

【Abstract】: We investigate the stochastic optimization problem and develop a scalable parallel computing algorithm for deep learning tasks. The key of our study involves a reformation of the objective function for the stochastic optimization in neural network models. We propose a novel update rule, named weighted aggregating stochastic gradient decent, after theoretically analyzing the characteristics of the newly formalized objective function. The new rule introduces a weighted aggregation scheme based on the performance of local workers and does not require a center variable. It assesses the relative importance of local workers and accepts them according to their contributions. Our new rule also allows the implementation of both synchronous and asynchronous parallelization and can result in varying convergence rates. For method evaluation, we benchmark our schemes against the mainstream algorithms, including the elastic averaging SGD in training deep neural networks for classification tasks. We conduct extensive experiments on several classic datasets, and the results confirm the strength of our scheme in accelerating the training of deep architecture and scalable parallelization.

【Keywords】: Deep learning, Stochastic gradient descent, Parallel Computing

118. Improving Disentangled Representation Learning with the Beta Bernoulli Process.

【Paper Link】【Pages】:1078-1083

【Authors】: Prashnna K. Gyawali ; Zhiyuan Li ; Cameron Knight ; Sandesh Ghimire ; B. Milan Horácek ; John L. Sapp ; Linwei Wang

【Abstract】: To improve the ability of variational auto-encoders (VAE) to disentangle in the latent space, existing works mostly focus on enforcing the independence among the learned latent factors. However, the ability of these models to disentangle often decreases as the complexity of the generative factors increases. In this paper, we investigate the little-explored effect of the modeling capacity of a posterior density on the disentangling ability of the VAE. We note that the independence within and the complexity of the latent density are two different properties we constrain when regularizing the posterior density: while the former promotes the disentangling ability of VAE, the latter - if overly limited - creates an unnecessary competition with the data reconstruction objective in VAE. Therefore, if we preserve the independence but allow richer modeling capacity in the posterior density, we will lift this competition and thereby allow improved independence and data reconstruction at the same time. We investigate this theoretical intuition with a VAE that utilizes a non-parametric latent factor model, the Indian Buffet Process (IBP), as a latent density that is able to grow with the complexity of the data. Across two widely-used benchmark data sets (MNIST and dSprites) and two clinical data sets little explored for disentangled learning, we qualitatively and quantitatively demonstrated the improved disentangling performance of IBP-VAE over the state of the art. In the latter two clinical data sets riddled with complex factors of variations, we further demonstrated that unsupervised disentangling of nuisance factors via IBP-VAE - when combined with a supervised objective - can not only improve task accuracy in comparison to relevant supervised deep architectures, but also facilitate knowledge discovery related to task decision-making.

【Keywords】: Variational Autoencoder; Non-parametric latent factor model; Disentangled Representation

119. Diversity-Aware Recommendation by User Interest Domain Coverage Maximization.

【Paper Link】【Pages】:1084-1089

【Authors】: Yifan He ; Haitao Zou ; Hualong Yu ; Qi Wang ; Shang Gao

【Abstract】: Diversity-oriented models have been developed to recommend top-K items, which utilize some static parameters to make a trade-off or construct an objective function, derived from item relevance and item diversity. However, such a process directly narrows the interest points of the item list, and would not satisfy users' preferences very much. Besides, the static parameters mentioned above make recommendation algorithms a lack of adaptability and limit their application scenarios. Aiming at improving the adaptability and efficiency of diversity-aware recommendations, we propose a coverage-based approach according to the concepts of user-coverage and users' interest domain we have defined in this paper. Our method is parameter-free and suitable for either implicit data or explicit data. From a technique perspective, we design an improved greedy algorithm, which is used to achieve user interest domain coverage maximization, and provide solid theoretical proof about performance guarantee on efficiency and recommendation quality. During the experiments, we compare our model against two novel methods on two real-world data sets. Experimental results demonstrate the superiority of our method over the state-of-the-art techniques in terms of item relevance and diversity.

【Keywords】: Recommender system, Diversity, User coverage

120. Deep Reinforcement Learning for Multi-driver Vehicle Dispatching and Repositioning Problem.

【Paper Link】【Pages】:1090-1095

【Authors】: John Holler ; Risto Vuorio ; Zhiwei Qin ; Xiaocheng Tang ; Yan Jiao ; Tiancheng Jin ; Satinder Singh ; Chenxi Wang ; Jieping Ye

【Abstract】: Order dispatching and driver repositioning (also known as fleet management) in the face of spatially and temporally varying supply and demand are central to a ride-sharing platform marketplace. Hand-crafting heuristic solutions that account for the dynamics in these resource allocation problems is difficult, and may be better handled by an end-to-end machine learning method. Previous works have explored machine learning methods to the problem from a high-level perspective, where the learning method is responsible for either repositioning the drivers or dispatching orders, and as a further simplification, the drivers are considered independent agents maximizing their own reward functions. In this paper we present a deep reinforcement learning approach for tackling the full fleet management and dispatching problems. In addition to treating the drivers as individual agents, we consider the problem from a system-centric perspective, where a central fleet management agent is responsible for decision-making for all drivers.

【Keywords】: reinforcement learning; ride-sharing; fleet management; order dispatching

121. Deep-Aligned Convolutional Neural Network for Skeleton-Based Action Recognition and Segmentation.

【Paper Link】【Pages】:1096-1101

【Authors】: Babak Hosseini ; Romain Montagné ; Barbara Hammer

【Abstract】: Convolutional neural networks (CNNs) are deep learning frameworks which are well-known for their notable performance in classification tasks. Hence, many skeleton-based action recognition and segmentation (SBARS) algorithms benefit from them in their designs. However, a shortcoming of such applications is the general lack of spatial relationships between the input features in such data types. Besides, non-uniform temporal scalings is a common issue in skeleton-based data streams which leads to having different input sizes even within one specific action category. In this work, we propose a novel deep-aligned convolutional neural network (DACNN) to tackle the above challenges for the particular problem of SBARS. Our network is designed by introducing a new type of filters in the context of CNNs which are trained based on their alignments to the local subsequences in the inputs. These filters result in efficient predictions as well as learning interpretable patterns in the data. We empirically evaluate our framework on real-world benchmarks showing that the proposed DACNN algorithm obtains a competitive performance compared to the state-of-the-art while benefiting from a less complicated yet more interpretable model.

【Keywords】: Action Recognition; Deep Learning; Skeleton based data; interpretation

122. A Distributed Fair Machine Learning Framework with Private Demographic Data Protection.

【Paper Link】【Pages】:1102-1107

【Authors】: Hui Hu ; Yijun Liu ; Zhen Wang ; Chao Lan

【Abstract】: Fair machine learning has become a significant research topic with broad societal impact. However, most fair learning methods require direct access to personal demographic data, which is increasingly restricted to use for protecting user privacy (e.g. by the EU General Data Protection Regulation). In this paper, we propose a distributed fair learning framework for protecting the privacy of demographic data. We assume this data is privately held by a third party, which can communicate with the data center (responsible for model development) without revealing the demographic information. We propose a principled approach to design fair learning methods under this framework, exemplify four methods and show they consistently outperform their existing counterparts in both fairness and accuracy across two real-world data sets. We theoretically analyze the framework, and prove it can learn models with high fairness or high accuracy, with their trade-offs balanced by a threshold variable.

【Keywords】: machine learning; fairness; data privacy; random projection; distributed learning

123. Constructing Educational Concept Maps with Multiple Relationships from Multi-Source Data.

【Paper Link】【Pages】:1108-1113

【Authors】: Xiaoqing Huang ; Qi Liu ; Chao Wang ; Haoyu Han ; Jianhui Ma ; Enhong Chen ; Yu Su ; Shijin Wang

【Abstract】: Concept map is an useful tool to help people organize and improve knowledge. Particularly in educational domain, it is beneficial for students and teachers to improve the learning and teaching quality. Traditionally, manual educational concept maps, provided by teachers, are quite time-consuming and limited to teachers' experience. Thus, it is meaningful to automatically construct high-quality concept maps. However, existing data-driven solutions only focus on either separate data source or single pedagogic relationship, which are not sufficient to satisfy actual demands. To this end, we propose a novel framework, named Extracting Multiple Relationships Concept Map (EMRCM), to construct multiple relations concept maps from Multi-source Data. Specifically, we design various targeted evidences to explore diverse information of multi-source data from different perspectives. Then, we employ three classic classifiers to bulid the predictive model for extracting key concepts and multiple concept relationships using the proposed evidences. We create a real dataset for empirically studying this problem. Extensive experiments on a real-world dataset show the effectiveness of our method.

【Keywords】: Educational concept map; Multi-source data; Multiple relationships

124. Unsupervised Qualitative Scoring for Binary Item Features.

【Paper Link】【Pages】:1114-1119

【Authors】: Koji Ichikawa ; Hiroshi Tamano

【Abstract】: The qualitative score of tags is widely used to describe which product is better in terms of the given property. For example, in a restaurant-navigation site, properties such as food, location, and mood are given in the form of numerical values, representing the goodness of each aspect. In this paper, we propose a novel approach to estimate the qualitative score from the binary features of products. Based on a natural assumption that an item with a better property is more popular among users who prefer that property, in short, "experts know best", we introduce one discriminative and two generative models with which user preferences and item-qualitative scores are inferred from user-item interactions. Our approach contributes to resolving the following difficulties: (1) no supervised data for the score estimation, (2) implicit user purpose, and (3) irrelevant tag contamination. We evaluate our models by using two artificial datasets and two real-world datasets of movie and book ratings and observe that our models outperform baseline models.

【Keywords】: label enhancement; unsupervised learning; collaborative filtering; topic model

125. Semi-Supervised Adversarial Domain Adaptation for Seagrass Detection in Multispectral Images.

【Paper Link】【Pages】:1120-1125

【Authors】: Kazi Aminul Islam ; Victoria Hill ; Blake A. Schaeffer ; Richard Zimmerman ; Jiang Li

【Abstract】: Seagrass form the basis for critically important marine ecosystems. Previously, we implemented a deep convolutional neural network (CNN) model to detect seagrass in multispectral satellite images of three coastal habitats in northern Florida. However, a deep CNN model trained at one location usually does not generalize to other locations due to data distribution shifts. In this paper, we developed a semi-supervised domain adaptation method to generalize a trained deep CNN model to other locations for seagrass detection. First, we utilized a generative adversarial network (GAN) loss to align marginal data distribution between source domain and target domain using unlabeled data from both data domains. Second, we used a few labelled samples from the target domain to align class specific data distributions between the two domains, based on the contrastive semantic alignment loss. We achieved the best results in 28 out of 36 scenarios as compared to other state-of-the-art domain adaptation methods.

【Keywords】: Deep convolutional neural network; Seagrass detection; Domain adaptation

126. CSNN: Contextual Sentiment Neural Network.

【Paper Link】【Pages】:1126-1131

【Authors】: Tomoki Ito ; Kota Tsubouchi ; Hiroki Sakaji ; Kiyoshi Izumi ; Tatsuo Yamashita

【Abstract】: Although deep neural networks are excellent for text sentiment analysis, their applications in real-world practice are occasionally limited owing to their black-box property. In response, we propose a novel neural network model called contextual sentiment neural network (CSNN) model that can explain the process of its sentiment analysis prediction in a way that humans find natural and agreeable. The CSNN has the following interpretable layers: the word-level original sentiment layer, word-level sentiment shift layer, word-level local contextual sentiment layer, word-level global importance layer, and word-level global contextual sentiment layer. Because of these layers, this network can explain the process of its document-level sentiment analysis results in a human-like way using these layers. Realizing the interpretability of each layer in the CSNN is a crucial problem in the development of this CSNN because the general back-propagation method cannot realize such interpretability. To realize this interpretability, we propose a novel learning strategy called initialization propagation (IP) learning. Using real textual datasets, we experimentally demonstrate that the proposed IP learning is effective for improving the interpretability of each layer in CSNN. We then experimentally demonstrate that both the predictability and explanation ability of the CSNN are high.

【Keywords】: Interpretable Neural Networks; Text-mining; Support System

127. Multi-view Outlier Detection in Deep Intact Space.

【Paper Link】【Pages】:1132-1137

【Authors】: Yu-Xuan Ji ; Ling Huang ; Heng-Ping He ; Chang-Dong Wang ; Guangqiang Xie ; Wei Shi ; Kun-Yu Lin

【Abstract】: Recently, multi-view outlier detection has emerged as a challenging research topic in outlier detection because of complex distributions of data across different views. There are mainly three types of outliers, i.e., attribute outliers, class outliers and class-attribute outliers. Most existing multi-view outlier detection approaches only detect part of the three types of outliers in a pairwise manner across different views, which is not able to accomplish the task of multi-view outlier detection comprehensively and uniformly. Outlier detection in a pairwise manner across different views also leads to time-consuming computation. We propose a new algorithm termed Multi-view Outlier Detection in Deep Intact Space (MODDIS) to find all the three types of outliers simultaneously and avoid comparing different views in a pairwise manner. Rather than leveraging subspace clustering, the performance of which is seriously affected by the dependence of subspaces on most real datasets, neural networks are employed in MODDIS in that neural networks have a stronger representation learning ability. Meanwhile, based on the view insufficiency assumption, a multi-view intact outlierness space assumption is proposed. Based on this assumption, a multi-view latent intact space is constructed to encode outlierness information of all views, where outlierness in any view is a snapshot from some perspective. Finally, an outlier detection measurement is defined in the latent intact space. Experiments are conducted on several UCI datasets and the empirical results demonstrate the effectiveness of our proposed method.

【Keywords】: outlier detection; multi-view data; intact space learning; deep model

128. Block-Structured Optimization for Anomalous Pattern Detection in Interdependent Networks.

【Paper Link】【Pages】:1138-1143

【Authors】: Fei Jie ; Chunpai Wang ; Feng Chen ; Lei Li ; Xindong Wu

【Abstract】: We propose a generalized optimization framework for detecting anomalous patterns (subgraphs that are interesting or unexpected) in interdependent networks, such as multi-layer networks, temporal networks, networks of networks, and many others. We frame the problem as a non-convex optimization that has a general nonlinear score function and a set of block-structured and non-convex constraints. We develop an effective, efficient, and parallelizable projection-based algorithm, namely Graph Block-structured Gradient Projection (GBGP), to solve the problem. It is proved that our algorithm 1) runs in nearly-linear time on the network size, and 2) enjoys a theoretical approximation guarantee. Moreover, we demonstrate how our framework can be applied to two very practical applications, and we conduct comprehensive experiments to show the effectiveness and efficiency of our proposed algorithm.

【Keywords】: subgraph detection; sparse optimization; interdependent networks

129. Network Identification and Authentication.

【Paper Link】【Pages】:1144-1149

【Authors】: Shengmin Jin ; Vir V. Phoha ; Reza Zafarani

【Abstract】: Research on networks is commonly performed using anonymized network data for various reasons such as protecting data privacy. Under such circumstances, it is difficult to verify the source of network data, which leads to questions such as: Given an anonymized graph, can we identify the network from which it is collected? Or if one claims the graph is sampled from a certain network, can we verify it? The intuitive approach is to check for subgraph isomorphism. However, subgraph isomorphism is NP-complete; hence, infeasible for most large networks. Inspired by biometrics studies, we address these challenges by formulating two new problems: network identification and network authentication. To tackle these problems, similar to research on human fingerprints, we introduce two versions of a network identity: (1) embedding-based identity and (2) distribution-based identity. We demonstrate the effectiveness of these network identities on various real-world networks. Using these identities, we propose two approaches for network identification. One method uses supervised learning and can achieve an identification accuracy rate of 94.7%, and the other, which is easier to implement, relies on distances between identities and achieves an accuracy rate of 85.5%. For network authentication, we propose two methods to build a network authentication system. The first is a supervised learner and provides a low false accept rate and the other method allows one to control the false reject rate with a reasonable false accept rate across networks. Our study can help identify or verify the source of network data, validate network-based research, and be used for network-based biometrics.

【Keywords】: Network Identification; Network Authentication; Network Representation Learning; Network Embedding

130. Discovering Robustly Connected Subgraphs with Simple Descriptions.

【Paper Link】【Pages】:1150-1155

【Authors】: Janis Kalofolias ; Mario Boley ; Jilles Vreeken

【Abstract】: We study the problem of discovering robustly connected subgraphs that have simple descriptions. Our aim is, hence, to discover vertex sets which not only a) induce a subgraph that is difficult to fragment into disconnected components, but also b) can be selected from the entire graph using just a simple conjunctive query on their vertex attributes. Since many subgraphs do not have such a simple logical description, first mining robust subgraphs and post-hoc discovering their description leads to sub-optimal results. Instead, we propose to optimise over describable subgraphs only. To do so efficiently we propose a non-redundant iterative deepening approach, which we equip with a linear-time tight optimistic estimator that allows pruning large parts of the search space. Extensive empirical evaluation shows that our method can handle large real-world graphs, and discovers easily interpretable and meaningful subgraphs.

【Keywords】: Dense subgraph mining; Subgroup discovery; pattern mining; k core decomposition; robust connectedness

131. Matrix Profile XV: Exploiting Time Series Consensus Motifs to Find Structure in Time Series Sets.

【Paper Link】【Pages】:1156-1161

【Authors】: Kaveh Kamgar ; Shaghayegh Gharghabi ; Eamonn J. Keogh

【Abstract】: In recent years the data mining community has largely coalesced around the idea that many problems in time series analytics essentially reduce to finding and then reasoning about repeated structure in time series. Existing tools can find conserved structure within a single time series (motifs) and between pairs of time series (joins). However, to date there are no tools to find repeated structure in sets of time series, an idea we call time series consensus motifs in recognition of their similarity to their discrete analogs in DNA strings. In this work we introduce a definition of time series consensus motifs and a scalable algorithm to discover them in large data collections. We further show that given this new primitive, we can solve multiple higherlevel problems in time series data mining. We demonstrate the utility of our ideas with case studies in diverse domains.

【Keywords】: time series; motif discovery

132. Iterative Graph Alignment via Supermodular Approximation.

【Paper Link】【Pages】:1162-1167

【Authors】: Aritra Konar ; Nicholas D. Sidiropoulos

【Abstract】: Graph matching, the problem of aligning a pair of graphs so as to minimize their edge disagreements, has received widespread attention owing to its broad spectrum of applications in data science. As the problem is NP-hard in the worst-case, a variety of approximation algorithms have been proposed for obtaining high quality, suboptimal solutions. In this paper, we approach the task of designing an efficient polynomial-time approximation algorithm for graph matching from a previously unconsidered perspective. Our key result is that graph matching can be formulated as maximizing a monotone, supermodular set function subject to matroid intersection constraints. We leverage this fact to apply a discrete optimization variant of the minorization-maximization algorithm which exploits supermodularity of the objective function to iteratively construct and maximize a sequence of global lower bounds on the objective. At each step, we solve a maximum weight matching problem in a bipartite graph. Differing from prior approaches, the algorithm exploits the combinatorial structure inherent in the problem to generate a sequence of iterates featuring monotonically non-decreasing objective value while always adhering to the combinatorial matching constraints. Experiments on real-world data demonstrate the empirical effectiveness of the algorithm relative to the prevailing state-of-the-art.

【Keywords】: Graph matching; supermodular optimization; matroid intersection; majorization minimization

133. Transfer Metric Learning for Unseen Domains.

【Paper Link】【Pages】:1168-1173

【Authors】: Atsutoshi Kumagai ; Tomoharu Iwata ; Yasuhiro Fujiwara

【Abstract】: We propose a transfer metric learning method to infer domain-specific data embeddings for unseen domains, from which no data are given in the training phase, by using knowledge transferred from related domains. When training and test distributions are different, the standard metric learning cannot infer appropriate data embeddings. The proposed method can infer appropriate data embeddings for the unseen domains by using latent domain vectors, which are latent representations of domains and control the property of data embeddings for each domain. This latent domain vector is inferred by using a neural network that takes the set of feature vectors in the domain as an input. The neural network is trained without the unseen domains. The proposed method can instantly infer data embeddings for the unseen domains without (re)-training once the sets of feature vectors in the domains are given. To accumulate knowledge in advance, the proposed method uses labeled and unlabeled data in multiple source domains. Labeled data, i.e., data with label information such as class labels or pair (similar/dissimilar) constraints, are used for learning data embeddings in such a way that similar data points are close and dissimilar data points are separated in the embedding space. Although unlabeled data do not have labels, they have geometric information that characterizes domains. The proposed method incorporates this information in a natural way on the basis of a probabilistic framework. The conditional distributions of the latent domain vectors, the embedded data, and the observed data are parametrized by neural networks and are optimized by maximizing the variational lower bound. The effectiveness of the proposed method was demonstrated through experiments using three clustering tasks.

【Keywords】: Metric Learning; Transfer Learning

【Paper Link】【Pages】:1174-1179

【Authors】: Jinning Li ; Yirui Gao ; Xiaofeng Gao ; Yan Shi ; Guihai Chen

【Abstract】: Topic popularity prediction is an important task on social media, which aims at predicting the ongoing trends of topics according to logged historical text-based records. However, only limited existing approaches apply sentiment analysis to facilitate popularity prediction. Public sentiment is worth taking into consideration because the topics with strong sentiment tend to spread faster and broader on social media. In this paper, we propose a novel framework, SENTI2POP, to predict topic popularity utilizing sentiment information. We first adapt a state-of-art popularity quantification method to capture the topic popularity, and then design a novel tree-like network (Tree-Net) combining Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) for sentiment analysis. In addition, we propose a sentiment-aware time series prediction approach based on Dynamic Time Warping (DTW) and Autoregressive Integrated Moving Average model (ARIMA) to predict topic popularity. We prove by experiments that SENTI2POP outperforms the existing popularity prediction models on a real-world Twitter dataset by reducing the prediction error. Experimental results also show that SENTI2POP could be applied to improve the accuracy of most non-sentiment popularity prediction models.

【Keywords】: Topic Popularity Prediction, Social Media, Sentiment Analysis, Time Series Alignment, Neural Networks

【Paper Link】【Pages】:1180-1185

【Authors】: Ye Li ; Hong Xie ; Yishi Lin ; John C. S. Lui

【Abstract】: Firms are now considering to offer rewards to customers who recommend the firms' products/services in online social networks (OSN). However, the pros and cons of such social recommendation scheme are still unclear. Thus, it is difficult for firms to design rewarding schemes. Via empirical analysis of data, we first identify key factors that affect the spreading of a firm's product in OSNs. These findings enable us to develop an accurate (i.e., with a high validation accuracy) mathematical model on social recommendations. In particular, our model captures how users decide whether to recommend an item, which is a key factor but often ignored by previous social recommendation models such as the "Independent Cascade model". We also design algorithms to infer model parameters. Using these parameters in our model, we uncover conditions when social recommendation can (or cannot) improves a firm's profit. These conditions help a firm to design rewarding schemes. Finally, we extend our model to an online setting and design reinforcement learning algorithms for a firm to dynamically optimize its rewarding schemes.

【Keywords】: Social recommendation; Economics; Incentive; Reinforcement Learning

136. Dense Semantic Matching Network for Multi-turn Conversation.

【Paper Link】【Pages】:1186-1191

【Authors】: Yongrui Li ; Jun Yu ; Zengfu Wang

【Abstract】: Mining the semantic information in the text to model the multi-turn conversation has attracted great interests. Previous models ignore the capturing and utilizing of the matching patterns at different levels, which causes the loss of the valuable information for the calculation of the matching degree. To address the problem, we propose a dense semantic matching network (DMN). Given a context-response pair, DMN first constructs semantic representations for the response candidate and each utterance in the context. Then, DMN models the interaction between the response candidate and each utterance in the context to generate interactive matrices. By processing the interactive matrices with dense convolutional blocks, the hierarchical matching patterns are generated for each utterance-response pair. The matching patterns of all the utterance-response pairs are finally accumulated in chronological order with bidirectional long short-term memory network. The final matching score of the response candidate and the multi-turn context is finally calculated. We evaluate the performance of our network on the benchmark dataset. The results show that our network yields a significant performance gain compared with other methods.

【Keywords】: multi-turn conversation; semantic model; text mining; deep learning, attention mechanism

137. Contrast Feature Dependency Pattern Mining for Controlled Experiments with Application to Driving Behavior.

【Paper Link】【Pages】:1192-1197

【Authors】: Qingzhe Li ; Liang Zhao ; Yi-Ching Lee ; Yanfang Ye ; Jessica Lin ; Lingfei Wu

【Abstract】: A controlled experiment is an empirical interventional study method to evaluate the causal impact of an intervention, by identifying the dynamic feature dependency patterns in the contrast multivariate time series (CMTS) collected from the control and experimental groups. Manually labeling or interpreting the effects caused by the intervention from the CMTS data has become an infeasible task even for domain experts. Thus, it is imperative to develop an integrated technique, preferably in an unsupervised manner, that can simultaneously identify and characterize feature dynamic dependencies and their contrast patterns in CMTS, which we call the contrast dynamic feature dependency (CDFD) patterns. In this paper, we propose a generative model with partial correlation-based feature dependency regularization to help analysts understand the CMTS data by jointly 1) characterizing a set of comparable multivariate Gaussian distributions from CMTS, and 2) determining whether the intervention causes the changes between two comparable distributions. Extensive experiments demonstrate the effectiveness and scalability of the proposed method. The proposed method applied to a driving behavior application demonstrates its utility and interpretability.

【Keywords】: Contrast Pattern; Time Series; Markov Random Fields; Feature Dependency; Driving Behavior; Controlled Experiment

138. Mining Maximal Clique Summary with Effective Sampling.

【Paper Link】【Pages】:1198-1203

【Authors】: Xiaofan Li ; Rui Zhou ; Yujun Dai ; Lu Chen ; Chengfei Liu ; Qiang He ; Yun Yang

【Abstract】: Maximal clique enumeration (MCE) is a fundamental problem in graph theory and is used in many applications, such as social network analysis, bioinformatics, intelligent agent systems, cyber security. Most existing MCE algorithms focus on improving the efficiency rather than reducing the size of the output, which could consist of a large number of maximal cliques. In this paper, we study how to report a summary of less overlapping maximal cliques. The problem was studied before, however, after examining the pioneer approach, we consider it still not satisfactory. To advance the research along this line, this paper attempts to make two contributions: (a) We propose a more effective sampling strategy, which produces a much smaller summary but still ensures that the summary can somehow witness all the maximal cliques and the expectation of each maximal clique witnessed by the summary is above a predefined threshold. (b) To verify experimentally, we tested ten real benchmark datasets that have a variety of graph characteristics. The results show that our new sampling strategy consistently outperforms the state-of-the-art method by producing smaller summaries and running faster on all the datasets.

【Keywords】: maximal clique; clique summary; clique enumeration; clique sampling

139. Consistency Meets Inconsistency: A Unified Graph Learning Framework for Multi-view Clustering.

【Paper Link】【Pages】:1204-1209

【Authors】: Youwei Liang ; Dong Huang ; Chang-Dong Wang

【Abstract】: Graph Learning has emerged as a promising technique for multi-view clustering, and has recently attracted lots of attention due to its capability of adaptively learning a unified and probably better graph from multiple views. However, the existing multi-view graph learning methods mostly focus on the multi-view consistency, but neglect the potential multi-view inconsistency (which may be incurred by noise, corruptions, or view-specific characteristics). To address this, this paper presents a new graph learning-based multi-view clustering approach, which for the first time, to our knowledge, simultaneously and explicitly formulates the multi-view consistency and the multi-view inconsistency in a unified optimization model. To solve this model, a new alternating optimization scheme is designed, where the consistent and inconsistent parts of each single-view graph as well as the unified graph that fuses the consistent parts of all views can be iteratively learned. It is noteworthy that our multi-view graph learning model is applicable to both similarity graphs and dissimilarity graphs, leading to two graph fusion-based variants, namely, distance (dissimilarity) graph fusion and similarity graph fusion. Experiments on various multi-view datasets demonstrate the superiority of our approach. The MATLAB source code is available at https://github.com/youweiliang/ConsistentGraphLearning.

【Keywords】: Multi-view graph learning; Multi-view clustering; Graph fusion; Consistency; Inconsistency

140. Predicting Water Quality for the Woronora Delivery Network with Sparse Samples.

【Paper Link】【Pages】:1210-1215

【Authors】: Bin Liang ; Dammika Vitanage ; Corinna Doolan ; Zhidong Li ; Ronnie Taib ; George Mathews ; Yang Wang ; Shiyang Lu ; Fang Chen ; Tin Hua ; Andrew Peters

【Abstract】: Monitoring drinking water quality in the entire delivery network, mainly indicated by total chlorine (TC), is a critical component of overall water supply management. However, it is extremely difficult to collect sufficient TC data from the network at customer sites, which makes it sparse for comprehensive modelling. This paper details an approach that provides TC prediction within the entire Woronora delivery network in Sydney in the next 24 hours. First, the hydraulic system is employed to capture the topology of the delivery network, so that the water travel time can be estimated using predicted water demand. The travel time links the upstream (reservoir) data to the downstream (resident) data. Then, a two-step strategy is proposed as a semi-parametric method to determine the crucial factors and build Bayesian model for TC decay to predict TC with the travel time. Lastly, the uncertainties of both data and the model are analysed to define the boundaries of prediction for better decision making. Several operational stages are involved when the approach is being deployed, including prediction interpretation, interactive tool development for water quality mapping and visualisation, and proactive optimisation. This has established a successful initiative to improve the overall water supply management for the entire Woronora delivery network.

【Keywords】: Bayesian models, uncertainty analysis, sparse sample, data mining model deployment, water quality prediction

141. First Index-Free Manifold Ranking-Based Image Retrieval with Output Bound.

【Paper Link】【Pages】:1216-1221

【Authors】: Dandan Lin ; Victor Junqiu Wei ; Raymond Chi-Wing Wong

【Abstract】: Image retrieval keeps attracting a lot of attention from both academic and industry over past years due to its variety of useful applications. Due to the rapid growth of deep learning approaches, more better feature vectors of images could be discovered for improving image retrieval. However, most (if not all) existing deep learning approaches consider the similarity between 2 images locally without considering the similarity among a group of similar images globally, and thus could not return accurate results. In this paper, we study the image retrieval with manifold ranking (MR) which considers both the local similarity and the global similarity, which could give more accurate results. However, existing best-known algorithms have one of the following issues: (1) They require a bulky index, (2) some of them do not have any theoretical bound on the output, and (3) some of them are time-consuming. Motivated by this, we propose an algorithm, namely Monte Carlo-based MR (MCMR) for image retrieval, which does not have the above issues. We are the first one to propose an index-free manifold ranking-based image retrieval with the output theoretical bound. Lastly, our experiments show that MCMR outperforms existing algorithms by up to 4 orders of magnitude in terms of query time.

【Keywords】: Manifold Ranking; Image Retrieval; Random Walk sampling

142. What is the Value of Experimentation & Measurement?

【Paper Link】【Pages】:1222-1227

【Authors】: C. H. Bryan Liu ; Benjamin Paul Chamberlain

【Abstract】: Experimentation and Measurement (E&M) capabilities allow organizations to accurately assess the impact of new propositions and to experiment with many variants of existing products. However, until now, the question of measuring the measurer, or valuing the contribution of an E&M capability to organizational success has not been addressed. We tackle this problem by analyzing how, by decreasing estimation uncertainty, E&M platforms allow for better prioritization. We quantify this benefit in terms of expected relative improvement in the performance of all new propositions and provide guidance for how much an E&M capability is worth and when organizations should invest in one.

【Keywords】: experimentation; measurement; ranking under uncertainty; prioritization; operational research

143. Spatio-Temporal GRU for Trajectory Classification.

【Paper Link】【Pages】:1228-1233

【Authors】: Hongbin Liu ; Hao Wu ; Weiwei Sun ; Ickjai Lee

【Abstract】: Spatio-temporal trajectory classification is a fundamental problem for location-based services with many real-world applications such as travel mode classification, animal mobility detection, and location recommendation. In the literature, many approaches have been proposed to solve this classification task including deep learning models like LSTM recently for sequence classification. However, these approaches fail to consider both spatial and temporal interval information simultaneously, but share some common drawbacks: omitting either the spatial information or the temporal interval information out. Some models like Time-LSTM, have been proposed to handle the temporal interval information for spatio-temporal trajectories, but they do not take into account the spatial information. Note that, considering both spatial and temporal interval information is crucial for spatio-temporal data mining in order not to miss any spatio-temporal pattern. In this study, we propose a trajectory classifier called Spatio-Temporal GRU to better model the spatio-temporal correlations and irregular temporal intervals prevalently present in spatio-temporal trajectories. We introduce a novel segmented convolutional weight mechanism to capture short-term local spatial correlations in trajectories and propose an additional temporal gate to control the information flow related to the temporal interval information. Performance evaluation demonstrates that our proposed model outperforms popular deep learning approaches for the travel model classification problem.

【Keywords】: trajectory classification; spatio-temporal trajectory; GRU; travel model classification; deep learning

144. Learning Local and Global Multi-context Representations for Document Classification.

【Paper Link】【Pages】:1234-1239

【Authors】: Yi Liu ; Hao Yuan ; Shuiwang Ji

【Abstract】: Context-based attention models achieve state-of-the-art results on constructing document representations. Current methods build document representations based on a trainable global context vector for all documents, while local contexts capturing document-specific information are ignored. Representing document context as a vector also restricts the model's capacity to capture complete semantic components of documents. In this work, we address these limitations by proposing novel attention-based approaches. We first propose a local and global context attention (LGCA) model, which constructs the context representations of a document as a combination of global and local contexts. The local context is used to capture local document-specific information. Furthermore, a multi-context attention (MCA) model is proposed to capture multiple local contexts within a document, thereby fusing the overall semantic information of documents. We conduct experiments on document-level sentiment classification tasks where extra user and product information is shown to be important. Experimental results indicate that our methods significantly outperform previous models with and without user and product information. It is also demonstrated that our model has great potential of handling extra information to improve document classification performance.

【Keywords】: document classification; attention; local and global features

145. I-CARS: An Interactive Context-Aware Recommender System.

【Paper Link】【Pages】:1240-1245

【Authors】: Rosni Lumbantoruan ; Xiangmin Zhou ; Yongli Ren ; Lei Chen

【Abstract】: Context-aware recommendation has attracted significant attentions over online sites due to its smart context adaption in improving recommendation quality. However, the user's instant contexts do not follow his/her regular user behaviour patterns, thus have not been well captured for advanced personalization of recommendation generation. In this work, we propose an Interactive Context-Aware Recommender System (I-CARS), which allows users to interact and present their needs, so the system can personalize and refine user preferences. I-CARS iteratively asks a question to a user to trigger feedback in term of her recent contexts and incorporates the response to recommend items most likely satisfying his/her instant interests. Specifically, we first propose a Personalized Weighted Context-Aware Matrix Factorization (PW-CAMF) that enables the personalization of important contexts for each user. Then we propose two question selection strategies that exploit user preferences through feedback. We have conducted comprehensive experiments over two real datasets. The experimental results prove the effectiveness of our I-CARS system compare to existing competitors.

【Keywords】: interactive, context-aware, recommender system

146. Triple-Shapelet Networks for Time Series Classification.

【Paper Link】【Pages】:1246-1251

【Authors】: Qianli Ma ; Wanqing Zhuang ; Garrison W. Cottrell

【Abstract】: Shapelets are discriminative subsequences for time series classification (TSC). Although shapelet-based methods have achieved good performance and interpretability, they still have two issues that can be improved. First, previous methods only assess a shapelet by how accurately it can classify all the samples. However, for multi-class imbalanced classification tasks, these methods will ignore the shapelets that can distinguish minority class from other classes and will tend to use the shapelets that are useful for discriminating the majority classes. Second, the shapelets are fixed after the training phase and cannot adapt to time series with deformations, which will lead to poor matches to the shapelets. In this paper, we propose a novel end-to-end shapelet learning model called Triple Shapelet Networks (TSNs) to extract multi-level feature representations. Specifically, TSN learns the most discriminative shapelets by gradient descent similar to previous methods. In addition, it learns category-specific shapelets for each class by using auxiliary binary classifiers. Finally, it uses a shapelet generator to produce sample-specific shapelets conditioned on subsequences of the input time series. The addition of category-level and sample-level shapelets to the standard model improves the performance. Experiments conducted on extensive time series data sets show that TSN is state-of-the-art compared to existing shapelet-based methods, and the visualization analysis also shows its effectiveness.

【Keywords】: Shapelet transformation, time series classification, temporal feature learning

147. Discovering Reliable Correlations in Categorical Data.

【Paper Link】【Pages】:1252-1257

【Authors】: Panagiotis Mandros ; Mario Boley ; Jilles Vreeken

【Abstract】: In many scientific tasks we are interested in finding correlations in our data. This raises many questions, such as how to reliably and interpretably measure correlation between a multivariate set of attributes, how to do so without having to make assumptions on data distribution or the type of correlation, and, how to search efficiently for the most correlated attribute sets. We answer these questions for discovery tasks with categorical data. In particular, we propose a corrected-for-chance, consistent, and efficient estimator for normalized total correlation, in order to obtain a reliable, interpretable, and non-parametric measure for correlation over multivariate sets. For the discovery of the top-k correlated sets, we derive an effective algorithmic framework based on a tight bounding function. This framework offers exact, approximate, and heuristic search. Empirical evaluation shows that already for small sample sizes the estimator leads to low-regret optimization outcomes, while the algorithms are shown to be highly effective for both large and high-dimensional data. Through a case study we confirm that our discovery framework identifies interesting and meaningful correlations.

【Keywords】: knowledge discovery, information theory, total correlation, optimization, branch-and-bound

148. Deep Embedded Cluster Tree.

【Paper Link】【Pages】:1258-1263

【Authors】: Dominik Mautz ; Claudia Plant ; Christian Böhm

【Abstract】: The idea of combining the high representational power of deep learning techniques with clustering methods has gained much interest in recent years. Optimizing representation and clustering simultaneously has been shown to have an advantage over optimizing them separately. However, so far all proposed methods have been using a flat clustering strategy, with the true number of clusters known a priori. In this paper, we propose the Deep Embedded Cluster Tree (DeepECT), the first divisive hierarchical embedded clustering method. The cluster tree does not need to know the true number of clusters during optimization. Instead, the level of detail to be analyzed can be chosen afterward and for each sub-tree separately. An optional data-augmentation-based extension allows DeepECT to ignore prior-known invariances of the dataset, such as affine transformations in image data. We evaluate and show the advantages of DeepECT in extensive experiments.

【Keywords】: embedded clustering; hierarchical clustering; autoencoder; deep learning

149. Recognizing Variables from Their Data via Deep Embeddings of Distributions.

【Paper Link】【Pages】:1264-1269

【Authors】: Jonas Mueller ; Alex Smola

【Abstract】: A key obstacle in automated analytics and meta-learning is failing to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be more robustly addressed by leveraging the data values themselves, rather than relying on their arbitrarily selected variable names. Here, we present a computationally efficient method to identify high-confidence variable matches between a given set of data values and a large repository of previously encountered datasets. Our approach enjoys numerous advantages over distributional similarity based techniques because we leverage learned vector embeddings of datasets which adaptively account for natural forms of data variation encountered in practice. Based on the neural architecture of deep sets, our embeddings can be computed for both numeric and string data, and empirically outperform standard statistical techniques for dataset search.

【Keywords】: Data integration; Neural networks

150. Efficient Bayesian Optimization for Uncertainty Reduction Over Perceived Optima Locations.

【Paper Link】【Pages】:1270-1275

【Authors】: Vu Nguyen ; Sunil Gupta ; Santu Rana ; My Thai ; Cheng Li ; Svetha Venkatesh

【Abstract】: Bayesian optimization (BO) is concerned with efficient optimization using probabilistic methods. Predictive entropy search (PES) is a popular and successful BO strategy to find a point that maximizes the information gained about the optima location of an unknown function. Since the PES analytical form is intractable, it requires approximations and is computationally expensive. These approximations may degrade PES performance in terms of accuracy and efficiency. In this paper, we propose an alternative scheme - predictive variance reduction search (PVRS) - to find a point that maximally reduces the uncertainty at the perceived optima locations. The optimization converges to the true optimum when the uncertainty at all perceived optima locations is vanished. Our novel modification is beneficial in two ways. First, PVRS can be computed in closed-form, unlike the approximations made in PES. Second, PVRS is simple and easy to implement. As a result, the proposed PVRS gains huge speed up for scalable BO whilst showing favorable optimization efficiency. Furthermore, we extend our PVRS framework for batch setting where we select multiple experiments for parallel evaluations at each iteration. Empirically, we demonstrate the effectiveness of the PVRS on both benchmark functions and real-world applications in standard and batch BO settings.

【Keywords】: Bayesian optimization, uncertainty reduction, hyper-parameter tuning, experimental design

151. Efficient Mining and Exploration of Multiple Axis-Aligned Intersecting Objects.

【Paper Link】【Pages】:1276-1281

【Authors】: Tilemachos Pechlivanoglou ; Vincent Chu ; Manos Papagelis

【Abstract】: Identifying and quantifying the size of multiple intersections among a large number of axis-aligned geometric objects is an essential computational geometry problem. The ability to solve this problem can effectively inform a number of spatial data mining methods and can provide support in decision making for a variety of applications. Currently, the state-of-the-art approach for addressing such intersection problems resorts to an algorithmic paradigm, collectively known as the sweep-line algorithm. However, its application on specific instances of the problem inherits a number of limitations. With that mind, we design and implement a novel, exact, fast and scalable yet versatile, sweep-line based algorithm, named SLIG. Our algorithm can be employed in a number of problems and applications involving the efficient computation of numerous axis-aligned object intersection problems in multiple dimensions. The key idea of our algorithm lies in constructing an auxiliary data structure when the sweep line algorithm is applied, an intersection graph. This graph can effectively be used to provide connectivity properties among overlapping objects, as well as to inform the much harder problem of finding the location and size of the common area defined by multiple overlapping objects. A thorough experimental evaluation on synthetic data of various characteristics and sizes, demonstrates that SLIG performs significantly faster than classic sweep-line based algorithms. SLIG is not only faster and more versatile, but also provides a suite of powerful querying capabilities. To support the reproducibility of our methods, we make source code and datasets available.

【Keywords】: geometric intersection problems; multiple intersections; intersection graph; sweep-line

152. An Integrated Multimodal Attention-Based Approach for Bank Stress Test Prediction.

【Paper Link】【Pages】:1282-1287

【Authors】: Farid Razzak ; Fei Yi ; Yang Yang ; Hui Xiong

【Abstract】: Since the financial crisis in late 2008-2009, several global regulatory authorities have mandated stress-testing exercises to evaluate the potential capital shortfalls & systemic impacts that large banks may face during adverse economic conditions. Thus, having the ability to analyze economic conditions & banking performance profiles together to determine relationships among their respective features may provide insights for stress testing tasks. In this paper, we propose an Integrated Multimodal Bank Stress Test Prediction (IMBSTP) model framework consisting of a two-stages; (1) economic conditions estimator to approximate joint representation among the exogenous factors using generative models, (2) bank capital & loss forecaster to project stress test measures based on dimensional & temporal features selected from the exogenous economic conditions & banking performance profiles using a dual-attention recurrent neural network. Extensive experimentation is performed on historical economic conditions & consolidated financial statements of U.S. bank holdings companies to show the effectiveness of our approach when compared to state-of-the-art baseline methods.

【Keywords】: Deep Learning; Multimodal Conditional Generative Models; Recurrent Neural Networks; Bank Stress-Test

153. Dual Adversarial Learning Based Network Alignment.

【Paper Link】【Pages】:1288-1293

【Authors】: Jiaxiang Ren ; Yang Zhou ; Ruoming Jin ; Zijie Zhang ; Dejing Dou ; Pengwei Wang

【Abstract】: Network alignment, which aims to learn a matching between the same entities across multiple information networks, often suffers challenges from feature inconsistency, high-dimensional features, to unstable alignment results. This paper presents a novel network alignment framework, RANA, that combines dual generative adversarial network (GAN) techniques to match the distributions of two networks based on two dimensions of distance and shape. First, we propose an adversarial network distribution matching model to perform the bidirectional cross-network alignment translations between two networks, such that the cross-network transformed distributions of two networks move closer to each other and finally meet with each other halfway. In addition, a homophily consistency loss is introduced to maintain the vertex homophily consistency between pairwise vertices on two networks in both the embedding space. Second, in order to address the feature inconsistency issue, we integrate a dual adversarial autoencoder module with an adversarial two-class classification model together to twist the cross-network transformed distributions of two networks, such that two distributions could have the same shape. This facilitates the translations of the distributions of two networks in the adversarial network distribution matching model. Moreover, a semantic preservation loss is introduced to preserve the original embedding semantics of one network when this network is translated to another network and returned to itself. Third but last, the competition game by integrating the above two adversarial models together can help project two copies of the same vertices with high-dimensional inconsistent features into the same low-dimensional embedding space, and thus guarantee the distribution consistency between two networks in terms of both distance and shape.

【Keywords】: Network alignment, high dimensional inconsistent features, adversarial matching, adversarial classification

154. On Privacy of Socially Contagious Attributes.

【Paper Link】【Pages】:1294-1299

【Authors】: Aria Rezaei ; Jie Gao

【Abstract】: A common approach to protect user's privacy in data collection is to perform random perturbations on user's sensitive data before collection in a way that aggregated statistics can still be inferred without endangering individual secrets. In this paper, we take a closer look at the validity of Differential Privacy guarantees, when sensitive attributes are subject to social contagion. We first show that in the absence of any knowledge about the contagion network, an adversary that tries to predict the real values from perturbed ones, cannot train a classifier that achieves an area under the ROC curve (AUC) above 1-(1-δ)/(1+e ^ε ), if the dataset is perturbed using an (ε,δ)-differentially private mechanism. Then, we show that with the knowledge of the contagion network and model, one can do substantially better. We demonstrate that our method passes the performance limit imposed by differential privacy. Our experiments also reveal that nodes with high influence on others are at more risk of revealing their secrets than others. Our method's superior performance is demonstrated through extensive experiments on synthetic and real-world networks.

【Keywords】: Data Mining; Social Computing; Data Privacy

155. Nearest Neighbor Classifiers Versus Random Forests and Support Vector Machines.

【Paper Link】【Pages】:1300-1305

【Authors】: Saket Sathe ; Charu C. Aggarwal

【Abstract】: Recent experimental studies have shown that the support vector machine and the random forest are "stand out" models that outperform most other supervised learning methods in benchmarked evaluations on real-world data sets. Studies have also established that both these models are closely related to nearest neighbor classifiers. Due to this connection, one would expect a similar performance from a nearest neighbor classifier; however, the latter lags greatly behind the support vector machine and the random forest in most evaluations. On the contrary, the nearest neighbor classifier has great theoretical potential because of its ability to model complex decision boundaries with low bias. Furthermore, it theoretically guarantees an error of no more than twice the Bayes error rate on an infinite data set. In this paper, we examine this obvious disconnect between the great theoretical promise and empirical reality of the nearest neighbor classifier. An additional point is that the random forest and support vector machine can be considered eager forms of the (lazy) nearest neighbor classifier. Inspired by the connections between nearest neighbor classifiers, support vector machines and random forest models, we propose a nearest neighbor classifier that is competitive to these models.

【Keywords】: nearest neighbor classification, random forests, support vector machines

156. Multi-graph Convolution Collaborative Filtering.

【Paper Link】【Pages】:1306-1311

【Authors】: Jianing Sun ; Yingxue Zhang ; Chen Ma ; Mark Coates ; Huifeng Guo ; Ruiming Tang ; Xiuqiang He

【Abstract】: Personalized recommendation is ubiquitous, playing an important role in many online services. Substantial research has been dedicated to learning vector representations of users and items with the goal of predicting a user's preference for an item based on the similarity of the representations. Techniques range from classic matrix factorization to more recent deep learning based methods. However, we argue that existing methods do not make full use of the information that is available from user-item interaction data and the similarities between user pairs and item pairs. In this work, we develop a graph convolution-based recommendation framework, named Multi-Graph Convolution Collaborative Filtering (Multi-GCCF), which explicitly incorporates multiple graphs in the embedding learning process. Multi-GCCF not only expressively models the high-order information via a bipartite user-item interaction graph, but integrates the proximal information by building and processing user-user and item-item graphs. Furthermore, we consider the intrinsic difference between user nodes and item nodes when performing graph convolution on the bipartite graph. We conduct extensive experiments on four publicly accessible benchmarks, showing significant improvements relative to several state-of-the-art collaborative filtering and graph neural network-based recommendation models. Further experiments quantitatively verify the effectiveness of each component of our proposed model and demonstrate that the learned embeddings capture the important relationship structure.

【Keywords】: Graph neural networks; Recommendation system; Collaborative filtering

157. Space-Efficient Feature Maps for String Alignment Kernels.

【Paper Link】【Pages】:1312-1317

【Authors】: Yasuo Tabei ; Yoshihiro Yamanishi ; Rasmus Pagh

【Abstract】: String kernels are attractive data analysis tools for analyzing string data. Among them, alignment kernels are known for their high prediction accuracies in string classifications when tested in combination with SVM in various applications. However, alignment kernels have a crucial drawback in that they scale poorly due to their quadratic computation complexity in the number of input strings, which limits large-scale applications in practice. We address this need by presenting the first approximation for string alignment kernels, which we call space-efficient feature maps for edit distance with moves (SFMEDM), by leveraging a metric embedding named edit sensitive parsing (ESP) and feature maps (FMs) of random Fourier features (RFFs). The original FMs for RFFs consume a huge amount of memory proportional to the dimension d of input vectors and the dimension D of output vectors. Thus, we present novel space-efficient feature maps (SFMs) of RFFs for a space reduction from O(dD) of the original FMs to O(d) of SFMs with a theoretical guarantee with respect to concentration bounds. We experimentally test SFMEDM on its ability to learn SVM for large-scale string classifications with various massive string data, and we demonstrate the superior performance of SFMEDM with respect to prediction accuracy, scalability and computation efficiency.

【Keywords】: feature maps; kernel approximation; string alignment kernels

158. An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits.

【Paper Link】【Pages】:1318-1323

【Authors】: Kei Takemura ; Shinji Ito

【Abstract】: Combinatorial linear semi-bandits (CLS) are widely applicable frameworks of sequential decision-making, in which a learner chooses a subset of arms from a given set of arms associated with feature vectors. Existing algorithms work poorly for the clustered case, in which the feature vectors form several large clusters. This shortcoming is critical in practice because it can be found in many applications, including recommender systems. In this paper, we clarify why such a shortcoming occurs, and we introduce a key technique of arm-wise randomization to overcome it. We propose two algorithms with this technique: the perturbed C^2 UCB (PC^2 UCB) and the Thompson sampling (TS). Our empirical evaluation with artificial and real-world datasets demonstrates that the proposed algorithms with the arm-wise randomization technique outperform the existing algorithms without this technique, especially for the clustered case. Our contributions also include theoretical analyses that provide high probability asymptotic regret bounds for our algorithms.

【Keywords】: multi-armed bandit; combinatorial semi-bandit; contextual bandit; recommender system

159. User Response Driven Content Understanding with Causal Inference.

【Paper Link】【Pages】:1324-1329

【Authors】: Fei Tan ; Zhi Wei ; Abhishek Pani ; Zhenyu Yan

【Abstract】: Content understanding with many potential industrial applications, is spurring interest by researchers in many areas in artificial intelligence. We propose to revisit the content understanding problem in digital marketing from three novel perspectives. First, our problem is to explore the way how user experience is delivered with divergent key multimedia elements. Second, we treat understanding as to elucidate their causal implications in driving user responses. Third, we propose to understand content based on observational audience visit logs. To approach this problem, we measure and generate heterogeneous content features and model them as binary, multivalued or continuous genres. Multiple key performance indicators (KPIs) are introduced to quantify user responses. We then develop a flexible and adaptive doubly robust estimator to identify the causality between these features and user responses from observational data. The comprehensive experiments are performed on real-world data sets. We show that the further analysis of the experimental results can shed actionable insights on how to improve KPIs. Our work will benefit content distribution and optimization in digital marketing.

【Keywords】: Content Understanding, User Engagement, Digital Marketing, Causal Inference

160. Permutation Strategies for Mining Significant Sequential Patterns.

【Paper Link】【Pages】:1330-1335

【Authors】: Andrea Tonon ; Fabio Vandin

【Abstract】: The identification of significant patterns, defined as patterns whose frequency significantly deviates from what is expected under a suitable null model of the data, is a key data mining task with application in several areas. We present PROMISE, an algorithm for identifying significant sequential patterns while guaranteeing that the probability that one or more false discoveries are reported in output (i.e., the Family-Wise Error Rate - FWER) is less than a user-defined threshold. PROMISE employs the Westfall-Young method to correct for multiple hypothesis testing, a more powerful method than the commonly used Bonferroni correction. PROMISE crucially hinges on the generation of (random) permuted datasets with features similar to the input dataset, for which we provide two efficient strategies. We also provide a rigorous analysis of one of such strategies, which is based on a properly defined swap operation, proving a rigorous bound on the number of swaps it requires. The results of our experimental evaluation show that PROMISE is an efficient method that allows the discovery of statistically significant sequential patterns from transactional datasets while properly controlling for false discoveries.

【Keywords】: Sequential Patterns, Statistical Pattern Mining, Hypothesis Testing, Permutation Test

161. Curve Fitting from Probabilistic Emissions and Applications to Dynamic Item Response Theory.

【Paper Link】【Pages】:1336-1341

【Authors】: Ajay Shanker Tripathi ; Benjamin W. Domingue

【Abstract】: Item response theory (IRT) models are widely used in psychometrics and educational measurement, being deployed in many high stakes tests such as the GRE aptitude test. IRT has largely focused on estimation of a single latent trait (e.g. ability) that remains static through the collection of item responses. However, in contemporary settings where item responses are being continuously collected, such as Massive Open Online Courses (MOOCs), interest will naturally be on the dynamics of ability, thus complicating usage of traditional IRT models. We propose DynAEsti, an augmentation of the traditional IRT Expectation Maximization algorithm that allows ability to be a continuously varying curve over time. In the process, we develop CurvFiFE, a novel non-parametric continuous-time technique that handles the curve-fitting/regression problem extended to address more general probabilistic emissions (as opposed to simply noisy data points). Furthermore, to accomplish this, we develop a novel technique called grafting, which can successfully approximate distributions represented by graphical models when other popular techniques like Loopy Belief Propogation (LBP) and Variational Inference (VI) fail. The performance of DynAEsti is evaluated through simulation, where we achieve results comparable to the optimal of what is observed in the static ability scenario. Finally, DynAEsti is applied to a longitudinal performance dataset (80-years of competitive golf at the 18-hole Masters Tournament) to demonstrate its ability to recover key properties of human performance and the heterogeneous characteristics of the different holes. Python code for CurvFiFE and DynAEsti is publicly available at github.com/chausies/DynAEstiAndCurvFiFE.

【Keywords】: Curve Fitting and Regression; Dynamic Item Response Theory; Graphical Models; Approximate Inference; Psychometrics; Longitudinal Analysis

162. Adaptive Teacher-and-Student Model for Heterogeneous Domain Adaptation.

【Paper Link】【Pages】:1342-1347

【Authors】: Yunyun Wang ; Xuzhang Chen ; Songcan Chen ; Hui Xue

【Abstract】: In heterogeneous domain adaptation (HDA), since the feature spaces of the source and target domains are different, knowledge transfer from the source to the target domain is really challenging. How to align the different feature spaces and then adaptively transfer the related knowledge is critical for HDA. In this paper, we develop an adaptive teacher-and-student model for heterogeneous domain adaptation (AtsHDA). In AtsHDA, the source domain as a teacher and the target domain as a student are aligned or co-adapted to each other first, so that their correlation can be maximized. Then the target domain adaptively learns from the source domain. Specifically, there is a balance between the learning by the target domain itself and the instruction from the source domain. That is, when the guidance from the source domain is helpful for learning, the learning of target classifier emphasizes the instruction of source knowledge, and considers its own knowledge more, otherwise. Further, an ensemble method is designed to decide such a balance. Finally, empirical results show that AtsHDA can achieve competitive results compared with the state-of-arts.

【Keywords】: Heterogeneous domain adaptation; teacher-and-student; correlation; ensemble

【Paper Link】【Pages】:1348-1353

【Authors】: Xingzhi Wang ; Xin Liu ; Shu-Juan Peng ; Yiu-ming Cheung ; Zhikai Hu ; Nannan Wang

【Abstract】: Most Cross-modal hashing methods do not sufficiently exploit the discrimination power of semantic information when learning hash codes, while often involving time-consuming training procedures for large-scale dataset. To tackle these issues, we first formulate the learning of similarity-preserving hash codes in terms of orthogonally rotating the semantic data to hamming space, and then propose a novel Fast Semantic Preserving Hashing (FSePH) approach to large-scale cross-modal retrieval. Specifically, FSePH introduces an orthonormal basis to regress the targeted hash codes of training examples to their corresponding reasonably relaxed class labels, featuring significantly reducing the quantization error. Meanwhile, an effective optimization algorithm is derived for modality-specific projection function learning and an efficient closed-form solution for hash code learning, which are computationally tractable. Extensive experiments have shown that the proposed FSePH approach runs sufficiently fast, and also significantly improves the retrieval performances over the state-of-the-arts.

【Keywords】: Cross-modal hashing, fast semantic preserving, orthonormal basis, relaxed class label

164. Fast Classification Algorithms via Distributed Accelerated Alternating Direction Method of Multipliers.

【Paper Link】【Pages】:1354-1359

【Authors】: Huihui Wang ; Shunmei Meng ; Yiming Qiao ; Jing Zhang

【Abstract】: Distributed machine learning has gained lots of attention due to the rapid growth of data. In this paper, we focus regularized empirical risk minimization problems, and propose two novel Distributed Accelerated Alternating Direction Method of Multipliers (D-A2DM2) algorithms for distributed classification. Based on the framework of Alternating Direction Method of Multipliers (ADMM), we decentralize the distributed classification problem as a global consensus optimization problem with a series of sub-problems. In D-A2DM2, we exploit ADMM with variance reduction for sub-problem optimization in parallel. Taking global update and local update into consideration respectively, we propose two acceleration mechanisms in the framework of D-A2DM2. In particular, inspired by Nesterov's accelerated gradient descent, we utilize it for global update to further improve time efficiency. Moreover, we also introduce Nesterov's acceleration for local update, and develop the corrected local update and symmetric dual update to accelerate the convergence with only a little change in the computational effort. Theoretically, D-A2DM2 has a linear convergence rate. Empirically, experimental results show that D-A2DM2 converge faster than existing distributed ADMM-based classification, and could be a highly efficient algorithm for practical use.

【Keywords】: ADMM, Nesterov's Acceleration, Distributed Classification, Accelerated ADMM

165. Learning to Hash for Efficient Search Over Incomplete Knowledge Graphs.

【Paper Link】【Pages】:1360-1365

【Authors】: Meng Wang ; Haomin Shen ; Sen Wang ; Lina Yao ; Yinlin Jiang ; Guilin Qi ; Yang Chen

【Abstract】: Knowledge graph (KG) embedding techniques represent entities and relations as low-dimensional, continuous vectors, and thus enables machine learning models to be easily adapted to KG completion and querying tasks. However, learned dense vectors are inefficient for large-scale similarity computations. Learning-to-hash is to learn compact binary codes from high-dimensional input data and provides a promising way to accelerate efficiency by measuring Hamming distance instead of Euclidean distance or dot-product. Unfortunately, most of learning-to-hash methods cannot be directly applied to KG structure encoding. In this paper, we introduce a novel framework for encoding incomplete KGs and graph queries in Hamming space. To preserve KG structure information from embeddings to hash codes and address the ill-posed gradient issue in optimization, we utilize a continuation method with convergence guarantees to jointly encode queries and KG entities with geometric operations. The hashed embedding of a query can be utilized to discover target answers from incomplete KGs whilst the efficiency has been greatly improved.We compared our model with state-of-the-art methods on real-world KGs. Experimental results show that our framework not only significantly speeds up the searching process, but also provides good results for unanswerable queries caused by incomplete information.

【Keywords】: knowledge-graph-embedding; learning-to-hashing; graph-query

166. Competitive Multi-agent Deep Reinforcement Learning with Counterfactual Thinking.

【Paper Link】【Pages】:1366-1371

【Authors】: Yue Wang ; Yao Wan ; Chenwei Zhang ; Lu Bai ; Lixin Cui ; Philip S. Yu

【Abstract】: Counterfactual thinking describes a psychological phenomenon that people re-infer the possible results with different solutions about things that have already happened. It helps people to gain more experience from mistakes and thus to perform better in similar future tasks. This paper investigates the counterfactual thinking for agents to find optimal decision-making strategies in multi-agent reinforcement learning environments. In particular, we propose a multi-agent deep reinforcement learning model with a structure which mimics the human-psychological counterfactual thinking process to improve the competitive abilities for agents. To this end, our model generates several possible actions (intent actions) with a parallel policy structure and estimates the rewards and regrets for these intent actions based on its current understanding of the environment. Our model incorporates a scenario-based framework to link the estimated regrets with its inner policies. During the iterations, our model updates the parallel policies and the corresponding scenario-based regrets for agents simultaneously. To verify the effectiveness of our proposed model, we conduct extensive experiments. Experimental results show that counterfactual thinking can actually benefit the agents to obtain more accumulative rewards from the environments with fair information by comparing to their opponents.

【Keywords】: Multi-agent, reinforcement learning, counterfactual thinking, competitive game

167. TMDA: Task-Specific Multi-source Domain Adaptation via Clustering Embedded Adversarial Training.

【Paper Link】【Pages】:1372-1377

【Authors】: Haotian Wang ; Wenjing Yang ; Zhipeng Lin ; Yue Yu

【Abstract】: Beyond classical domain-specific adversarial training, a recently proposed task-specific framework has achieved a great success in single source domain adaptation by utilizing task-specific decision boundaries. However, compared to single-source-single-target setting, multi-source domain adaptation (MDA) shows more powerful capability to handle with most real-life cases. To align target domain with diverse multi-source domains using task-specific decision boundaries, we provide a deep insight of task-specific framework on MDA for the first time. Accordingly, we propose a novel task-specific multi-source domain adaptation method (TMDA) with a clustering embedded adversarial training process. Specifically, the proposed TMDA detects and refines less discriminative target representations through a max-min optimization over two adversarial task-specific classifiers. Moreover, our analysis implies that scattered multi-source representations disturb the adversarial training under the task-specific framework. To tight up the dispersed source representations, we embeds a relationship-based domain clustering into TMDA. Empirical results demonstrate that our TMDA outperforms state-of-the-art methods on toy dataset, sentiment analysis and digit classification.

【Keywords】: Multi-source domain adaptation, Adversarial Training, Task-specific, Domain Clustering

168. Learning Robust Representations with Graph Denoising Policy Network.

【Paper Link】【Pages】:1378-1383

【Authors】: Lu Wang ; Wenchao Yu ; Wei Wang ; Wei Cheng ; Wei Zhang ; Hongyuan Zha ; Xiaofeng He ; Haifeng Chen

【Abstract】: Existing representation learning methods based on graph neural networks and their variants rely on the aggregation of neighborhood information, which makes it sensitive to noises in the graph, e.g. erroneous links between nodes, incorrect/missing node features. In this paper, we propose Graph Denoising Policy Network (short for GDPNet) to learn robust representations from noisy graph data through reinforcement learning. GDPNet first selects signal neighborhoods for each node, and then aggregates the information from the selected neighborhoods to learn node representations for the down-stream tasks. Specifically, in the signal neighborhood selection phase, GDPNet optimizes the neighborhood for each target node by formulating the process of removing noisy neighborhoods as a Markov decision process and learning a policy with task-specific rewards received from the representation learning phase. In the representation learning phase, GDPNet aggregates features from signal neighbors to generate node representations for down-stream tasks, and provides task-specific rewards to the signal neighbor selection phase. These two phases are jointly trained to select optimal sets of neighbors for target nodes with maximum cumulative task-specific rewards, and to learn robust representations for nodes. Experimental results on node classification task demonstrate the effectiveness of GDNet, outperforming the state-of-the-art graph representation learning methods on several well-studied datasets.

【Keywords】: graph representation learning; graph neural networks; graph embedding; reinforcement learning

169. Personalized Neural Usefulness Network for Rating Prediction.

【Paper Link】【Pages】:1384-1389

【Authors】: Qianqian Wang ; Min Zhang

【Abstract】: Reviews are crucial to rating prediction on e-commerce websites such as Amazon and Yelp. They reflect user preferences and item properties. Considering not all parts of the reviews are equally important for rating prediction, some works select the more informative words of reviews, others focus on finding the more useful reviews for each user and item. However, on the one hand, the meaning of words may be incomplete, further they may distort the original meaning of sentences, on the other hand, for "the more useful" reviews, they still contain more useful parts and less useful parts for rating prediction. To address these problems, we propose to model reviews in a view of sentence usefulness, since different sentences have varied importance for reviews modeling. In addition, the preferences of a user may vary with different item properties, it leads to the inconstant usefulness of a user sentence for different items, and vice verse. Hence, in this paper, we show a Personalized Neural Usefulness Network (PUN) to capture the varied sentence usefulness of reviews for rating prediction. Expensive experiments on benchmark datasets demonstrate that PUN significantly outperforms the state-of-the-art baseline models. In the meantime, according to personalized sentence usefulness, PUN also attains highly-useful sentences to better understand the varied user preferences and item properties.

【Keywords】: recommendation, reviews, sentences, deep learn ing, rating prediction

170. Collaborative Label Correction via Entropy Thresholding.

【Paper Link】【Pages】:1390-1395

【Authors】: Hao Wu ; Jiangchao Yao ; Jiajie Wang ; Yinru Chen ; Ya Zhang ; Yanfeng Wang

【Abstract】: Deep neural networks (DNNs) have the capacity to fit extremely noisy labels nonetheless they tend to learn data with clean labels first and then memorize those with noisy labels. We examine this behavior in light of the Shannon entropy of the predictions and demonstrate the low entropy predictions determined by a given threshold are much more reliable as the supervision than the original noisy labels. It also shows the advantage in maintaining more training samples than previous methods. Then, we power this entropy criterion with the Collaborative Label Correction (CLC) framework to further avoid undesired local minimums of the single network. A range of experiments have been conducted on multiple benchmarks with both synthetic and real-world settings. Extensive results indicate that our CLC outperforms several state-of-the-art methods.

【Keywords】: deep learning, noisy supervision, entropy thresholding, label correction, sample selection

171. Deep Technology Tracing for High-Tech Companies.

【Paper Link】【Pages】:1396-1401

【Authors】: Han Wu ; Kun Zhang ; Guangyi Lv ; Qi Liu ; Runlong Yu ; Weihao Zhao ; Enhong Chen ; Jianhui Ma

【Abstract】: Technological change and innovation are vitally important, especially for high-tech companies. However, factors influencing their future research and development (R&D) trends are both complicated and various, leading it a quite difficult task to make technology tracing for high-tech companies. To this end, in this paper, we develop a novel data-driven solution, i.e., Deep Technology Forecasting (DTF) framework, to automatically find the most possible technology directions customized to each high-tech company. Specially, DTF consists of three components: Potential Competitor Recognition (PCR), Collaborative Technology Recognition (CTR), and Deep Technology Tracing (DTT) neural network. For one thing, PCR and CTR aim to capture competitive relations among enterprises and collaborative relations among technologies, respectively. For another, DTT is designed for modeling dynamic interactions between companies and technologies with the above relations involved. Finally, we evaluate our DTF framework on real-world patent data, and the experimental results clearly prove that DTF can precisely help to prospect future technology emphasis of companies by exploiting hybrid factors.

【Keywords】: Technology Prospecting, Patent Mining, Business Planning, Deep Learning Application

172. Adaptive Neural Network for Node Classification in Dynamic Networks.

【Paper Link】【Pages】:1402-1407

【Authors】: Dongkuan Xu ; Wei Cheng ; Dongsheng Luo ; Yameng Gu ; Xiao Liu ; Jingchao Ni ; Bo Zong ; Haifeng Chen ; Xiang Zhang

【Abstract】: Given a network with the labels for a subset of nodes, transductive node classification targets to predict the labels for the remaining nodes in the network. This technique has been used in a variety of applications such as voxel functionality detection in brain network and group label prediction in social network. Most existing node classification approaches are performed in static networks. However, many real-world networks are dynamic and evolve over time. The dynamics of both node attributes and network topology jointly determine the node labels. In this paper, we study the problem of classifying the nodes in dynamic networks. The task is challenging for three reasons. First, it is hard to effectively learn the spatial and temporal information simultaneously. Second, the network evolution is complex. The evolving patterns lie in both node attributes and network topology. Third, for different networks or even different nodes in the same network, the node attributes, the neighborhood node representations and the network topology usually affect the node labels differently, it is desirable to assess the relative importance of different factors over evolutionary time scales. To address the challenges, we propose AdaNN, an adaptive neural network for transductive node classification. AdaNN learns node attribute information by aggregating the node and its neighbors, and extracts network topology information with a random walk strategy. The attribute information and topology information are further fed into two connected gated recurrent units to learn the spatio-temporal contextual information. Additionally, a triple attention module is designed to automatically model the different factors that influence the node representations. AdaNN is the first node classification model that is adaptive to different kinds of dynamic networks. Extensive experiments on real datasets demonstrate the effectiveness of AdaNN.

【Keywords】: Dynamic Networks, Node Classification, Spatio Temporal, RNN, Attention Mechanism

173. MIX: A Joint Learning Framework for Detecting Both Clustered and Scattered Outliers in Mixed-Type Data.

【Paper Link】【Pages】:1408-1413

【Authors】: Hongzuo Xu ; Yijie Wang ; Yongjun Wang ; Zhiyue Wu

【Abstract】: Mixed-type data are pervasive in real life, but very limited outlier detection methods are available for these data. Some existing methods handle mixed-type data by feature converting, whereas their performance is downgraded by information loss and noise caused by the transformation. Another kind of approaches separately evaluates outlierness in numerical and categorical features. However, they fail to adequately consider the behaviours of data objects in different feature spaces, often leading to suboptimal results. As for outlier form, both clustered outliers and scattered outliers are contained in many real-world data, but a number of outlier detectors are inherently restricted by their outlier definitions to simultaneously detect both of them. To address these issues, an unsupervised outlier detection method MIX is proposed. MIX constructs a joint learning framework to establish a cooperation mechanism to make separate outlier scoring constantly communicate and sufficiently grasp the behaviours of data objects in another feature space. Specifically, MIX iteratively performs outlier scoring in numerical and categorical space. Each outlier scoring phase can be iteratively and cooperatively enhanced by the prior knowledge given by another feature space. To target both clustered and scattered outliers, the outlier scoring phases capture the essential characteristic of outliers, i.e., evaluating outlierness via the deviation from the normal model. We show that MIX significantly outperforms eight state-of-the-art outlier detectors on twelve real-world datasets and obtains good scalability.

【Keywords】: Outlier Detection; Mixed-Type Data; Joint Learning; Unsupervised Learning

174. From Joint Feature Selection and Self-Representation Learning to Robust Multi-view Subspace Clustering.

【Paper Link】【Pages】:1414-1419

【Authors】: Hui Yan ; Siyu Liu ; Philip S. Yu

【Abstract】: In era of big data, we have easier access to the data with multi-view representations from heterogeneous feature spaces, where each view is often unlabeled, partial and even full of noises. These unique challenges and properties motivate us to develop a novel robust multi-view subspace clustering framework (RMSC), which learns a consensus affinity matrix with the ideal subspace structure, by extending our joint feature selection and self-representation model (JFSSR). Concretely, RMSC learns the consensus graph across diverse views with exactly k connected components (k is the number of clusters), which is encoded by a block diagonal self-representation matrix. Besides, we emphasize l2;1-norm minimization on the loss function to reduce redundant and irrelevant features, and implicitly assign an adaptive weight to each view without introducing additional parameters. Lastly, an alternating optimization algorithm is derived to solve the nonconvex formulated objective. Extensive empirical results on both synthetic data and real-world benchmark data sets show that RMSC consistently outperforms several representative multiview clustering approaches.

【Keywords】: multi-view, clustering, robust, sparsity

175. On the Robust Splitting Criterion of Random Forest.

【Paper Link】【Pages】:1420-1425

【Authors】: Bin-Bin Yang ; Wei Gao ; Ming Li

【Abstract】: Splitting criteria have played an important role in the construction of decision trees, and various trees have been developed based on different criteria. This work presents a unified framework on various splitting criteria from the perspective of loss functions, and most classical splitting criteria can be viewed essentially as the optimizations of loss functions in this framework. We further introduce a new splitting criterion, named pairwise gain, which is motivated from a lower bound on the mutual coupling of pairwise loss. Theoretically, we prove that this new criterion is robust to symmetric and asymmetric label noises simultaneously. Based on this new criterion, we develop another variant of random forests, and extensive experiments are provided to verify its robustness.

【Keywords】: decision tree; splitting criterion; random forest; label noise; robustness

176. Scene Text Recognition with Auto-Aligned Feature Generator.

【Paper Link】【Pages】:1426-1431

【Authors】: Qiangpeng Yang ; Hongsheng Jin ; Mengli Cheng ; Wenmeng Zhou ; Jun Huang ; Wei Lin

【Abstract】: Scene text recognition has attracted increasing attention in computer vision due to its various applications. Most of the existing scene text recognition methods are under the encoder-decoder framework. In order to improve text feature learning of these methods, Generative Adversarial Networks (GANs) are recently integrated to generate clean text images without distorted letters. However, the existing GANs assume the input images are spatially aligned, while the words in natural images are often in irregular shapes. The misalignment brings a big problem for both image generation and text recognition. In this paper, we present a novel text feature alignment network to solve this problem. Our method can handle both horizontal and vertical images with irregular texts. Our proposed framework is end-to-end trainable, and extensive experiments on several public benchmarks demonstrate its superiority in terms of both effectiveness and efficiency.

【Keywords】: Scene Text Recognition; Feature Alignment; Image Generator

177. ACE: Adaptively Similarity-Preserved Representation Learning for Individual Treatment Effect Estimation.

【Paper Link】【Pages】:1432-1437

【Authors】: Liuyi Yao ; Sheng Li ; Yaliang Li ; Mengdi Huai ; Jing Gao ; Aidong Zhang

【Abstract】: Treatment effect estimation refers to the estimation of causal effects, which benefits decision-making process across various domains, but it is a challenging problem in real practice. The estimation of causal effects from observational data at the individual level faces two major challenges, i.e., treatment selection bias and missing counterfactuals. Existing methods tackle the selection bias problem by learning a balanced representation and infer the missing counterfactuals based on the learned representation. However, most existing methods learn the representation in a global manner and ignore the local similarity information, which is essential for an accurate estimation of causal effects. Motivated by the above observations, we propose a novel representation learning method, which adaptively extracts fine-grained similarity information from the original feature space and minimizes the distance between different treatment groups as well as the similarity loss during the representation learning procedure. Experiments on three public datasets demonstrate that the proposed method achieves the best performance in causal effect estimation among all the compared methods and is robust to the treatment selection bias.

【Keywords】: treatment estimation; similarity preserving; representation learning

178. EDiT: Interpreting Ensemble Models via Compact Soft Decision Trees.

【Paper Link】【Pages】:1438-1443

【Authors】: Jaemin Yoo ; Lee Sael

【Abstract】: Given feature-based data, how can we accurately classify individual input and interpret the result of it? Ensemble models are often the best choice in terms of accuracy when dealing with feature-based datasets. However, interpreting the decision made by the ensemble model for individual input seems intractable. On the other hand, decision trees, although being prone to overfit, are considered as the most interpretable in terms of being able to trace the decision process of individual input. In this work, we propose Ensemble to Distilled Tree (EDiT), a novel distilling method that generates compact soft decision trees from ensemble models. EDiT exploits the interpretability of a tree-based structure by removing redundant branches and learning sparse weights, while enhancing accuracy by distilling the knowledge of ensemble models such as random forests (RF). Our experiments on eight datasets show that EDiT reduces the number of parameters of an RF by 6.4 to 498.4 times with a minor loss of classification accuracy.

【Keywords】: interpretable learning; soft decision trees; random forests; knowledge distillation; weight pruning

179. Learning Review Representations from user and Product Level Information for Spam Detection.

【Paper Link】【Pages】:1444-1449

【Authors】: Chunyuan Yuan ; Wei Zhou ; Qianwen Ma ; Shangwen Lv ; Jizhong Han ; Songlin Hu

【Abstract】: Opinion spam has become a widespread problem in social media, where hired spammers write deceptive reviews to promote or demote products to mislead the consumers for profit or fame. Existing works mainly focus on manually designing discrete textual or behavior features, which cannot capture complex global semantics of reviews. Although recent works apply deep learning methods to learn review-level semantic features, their models ignore the impact of the user-level and product-level information on learning review semantics and the inherent user-review-product relationship information. In this paper, we propose a Hierarchical Fusion Attention Network (HFAN) to automatically learn the semantics of reviews from user and product level. Specifically, we design a multiattention unit to extract user(product)-related review information. Then, we use orthogonal decomposition and fusion attention to learn a user, review, and product representation from the review information. Finally, we take the review as a relation between user and product entity and apply TransH to jointly encode this relationship into review representation. Experimental results obtained more than 10% absolute precision improvement over the state-of-the-art performances on four real-world datasets, which show the effectiveness and versatility of the model.

【Keywords】: Opinion mining, Opinion spam detection, Hierarchical fusion attention network, User-level and product-level information

180. Learning Attentional Temporal Cues of Brainwaves with Spatial Embedding for Motion Intent Detection.

【Paper Link】【Pages】:1450-1455

【Authors】: Dalin Zhang ; Kaixuan Chen ; Debao Jian ; Lina Yao ; Sen Wang ; Po Li

【Abstract】: As brain dynamics fluctuate considerably across different subjects, it is challenging to design effective handcrafted features based on prior knowledge. Regarding this gap, this paper proposes a Graph-based Convolutional Recurrent Attention Model (G-CRAM) to explore EEG features across different subjects for movement intention recognition. A graph structure is first developed to embed the positioning information of EEG nodes, and then a convolutional recurrent attention model learns EEG features from both spatial and temporal dimensions and adaptively emphasizes on the most distinguishable temporal periods. The proposed approach is validated on two public movement intention EEG datasets. The results show that the GCRAM achieves superior performance to state-of-the-art methods regarding recognition accuracy and ROC-AUC. Furthermore, model interpreting studies reveal the learning process of different neural network components and demonstrate that the proposed model can extract detailed features efficiently.

【Keywords】: EEG, subject-independent, movement intention, deep learning, graph representation

181. Dynamic News Recommendation with Hierarchical Attention Network.

【Paper Link】【Pages】:1456-1461

【Authors】: Hui Zhang ; Xu Chen ; Shuai Ma

【Abstract】: News recommendation is an effective information dissemination solution in modern society. In general, news articles can be modeled from multiple granularities: sentence-, element-and news-level. However, the first two levels have been largely ignored in existing methods and it is also unclear how such multi-granularity modeling can enhance news recommendation. In this paper, we propose a novel dynamic model for news recommendation. A unique perspective of our model is to discriminate the contributions of previously interacted contents for triggering the next news-reading, in sentence-, element-and news-level simultaneously. To this end, we design a hierarchical attention network of which the lower layer learns the impacts of sentences and elements, while the upper layer captures disparity of news. Moreover, we incorporate a time-decaying factor to reflect the dynamism, as well as convolution neural networks for learning sequential influence. Using three real-world datasets, we conduct extensive experiments to verify the superiority of our model, compared with several state-of-the-art approaches.

【Keywords】: news recommendation; attention model; dynamic model; convolutional neural networks

182. Fast Sparse Coding Inference with Historical Information.

【Paper Link】【Pages】:1462-1467

【Authors】: Zhenchang Zhang ; Fei Jiang ; Ruimin Shen

【Abstract】: Recently, time-unfolded recurrent neural network (RNN) based algorithms are successfully applied for fast sparse coding (SC) inference, such as LISTA and SLSTM. However, these methods do not properly exploit the historical information which is proved to speed up the convergence. In this paper, we propose a novel RNN-based SC inference framework with attention mechanism to efficiently incorporate the related historical information. The proposed framework consists of an attention network and a time-unfolded RNN, where the RNN generates the historical information and the attention network automatically determines the importance of these historical values for the current updating. The final sparse code is obtained by passing the context vectors generated from the attention network to a soft shrinkage function. Extensive experiments on convergence analysis and image classification tasks demonstrate that our network achieves impressive improvements on SC inference in terms of the quality of estimated sparse codes and the inference time. Moreover, the proposed network can be easily extended into a supervised version to further improve the classification accuracy.

【Keywords】: sparse coding, attention mechanism, historical information, recurrent neural network

【Paper Link】【Pages】:1468-1473

【Authors】: Shiwei Zhang ; Jey Han Lau ; Xiuzhen Zhang ; Jeffrey Chan ; Cécile Paris

【Abstract】: With the increasing popularity of e-commerce, the number of product-related queries generated by customers is growing. Answering these queries manually in real time is infeasible, and so automatic question-answering systems can be immensely helpful. Product queries are, however, very different from open-domain questions: they tend to be product-specific and the answers they demand can be very subjective. Previous research suggests that reviews are a valuable resource for answering product queries, but a key challenge is the language mismatch between user queries and reviews. To address this, we propose two neural models that discover relevant reviews for answering product queries. We demonstrate that our best model produces strong performance, outperforming state-of-the-art systems by consistently finding the most relevant reviews for product queries.

【Keywords】: Product Question Answering, Mixtures of Experts, Deep Learning

184. TrafficGAN: Off-Deployment Traffic Estimation with Traffic Generative Adversarial Networks.

【Paper Link】【Pages】:1474-1479

【Authors】: Yingxue Zhang ; Yanhua Li ; Xun Zhou ; Xiangnan Kong ; Jun Luo

【Abstract】: The rapid progress of urbanization has expedited the process of urban planning, e.g., new residential, commercial areas, which in turn boosts the local travel demand. We propose a novel "off-deployment traffic estimation problem", namely, to foresee the traffic condition changes of a region prior to the deployment of a construction plan. This problem is important to city planners to evaluate and develop urban deployment plans. However, this task is challenging. Traditional traffic estimation approaches lack the ability to solve this problem, since no data about the impact can be collected before the deployment and old data fails to capture the traffic pattern changes. In this paper, we define the off-deployment traffic estimation problem as a traffic generation problem, and develop a novel deep generative model TrafficGAN that captures the shared patterns across spatial regions of how traffic conditions evolve according to travel demand changes and underlying road network structures. In particular, TrafficGAN captures the road network structures through a dynamic filter in the dynamic convolutional layer. We evaluate our TrafficGAN using a large-scale traffic data collected from Shenzhen, China. Results show that TrafficGAN can more accurately estimate the traffic conditions compared with all baselines.

【Keywords】: traffic estimation; TrafficGAN; generative model

185. Unveiling Taxi Drivers' Strategies via cGAIL: Conditional Generative Adversarial Imitation Learning.

【Paper Link】【Pages】:1480-1485

【Authors】: Xin Zhang ; Yanhua Li ; Xun Zhou ; Jun Luo

【Abstract】: Smart passenger-seeking strategies employed by taxi drivers contribute not only to drivers' incomes, but also higher quality of service passengers received. Therefore, understanding taxi drivers' behaviors and learning the good passenger-seeking strategies are crucial to boost taxi drivers' well-being and public transportation quality of service. However, we observe that drivers' preferences of choosing which area to find the next passenger are diverse and dynamic across locations and drivers. It is hard to learn the location-dependent preferences given the partial data (i.e., an individual driver's trajectory may not cover all locations). In this paper, we make the first attempt to develop conditional generative adversarial imitation learning (cGAIL) model, as a unifying collective inverse reinforcement learning framework that learns the driver's decision-making preferences and policies by transferring knowledge across taxi driver agents and across locations. Our evaluation results on three months of taxi GPS trajectory data in Shenzhen, China, demonstrate that the driver's preferences and policies learned from cGAIL are on average 34.7% more accurate than those learned from other state-of-the-art baseline approaches.

【Keywords】: Urban Computing, Inverse Reinforcement Learning, Generative Adversarial Imitation Learning

186. Generation of Low Distortion Adversarial Attacks via Convex Programming.

【Paper Link】【Pages】:1486-1491

【Authors】: Tianyun Zhang ; Sijia Liu ; Yanzhi Wang ; Makan Fardad

【Abstract】: As deep neural networks (DNNs) achieve extraordinary performance in a wide range of tasks, testing their robustness under adversarial attacks becomes paramount. Adversarial attacks, also known as adversarial examples, are used to measure the robustness of DNNs and are generated by incorporating imperceptible perturbations into the input data with the intention of altering a DNN's classification. In prior work in this area, most of the proposed optimization based methods employ gradient descent to find adversarial examples. In this paper, we present an innovative method which generates adversarial examples via convex programming. Our experiment results demonstrate that we can generate adversarial examples with lower distortion and higher transferability than the C&W attack, which is the current state-of-the-art adversarial attack method for DNNs. We achieve 100% attack success rate on both the original undefended models and the adversarially-trained models. Our distortions of the L_inf attack are respectively 31% and 18% lower than the C&W attack for the best case and average case on the CIFAR-10 data set.

【Keywords】: Deep neural networks; adversarial attack; convex programming

187. KnowRisk: An Interpretable Knowledge-Guided Model for Disease Risk Prediction.

【Paper Link】【Pages】:1492-1497

【Authors】: Xianli Zhang ; Buyue Qian ; Yang Li ; Changchang Yin ; Xudong Wang ; Qinghua Zheng

【Abstract】: Thanks to the widespread adoption of Electronic Health Record (EHR) systems, a variety of data-driven clinical risk prediction approaches have been spawned in recent years. However, there remain three challenges, which if addressed would improve the performance and applicability of such models. (i) Due to the limited data sharing between different health care institutions, the EHR data collected by a single institution is often inadequate or missing some visits records. The limited number of data cannot meet the large sample required of recent approaches especially deep learning models. In addition, the missing records (due to visiting different institution) may contain important health condition of the patient, which if ignored would cause prediction bias. (ii) Few existing approaches take clinical knowledge into account. The auxiliary knowledge if included can greatly reduce the data dependency of many modern learning algorithms. (iii) Most existing deep learning based methods are unable to identify the contribution of each medical event to the final results, which prohibits such models from being widely accepted in practical clinical applications. In this paper, we propose an interpretable and knowledge-guided deep model to address these challenges. Specifically, we distill knowledge from a clinical knowledge graph both explicitly and implicitly, which can not only supplement inadequate patient records but also guide the predicting process of the model. Furthermore, skip-connections and attention mechanisms are adopted to improve the interpretability of our model. In the context of heart failure prediction task, our model outperforms several state-of-the-art methods. Finally, a series of case studies are presented to prove the interpretability of our model.

【Keywords】: Risk Prediction; Deep Learning; Interpretability; Knowledge Graph

188. Collective Protection: Preventing Sensitive Inferences via Integrative Transformation.

【Paper Link】【Pages】:1498-1503

【Authors】: Dalin Zhang ; Lina Yao ; Kaixuan Chen ; Guodong Long ; Sen Wang

【Abstract】: Sharing ubiquitous mobile sensor data, especially physiological data, raises potential risks of leaking physical and demographic information that can be inferred from the time series sensor data. Existing sensitive information protection mechanisms that depend on data transformation are effective only on a particular sensitive attribute, together with usually requiring the labels of sensitive information for training. Considering this gap, we propose a novel user sensitive information protection framework without using a sensitive training dataset or being validated on protecting only one specific sensitive information. The presented approach transforms raw sensor data into a new format that has a "style" (sensitive information) of random noise and a "content" (desired information) of the raw sensor data, thus is free of user sensitive information for training and able to collectively protect all sensitive information at once. Our implementation and experiments on two real-world multisensor human activity datasets demonstrate that the proposed data transformation technique can achieve the protection for all sensitive information at once without requiring the knowledge of users' personal attributes for training, and simultaneously preserve the usability of the new transformed data with regard to inferring human activities with insignificant performance loss.

【Keywords】: mobile sensor, sensitive inference, data transformation, activity recognition

189. Elastic Bulk Synchronous Parallel Model for Distributed Deep Learning.

【Paper Link】【Pages】:1504-1509

【Authors】: Xing Zhao ; Manos Papagelis ; Aijun An ; Bao Xin Chen ; Junfeng Liu ; Yonggang Hu

【Abstract】: The bulk synchronous parallel (BSP) is a celebrated synchronization model for general-purpose parallel computing that has successfully been employed for distributed training of machine learning models. A prevalent shortcoming of the BSP is that it requires workers to wait for the straggler at every iteration. To ameliorate this shortcoming of classic BSP, we propose ELASTICBSP a model that aims to relax its strict synchronization requirement. The proposed model offers more flexibility and adaptability during the training phase, without sacrificing on the accuracy of the trained model. We also propose an efficient method that materializes the model, named ZIPLINE. The algorithm is tunable and can effectively balance the trade-off between quality of convergence and iteration throughput, in order to accommodate different environments or applications. A thorough experimental evaluation demonstrates that our proposed ELASTICBSP model converges faster and to a higher accuracy than the classic BSP. It also achieves comparable (if not higher) accuracy than the other sensible synchronization models.

【Keywords】: Distributed deep learning, parameter server framework, GPU cluster, data parallelism, BSP, SSP, ASP

190. Constrained Matrix Factorization for Course Score Prediction.

【Paper Link】【Pages】:1510-1515

【Authors】: Shi-Ting Zhong ; Ling Huang ; Chang-Dong Wang ; Jian-Huang Lai

【Abstract】: Recommender system is widely used in e-commercial platforms to recommend users suitable items according to users's preferences. In recent years, an increasing amount of attention has been paid to the application of recommender system in education. There are many online learning systems that can recommend students suitable courses according to students' learning performances. However, there are few universities using recommender system to recommend students suitable elective courses. It is generally known that students in higher grade take the courses earlier than those in lower grade. Therefore, the elective course scores of sophomores can be predicted by using the course score information from students of higher grades. However, the unbalanced distribution of course-enrollment data makes it hard to predict the scores of the courses that are in a low selection rate. Therefore, we propose a Constrained Matrix Factorization (ConMF) algorithm to predict sophomores' elective course scores, which integrates the course average score into the objective function so as to make up the prediction deviation caused by the unbalanced course selection rate and make more accurate prediction than the traditional Matrix Factorization (MF) approach. The experimental results show that our proposed model outperforms the state-of-the-art methods in the task of university students' course score prediction.

【Keywords】: recommender system, collaborative filtering, matrix factorization, score prediction

191. Inquiry Spam Detection via Jointly Exploiting Temporal-Categorical Behavior and Linguistics.

【Paper Link】【Pages】:1516-1521

【Authors】: Qiwei Zhong ; Jiayu Tang ; Jinghua Feng ; Jianfeng Chi

【Abstract】: Inquiry performs one of the backbones in current E-commerce websites. Detecting spams in inquiries is essential for these platforms but is surprisingly underexplored by current research. In this paper, we propose a system coined ISTBEL to detect spam inquiries. Motivated by the observations on both behavioral and linguistic differences between spammers and benign users, ISTBEL jointly utilizes the temporal-categorical behavioral sequence and text sequence to accomplish its purpose. At its heart, a variant of LSTM equipped interactive attentions is devised to strengthen the coherence between the categorical and temporal behaviors. A character-level CNN is adopted to capture the capricious linguistic patterns. These two modules are integrated and collaboratively contribute to the final predictions. Our system is trained through an end-to-end manner and can directly utilize raw data as input without tedious feature engineering. We have applied the system to inquiry spam detection on Alibaba.com, and results show ISTBEL is promising in detection spammers and outperforms baselines by a large margin. Meanwhile, ISTBEL can be easily applied to other applications which contain both behavioral and linguistic information.

【Keywords】: inquiry spam detection; user behavior

192. ADMIRING: Adversarial Multi-network Mining.

【Paper Link】【Pages】:1522-1527

【Authors】: Qinghai Zhou ; Liangyue Li ; Nan Cao ; Lei Ying ; Hanghang Tong

【Abstract】: Multi-sourced networks naturally appear in many application domains, ranging from bioinformatics, social networks, neuroscience to management. Although state-of-the-art offers rich models and algorithms to find various patterns when input networks are given, it has largely remained nascent on how vulnerable the mining results are due to the adversarial attacks. In this paper, we address the problem of attacking multi-network mining through the way of deliberately perturbing the networks to alter the mining results. The key idea of the proposed method (Admiring) is effective influence functions on the Sylvester equation defined over the input networks, which plays a central and unifying role in various multi-network mining tasks. The proposed algorithms bear two main advantages, including (1) effectiveness, being able to accurately quantify the rate of change of the mining results in response to attacks; and (2) generality, being applicable to a variety of multi-network mining tasks ( e.g., graph kernel, network alignment, cross-network node similarity) with different attacking strategies (e.g., edge/node removal, attribute alteration).

【Keywords】: Adversarial attacks; Sylvester Equation; Multi network mining

193. A Model-Agnostic Approach for Explaining the Predictions on Clustered Data.

【Paper Link】【Pages】:1528-1533

【Authors】: Zihan Zhou ; Mingxuan Sun ; Jianhua Chen

【Abstract】: Machine learning models especially deep neural network models have shown great potential in making decisions when analyzing clustered or longitudinal data. However, lack of model transparency is a major concern in risk sensitive domains such as social science and medical diagnosis. Despite the early success of explaining machine learning models, there is a lack of explanation methods that can be applied to any predictors on clustered data since most of the existing models assume that all observations are independent of each other. In this paper, we address this deficiency and propose to use a linear mixed model to mimic the local behavior of any complex model on clustered data, which can also improve the fidelity of the explanation method to the complex models. We apply our method to explain several models including a deep neural network model on two tasks including movie recommendation and medical record diagnosis. Experiment results show that our model outperforms the baseline models on several metrics such as fidelity and exactness.

【Keywords】: Explainable machine learning, Deep neural network, Clustered data

194. Relation Structure-Aware Heterogeneous Graph Neural Network.

【Paper Link】【Pages】:1534-1539

【Authors】: Shichao Zhu ; Chuan Zhou ; Shirui Pan ; Xingquan Zhu ; Bin Wang

【Abstract】: Heterogeneous graphs with different types of nodes and edges are ubiquitous and have immense value in many applications. Existing works on modeling heterogeneous graphs usually follow the idea of splitting a heterogeneous graph into multiple homogeneous subgraphs. This is ineffective in exploiting hidden rich semantic associations between different types of edges for large-scale multi-relational graphs. In this paper, we propose Relation Structure-Aware Heterogeneous Graph Neural Network (RSHN), a unified model that integrates graph and its coarsened line graph to embed both nodes and edges in heterogeneous graphs without requiring any prior knowledge such as metapath. To tackle the heterogeneity of edge connections, RSHN first creates a Coarsened Line Graph Neural Network (CL-GNN) to excavate edge-centric relation structural features that respect the latent associations of different types of edges based on coarsened line graph. After that, a Heterogeneous Graph Neural Network (H-GNN) is used to leverage implicit messages from neighbor nodes and edges propagating among nodes in heterogeneous graphs. As a result, different types of nodes and edges can enhance their embedding through mutual integration and promotion. Experiments and comparisons, based on semi-supervised classification tasks on large scale heterogeneous networks with over a hundred types of edges, show that RSHN significantly outperforms state-of-the-arts.

【Keywords】: heterogenous graph, coarsened line graph, graph neural network

Knowledge Graph Contest 2

195. Automatic Knowledge Graph Construction: A Report on the 2019 ICDM/ICBK Contest.

【Paper Link】【Pages】:1540-1545

【Authors】: Xindong Wu ; Jia Wu ; Xiaoyi Fu ; Jiachen Li ; Peng Zhou ; Xu Jiang

【Abstract】: Automatic knowledge graph construction seeks to build a knowledge graph from unstructured text in a specific domain or cross multiple domains, without human intervention. IEEE ICDM 2019 and ICBK 2019 invited teams from both degree-granting institutions and industrial labs to compete in the 2019 Knowledge Graph Contest by automatically constructing knowledge graphs in at least two different domains. This article reports the outcomes of the Contest. The participants were expected to build a model to extract knowledge represented as triplets from text data and develop a web application to visualize the triplets. Awards were given to five teams. Their models and key techniques used to construct knowledge graphs are summarized.

【Keywords】: Knowledge-Graph-Construction

196. ICDM 2019 Knowledge Graph Contest: Team UWA.

【Paper Link】【Pages】:1546-1551

【Authors】: Michael Stewart ; Majigsuren Enkhsaikhan ; Wei Liu

【Abstract】: We present an overview of our triple extraction system for the ICDM 2019 Knowledge Graph Contest. Our system uses a pipeline-based approach to extract a set of triples from a given document. It offers a simple and effective solution to the challenge of knowledge graph construction from domain-specific text. It also provides the facility to visualise useful information about each triple such as the degree, betweenness, structured relation type(s), and named entity types.

【Keywords】: knowledge graph construction; triple extraction