25th KDD 2020:Virtual Conference, USA

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020. ACM 【DBLP Link】

Paper Num: 421 || Session Num: 8

Applied Data Science Invited Talks Abstracts 12
Applied Data Science Track Papers 121
Diversity and Inclusion Abstracts 14
Health Day Papers 8
Keynote & Invited Talks 4
Panel 1
Research Track Papers 217
Tutorial Abstracts 44

Keynote & Invited Talks 4

1. AI for Intelligent Financial Services: Examples and Discussion.

【Paper Link】【Pages】:1-2

【Authors】: Manuela Veloso

【Abstract】: There are many opportunities to pursue AI and ML in the financial domain. In this talk, I will overview several research directions we are pursuing in engagement with the lines of business, ranging from data and knowledge, learning from experience, reasoning and planning, multi agent systems, and secure and private AI. I will offer concrete examples of projects, and conclude with the many challenges and opportunities that AI can offer in the financial domain.

【Keywords】: Computing methodologies; Artificial intelligence; Machine learning; Security and privacy; Cryptography

2. Keynote Speaker: Emery N. Brown.

【Paper Link】【Pages】:3

【Authors】: Emery N. Brown

【Abstract】: Emery Brown, M.D., Ph.D. is an American statistician, neuroscientist, and anesthesiologist. He is the Warren M. Zapol Professor of Anesthesia at Harvard Medical School and at Massachusetts General Hospital(MGH), and a practicing anesthesiologist at MGH. At MIT he is the Edward Hood Taplin Professor of Medical Engineering and professor of computational neuroscience, the Associate Director of the Institute for Medical Engineering and Science, and the Director of the Harvard-MIT Program in Health Sciences and Technology. Brown is one of only 19 individuals who has been elected to all three branches of the National Academies of Sciences, Engineering, and Medicine, Brown is also the first African American and first anesthesiologist to be elected to all three National Academies.

【Keywords】:

3. Keynote Speaker: Yolanda Gil.

【Paper Link】【Pages】:4

【Authors】: Yolanda Gil

【Abstract】: Dr. Yolanda Gil Dr. Yolanda Gil is Director of Knowledge Technologies and Associate Division Director at the Information Sciences Institute of the University of Southern California, and Research Professor in Computer Science and in Spatial Sciences. She is also Associate Director of Interdisciplinary Programs in Informatics. She received her M.S. and Ph. D. degrees in Computer Science from Carnegie Mellon University, with a focus on artificial intelligence. Her research is on intelligent interfaces for knowledge capture and discovery, which she investigates in a variety of projects concerning knowledge-based planning and problem solving, information analysis and assessment of trust, semantic annotation and metadata, and community-wide development of knowledge bases. Dr. Gil collaborates with scientists in different domains on semantic workflows and metadata capture, social knowledge collection, computer-mediated collaboration, and automated discovery. Dr. Gil has served in the Advisory Committee of the Computer Science and Engineering Directorate of the National Science Foundation. She initiated and chaired the W3C Provenance Group that led to a community standard in this area. Dr. Gil is a Fellow of the Association for Computing Machinery (ACM), and Past Chair of its Special Interest Group in Artificial Intelligence. She is also Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), and was elected as its 24th President in 2016.

【Keywords】:

4. Keynote Speaker: Alessandro Vespignani.

【Paper Link】【Pages】:5

【Authors】: Alessandro Vespignani

【Abstract】: Alessandro Vespignani research activity is focused on the study of "techno-social" systems, where infrastructures composed of different technological layers are interoperating within the social component that drives their use and development. In this context we aim at understanding how the very same elements assembled in large number can give rise - according to the various forces and elements at play - to different macroscopic and dynamical behaviors, opening the path to quantitative computational approaches and forecasting power. The main research lines pursued at the moment are: Develop analytical and computational models for the co-evolution and interdependence of large-scale social, technological and biological networks. Modeling contagion processes in structured populations. Developing predictive computational tools for the analysis of the spatial spread of emerging diseases. Analyze the dynamics and evolution of information and social networks. Model the adaptive behavior of social systems. Prof. Vespignani is a joint appointment between the College of Science, the College of Computer and Information Science, and the Bouvé College of Health Sciences.

【Keywords】:

Research Track Papers 217

5. Learning Effective Road Network Representation with Hierarchical Graph Neural Networks.

【Paper Link】【Pages】:6-14

【Authors】: Ning Wu ; Wayne Xin Zhao ; Jingyuan Wang ; Dayan Pan

【Abstract】: Road network is the core component of urban transportation, and it is widely useful in various traffic-related systems and applications. Due to its important role, it is essential to develop general, effective, and robust road network representation models. Although several efforts have been made in this direction, they cannot fully capture the complex characteristics of road networks.

【Keywords】: Information systems; Information systems applications; Spatial-temporal systems

6. Interpretability is a Kind of Safety: An Interpreter-based Ensemble for Adversary Defense.

【Paper Link】【Pages】:15-24

【Authors】: Jingyuan Wang ; Yufan Wu ; Mingxuan Li ; Xin Lin ; Junjie Wu ; Chao Li

【Abstract】: While having achieved great success in rich real-life applications, deep neural network (DNN) models have long been criticized for their vulnerability to adversarial attacks. Tremendous research efforts have been dedicated to mitigating the threats of adversarial attacks, but the essential trait of adversarial examples is not yet clear, and most existing methods are yet vulnerable to hybrid attacks and suffer from counterattacks. In light of this, in this paper, we first reveal a gradient-based correlation between sensitivity analysis-based DNN interpreters and the generation process of adversarial examples, which indicates the Achilles's heel of adversarial attacks and sheds light on linking together the two long-standing challenges of DNN: fragility and unexplainability. We then propose an interpreter-based ensemble framework called X-Ensemble for robust adversary defense. X-Ensemble adopts a novel detection-rectification process and features in building multiple sub-detectors and a rectifier upon various types of interpretation information toward target classifiers. Moreover, X-Ensemble employs the Random Forests (RF) model to combine sub-detectors into an ensemble detector for adversarial hybrid attacks defense. The non-differentiable property of RF further makes it a precious choice against the counterattack of adversaries. Extensive experiments under various types of state-of-the-art attacks and diverse attack scenarios demonstrate the advantages of X-Ensemble to competitive baseline methods.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks

7. Higher-order Clustering in Complex Heterogeneous Networks.

【Paper Link】【Pages】:25-35

【Authors】: Aldo G. Carranza ; Ryan A. Rossi ; Anup Rao ; Eunyee Koh

【Abstract】: Heterogeneous networks are seemingly ubiquitous in the real world. Yet, most graph mining methods such as clustering have mostly focused on homogeneous graphs by ignoring semantic information in real-world systems. Moreover, most methods are based on first-order connectivity patterns (edges) despite that higher-order connectivity patterns are known to be important in understanding the structure and organization of such networks. In this work, we propose a framework for higher-order spectral clustering in heterogeneous networks through the notions of typed graphlets and typed-graphlet conductance. The proposed method builds clusters that preserve the connectivity of higher-order structures built up from typed graphlets. The approach generalizes previous work on higher-order spectral clustering. We theoretically prove a number of important results including a Cheeger-like inequality for typed-graphlet conductance that shows near-optimal bounds for the method. The theoretical results greatly simplify previous work while providing a unifying theoretical framework for analyzing higher-order spectral methods. Empirically, we demonstrate the effectiveness of the framework quantitatively for three important applications including clustering, compression, and link prediction.

【Keywords】: Computing methodologies; Artificial intelligence; Machine learning; Machine learning approaches; Logical and relational learning; Information systems; Information systems applications; Data mining; Mathematics of computing; Discrete mathematics; Combinatorics; Graph theory; Approximation algorithms; Graph algorithms; Theory of computation; Design and analysis of algorithms; Graph algorithms analysis

8. Preserving Dynamic Attention for Long-Term Spatial-Temporal Prediction.

【Paper Link】【Pages】:36-46

【Authors】: Haoxing Lin ; Rufan Bai ; Weijia Jia ; Xinyu Yang ; Yongjian You

【Abstract】: Effective long-term predictions have been increasingly demanded in urban-wise data mining systems. Many practical applications, such as accident prevention and resource pre-allocation, require an extended period for preparation. However, challenges come as long-term prediction is highly error-sensitive, which becomes more critical when predicting urban-wise phenomena with complicated and dynamic spatial-temporal correlation. Specifically, since the amount of valuable correlation is limited, enormous irrelevant features introduce noises that trigger increased prediction errors. Besides, after each time step, the errors can traverse through the correlations and reach the spatial-temporal positions in every future prediction, leading to significant error propagation. To address these issues, we propose a Dynamic Switch-Attention Network (DSAN) with a novel Multi-Space Attention (MSA) mechanism that measures the correlations between inputs and outputs explicitly. To filter out irrelevant noises and alleviate the error propagation, DSAN dynamically extracts valuable information by applying self-attention over the noisy input and bridges each output directly to the purified inputs via implementing a switch-attention mechanism. Through extensive experiments on two spatial-temporal prediction tasks, we demonstrate the superior advantage of DSAN in both short-term and long-term predictions. The source code can be obtained from https://github.com/hxstarklin/DSAN.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information systems applications; Data mining; Spatial-temporal systems

9. Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach.

【Paper Link】【Pages】:47-55

【Authors】: Qifan Wang ; Li Yang ; Bhargav Kanagal ; Sumit Sanghai ; D. Sivakumar ; Bin Shu ; Zac Yu ; Jon Elsas

【Abstract】: Attribute value extraction refers to the task of identifying values of an attribute of interest from product information. It is an important research topic which has been widely studied in e-Commerce and relation learning. There are two main limitations in existing attribute value extraction methods: scalability and generalizability. Most existing methods treat each attribute independently and build separate models for each of them, which are not suitable for large scale attribute systems in real-world applications. Moreover, very limited research has focused on generalizing extraction to new attributes.

【Keywords】: Information systems; Information systems applications; Data mining

10. Kernel Assisted Learning for Personalized Dose Finding.

【Paper Link】【Pages】:56-65

【Authors】: Liangyu Zhu ; Wenbin Lu ; Michael R. Kosorok ; Rui Song

【Abstract】: An individualized dose rule recommends a dose level within a continuous safe dose range based on patient level information such as physical conditions, genetic factors and medication histories. Traditionally, personalized dose finding process requires repeating clinical visits of the patient and frequent adjustments of the dosage. Thus the patient is constantly exposed to the risk of underdosing and overdosing during the process. Statistical methods for finding an optimal individualized dose rule can lower the costs and risks for patients. In this article, we propose a kernel assisted learning method for estimating the optimal individualized dose rule. The proposed methodology can also be applied to all other continuous decision-making problems. Advantages of the proposed method include robustness to model misspecification and capability of providing statistical inference for the estimated parameters. In the simulation studies, we show that this method is capable of identifying the optimal individualized dose rule and produces favorable expected outcomes in the population. Finally, we illustrate our approach using data from a warfarin dosing study for thrombosis patients.

【Keywords】: Applied computing; Life and medical sciences; Health care information systems; Operations research; Decision analysis; Physical sciences and engineering; Mathematics and statistics; Computing methodologies; Machine learning; Machine learning algorithms; Dynamic programming for Markov decision processes; Q-learning; Machine learning approaches; Kernel methods; Modeling and simulation; Model development and analysis; Mathematics of computing; Probability and statistics; Multivariate statistics; Nonparametric statistics; Probabilistic inference problems; Hypothesis testing and confidence interval computation; Probabilistic representations; Nonparametric representations; Kernel density estimators; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Reinforcement learning; Sequential decision making

11. Graph Structure Learning for Robust Graph Neural Networks.

【Paper Link】【Pages】:66-74

【Authors】: Wei Jin ; Yao Ma ; Xiaorui Liu ; Xianfeng Tang ; Suhang Wang ; Jiliang Tang

【Abstract】: Graph Neural Networks (GNNs) are powerful tools in representation learning for graphs. However, recent studies show that GNNs are vulnerable to carefully-crafted perturbations, called adversarial attacks. Adversarial attacks can easily fool GNNs in making predictions for downstream tasks. The vulnerability to adversarial attacks has raised increasing concerns for applying GNNs in safety-critical applications. Therefore, developing robust algorithms to defend adversarial attacks is of great significance. A natural idea to defend adversarial attacks is to clean the perturbed graph. It is evident that real-world graphs share some intrinsic properties. For example, many real-world graphs are low-rank and sparse, and the features of two adjacent nodes tend to be similar. In fact, we find that adversarial attacks are likely to violate these graph properties. Therefore, in this paper, we explore these properties to defend adversarial attacks on graphs. In particular, we propose a general framework Pro-GNN, which can jointly learn a structural graph and a robust graph neural network model from the perturbed graph guided by these properties. Extensive experiments on real-world graphs demonstrate that the proposed framework achieves significantly better performance compared with the state-of-the-art defense methods, even when the graph is heavily perturbed. We release the implementation of Pro-GNN to our DeepRobust repository for adversarial attacks and defenses. The specific experimental settings to reproduce our results can be found in https://github.com/ChandlerBang/Pro-GNN.

【Keywords】: Computing methodologies; Machine learning; Learning settings; Semi-supervised learning settings; Machine learning approaches; Neural networks

12. An Efficient Neighborhood-based Interaction Model for Recommendation on Heterogeneous Graph.

【Paper Link】【Pages】:75-84

【Authors】: Jiarui Jin ; Jiarui Qin ; Yuchen Fang ; Kounianhua Du ; Weinan Zhang ; Yong Yu ; Zheng Zhang ; Alexander J. Smola

【Abstract】: There is an influx of heterogeneous information network (HIN) based recommender systems in recent years since HIN is capable of characterizing complex graphs and contains rich semantics. Although the existing approaches have achieved performance improvement, while practical, they still face the following problems. On one hand, most existing HIN-based methods rely on explicit path reachability to leverage path-based semantic relatedness between users and items, e.g., metapath-based similarities. These methods are hard to use and integrate since path connections are sparse or noisy, and are often of different lengths. On the other hand, other graph-based methods aim to learn effective heterogeneous network representations by compressing node together with its neighborhood information into single embedding before prediction. This weakly coupled manner in modeling overlooks the rich interactions among nodes, which introduces an early summarization issue. In this paper, we propose an end-to-end Neighborhood-based Interaction Model for Recommendation (NIRec) to address above problems. Specifically, we first analyze the significance of learning interactions in HINs and then propose a novel formulation to capture the interactive patterns between each pair of nodes through their metapath-guided neighborhoods. Then, to explore complex interactions between metapaths and deal with the learning complexity on large-scale networks, we formulate interaction in a convolutional way and learn efficiently with fast Fourier transform. The extensive experiments on four different types of heterogeneous graphs demonstrate the performance gains of NIRec comparing with state-of-the-arts. To the best of our knowledge, this is the first work providing an efficient neighborhood-based interaction model in the HIN-based recommendations.

【Keywords】: Computer systems organization; Architectures; Other architectures; Heterogeneous (hybrid) systems; Information systems; Information systems applications; Data mining

13. Directional Multivariate Ranking.

【Paper Link】【Pages】:85-94

【Authors】: Nan Wang ; Hongning Wang

【Abstract】: User-provided multi-aspect evaluations manifest users' detailed feedback on the recommended items and enable fine-grained understanding of their preferences. Extensive studies have shown that modeling such data greatly improves the effectiveness and explainability of the recommendations. However, as ranking is essential in recommendation, there is no principled solution yet for collectively generating multiple item rankings over different aspects.

【Keywords】: Information systems; Information retrieval; Retrieval models and ranking; Probabilistic retrieval models; Retrieval tasks and goals; Recommender systems; Users and interactive retrieval; Personalization; World Wide Web; Web searching and information discovery; Content ranking; Personalization; Social recommendation

14. Truth Discovery against Strategic Sybil Attack in Crowdsourcing.

【Paper Link】【Pages】:95-104

【Authors】: Yue Wang ; Ke Wang ; Chunyan Miao

【Abstract】: Crowdsourcing is an information system for recruiting online workers to perform human intelligent tasks (HITs) that are hard for computers. Due to the openness of crowdsourcing, dynamic online workers with different knowledge backgrounds might give conflicting labels to a task. With the assumption that workers provide their labels independently, most existing works aggregate worker labels in a voting manner, which is vulnerable to Sybil attack where the attacker earns easy rewards by coordinating several Sybil workers to share a randomized label on each task for dominating the aggregation result. A strategic Sybil attacker also attempts to evade Sybil detection. In this paper, we propose a novel approach, called TDSSA (Truth Discovery against Strategic Sybil Attack), to defend against strategic Sybil attack. Experimental results on real-world and synthetic datasets indicate that TDSSA ensures more accurate inference of true labels under various Sybil attacking scenarios, as compared to state-of-the-art methods.

【Keywords】: Information systems; World Wide Web; Web applications; Crowdsourcing; Reputation systems; Trust

15. Partial Multi-Label Learning via Probabilistic Graph Matching Mechanism.

【Paper Link】【Pages】:105-113

【Authors】: Gengyu Lyu ; Songhe Feng ; Yidong Li

【Abstract】: Partial Multi-Label learning (PML) learns from the ambiguous data where each instance is associated with a candidate label set, where only a part is correct. The key to solve such problem is to disambiguate the candidate label sets and identify the correct assignments between instances and their ground-truth labels. In this paper, we interpret such assignments as instance-to-label matchings, and formulate the task of PML as a matching selection problem. To model such problem, we propose a novel grapH mAtching based partial muLti-label lEarning (HALE) framework, where Graph Matching scheme is incorporated owing to its good performance of exploiting the instance and label relationship. Meanwhile, since conventional one-to-one graph matching algorithm does not satisfy the constraint of PML problem that multiple instances may correspond to multiple labels, we extend the traditional probabilistic graph matching algorithm from one-to-one constraint to many-to-many constraint, and make the proposed framework to accommodate to the PML problem. Moreover, to improve the performance of predictive model, both the minimum error reconstruction and k-nearest-neighbor weight voting scheme are employed to assign more accurate labels for unseen instances. Extensive experiments on various data sets demonstrate the superiority of our proposed method.

【Keywords】: Computing methodologies; Machine learning

16. Spectrum-Guided Adversarial Disparity Learning.

【Paper Link】【Pages】:114-124

【Authors】: Zhe Liu ; Lina Yao ; Lei Bai ; Xianzhi Wang ; Can Wang

【Abstract】: It has been a significant challenge to portray intraclass disparity precisely in the area of activity recognition, as it requires a robust representation of the correlation between subject-specific variation for each activity class. In this work, we propose a novel end-to-end knowledge directed adversarial learning framework, which portrays the class-conditioned intraclass disparity using two competitive encoding distributions and learns the purified latent codes by denoising learned disparity. Furthermore, the domain knowledge is incorporated in an unsupervised manner to guide the optimization and further boosts the performance. The experiments on four HAR benchmark datasets demonstrate the robustness and generalization of our proposed methods over a set of state-of-the-art. We further prove the effectiveness of automatic domain knowledge incorporation in performance enhancement.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Learning latent representations; Neural networks

17. Attention and Memory-Augmented Networks for Dual-View Sequential Learning.

【Paper Link】【Pages】:125-134

【Authors】: Yong He ; Cheng Wang ; Nan Li ; Zhenyu Zeng

【Abstract】: In recent years, sequential learning has been of great interest due to the advance of deep learning with applications in time-series forecasting, natural language processing, and speech recognition. Recurrent neural networks (RNNs) have achieved superior performance in single-view and synchronous multi-view sequential learning comparing to traditional machine learning models. However, the method remains less explored in asynchronous multi-view sequential learning, and the unalignment nature of multiple sequences poses a great challenge to learn the inter-view interactions. We develop an AMANet (Attention and Memory-Augmented Networks) architecture by integrating both attention and memory to solve asynchronous multi-view learning problem in general, and we focus on experiments in dual-view sequences in this paper. Self-attention and inter-attention are employed to capture intra-view interaction and inter-view interaction, respectively. History attention memory is designed to store the historical information of a specific object, which serves as local knowledge storage. Dynamic external memory is used to store global knowledge for each view. We evaluate our model in three tasks: medication recommendation from a patient's medical records, diagnosis-related group (DRG) classification from a hospital record, and invoice fraud detection through a company's taxation behaviors. The results demonstrate that our model outperforms all baselines and other state-of-the-art models in all tasks. Moreover, the ablation study of our model indicates that the inter-attention mechanism plays a key role in the model and it can boost the predictive power by effectively capturing the inter-view interactions from asynchronous views.

【Keywords】: Applied computing; Life and medical sciences; Health care information systems; Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by classification; Machine learning approaches; Neural networks; Information systems; Information retrieval; Retrieval tasks and goals; Clustering and classification

18. Semantic Search in Millions of Equations.

【Paper Link】【Pages】:135-143

【Authors】: Lukas Pfahler ; Katharina Morik

【Abstract】: Given the increase of publications, search for relevant papers becomes tedious. In particular, search across disciplines or schools of thinking is not supported. This is mainly due to the retrieval with keyword queries: technical terms differ in different sciences or at different times. Relevant articles might better be identified by their mathematical problem descriptions. Just looking at the equations in a paper already gives a hint to whether the paper is relevant. Hence, we propose a new approach for retrieval of mathematical expressions based on machine learning. We design an unsupervised representation learning task that combines embedding learning with self-supervised learning. Using graph convolutional neural networks we embed mathematical expression into low-dimensional vector spaces that allow efficient nearest neighbor queries. To train our models, we collect a huge dataset with over 29 million mathematical expressions from over 900,000 publications published on arXiv.org. The math is converted into an XML format, which we view as graph data. Our empirical evaluations involving a new dataset of manually annotated search queries show the benefits of using embedding models for mathematical retrieval.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Machine learning approaches; Neural networks; Information systems; Information retrieval; Specialized information retrieval; Structure and multilingual text search; Mathematics retrieval

19. SSumM: Sparse Summarization of Massive Graphs.

【Paper Link】【Pages】:144-154

【Authors】: Kyuhan Lee ; Hyeonsoo Jo ; Jihoon Ko ; Sungsu Lim ; Kijung Shin

【Abstract】: Given a graph G and the desired size k in bits, how can we summarize G within k bits, while minimizing the information loss?

【Keywords】: Computing methodologies; Modeling and simulation; Simulation theory; Network science; Human-centered computing; Collaborative and social computing; Collaborative and social computing design and evaluation methods; Social network analysis; Information systems; Information retrieval; Retrieval tasks and goals; Summarization; Information systems applications; Data mining; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms

20. Rethinking Pruning for Accelerating Deep Inference At the Edge.

【Paper Link】【Pages】:155-164

【Authors】: Dawei Gao ; Xiaoxi He ; Zimu Zhou ; Yongxin Tong ; Ke Xu ; Lothar Thiele

【Abstract】: There is a growing trend to deploy deep neural networks at the edge for high-accuracy, real-time data mining and user interaction. Applications such as speech recognition and language understanding often apply a deep neural network to encode an input sequence and then use a decoder to generate the output sequence. A promising technique to accelerate these applications on resource-constrained devices is network pruning, which compresses the size of the deep neural network without severe drop in inference accuracy. However, we observe that although existing network pruning algorithms prove effective to speed up the prior deep neural network, they lead to dramatic slowdown of the subsequent decoding and may not always reduce the overall latency of the entire application. To rectify such drawbacks, we propose entropy-based pruning, a new regularizer that can be seamlessly integrated into existing network pruning algorithms. Our key theoretical insight is that reducing the information entropy of the deep neural network outputs decreases the upper bound of the subsequent decoding search space. We validate our solution with two state-of-the-art network pruning algorithms on two model architectures. Experimental results show that compared with existing network pruning algorithms, our entropy-based pruning method notably suppresses and even eliminates the increase of decoding time, and achieves shorter overall latency with only negligible extra accuracy loss in the applications.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Human-centered computing; Ubiquitous and mobile computing; Ubiquitous and mobile computing theory, concepts and paradigms; Ubiquitous computing

21. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems.

【Paper Link】【Pages】:165-175

【Authors】: Hao-Jun Michael Shi ; Dheevatsa Mudigere ; Maxim Naumov ; Jiyan Yang

【Abstract】: Modern deep learning-based recommendation systems exploit hundreds to thousands of different categorical features, each with millions of different categories ranging from clicks to posts. To respect the natural diversity within the categorical data, embeddings map each category to a unique dense representation within an embedded space. Since each categorical feature could take on as many as tens of millions of different possible categories, the embedding tables form the primary memory bottleneck during both training and inference. We propose a novel approach for reducing the embedding size in an end-to-end fashion by exploiting complementary partitions of the category set to produce a unique embedding vector for each category without explicit definition. By storing multiple smaller embedding tables based on each complementary partition and combining embeddings from each table, we define a unique embedding for each category at smaller cost. This approach may be interpreted as using a specific fixed codebook to ensure uniqueness of each category's representation. Our experimental results demonstrate the effectiveness of our approach over the hashing trick for reducing the size of the embedding tables in terms of model loss and accuracy, while retaining a similar reduction in the number of parameters.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information systems applications; Computational advertising; World Wide Web; Online advertising

22. Structural Patterns and Generative Models of Real-world Hypergraphs.

【Paper Link】【Pages】:176-186

【Authors】: Manh Tuan Do ; Se-eun Yoon ; Bryan Hooi ; Kijung Shin

【Abstract】: Graphs have been utilized as a powerful tool to model pairwise relationships between people or objects. Such structure is a special type of a broader concept referred to as hypergraph, in which each hyperedge may consist of an arbitrary number of nodes, rather than just two. A large number of real-world datasets are of this form - for example, lists of recipients of emails sent from an organization, users participating in a discussion thread or subject labels tagged in an online question. However, due to complex representations and lack of adequate tools, little attention has been paid to exploring the underlying patterns in these interactions.

【Keywords】: Information systems; Information systems applications; Data mining; World Wide Web; Web mining; Mathematics of computing; Discrete mathematics; Graph theory; Theory of computation; Design and analysis of algorithms; Graph algorithms analysis; Randomness, geometry and discrete structures; Random network models

23. Efficient Algorithm for the b-Matching Graph.

【Paper Link】【Pages】:187-197

【Authors】: Yasuhiro Fujiwara ; Atsutoshi Kumagai ; Sekitoshi Kanai ; Yasutoshi Ida ; Naonori Ueda

【Abstract】: The b-matching graph is a useful approach to computing a graph from high-dimensional data. Unlike the k-NN graph that greedily connects each data point to its k nearest neighbors and typically has more than k edges, each data point in the b-matching graph uniformly has b edges; the idea is reduce edges between cross-clusters that have different semantics. In addition, edge weights are obtained from regression results of each data pointand restricted to be non-negative to improve the robustness for data noise. The b-matching graph can more effectively model high-dimensional data than the traditional k-NN graph. However, the construction cost of the b-matching graph is impractical for large-scale data sets. This is because, to determine edges in the graph, it needs to iteratively update messages between all pairs of data points until convergence, and it computes non-negative edge weights of each data point by applying a solver intended for quadratic programming problems. Our proposal, b-dash, can efficiently construct a b-matching graph because of its two key techniques: (1) it prunes unnecessary update messages in determining edges and (2) it incrementally computes edge weights by exploiting the Sherman-Morrison formula. Experiments show that our approach is up to 58.6 times faster than the previous approaches while guaranteeing result optimality.

【Keywords】: Computing methodologies; Machine learning; Information systems; Information systems applications; Data mining

24. Isolation Distributional Kernel: A New Tool for Kernel based Anomaly Detection.

【Paper Link】【Pages】:198-206

【Authors】: Kai Ming Ting ; Bi-Cun Xu ; Takashi Washio ; Zhi-Hua Zhou

【Abstract】: We introduce Isolation Distributional Kernel as a new way to measure the similarity between two distributions. Existing approaches based on kernel mean embedding, which converts a point kernel to a distributional kernel, have two key issues: the point kernel employed has a feature map with intractable dimensionality; and it is data independent. This paper shows that Isolation Distributional Kernel (IDK), which is based on a data dependent point kernel, addresses both key issues. We demonstrate IDK's efficacy and efficiency as a new tool for kernel based anomaly detection. Without explicit learning, using IDK alone outperforms existing kernel based anomaly detector OCSVM and other kernel mean embedding methods that rely on Gaussian kernel. We reveal for the first time that an effective kernel based anomaly detector based on kernel mean embedding must employ a characteristic kernel which is data dependent.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Kernel methods

25. NodeAug: Semi-Supervised Node Classification with Data Augmentation.

【Paper Link】【Pages】:207-217

【Authors】: Yiwei Wang ; Wei Wang ; Yuxuan Liang ; Yujun Cai ; Juncheng Liu ; Bryan Hooi

【Abstract】: By using Data Augmentation (DA), we present a new method to enhance Graph Convolutional Networks (GCNs), that are the state-of-the-art models for semi-supervised node classification. DA for graph data remains under-explored. Due to the connections built by edges, DA for different nodes influence each other and lead to undesired results, such as uncontrollable DA magnitudes and changes of ground-truth labels. To address this issue, we present the NodeAug (Node-Parallel Augmentation) scheme, that creates a 'parallel universe' for each node to conduct DA, to block the undesired effects from other nodes. NodeAug regularizes the model prediction of every node (including unlabeled) to be invariant with respect to changes induced by Data Augmentation (DA), so as to improve the effectiveness. To augment the input features from different aspects, we propose three DA strategies by modifying both node attributes and the graph structure. In addition, we introduce the subgraph mini-batch training for the efficient implementation of NodeAug. The approach takes the subgraph corresponding to the receptive fields of a batch of nodes as the input per iteration, rather than the whole graph that the prior full-batch training takes. Empirically, NodeAug yields significant gains for strong GCN models on the Cora, Citeseer, Pubmed, and two co-authorship networks, with a more efficient training process thanks to the proposed subgraph mini-batch training approach.

【Keywords】: Computing methodologies; Machine learning; Learning settings; Semi-supervised learning settings; Machine learning algorithms; Regularization; Machine learning approaches; Neural networks

26. An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks.

【Paper Link】【Pages】:218-228

【Authors】: Ruixiang Tang ; Mengnan Du ; Ninghao Liu ; Fan Yang ; Xia Hu

【Abstract】: With the widespread use of deep neural networks (DNNs) in high-stake applications, the security problem of the DNN models has received extensive attention. In this paper, we investigate a specific security problem called trojan attack, which aims to attack deployed DNN systems relying on the hidden trigger patterns inserted by malicious hackers. We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset. Specifically, we do not change parameters in the original model but insert a tiny trojan module (TrojanNet) into the target model. The infected model with a malicious trojan can misclassify inputs into a target label when the inputs are stamped with the special trigger. The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts comparing to conventional trojan attack methods. The experimental results show that TrojanNet can inject the trojan into all labels simultaneously (all-label trojan attack) and achieves 100% attack success rate without affecting model accuracy on original tasks. Experimental analysis further demonstrates that state-of-the-art trojan detection algorithms fail to detect TrojanNet attack. The code is available at https://github.com/trx14/TrojanNet.

【Keywords】: Security and privacy; Intrusion/anomaly detection and malware mitigation; Malware and its mitigation

27. Kronecker Attention Networks.

【Paper Link】【Pages】:229-237

【Authors】: Hongyang Gao ; Zhengyang Wang ; Shuiwang Ji

【Abstract】: Attention operators have been applied on both 1-D data like texts and higher-order data such as images and videos. Use of attention operators on high-order data requires flattening of the spatial or spatial-temporal dimensions into a vector, which is assumed to follow a multivariate normal distribution. This not only incurs excessive requirements on computational resources, but also fails to preserve structures in data. In this work, we propose to avoid flattening by assuming the data follow matrix-variate normal distributions. Based on this new view, we develop Kronecker attention operators (KAOs) that operate on high-order tensor data directly. More importantly, the proposed KAOs lead to dramatic reductions in computational resources. Experimental results show that our methods reduce the amount of required computational resources by a factor of hundreds, with larger factors for higher-dimensional and higher-order data. Results also show that networks with KAOs outperform models without attention, while achieving competitive performance as those with original attention operators.

【Keywords】: Computing methodologies; Artificial intelligence; Machine learning; Machine learning algorithms; Machine learning approaches; Neural networks

28. GRACE: Generating Concise and Informative Contrastive Sample to Explain Neural Network Model's Prediction.

【Paper Link】【Pages】:238-248

【Authors】: Thai Le ; Suhang Wang ; Dongwon Lee

【Abstract】: Despite the recent development in the topic of explainable AI/ML for image and text data, the majority of current solutions are not suitable to explain the prediction of neural network models when the datasets are tabular and their features are in high-dimensional vectorized formats. To mitigate this limitation, therefore, we borrow two notable ideas (i.e., "explanation by intervention" from causality and "explanation are contrastive" from philosophy) and propose a novel solution, named as GRACE, that better explains neural network models' predictions for tabular datasets. In particular, given a model's prediction as label X, GRACE intervenes and generates a minimally-modified contrastive sample to be classified as Y, with an intuitive textual explanation, answering the question of "Why X rather than Y?" We carry out comprehensive experiments using eleven public datasets of different scales and domains (e.g., # of features ranges from 5 to 216) and compare GRACE with competing baselines on different measures: fidelity, conciseness, info-gain, and influence. The user-studies show that our generated explanation is not only more intuitive and easy-to-understand but also facilitates end-users to make as much as 60% more accurate post-explanation decisions than that of Lime.

【Keywords】: Human-centered computing; Visualization; Information systems; Information systems applications; Data mining; Decision support systems

29. Hierarchical Attention Propagation for Healthcare Representation Learning.

【Paper Link】【Pages】:249-256

【Authors】: Muhan Zhang ; Christopher R. King ; Michael Avidan ; Yixin Chen

【Abstract】: Medical ontologies are widely used to represent and organize medical terminologies. Examples include ICD-9, ICD-10, UMLS etc. The ontologies are often constructed in hierarchical structures, encoding the multi-level subclass relationships among different medical concepts, allowing very fine distinctions between concepts. Medical ontologies provide a great source for incorporating domain knowledge into a healthcare prediction system, which might alleviate the data insufficiency problem and improve predictive performance with rare categories. To incorporate such domain knowledge, Gram, a recent graph attention model, represents a medical concept as a weighted sum of its ancestors' embeddings in the ontology using an attention mechanism. Although showing improved performance, Gram only considers the unordered ancestors of a concept, which does not fully leverage the hierarchy thus having limited expressibility. In this paper, we propose Hierarchical Attention Propagation (HAP), a novel medical ontology embedding model that hierarchically propagate attention across the entire ontology structure, where a medical concept adaptively learns its embedding from all other concepts in the hierarchy instead of only its ancestors. We prove that HAP learns more expressive medical concept embeddings -- from any medical concept embedding we are able to fully recover the entire ontology structure. Experimental results on two sequential procedure/diagnosis prediction tasks demonstrate HAP's better embedding quality than Gram and other baselines. Furthermore, we find that it is not always best to use the full ontology. Sometimes using only lower levels of the hierarchy outperforms using all levels.

【Keywords】: Applied computing; Life and medical sciences; Health informatics; Computing methodologies; Artificial intelligence; Knowledge representation and reasoning

30. SCE: Scalable Network Embedding from Sparsest Cut.

【Paper Link】【Pages】:257-265

【Authors】: Shengzhong Zhang ; Zengfeng Huang ; Haicang Zhou ; Ziang Zhou

【Abstract】: Large-scale network embedding is to learn a latent representation for each node in an unsupervised manner, which captures inherent properties and structural information of the underlying graph. In this field, many popular approaches are influenced by the skip-gram model from natural language processing. Most of them use a contrastive objective to train an encoder which forces the embeddings of similar pairs to be close and embeddings of negative samples to be far. A key of success to such contrastive learning methods is how to draw positive and negative samples. While negative samples that are generated by straightforward random sampling are often satisfying, methods for drawing positive examples remains a hot topic.

【Keywords】: Computing methodologies; Machine learning; Machine learning algorithms; Feature selection; Machine learning approaches; Learning latent representations; Information systems; Information systems applications; Data mining; Mathematics of computing; Discrete mathematics; Graph theory

31. Local Community Detection in Multiple Networks.

【Paper Link】【Pages】:266-274

【Authors】: Dongsheng Luo ; Yuchen Bian ; Yaowei Yan ; Xiao Liu ; Jun Huan ; Xiang Zhang

【Abstract】: Local community detection aims to find a set of densely-connected nodes containing given query nodes. Most existing local community detection methods are designed for a single network. However, a single network can be noisy and incomplete. Multiple networks are more informative in real-world applications. There are multiple types of nodes and multiple types of node proximities. Complementary information from different networks helps to improve detection accuracy. In this paper, we propose a novel RWM (Random Walk in Multiple networks) model to find relevant local communities in all networks for a given query node set from one network. RWM sends a random walker in each network to obtain the local proximity w.r.t. the query nodes (i.e., node visiting probabilities).

【Keywords】: Information systems; Information systems applications; Data mining

32. A Block Decomposition Algorithm for Sparse Optimization.

【Paper Link】【Pages】:275-285

【Authors】: Ganzhao Yuan ; Li Shen ; Wei-Shi Zheng

【Abstract】: Sparse optimization is a central problem in machine learning and computer vision. However, this problem is inherently NP-hard and thus difficult to solve in general. Combinatorial search methods find the global optimal solution but are confined to small-sized problems, while coordinate descent methods are efficient but often suffer from poor local minima. This paper considers a new block decomposition algorithm that combines the effectiveness of combinatorial search methods and the efficiency of coordinate descent methods. Specifically, we consider a random strategy or/and a greedy strategy to select a subset of coordinates as the working set, and then perform a global combinatorial search over the working set based on the original objective function. We show that our method finds stronger stationary points than Amir Beck et al.'s coordinate-wise optimization method. In addition, we establish the convergence rate of our algorithm. Our experiments on solving sparse regularized and sparsity constrained least squares optimization problems demonstrate that our method achieves state-of-the-art performance in terms of accuracy. For example, our method generally outperforms the well-known greedy pursuit method.

【Keywords】: Mathematics of computing; Discrete mathematics; Combinatorics; Combinatorial optimization; Theory of computation; Design and analysis of algorithms; Mathematical optimization; Continuous optimization; Nonconvex optimization

33. Adversarial Infidelity Learning for Model Interpretation.

【Paper Link】【Pages】:286-296

【Authors】: Jian Liang ; Bing Bai ; Yuren Cao ; Kun Bai ; Fei Wang

【Abstract】: Model interpretation is essential in data mining and knowledge discovery. It can help understand the intrinsic model working mechanism and check if the model has undesired characteristics. A popular way of performing model interpretation is Instance-wise Feature Selection (IFS), which provides an importance score of each feature representing the data samples to explain how the model generates the specific output. In this paper, we propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation, mitigating concerns about sanity, combinatorial shortcuts, model identifiability, and information transmission. Also, we focus on the following setting: using selected features to directly predict the output of the given model, which serves as a primary evaluation metric for model-interpretation methods. Apart from the features, we involve the output of the given model as an additional input to learn an explainer based on more accurate information. To learn the explainer, besides fidelity, we propose an Adversarial Infidelity Learning (AIL) mechanism to boost the explanation learning by screening relatively unimportant features. Through theoretical and experimental analysis, we show that our AIL mechanism can help learn the desired conditional distribution between selected features and targets. Moreover, we extend our framework by integrating efficient interpretation methods as proper priors to provide a warm start. Comprehensive empirical evaluation results are provided by quantitative metrics and human evaluation to demonstrate the effectiveness and superiority of our proposed method. Our code is publicly available online at https://github.com/langlrsw/MEED.

【Keywords】: Computing methodologies; Machine learning; Machine learning algorithms; Feature selection; Machine learning approaches; Instance-based learning; Neural networks

34. Grounding Visual Concepts for Zero-Shot Event Detection and Event Captioning.

【Paper Link】【Pages】:297-305

【Authors】: Zhihui Li ; Xiaojun Chang ; Lina Yao ; Shirui Pan ; Zongyuan Ge ; Huaxiang Zhang

【Abstract】: The flourishing of social media platforms requires techniques for understanding the content of media on a large scale. However, state-of-the art video event understanding approaches remain very limited in terms of their ability to deal with data sparsity, semantically unrepresentative event names, and lack of coherence between visual and textual concepts. Accordingly, in this paper, we propose a method of grounding visual concepts for large-scale Multimedia Event Detection (MED) and Multimedia Event Captioning (MEC) in zero-shot setting. More specifically, our framework composes the following: (1) deriving the novel semantic representations of events from their textual descriptions, rather than event names; (2) aggregating the ranks of grounded concepts for MED tasks. A statistical mean-shift outlier rejection model is proposed to remove the outlying concepts which are incorrectly grounded; and (3) defining MEC tasks and augmenting the MEC training set by the videos detected in MED in a zero-shot setting. To the best of our knowledge, this work is the first time to define and solve the MEC task, which is a further step towards understanding video events. We conduct extensive experiments and achieve state-of-the-art performance on the TRECVID MEDTest dataset, as well as our newly proposed TRECVID-MEC dataset.

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

35. How to Count Triangles, without Seeing the Whole Graph.

【Paper Link】【Pages】:306-316

【Authors】: Suman K. Bera ; C. Seshadhri

【Abstract】: Triangle counting is a fundamental problem in the analysis of large graphs. There is a rich body of work on this problem, in varying streaming and distributed models, yet all these algorithms require reading the whole input graph. In many scenarios, we do not have access to the whole graph, and can only sample a small portion of the graph (typically through crawling). In such a setting, how can we accurately estimate the triangle count of the graph?

【Keywords】: Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms; Probability and statistics; Probabilistic algorithms; Theory of computation; Design and analysis of algorithms; Streaming, sublinear and near linear time algorithms; Sketching and sampling; Randomness, geometry and discrete structures; Random walks and Markov chains

36. Incremental Lossless Graph Summarization.

【Paper Link】【Pages】:317-327

【Authors】: Jihoon Ko ; Yunbum Kook ; Kijung Shin

【Abstract】: Given a fully dynamic graph, represented as a stream of edge insertions and deletions, how can we obtain and incrementally update a lossless summary of its current snapshot? As large-scale graphs are prevalent, concisely representing them is inevitable for efficient storage and analysis. Lossless graph summarization is an effective graph-compression technique with many desirable properties. It aims to compactly represent the input graph as (a) a summary graph consisting of supernodes (i.e., sets of nodes) and superedges (i.e., edges between supernodes), which provide a rough description, and (b) edge corrections which fix errors induced by the rough description. While a number of batch algorithms, suited for static graphs, have been developed for rapid and compact graph summarization, they are highly inefficient in terms of time and space for dynamic graphs, which are common in practice.

【Keywords】: Computing methodologies; Modeling and simulation; Simulation theory; Network science; Information systems; Information retrieval; Retrieval tasks and goals; Summarization; Information systems applications; Data mining; Data stream mining; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms

37. From Online to Non-i.i.d. Batch Learning.

【Paper Link】【Pages】:328-337

【Authors】: Yufei Tao ; Shangqi Lu

【Abstract】: This paper initializes the study of online-to-batch conversion when the samples in batch learning are not i.i.d. Our motivation originated from two facts. First, sample sets in reality are seldom i.i.d., thus preventing the application of the existing conversions. Second, the online model of learning permits an adversarial stream of samples that almost for sure violates the i.i.d. assumption, raising the possibility of adapting an online algorithm effectively to learn from a non-i.i.d. sample set. We present a set of techniques to utilize an online algorithm as a black box to perform batch learning in the absence of the i.i.d. assumption. Our techniques are generic, and are applicable to virtually any online algorithms on classification. This provides strong evidence that the great variety of known algorithms in the online-learning literature can indeed be harnessed to learn from sufficiently-representative non-i.i.d. samples.

【Keywords】: Theory of computation; Theory and algorithms for application domains; Machine learning theory

38. Towards Deeper Graph Neural Networks.

【Paper Link】【Pages】:338-348

【Authors】: Meng Liu ; Hongyang Gao ; Shuiwang Ji

【Abstract】: Graph neural networks have shown significant success in the field of graph representation learning. Graph convolutions perform neighborhood aggregation and represent one of the most important graph operations. Nevertheless, one layer of these neighborhood aggregation methods only consider immediate neighbors, and the performance decreases when going deeper to enable larger receptive fields. Several recent studies attribute this performance deterioration to the over-smoothing issue, which states that repeated propagation makes node representations of different classes indistinguishable. In this work, we study this observation systematically and develop new insights towards deeper graph neural networks. First, we provide a systematical analysis on this issue and argue that the key factor compromising the performance significantly is the entanglement of representation transformation and propagation in current graph convolution operations. After decoupling these two operations, deeper graph neural networks can be used to learn graph node representations from larger receptive fields. We further provide a theoretical analysis of the above observation when building very deep models, which can serve as a rigorous and gentle description of the over-smoothing issue. Based on our theoretical and empirical analysis, we propose Deep Adaptive Graph Neural Network (DAGNN) to adaptively incorporate information from large receptive fields. A set of experiments on citation, co-authorship, and co-purchase datasets have confirmed our analysis and insights and demonstrated the superiority of our proposed methods.

【Keywords】: Computing methodologies; Artificial intelligence; Machine learning; Machine learning approaches; Neural networks; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms

39. Laplacian Change Point Detection for Dynamic Graphs.

【Paper Link】【Pages】:349-358

【Authors】: Shenyang Huang ; Yasmeen Hitti ; Guillaume Rabusseau ; Reihaneh Rabbany

【Abstract】: Dynamic and temporal graphs are rich data structures that are used to model complex relationships between entities over time. In particular, anomaly detection in temporal graphs is crucial for many real world applications such as intrusion identification in network systems, detection of ecosystem disturbances and detection of epidemic outbreaks. In this paper, we focus on change point detection in dynamic graphs and address two main challenges associated with this problem: I) how to compare graph snapshots across time, II) how to capture temporal dependencies. To solve the above challenges, we propose Laplacian Anomaly Detection (LAD) which uses the spectrum of the Laplacian matrix of the graph structure at each snapshot to obtain low dimensional embeddings. LAD explicitly models short term and long term dependencies by applying two sliding windows. In synthetic experiments, LAD outperforms the state-of-the-art method. We also evaluate our method on three real dynamic networks: UCI message network, US senate co-sponsorship network and Canadian bill voting network. In all three datasets, we demonstrate that our method can more effectively identify anomalous time points according to significant real world events.

【Keywords】: Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Temporal reasoning; Machine learning; Learning paradigms; Unsupervised learning; Anomaly detection; Machine learning algorithms; Spectral methods; Mathematics of computing; Discrete mathematics; Graph theory; Spectra of graphs; Theory of computation; Design and analysis of algorithms; Graph algorithms analysis; Dynamic graph algorithms

40. Learning Transferrable Parameters for Long-tailed Sequential User Behavior Modeling.

【Paper Link】【Pages】:359-367

【Authors】: Jianwen Yin ; Chenghao Liu ; Weiqing Wang ; Jianling Sun ; Steven C. H. Hoi

【Abstract】: Sequential user behavior modeling plays a crucial role in online user-oriented services, such as product purchasing, news feed consumption, and online advertising. The performance of sequential modeling heavily depends on the scale and quality of historical behaviors. However, the number of user behaviors inherently follows a long-tailed distribution, which has been seldom explored. In this work, we argue that focusing on tail users could bring more benefits and address the long tails issue by learning transferrable parameters from both optimization and feature perspectives. Specifically, we propose a gradient alignment optimizer and adopt an adversarial training scheme to facilitate knowledge transfer from the head to the tail. Such methods can also deal with the cold-start problem of new users. Moreover, it could be directly adaptive to various well-established sequential models. Extensive experiments on four real-world datasets verify the superiority of our framework compared with the state-of-the-art baselines.

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

41. TranSlider: Transfer Ensemble Learning from Exploitation to Exploration.

【Paper Link】【Pages】:368-378

【Authors】: Kuo Zhong ; Ying Wei ; Chun Yuan ; Haoli Bai ; Junzhou Huang

【Abstract】: In transfer learning, what and where to transfer has been widely studied. Nevertheless, the learned transfer strategies are at high risk of over-fitting, especially when only a few annotated instances are available in the target domain. In this paper, we introduce the concept of transfer ensemble learning, a new direction to tackle the over-fitting of transfer strategies. Intuitively, models with different transfer strategies offer various perspectives on what and where to transfer. Therefore a core problem is to search these diversely transferred models for ensemble so as to achieve better generalization. Towards this end, we propose the Transferability Slider (TranSlider) for transfer ensemble learning. By decreasing the transferability, we obtain a spectrum of base models ranging from pure exploitation of the source model to unconstrained exploration for the target domain. Furthermore, the manner of decreasing transferability with parameter sharing guarantees fast optimization at no additional training cost. Finally, we conduct extensive experiments with various analyses, which demonstrate that TranSlider achieves the state-of-the-art on comprehensive benchmark datasets.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Multi-task learning; Transfer learning; Supervised learning; Supervised learning by classification; Machine learning algorithms; Ensemble methods

42. InFoRM: Individual Fairness on Graph Mining.

【Paper Link】【Pages】:379-389

【Authors】: Jian Kang ; Jingrui He ; Ross Maciejewski ; Hanghang Tong

【Abstract】: Algorithmic bias and fairness in the context of graph mining have largely remained nascent. The sparse literature on fair graph mining has almost exclusively focused on group-based fairness notation. However, the notion of individual fairness, which promises the fairness notion at a much finer granularity, has not been well studied. This paper presents the first principled study of Individual Fairness on gRaph Mining (InFoRM). First, we present a generic definition of individual fairness for graph mining which naturally leads to a quantitative measure of the potential bias in graph mining results. Second, we propose three mutually complementary algorithmic frameworks to mitigate the proposed individual bias measure, namely debiasing the input graph, debiasing the mining model and debiasing the mining results. Each algorithmic framework is formulated from the optimization perspective, using effective and efficient solvers, which are applicable to multiple graph mining tasks. Third, accommodating individual fairness is likely to change the original graph mining results without the fairness consideration. We conduct a thorough analysis to develop an upper bound to characterize the cost (i.e., the difference between the graph mining results with and without the fairness consideration). We perform extensive experimental evaluations on real-world datasets to demonstrate the efficacy and generality of the proposed methods.

【Keywords】: Applied computing; Law, social and behavioral sciences; Information systems; Information systems applications; Data mining

43. Local Motif Clustering on Time-Evolving Graphs.

【Paper Link】【Pages】:390-400

【Authors】: Dongqi Fu ; Dawei Zhou ; Jingrui He

【Abstract】: Graph motifs are subgraph patterns that occur in complex networks, which are of key importance for gaining deep insights into the structure and functionality of the graph. Motif clustering aims at finding clusters consisting of dense motif patterns. It is commonly used in various application domains, ranging from social networks to collaboration networks, from market-basket analysis to neuroscience applications. More recently, local clustering techniques have been proposed for motif-aware clustering, which focuses on a small neighborhood of the input seed node instead of the entire graph. However, most of these techniques are designed for static graphs and may render sub-optimal results when applied to large time-evolving graphs. To bridge this gap, in this paper, we propose a novel framework, Local Motif Clustering on Time-Evolving Graphs (L-MEGA), which provides the evolution pattern of the local motif cluster in an effective and efficient way. The core of L-MEGA is approximately tracking the temporal evolution of the local motif cluster via novel techniques such as edge filtering, motif push operation, and incremental sweep cut. Furthermore, we theoretically analyze the efficiency and effectiveness of these techniques on time-evolving graphs. Finally, we evaluate the L-MEGA framework via extensive experiments on both synthetic and real-world temporal networks.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Cluster analysis; Motif discovery; Information systems; Information systems applications; Data mining; Clustering; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms

44. A Data-Driven Graph Generative Model for Temporal Interaction Networks.

【Paper Link】【Pages】:401-411

【Authors】: Dawei Zhou ; Lecheng Zheng ; Jiawei Han ; Jingrui He

【Abstract】: Deep graph generative models have recently received a surge of attention due to its superiority of modeling realistic graphs in a variety of domains, including biology, chemistry, and social science. Despite the initial success, most, if not all, of the existing works are designed for static networks. Nonetheless, many realistic networks are intrinsically dynamic and presented as a collection of system logs (i.e., timestamped interactions/edges between entities), which pose a new research direction for us: how can we synthesize realistic dynamic networks by directly learning from the system logs? In addition, how can we ensure the generated graphs preserve both the structural and temporal characteristics of the real data?

【Keywords】: Networks; Network properties; Network structure; Topology analysis and generation; Theory of computation; Design and analysis of algorithms; Graph algorithms analysis; Dynamic graph algorithms

45. Recurrent Networks for Guided Multi-Attention Classification.

【Paper Link】【Pages】:412-420

【Authors】: Xin Dai ; Xiangnan Kong ; Tian Guo ; John Boaz Lee ; Xinyue Liu ; Constance M. Moore

【Abstract】: Attention-based image classification has gained increasing popularity in recent years. State-of-the-art methods for attention-based classification typically require a large training set and operate under the assumption that the label of an image depends solely on a single object (i.e. region of interest) in the image. However, in many real-world applications (e.g. medical imaging), it is very expensive to collect a large training set. Moreover, the label of each image is usually determined jointly by multiple regions of interest (ROIs). Fortunately, for such applications, it is often possible to collect the locations of the ROIs in each training image. In this paper, we study the problem of guided multi-attention classification, the goal of which is to achieve high accuracy under the dual constraints of (1) small sample size, and (2) multiple ROIs for each image. We propose a model, called Guided Attention Recurrent Network (GARN), for multi-attention classification. Different from existing attention-based methods, GARN utilizes guidance information regarding multiple ROIs thus allowing it to work well even when sample size is small. Empirical studies on three different visual tasks show that our guided attention approach can effectively boost model performance for multi-attention image classification.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by classification; Machine learning approaches; Neural networks; Information systems; Information systems applications; Data mining; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Reinforcement learning

【Paper Link】【Pages】:421-429

【Authors】: Chao Li ; Haoteng Tang ; Cheng Deng ; Liang Zhan ; Wei Liu

【Abstract】: The vulnerability of deep neural networks has gained a great upsurge of research attention, which engages well-designed examples through adding little perturbations to fool a well-performed network. Meanwhile, a progress has been made in leveraging adversarial examples to boost the robustness of deep cross-modal networks. However, for cross-modal learning, both the causes of adversarial examples and their latent advantages in learning cross-modal correlations are under-explored. In this paper, we propose novel Disentangled Adversarial examples for Cross-Modal learning, dubbed DACM. Specifically, we first divide cross-modal data into two aspects, namely modality-related component and modality-unrelated counterpart, and then learn to improve the reliability of network using the modality-related component. To achieve this goal, we apply the generation of adversarial perturbations to strengthen cross-modal correlations, wherein the modality-related component is acquired through gradually detaching the modality-unrelated component. Finally, the proposed DACM is employed to create modality-related examples towards the application of cross-modal hashing retrieval. Extensive experiments carried out on two cross-modal benchmarks show that the adversarial examples learned by DACM are efficient at fooling a target deep cross-modal hashing network. On the other hand, training this target model by merely leveraging our created modality-related examples in turn significantly promotes the robustness of this model itself.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Reinforcement learning; Adversarial learning; Information systems; Information retrieval; Search engine architectures and scalability; Adversarial retrieval; Specialized information retrieval; Multimedia and multimodal retrieval; Image search; Security and privacy; Cryptography; Symmetric cryptography and hash functions; Hash functions and message authentication codes

47. XGNN: Towards Model-Level Explanations of Graph Neural Networks.

【Paper Link】【Pages】:430-438

【Authors】: Hao Yuan ; Jiliang Tang ; Xia Hu ; Shuiwang Ji

【Abstract】: Graphs neural networks (GNNs) learn node features by aggregating and combining neighbor information, which have achieved promising performance on many graph tasks. However, GNNs are mostly treated as black-boxes and lack human intelligible explanations. Thus, they cannot be fully trusted and used in certain application domains if GNN models cannot be explained. In this work, we propose a novel approach, known as XGNN, to interpret GNNs at the model-level. Our approach can provide high-level insights and generic understanding of how GNNs work. In particular, we propose to explain GNNs by training a graph generator so that the generated graph patterns maximize a certain prediction of the model. We formulate the graph generation as a reinforcement learning task, where for each step, the graph generator predicts how to add an edge into the current graph. The graph generator is trained via a policy gradient method based on information from the trained GNNs. In addition, we incorporate several graph rules to encourage the generated graphs to be valid. Experimental results on both synthetic and real-world datasets show that our proposed methods help understand and verify the trained GNNs. Furthermore, our experimental results indicate that the generated graphs can provide guidance on how to improve the trained GNNs.

48. CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data.

【Paper Link】【Pages】:439-449

【Authors】: Xiang Li ; Ben Kao ; Caihua Shan ; Dawei Yin ; Martin Ester

【Abstract】: We study the problem of applying spectral clustering to cluster multi-scale data, which is data whose clusters are of various sizes and densities. Traditional spectral clustering techniques discover clusters by processing a similarity matrix that reflects the proximity of objects. For multi-scale data, distance-based similarity is not effective because objects of a sparse cluster could be far apart while those of a dense cluster have to be sufficiently close. Following [16], we solve the problem of spectral clustering on multi-scale data by integrating the concept of objects' "reachability similarity" with a given distance-based similarity to derive an objects' coefficient matrix. We propose the algorithm CAST that applies trace Lasso to regularize the coefficient matrix. We prove that the resulting coefficient matrix has the "grouping effect" and that it exhibits "sparsity". We show that these two characteristics imply very effective spectral clustering. We evaluate CAST and 10 other clustering methods on a wide range of datasets w.r.t. various measures. Experimental results show that CAST provides excellent performance and is highly robust across test cases of multi-scale data.

【Keywords】: Computing methodologies; Machine learning; Machine learning algorithms; Spectral methods; Information systems; Information systems applications; Data mining; Clustering

49. INPREM: An Interpretable and Trustworthy Predictive Model for Healthcare.

【Paper Link】【Pages】:450-460

【Authors】: Xianli Zhang ; Buyue Qian ; Shilei Cao ; Yang Li ; Hang Chen ; Yefeng Zheng ; Ian Davidson

【Abstract】: Building a predictive model based on historical Electronic Health Records (EHRs) for personalized healthcare has become an active research area. Benefiting from the powerful ability of feature extraction, deep learning (DL) approaches have achieved promising performance in many clinical prediction tasks. However, due to the lack of interpretability and trustworthiness, it is difficult to apply DL in real clinical cases of decision making. To address this, in this paper, we propose an interpretable and trustworthy predictive model~(INPREM) for healthcare. Firstly, INPREM is designed as a linear model for interpretability while encoding non-linear relationships into the learning weights for modeling the dependencies between and within each visit. This enables us to obtain the contribution matrix of the input variables, which is served as the evidence of the prediction result(s), and help physicians understand why the model gives such a prediction, thereby making the model more interpretable. Secondly, for trustworthiness, we place a random gate (which follows a Bernoulli distribution to turn on or off) over each weight of the model, as well as an additional branch to estimate data noises. With the help of the Monto Carlo sampling and an objective function accounting for data noises, the model can capture the uncertainty of each prediction. The captured uncertainty, in turn, allows physicians to know how confident the model is, thus making the model more trustworthy. We empirically demonstrate that the proposed INPREM outperforms existing approaches with a significant margin. A case study is also presented to show how the contribution matrix and the captured uncertainty are used to assist physicians in making robust decisions.

【Keywords】: Applied computing; Life and medical sciences; Health informatics; Information systems; Information systems applications; Data mining

50. Policy-GNN: Aggregation Optimization for Graph Neural Networks.

【Paper Link】【Pages】:461-471

【Authors】: Kwei-Herng Lai ; Daochen Zha ; Kaixiong Zhou ; Xia Hu

【Abstract】: Graph data are pervasive in many real-world applications. Recently, increasing attention has been paid on graph neural networks (GNNs), which aim to model the local graph structures and capture the hierarchical patterns by aggregating the information from neighbors with stackable network modules. Motivated by the observation that different nodes often require different iterations of aggregation to fully capture the structural information, in this paper, we propose to explicitly sample diverse iterations of aggregation for different nodes to boost the performance of GNNs. It is a challenging task to develop an effective aggregation strategy for each node, given complex graphs and sparse features. Moreover, it is not straightforward to derive an efficient algorithm since we need to feed the sampled nodes into different number of network layers. To address the above challenges, we propose Policy-GNN, a meta-policy framework that models the sampling procedure and message passing of GNNs into a combined learning process. Specifically, Policy-GNN uses a meta-policy to adaptively determine the number of aggregations for each node. The meta-policy is trained with deep reinforcement learning~(RL) by exploiting the feedback from the model. We further introduce parameter sharing and a buffer mechanism to boost the training efficiency. Experimental results on three real-world benchmark datasets suggest that Policy-GNN significantly outperforms the state-of-the-art alternatives, showing the promise in aggregation optimization for GNNs.

【Keywords】: Human-centered computing; Collaborative and social computing; Collaborative and social computing design and evaluation methods; Social network analysis; Information systems; Information systems applications; Data mining; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Markov decision processes; Reinforcement learning; Semi-supervised learning

51. Malicious Attacks against Deep Reinforcement Learning Interpretations.

【Paper Link】【Pages】:472-482

【Authors】: Mengdi Huai ; Jianhui Sun ; Renqin Cai ; Liuyi Yao ; Aidong Zhang

【Abstract】: The past years have witnessed the rapid development of deep reinforcement learning (DRL), which is a combination of deep learning and reinforcement learning (RL). However, the adoption of deep neural networks makes the decision-making process of DRL opaque and lacking transparency. Motivated by this, various interpretation methods for DRL have been proposed. However, those interpretation methods make an implicit assumption that they are performed in a reliable and secure environment. In practice, sequential agent-environment interactions expose the DRL algorithms and their corresponding downstream interpretations to extra adversarial risk. In spite of the prevalence of malicious attacks, there is no existing work studying the possibility and feasibility of malicious attacks against DRL interpretations. To bridge this gap, in this paper, we investigate the vulnerability of DRL interpretation methods. Specifically, we introduce the first study of the adversarial attacks against DRL interpretations, and propose an optimization framework based on which the optimal adversarial attack strategy can be derived. In addition, we study the vulnerability of DRL interpretation methods to the model poisoning attacks, and present an algorithmic framework to rigorously formulate the proposed model poisoning attack. Finally, we conduct both theoretical analysis and extensive experiments to validate the effectiveness of the proposed malicious attacks against DRL interpretations.

【Keywords】: Computing methodologies; Machine learning; Information systems; Information systems applications; Data mining; Security and privacy

52. Disentangled Self-Supervision in Sequential Recommenders.

【Paper Link】【Pages】:483-491

【Authors】: Jianxin Ma ; Chang Zhou ; Hongxia Yang ; Peng Cui ; Xin Wang ; Wenwu Zhu

【Abstract】: To learn a sequential recommender, the existing methods typically adopt the sequence-to-item (seq2item) training strategy, which supervises a sequence model with a user's next behavior as the label and the user's past behaviors as the input. The seq2item strategy, however, is myopic and usually produces non-diverse recommendation lists. In this paper, we study the problem of mining extra signals for supervision by looking at the longer-term future. There exist two challenges: i) reconstructing a future sequence containing many behaviors is exponentially harder than reconstructing a single next behavior, which can lead to difficulty in convergence, and ii) the sequence of all future behaviors can involve many intentions, not all of which may be predictable from the sequence of earlier behaviors. To address these challenges, we propose a sequence-to-sequence (seq2seq) training strategy based on latent self-supervision and disentanglement. Specifically, we perform self-supervision in the latent space, i.e., reconstructing the representation of the future sequence as a whole, instead of reconstructing the items in the future sequence individually. We also disentangle the intentions behind any given sequence of behaviors and construct seq2seq training samples using only pairs of sub-sequences that involve a shared intention. Results on real-world benchmarks and synthetic data demonstrate the improvement brought by seq2seq training.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Learning to rank; Ranking; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems; World Wide Web; Web searching and information discovery; Collaborative filtering

53. DETERRENT: Knowledge Guided Graph Attention Network for Detecting Healthcare Misinformation.

【Paper Link】【Pages】:492-502

【Authors】: Limeng Cui ; Haeseung Seo ; Maryam Tabar ; Fenglong Ma ; Suhang Wang ; Dongwon Lee

【Abstract】: To provide accurate and explainable misinformation detection, it is often useful to take an auxiliary source (e.g., social context and knowledge base) into consideration. Existing methods use social contexts such as users' engagements as complementary information to improve detection performance and derive explanations. However, due to the lack of sufficient professional knowledge, users seldom respond to healthcare information, which makes these methods less applicable. In this work, to address these shortcomings, we propose a novel knowledge guided graph attention network for detecting health misinformation better. Our proposal, named as DETERRENT, leverages on the additional information from medical knowledge graph by propagating information along with the network, incorporates a Medical Knowledge Graph and an Article-Entity Bipartite Graph, and propagates the node embeddings through Knowledge Paths. In addition, an attention mechanism is applied to calculate the importance of entities to each article, and the knowledge guided article embeddings are used for misinformation detection. DETERRENT addresses the limitation on social contexts in the healthcare domain and is capable of providing useful explanations for the results of detection. Empirical validation using two real-world datasets demonstrated the effectiveness of DETERRENT. Comparing with the best results of eight competing methods, in terms of F1 Score, DETERRENT outperforms all methods by at least 4.78% on the diabetes dataset and 12.79% on cancer dataset. We release the source code of DETERRENT at: https://github.com/cuilimeng/DETERRENT.

【Keywords】: Applied computing; Life and medical sciences; Health informatics; Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Security and privacy; Human and societal aspects of security and privacy; Social aspects of security and privacy

54. MultiImport: Inferring Node Importance in a Knowledge Graph from Multiple Input Signals.

【Paper Link】【Pages】:503-512

【Authors】: Namyong Park ; Andrey Kan ; Xin Luna Dong ; Tong Zhao ; Christos Faloutsos

【Abstract】: Given multiple input signals, how can we infer node importance in a knowledge graph (KG)? Node importance estimation is a crucial and challenging task that can benefit a lot of applications including recommendation, search, and query disambiguation. A key challenge towards this goal is how to effectively use input from different sources. On the one hand, a KG is a rich source of information, with multiple types of nodes and edges. On the other hand, there are external input signals, such as the number of votes or pageviews, which can directly tell us about the importance of entities in a KG. While several methods have been developed to tackle this problem, their use of these external signals has been limited as they are not designed to consider multiple signals simultaneously. In this paper, we develop an end-to-end model MultiImport, which infers latent node importance from multiple, potentially overlapping, input signals. MultiImport is a latent variable model that captures the relation between node importance and input signals, and effectively learns from multiple signals with potential conflicts. Also, MultiImport provides an effective estimator based on attentive graph neural networks. We ran experiments on real-world KGs to show that MultiImport handles several challenges involved with inferring node importance from multiple input signals, and consistently outperforms existing methods, achieving up to 23.7% higher [email protected] than the state-of-the-art method.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information systems applications; Data mining

55. Geodesic Forests.

【Paper Link】【Pages】:513-523

【Authors】: Meghana Madhyastha ; Gongkai Li ; Veronika Strnadová-Neeley ; James Browne ; Joshua T. Vogelstein ; Randal C. Burns ; Carey E. Priebe

【Abstract】: Together with the curse of dimensionality, nonlinear dependencies in large data sets persist as major challenges in data mining tasks. A reliable way to accurately preserve nonlinear structure is to compute geodesic distances between data points. Manifold learning methods, such as Isomap, aim to preserve geodesic distances in a Riemannian manifold. However, as manifold learning algorithms operate on the ambient dimensionality of the data, the essential step of geodesic distance computation is sensitive to high-dimensional noise. Therefore, a direct application of these algorithms to high-dimensional, noisy data often yields unsatisfactory results and does not accurately capture nonlinear structure.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Dimensionality reduction and manifold learning; Mathematics of computing; Probability and statistics; Probabilistic algorithms; Theory of computation; Randomness, geometry and discrete structures; Random projections and metric embeddings

56. Z-Miner: An Efficient Method for Mining Frequent Arrangements of Event Intervals.

【Paper Link】【Pages】:524-534

【Authors】: Zed Lee ; Tony Lindgren ; Panagiotis Papapetrou

【Abstract】: Mining frequent patterns of event intervals from a large collection of interval sequences is a problem that appears in several application domains. In this paper, we propose Z-Miner, a novel algorithm for solving this problem that addresses the deficiencies of existing competitors by employing two novel data structures: Z-Table, a hierarchical hash-based data structure for time-efficient candidate generation and support count, and Z-Arrangement, a data structure for efficient memory consumption. The proposed algorithm is able to handle patterns with repetitions of the same event label, allowing for gap and error tolerance constraints, as well as keeping track of the exact occurrences of the extracted frequent patterns. Our experimental evaluation on eight real-world and six synthetic datasets demonstrates the superiority of Z-Miner against four state-of-the-art competitors in terms of runtime efficiency and memory footprint.

【Keywords】: Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Temporal reasoning; Machine learning; Machine learning approaches

57. Imputing Various Incomplete Attributes via Distance Likelihood Maximization.

【Paper Link】【Pages】:535-545

【Authors】: Shaoxu Song ; Yu Sun

【Abstract】: Missing values may appear in various attributes. By "various", we mean (1) different types of values in a tuple, such as numerical or categorical, and (2) different attributes in a tuple, either the dependent or determinant attributes of regression models or dependency rules. Such varieties unfortunately prevent the imputation performing. In this paper, we propose to study the distance models that predict distances between tuples for missing data imputation. The immediate benefits are in two aspects, (1) uniformly processing and collaboratively utilizing the distances on all the attributes with various types of values, and (2) rather than enumerating the combinations of imputation candidates on various attributes, we can directly calculate the most likely distances of missing values to other complete ones and thus infer the corresponding imputations. Our major technical highlights include (1) introducing the imputation statistically explainable by the likelihood on distances, (2) proving NP-hardness of finding the maximum likelihood imputation, and (3) devising the approximation algorithm with performance guarantees. Experiments over datasets with real missing values demonstrate the superiority of the proposed method compared to 11 existing approaches in 5 categories. Our proposal improves not only the imputation accuracy but also the downstream applications such as classification, clustering and record matching.

【Keywords】: Information systems; Data management systems; Information integration; Data cleaning

58. WeightGrad: Geo-Distributed Data Analysis Using Quantization for Faster Convergence and Better Accuracy.

【Paper Link】【Pages】:546-556

【Authors】: Syeda Nahida Akter ; Muhammad Abdullah Adnan

【Abstract】: High network communication cost for synchronizing weights and gradients in geo-distributed data analysis consumes the benefits of advancement in computation and optimization techniques. Many quantization methods for weight, gradient or both have been proposed in recent years where weight-quantized model suffers from error related to weight dimension and gradient-quantized method suffers from slow convergence rate by a factor related to the gradient quantization resolution and gradient dimension. All these methods have been proved to be infeasible in terms of distributed training across multiple data centers all over the world. Moreover recent studies show that communicating over WANs can significantly degrade DNN model performance by upto 53.7x because of unstable and limited WAN bandwidth. Our goal in this work is to design a geo-distributed Deep-Learning system that (1) ensures efficient and faster communication over LAN and WAN and (2) maintain accuracy and convergence for complex DNNs with billions of parameters. In this paper, we introduce WeightGrad which acknowledges the limitations of quantization and provides loss-aware weight-quantized networks with quantized gradients for local convergence and for global convergence it dynamically eliminates insignificant communication between data centers while still guaranteeing the correctness of DNN models. Our experiments on our developed prototypes of WeightGrad running across 3 Amazon EC2 global regions and on a cluster that emulates EC2 WAN bandwidth show that WeightGrad provides 1.06% gain in top-1 accuracy, 5.36x speedup over baseline and 1.4x-2.26x over the four state-of-the-art distributed ML systems.

【Keywords】: Computing methodologies; Distributed computing methodologies; Distributed algorithms; Machine learning; Machine learning approaches; Neural networks

59. Feature-Induced Manifold Disambiguation for Multi-View Partial Multi-label Learning.

【Paper Link】【Pages】:557-565

【Authors】: Jing-Han Wu ; Xuan Wu ; Qing-Guo Chen ; Yao Hu ; Min-Ling Zhang

【Abstract】: In conventional multi-label learning framework, each example is assumed to be represented by a single feature vector and associated with multiple valid labels simultaneously. Nonetheless, real-world objects usually exhibit complicated properties which can have multi-view feature representation as well as false positive labeling. Accordingly, the problem of multi-view partial multi-label learning (MVPML) is studied in this paper, where each example is assumed to be presented by multiple feature vectors while associated with multiple candidate labels which are only partially valid. To learn from MVPML examples, a novel approach named FIMAN is proposed which makes use of multi-view feature representation to tackle the noisy labeling information. Firstly, an aggregate manifold structure over training examples is generated by adaptively fusing affinity information conveyed by feature vectors of different views. Then, candidate labels of each training example are disambiguated by preserving the feature-induced manifold structure in label space. Finally, the resulting predictive models are learned by fitting modeling outputs with the disambiguated labels. Extensive experiments on a number of real-world data sets show that FIMAN achieves highly competitive performance against state-of-the-art approaches in solving the MVPML problem.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Machine learning algorithms

60. MinSearch: An Efficient Algorithm for Similarity Search under Edit Distance.

【Paper Link】【Pages】:566-576

【Authors】: Haoyu Zhang ; Qin Zhang

【Abstract】: We study a fundamental problem in data analytics: similarity search under edit distance (or, edit similarity search for short). In this problem we try to build an index on a set of n strings S = s1, ..., sn, with the goal of answering the following two types of queries: (1) the threshold query: given a query string t and a threshold K, output all si ∈ S such that the edit distance between si and t is at most K; (2) the top-k query: given a query string t, output the k strings in S that are closest to t in terms of edit distance. Edit similarity search has numerous applications in bioinformatics, databases, data mining, information retrieval, etc., and has been studied extensively in the literature. In this paper we propose a novel algorithm for edit similarity search named MinSearch. The algorithm is randomized, and we can show mathematically that it outputs the correct answer with high probability for both types of queries. We have conducted an extensive set of experiments on MinSearch, and compared it with the best existing algorithms for edit similarity search. Our experiments show that MinSearch has a clear advantage (often in orders of magnitudes) against the best previous algorithms in query time, and MinSearch is always one of the best among all competitors in the indexing time and space usage. Finally, MinSearch achieves perfect accuracy for both types of queries on all datasets that we have tested.

【Keywords】: Information systems; Data management systems; Database management system engines; Database query processing; Query optimization; Information systems applications; Data mining; Nearest-neighbor search

61. Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods.

【Paper Link】【Pages】:577-587

【Authors】: Aritra Konar ; Nicholas D. Sidiropoulos

【Abstract】: Mining dense subgraphs is an important primitive across a spectrum of graph-mining tasks. In this work, we formally establish that two recurring characteristics of real-world graphs, namely heavy-tailed degree distributions and large clustering coefficients, imply the existence of substantially large vertex neighborhoods with high edge-density. This observation suggests a very simple approach for extracting large quasi-cliques: simply scan the vertex neighborhoods, compute the clustering coefficient of each vertex, and output the best such subgraph. The implementation of such a method requires counting the triangles in a graph, which is a well-studied problem in graph mining. When empirically tested across a number of real-world graphs, this approach reveals a surprise: vertex neighborhoods include maximal cliques of non-trivial sizes, and the density of the best neighborhood often compares favorably to subgraphs produced by dedicated algorithms for maximizing subgraph density. For graphs with small clustering coefficients, we demonstrate that small vertex neighborhoods can be refined using a local-search method to grow larger cliques and near-cliques. Our results indicate that contrary to worst-case theoretical results, mining cliques and quasi-cliques of non-trivial sizes from real-world graphs is often not a difficult problem, and provides motivation for further work geared towards a better explanation of these empirical successes.

【Keywords】: Mathematics of computing; Discrete mathematics; Combinatorics; Graph theory

62. Residual Correlation in Graph Neural Network Regression.

【Paper Link】【Pages】:588-598

【Authors】: Junteng Jia ; Austin R. Benson

【Abstract】: A graph neural network transforms features in each vertex's neighborhood into a vector representation of the vertex. Afterward, each vertex's representation is used independently for predicting its label. This standard pipeline implicitly assumes that vertex labels are conditionally independent given their neighborhood features. However, this is a strong assumption, and we show that it is far from true on many real-world graph datasets. Focusing on regression tasks, we find that this conditional independence assumption severely limits predictive power. This should not be that surprising, given that traditional graph-based semi-supervised learning methods such as label propagation work in the opposite fashion by explicitly modeling the correlation in predicted outcomes.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Learning in probabilistic graphical models; Maximum likelihood modeling; Neural networks; Mathematics of computing; Mathematical analysis; Numerical analysis; Computations on matrices; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Semi-supervised learning

63. Towards Fair Truth Discovery from Biased Crowdsourced Answers.

【Paper Link】【Pages】:599-607

【Authors】: Yanying Li ; Haipei Sun ; Wendy Hui Wang

【Abstract】: Crowdsourcing systems have gained considerable interest and adoption in recent years. One important research problem for crowdsourcing systems is truth discovery, which aims to aggregate noisy answers contributed by the workers to obtain the correct answer (truth) of each task. However, since the collected answers are highly prone to the workers' biases, aggregating these biased answers without proper treatment will unavoidably lead to discriminatory truth discovery results for particular race, gender and political groups. To address this challenge, in this paper, first, we define a new fairness notion named θ-disparity for truth discovery. Intuitively, θ-disparity bounds the difference in the probabilities that the truth of both protected and unprotected groups being predicted to be positive. Second, we design three fairness enhancing methods, namely Pre-TD, FairTD, and Post-TD, for truth discovery. Pre-TD is a pre-processing method that removes the bias in workers' answers before truth discovery. FairTD is an in-processing method that incorporates fairness into the truth discovery process. And Post-TD is a post-processing method that applies additional treatment on the discovered truth to make it satisfy θ-disparity. We perform an extensive set of experiments on both synthetic and real-world crowdsourcing datasets. Our results demonstrate that among the three fairness enhancing methods, FairTD produces the best accuracy with θ-disparity. In some settings, the accuracy of FairTD is even better than truth discovery without fairness, as it removes some low-quality answers as side effects.

【Keywords】: Human-centered computing; Collaborative and social computing; Collaborative and social computing theory, concepts and paradigms; Collaborative content creation; Information systems; World Wide Web; Web applications; Crowdsourcing

64. AutoShuffleNet: Learning Permutation Matrices via an Exact Lipschitz Continuous Penalty in Deep Convolutional Neural Networks.

【Paper Link】【Pages】:608-616

【Authors】: Jiancheng Lyu ; Shuai Zhang ; Yingyong Qi ; Jack Xin

【Abstract】: ShuffleNet is a state-of-the-art light weight convolutional neural network architecture. Its basic operations include group, channel-wise convolution and channel shuffling. However, channel shuffling is manually designed on empirical grounds. Mathematically, shuffling is a multiplication by a permutation matrix. In this paper, we propose to automate channel shuffling by learning permutation matrices in network training. We introduce an exact Lipschitz continuous non-convex penalty so that it can be incorporated in the stochastic gradient descent to approximate permutation at high precision. Exact permutations are obtained by simple rounding at the end of training and are used in inference. The resulting network, referred to as AutoShuffleNet, achieved improved classification accuracies on data from CIFAR-10, CIFAR-100 and ImageNet while preserving the inference costs of ShuffleNet. In addition, we found experimentally that the standard convex relaxation of permutation matrices into stochastic matrices leads to poor performance. We prove theoretically the exactness (error bounds) in recovering permutation matrices when our penalty function is zero (very small). We present examples of permutation optimization through graph matching and two-layer neural network models where the loss functions are calculated in closed analytical form. In the examples, convex relaxation failed to capture permutations whereas our penalty succeeded.

【Keywords】: Mathematics of computing; Discrete mathematics

65. MoFlow: An Invertible Flow Model for Generating Molecular Graphs.

【Paper Link】【Pages】:617-626

【Authors】: Chengxi Zang ; Fei Wang

【Abstract】: Generating molecular graphs with desired chemical properties driven by deep graph generative models provides a very promising way to accelerate drug discovery process. Such graph generative models usually consist of two steps: learning latent representations and generation of molecular graphs. However, to generate novel and chemically-valid molecular graphs from latent representations is very challenging because of the chemical constraints and combinatorial complexity of molecular graphs. In this paper, we propose MoFlow, a flow-based graph generative model to learn invertible mappings between molecular graphs and their latent representations. To generate molecular graphs, our MoFlow first generates bonds (edges) through a Glow based model, then generates atoms (nodes) given bonds by a novel graph conditional flow, and finally assembles them into a chemically valid molecular graph with a posthoc validity correction. Our MoFlow has merits including exact and tractable likelihood training, efficient one-pass embedding and generation, chemical validity guarantees, 100% reconstruction of training data, and good generalization ability. We validate our model by four tasks: molecular graph generation and reconstruction, visualization of the continuous latent space, property optimization, and constrained property optimization. Our MoFlow achieves state-of-the-art performance, which implies its potential efficiency and effectiveness to explore large chemical space for drug discovery.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Machine learning approaches; Learning in probabilistic graphical models; Maximum likelihood modeling; Neural networks; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms; Theory of computation; Randomness, geometry and discrete structures; Generating random combinatorial structures

66. Parallel DNN Inference Framework Leveraging a Compact RISC-V ISA-based Multi-core System.

【Paper Link】【Pages】:627-635

【Authors】: Yipeng Zhang ; Bo Du ; Lefei Zhang ; Jia Wu

【Abstract】: RISC-V is an open-source instruction set and now has been examined as a universal standard to unify the heterogeneous platforms. However, current research focuses primarily on the design and fabrication of general-purpose processors based on RISC-V, despite the fact that in the era of IoT (Internet of Things), the fusion of heterogeneous platforms should also take application-specific processors into account. Accordingly, this paper proposes a collaborative RISC-V multi-core system for Deep Neural Network (DNN) accelerators. To the best of our knowledge, this is the first time that a multi-core scheduling architecture for DNN acceleration is formulated and RISC-V is explored as the ISA of a multi-core system to bridge the gap between the memory and the DNN Processor in order to increase the entire system throughput. The experiment realizes a four-stage design of the RISC-V core, and further reveals that a multi-core design along with an appropriate scheduling algorithm can efficiently decrease the runtime and elevate the throughput. Moreover, the experiment also provides us with a constructive suggestion regarding the ideal proportion of the cores to Process Engines (PE), which provides us with significant assistance in building highly efficient AI System-on-Chips (SoCs) in resource-aware situations.

【Keywords】: Hardware; Integrated circuits; Reconfigurable logic and FPGAs; Hardware accelerators

67. Missing Value Imputation for Mixed Data via Gaussian Copula.

【Paper Link】【Pages】:636-646

【Authors】: Yuxuan Zhao ; Madeleine Udell

【Abstract】: Missing data imputation forms the first critical step of many data analysis pipelines. The challenge is greatest for mixed data sets, including real, Boolean, and ordinal data, where standard techniques for imputation fail basic sanity checks: for example, the imputed values may not follow the same distributions as the data. This paper proposes a new semiparametric algorithm to impute missing values, with no tuning parameters. The algorithm models mixed data as a Gaussian copula. This model can fit arbitrary marginals for continuous variables and can handle ordinal variables with many levels, including Boolean variables as a special case. We develop an efficient approximate EM algorithm to estimate copula parameters from incomplete mixed data. The resulting model reveals the statistical associations among variables. Experimental results on several synthetic and real datasets show the superiority of our proposed algorithm to state-of-the-art imputation algorithms for mixed data.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Learning latent representations; Mathematics of computing; Probability and statistics; Probabilistic inference problems; Maximum likelihood estimation; Probabilistic reasoning algorithms; Expectation maximization

68. HiTANet: Hierarchical Time-Aware Attention Networks for Risk Prediction on Electronic Health Records.

【Paper Link】【Pages】:647-656

【Authors】: Junyu Luo ; Muchao Ye ; Cao Xiao ; Fenglong Ma

【Abstract】: Deep learning methods especially recurrent neural network based models have demonstrated early success in disease risk prediction on longitudinal patient data. Existing works follow a strong assumption to implicitly assume the stationary disease progression during each time period, and thus, take a homogeneous way to decay the information from previous time steps for all patients. However,in reality, disease progression is non-stationary. Besides, the key time steps for a target disease vary among patients. To leverage time information for risk prediction in a more reasonable way, we propose a new hierarchical time-aware attention network, named HiTANet, which imitates the decision making process of doctors inrisk prediction. Particularly, HiTANet models time information in local and global stages. The local evaluation stage has a time aware Transformer that embeds time information into visit-level embed-ding and generates local attention weight for each visit. The global synthesis stage further adopts a time-aware key-query attention mechanism to assign global weights to different time steps. Finally, the two types of attention weights are dynamically combined to generate the patient representations for further risk prediction. We evaluate HiTANet on three real-world datasets. Compared with the best results among twelve competing baselines, HiTANet achieves over 7% in terms of F1 score on all datasets, which demonstrates the effectiveness of the proposed model and the necessity of modeling time information in risk prediction task.

【Keywords】: Applied computing; Life and medical sciences; Consumer health; Health care information systems; Health informatics; Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by classification; Machine learning approaches; Neural networks

69. Personalized PageRank to a Target Node, Revisited.

【Paper Link】【Pages】:657-667

【Authors】: Hanzhi Wang ; Zhewei Wei ; Junhao Gan ; Sibo Wang ; Zengfeng Huang

【Abstract】: Personalized PageRank (PPR) is a widely used node proximity measure in graph mining and network analysis. Given a source node s and a target node t, the PPR value π(s,t) represents the probability that a random walk from s terminates at t, and thus indicates the bidirectional importance between s and t. The majority of the existing work focuses on the single-source queries, which asks for the PPR value of a given source node s and every node t ∈ V. However, the single-source query only reflects the importance of each node t with respect to s. In this paper, we consider the single-target PPR query, which measures the opposite direction of importance for PPR. Given a target node t, the single-target PPR query asks for the PPR value of every node $s\in V$ to a given target node t. We propose RBS, a novel algorithm that answers approximate single-target queries with optimal computational complexity. We show that RBS improves three concrete applications: heavy hitters PPR query, single-source SimRank computation, and scalable graph neural networks. We conduct experiments to demonstrate that RBS outperforms the state-of-the-art algorithms in terms of both efficiency and precision on real-world benchmark datasets.

【Keywords】: Information systems; Information systems applications; Data mining; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms

70. Edge-consensus Learning: Deep Learning on P2P Networks with Nonhomogeneous Data.

【Paper Link】【Pages】:668-678

【Authors】: Kenta Niwa ; Noboru Harada ; Guoqiang Zhang ; W. Bastiaan Kleijn

【Abstract】: An effective Deep Neural Network (DNN) optimization algorithm that can use decentralized data sets over a peer-to-peer (P2P) network is proposed. In applications such as medical data analysis, the aggregation of data in one location may not be possible due to privacy issues. Hence, we formulate an algorithm to reach a global DNN model that does not require transmission of data among nodes. An existing solution for this issue is gossip stochastic gradient descend (SGD), which updates by averaging node models over a P2P network. However, in practical situations where the data are statistically heterogeneous across the nodes and/or where communication is asynchronous, gossip SGD often gets trapped in local minimum since the model gradients are noticeably different. To overcome this issue, we solve a linearly constrained DNN cost minimization problem, which results in variable update rules that restrict differences among all node models. Our approach can be based on the Primal-Dual Method of Multipliers (PDMM) or the Alternating Direction Method of Multiplier (ADMM), but the cost function is linearized to be suitable for deep learning. It facilitates asynchronous communication. The results of our numerical experiments using CIFAR-10 indicate that the proposed algorithms converge to a global recognition model even though statistically heterogeneous data sets are placed on the nodes.

【Keywords】: Theory of computation; Design and analysis of algorithms; Mathematical optimization; Continuous optimization; Nonconvex optimization; Parallel algorithms; Massively parallel algorithms

71. Deep Learning of High-Order Interactions for Protein Interface Prediction.

【Paper Link】【Pages】:679-687

【Authors】: Yi Liu ; Hao Yuan ; Lei Cai ; Shuiwang Ji

【Abstract】: Protein interactions are important in a broad range of biological processes. Traditionally, computational methods have been developed to automatically predict protein interface from hand-crafted features. Recent approaches employ deep neural networks and predict the interaction of each amino acid pair independently. However, these methods do not incorporate the important sequential information from amino acid chains and the high-order pairwise interactions. Intuitively, the prediction of an amino acid pair should depend on both their features and the information of other amino acid pairs. In this work, we propose to formulate the protein interface prediction as a 2D dense prediction problem. In addition, we propose a novel deep model to incorporate the sequential information and high-order pairwise interactions to perform interface predictions. We represent proteins as graphs and employ graph neural networks to learn node features. Then we propose the sequential modeling method to incorporate the sequential information and reorder the feature matrix. Next, we incorporate high-order pairwise interactions to generate a 3D tensor containing different pairwise interactions. Finally, we employ convolutional neural networks to perform 2D dense predictions. Experimental results on multiple benchmarks demonstrate that our proposed method can consistently improve the protein interface prediction performance.

【Keywords】: Applied computing; Life and medical sciences; Bioinformatics

72. MAMO: Memory-Augmented Meta-Optimization for Cold-start Recommendation.

【Paper Link】【Pages】:688-697

【Authors】: Manqing Dong ; Feng Yuan ; Lina Yao ; Xiwei Xu ; Liming Zhu

【Abstract】: A common challenge for most current recommender systems is the cold-start problem. Due to the lack of user-item interactions, the fine-tuned recommender systems are unable to handle situations with new users or new items. Recently, some works introduce the meta-optimization idea into the recommendation scenarios, i.e. predicting the user preference by only a few of past interacted items. The core idea is learning a global sharing initialization parameter for all users and then learning the local parameters for each user separately. However, most meta-learning based recommendation approaches adopt model-agnostic meta-learning for parameter initialization, where the global sharing parameter may lead the model into local optima for some users. In this paper, we design two memory matrices that can store task-specific memories and feature-specific memories. Specifically, the feature-specific memories are used to guide the model with personalized parameter initialization, while the task-specific memories are used to guide the model fast predicting the user preference. And we adopt a meta-optimization approach for optimizing the proposed method. We test the model on two widely used recommendation datasets and consider four cold-start situations. The experimental results show the effectiveness of the proposed methods.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

【Paper Link】【Pages】:698-708

【Authors】: Lu Chen ; Chengfei Liu ; Rui Zhou ; Jiajie Xu ; Jeffrey Xu Yu ; Jianxin Li

【Abstract】: Geo-social group search aims to find a group of people proximate to a location while socially related. One of the driven applications for geo-social group search is organizing an impromptu activity. This is because the social cohesiveness of a found geo-social group ensures a good communication atmosphere for the activity and the spatial closeness of the geo-social group reduces the preparation time for the activity. Most existing works treat geo-social group search as a problem that finds a group satisfying a single social constraint while optimizing the spatial proximity. However, since different impromptu activities have diverse demands on attendees, e.g. an activity could require (or prefer) the attendees to have skills (or favorites) related to the activity, the existing works cannot find this kind of geo-social groups effectively. In this paper, we propose a novel geo-social group model, equipped with elegant keyword constraints, to fill this gap. We propose a novel search framework which first significantly narrows down the search space with theoretical guarantees and then efficiently finds the optimum result. To evaluate the effectiveness, we conduct experiments on real datasets, demonstrating the superiority of our proposed model. We conduct extensive experiments on large semi-synthetic datasets for justifying the efficiency of the proposed search algorithms.

【Keywords】: Information systems; Information systems applications; Data mining; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms

74. Representing Temporal Attributes for Schema Matching.

【Paper Link】【Pages】:709-719

【Authors】: Yinan Mei ; Shaoxu Song ; Yunsu Lee ; Jungho Park ; Soo-Hyung Kim ; Sungmin Yi

【Abstract】: Temporal data are prevalent, where one or several time attributes present. It is challenging to identify the temporal attributes from heterogeneous sources. The reason is that the same attribute could contain distinct values in different time spans, whereas different attributes may have highly similar timestamps and alike values. Existing studies on schema matching seldom explore the temporal information for matching attributes. In this paper, we argue to order the values in an attribute A by some time attribute T as a time series. To learn deep temporal features in the attribute pair (T, A), we devise an auto-encoder to embed the transitions of values in the time series into a vector. The temporal attribute matching (TAM) is thus to evaluate matching distance of two temporal attribute pairs by comparing their transition vectors. We show that computing the optimal matching distance is NP-hard, and present an approximation algorithm. Experiments on real datasets demonstrate the superiority of our proposal in matching temporal attributes compared to the generic schema matching approaches.

【Keywords】: Information systems; Data management systems; Information integration; Mediators and data integration

【Paper Link】【Pages】:720-730

【Authors】: Kazuki Nakajima ; Kazuyuki Shudo

【Abstract】: Accurately analyzing graph properties of social networks is a challenging task because of access limitations to the graph data. To address this challenge, several algorithms to obtain unbiased estimates of properties from few samples via a random walk have been studied. However, existing algorithms do not consider private nodes who hide their neighbors in real social networks, leading to some practical problems. Here we design random walk-based algorithms to accurately estimate properties without any problems caused by private nodes. First, we design a random walk-based sampling algorithm that comprises the neighbor selection to obtain samples having the Markov property and the calculation of weights for each sample to correct the sampling bias. Further, for two graph property estimators, we propose the weighting methods to reduce not only the sampling bias but also estimation errors due to private nodes. The proposed algorithms improve the estimation accuracy of the existing algorithms by up to 92.6% on real-world datasets.

【Keywords】: General and reference; Cross-computing tools and techniques; Estimation; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms

76. ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction.

【Paper Link】【Pages】:731-752

【Authors】: Zhongkai Hao ; Chengqiang Lu ; Zhenya Huang ; Hao Wang ; Zheyuan Hu ; Qi Liu ; Enhong Chen ; Cheekong Lee

【Abstract】: Molecular property prediction (e.g., energy) is an essential problem in chemistry and biology. Unfortunately, many supervised learning methods usually suffer from the problem of scarce labeled molecules in the chemical space, where such property labels are generally obtained by Density Functional Theory (DFT) calculation which is extremely computational costly. An effective solution is to incorporate the unlabeled molecules in a semi-supervised fashion. However, learning semi-supervised representation for large amounts of molecules is challenging, including the joint representation issue of both molecular essence and structure, the conflict between representation and property leaning. Here we propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules. Specifically, ASGN adopts a teacher-student framework. In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution. Then in the student model, we target at property prediction task to deal with the learning loss conflict. At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning. We conduct extensive experiments on several public datasets. Experimental results show the remarkable performance of our ASGN framework.

【Keywords】: Computer systems organization; Architectures; Other architectures; Molecular computing; Neural networks; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Active learning; Semi-supervised learning

77. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks.

【Paper Link】【Pages】:753-763

【Authors】: Zonghan Wu ; Shirui Pan ; Guodong Long ; Jing Jiang ; Xiaojun Chang ; Chengqi Zhang

【Abstract】: Modeling multivariate time series has long been a subject that has attracted researchers from a diverse range of fields including economics, finance, and traffic. A basic assumption behind multivariate time series forecasting is that its variables depend on one another but, upon looking closely, it is fair to say that existing methods fail to fully exploit latent spatial dependencies between pairs of variables. In recent years, meanwhile, graph neural networks (GNNs) have shown high capability in handling relational dependencies. GNNs require well-defined graph structures for information propagation which means they cannot be applied directly for multivariate time series where the dependencies are not known in advance. In this paper, we propose a general graph neural network framework designed specifically for multivariate time series data. Our approach automatically extracts the uni-directed relations among variables through a graph learning module, into which external knowledge like variable attributes can be easily integrated. A novel mix-hop propagation layer and a dilated inception layer are further proposed to capture the spatial and temporal dependencies within the time series. The graph learning, graph convolution, and temporal convolution modules are jointly learned in an end-to-end framework. Experimental results show that our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets and achieves on-par performance with other approaches on two traffic datasets which provide extra structural information.

【Keywords】: Computing methodologies; Artificial intelligence; Machine learning; Machine learning approaches; Neural networks

【Paper Link】【Pages】:764-773

【Authors】: Corrado Monti ; Gianmarco De Francisci Morales ; Francesco Bonchi

【Abstract】: Opinion dynamics the research field dealing with how people's opinions form and evolve in a social context? traditionally uses agent-based models to validate the implications of sociological theories. These models encode the causal mechanism that drives the opinion formation process, and have the advantage of being easy to interpret. However, as they do not exploit the availability of data, their predictive power is limited. Moreover, parameter calibration and model selection are manual and difficult tasks.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Learning in probabilistic graphical models; Modeling and simulation; Simulation types and techniques; Agent / discrete models; Human-centered computing; Collaborative and social computing; Collaborative and social computing design and evaluation methods; Social network analysis

79. Enterprise Cooperation and Competition Analysis with a Sign-Oriented Preference Network.

【Paper Link】【Pages】:774-782

【Authors】: Le Dai ; Yu Yin ; Chuan Qin ; Tong Xu ; Xiangnan He ; Enhong Chen ; Hui Xiong

【Abstract】: The development of effective cooperative and competitive strategies has been recognized as the key to the success of many companies in a globalized world. Therefore, many efforts have been made on the analysis of cooperation and competition among companies. However, existing studies either rely on labor intensive empirical analysis with specific cases or do not consider the heterogeneous company information when quantitatively measuring company relationships in a company network. More importantly, it is not clear how to generate a unified representation for cooperative and competitive strategies in a data driven way. To this end, in this paper, we provide a large-scale data driven analysis on the cooperative and competitive relationships among companies in a Sign-oriented Preference Network (SOPN). Specifically, we first exploit a Relational Graph Convolutional Network (RGCN) for generating a deep representation of the heterogeneous company features and a company relation network. Then, based on the representation, we generate two sets of preference vectors for each company by utilizing the attention mechanism to model the importance of different relations, representing their cooperative and competitive strategies respectively. Also, we design a sign constraint to model the dependency between cooperation and competition relations. Finally, we conduct extensive experiments on a real-world dataset, and verify the effectiveness of our approach. Moreover, we provide a case study to show some interesting patterns and their potential business value.

【Keywords】: Applied computing; Enterprise computing; Business process management; Business intelligence; Enterprise modeling; Computing methodologies; Machine learning; Machine learning approaches; Neural networks

80. BLOB: A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals.

【Paper Link】【Pages】:783-793

【Authors】: Otmane Sakhi ; Stephen Bonner ; David Rohde ; Flavian Vasile

【Abstract】: A common task for recommender systems is to build a profile of the interests of a user from items in their browsing history and later to recommend items to the user from the same catalog. The users' behavior consists of two parts: the sequence of items that they viewed without intervention (the organic part) and the sequences of items recommended to them and their outcome (the bandit part).

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Learning in probabilistic graphical models; Bayesian network models

81. AutoST: Efficient Neural Architecture Search for Spatio-Temporal Prediction.

【Paper Link】【Pages】:794-802

【Authors】: Ting Li ; Junbo Zhang ; Kainan Bao ; Yuxuan Liang ; Yexin Li ; Yu Zheng

【Abstract】: Spatio-temporal (ST) prediction (e.g. crowd flow prediction) is of great importance in a wide range of smart city applications from urban planning, intelligent transportation and public safety. Recently, many deep neural network models have been proposed to make accurate prediction. However, manually designing neural networks requires amount of expert efforts and ST domain knowledge. How to automatically construct a general neural network for diverse spatio-temporal predication tasks in cities? In this paper, we study Neural Architecture Search (NAS) for spatio-temporal prediction and propose an efficient spatio-temporal neural architecture search method, entitled AutoST. To our best knowledge, the search space is an important human prior to the success of NAS in different applications while current NAS models concentrated on optimizing search strategy in the fixed search space. Thus, we design a novel search space tailored for ST-domain which consists of two categories of components: (i) optional convolution operations at each layer to automatically extract multi-range spatio-temporal dependencies; (ii) learnable skip connections among layers to dynamically fuse low- and high-level ST-features. We conduct extensive experiments on four real-word spatio-temporal prediction tasks, including taxi flow and crowd flow, showing that the learned network architectures can significantly improve the performance of representative ST neural network models. Furthermore, our proposed efficient NAS approach searches 8-10x faster than state-of-the-art NAS approaches, demonstrating the efficiency and effectiveness of AutoST.

【Keywords】: Information systems; Information systems applications; Spatial-temporal systems

【Paper Link】【Pages】:803-812

【Authors】: Junyi Gao ; Cao Xiao ; Lucas M. Glass ; Jimeng Sun

【Abstract】: Clinical trials play important roles in drug development but often suffer from expensive, inaccurate and insufficient patient recruitment. The availability of massive electronic health records (EHR) data and trial eligibility criteria (EC) bring a new opportunity to data driven patient recruitment. One key task named patient-trial matching is to find qualified patients for clinical trials given structured EHR and unstructured EC text (both inclusion and exclusion criteria). How to match complex EC text with longitudinal patient EHRs? How to embed many-to-many relationships between patients and trials? How to explicitly handle the difference between inclusion and exclusion criteria? In this paper, we proposed CrOss-Modal PseudO-SiamEse network (COMPOSE) to address these challenges for patient-trial matching. One path of the network encodes EC using convolutional highway network. The other path processes EHR with multi-granularity memory network that encodes structured patient records into multiple levels based on medical ontology. Using the EC embedding as query, COMPOSE performs attentional record alignment and thus enables dynamic patient-trial matching. COMPOSE also introduces a composite loss term to maximize the similarity between patient records and inclusion criteria while minimize the similarity to the exclusion criteria. Experiment results show COMPOSE can reach 98.0% AUC on patient-criteria matching and 83.7% accuracy on patient-trial matching, which leads 24.3% improvement over the best baseline on real-world patient-trial matching tasks.

【Keywords】: Applied computing; Life and medical sciences; Health informatics; Computing methodologies; Machine learning; Machine learning approaches; Learning latent representations; Neural networks

83. Discovering Succinct Pattern Sets Expressing Co-Occurrence and Mutual Exclusivity.

【Paper Link】【Pages】:813-823

【Authors】: Jonas Fischer ; Jilles Vreeken

【Abstract】: Pattern mining is one of the core topics of data mining. We consider the problem of mining a succinct set of patterns that together explain the data in terms of mutual exclusivity and co-occurence. That is, we extend the traditional pattern languages beyond conjunctions, enabling us to capture more complex relationships, such as replacable sub-components or antagonists in biological pathways.

【Keywords】: Information systems; Information systems applications; Data mining

84. TIPRDC: Task-Independent Privacy-Respecting Data Crowdsourcing Framework for Deep Learning with Anonymized Intermediate Representations.

【Paper Link】【Pages】:824-832

【Authors】: Ang Li ; Yixiao Duan ; Huanrui Yang ; Yiran Chen ; Jianlei Yang

【Abstract】: The success of deep learning partially benefits from the availability of various large-scale datasets. These datasets are often crowdsourced from individual users and contain private information like gender, age, etc. The emerging privacy concerns from users on data sharing hinder the generation or use of crowdsourcing datasets and lead to hunger of training data for new deep learning applications. One naive solution is to pre-process the raw data to extract features at the user-side, and then only the extracted features will be sent to the data collector. Unfortunately, attackers can still exploit these extracted features to train an adversary classifier to infer private attributes. Some prior arts leveraged game theory to protect private attributes. However, these defenses are designed for known primary learning tasks, the extracted features work poorly for unknown learning tasks. To tackle the case where the learning task may be unknown or changing, we present TIPRDC, a task-independent privacy-respecting data crowdsourcing framework with anonymized intermediate representation. The goal of this framework is to learn a feature extractor that can hide the privacy information from the intermediate representations; while maximally retaining the original information embedded in the raw data for the data collector to accomplish unknown learning tasks. We design a hybrid training method to learn the anonymized intermediate representation: (1) an adversarial training process for hiding private information from features; (2) maximally retain original information using a neural-network-based mutual information estimator. We extensively evaluate TIPRDC and compare it with existing methods using two image datasets and one text dataset. Our results show that TIPRDC substantially outperforms other existing methods. Our work is the first task-independent privacy-respecting data crowdsourcing framework.

【Keywords】: Computing methodologies; Machine learning; Security and privacy

85. AutoGrow: Automatic Layer Growing in Deep Convolutional Networks.

【Paper Link】【Pages】:833-841

【Authors】: Wei Wen ; Feng Yan ; Yiran Chen ; Hai Li

【Abstract】: Depth is a key component of Deep Neural Networks (DNNs), however, designing depth is heuristic and requires many human efforts. We proposeAutoGrow to automate depth discovery in DNNs: starting from a shallow seed architecture,AutoGrow grows new layers if the growth improves the accuracy; otherwise, stops growing and thus discovers the depth. We propose robust growing and stopping policies to generalize to different network architectures and datasets. Our experiments show that by applying the same policy to different network architectures,AutoGrow can always discover near-optimal depth on various datasets of MNIST, FashionMNIST, SVHN, CIFAR10, CIFAR100 and ImageNet. For example, in terms of accuracy-computation trade-off,AutoGrow discovers a better depth combination in \resnets than human experts. OurAutoGrow is efficient. It discovers depth within similar time of training a single DNN. Our code is available at \urlhttps://github.com/wenwei202/autogrow.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Search methodologies; Discrete space search; Machine learning; Learning paradigms; Multi-task learning; Lifelong machine learning; Machine learning approaches; Neural networks

86. Curb-GAN: Conditional Urban Traffic Estimation through Spatio-Temporal Generative Adversarial Networks.

【Paper Link】【Pages】:842-852

【Authors】: Yingxue Zhang ; Yanhua Li ; Xun Zhou ; Xiangnan Kong ; Jun Luo

【Abstract】: Given an urban development plan and the historical traffic observations over the road network, the Conditional Urban Traffic Estimation problem aims to estimate the resulting traffic status prior to the deployment of the plan. This problem is of great importance to urban development and transportation management, yet is very challenging because the plan would change the local travel demands drastically and the new travel demand pattern might be unprecedented in the historical data. To tackle these challenges, we propose a novel Conditional Urban Traffic Generative Adversarial Network (Curb-GAN), which provides traffic estimations in consecutive time slots based on different (unprecedented) travel demands, thus enables urban planners to accurately evaluate urban plans before deploying them. The proposed Curb-GAN adopts and advances the conditional GAN structure through a few novel ideas: (1) dealing with various travel demands as the "conditions" and generating corresponding traffic estimations, (2) integrating dynamic convolutional layers to capture the local spatial auto-correlations along the underlying road networks, (3) employing self-attention mechanism to capture the temporal dependencies of the traffic across different time slots. Extensive experiments on two real-world spatio-temporal datasets demonstrate that our Curb-GAN outperforms major baseline methods in estimation accuracy under various conditions and can produce more meaningful estimations.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information systems applications; Spatial-temporal systems

87. Incremental Mobile User Profiling: Reinforcement Learning with Spatial Knowledge Graph for Modeling Event Streams.

【Paper Link】【Pages】:853-861

【Authors】: Pengyang Wang ; Kunpeng Liu ; Lu Jiang ; Xiaolin Li ; Yanjie Fu

【Abstract】: We study the integration of reinforcement learning and spatial knowledge graph for incremental mobile user profiling, which aims to map mobile users to dynamically-updated profile vectors by incremental learning from a mixed-user event stream. After exploring many profiling methods, we identify a new imitation based criteria to better evaluate and optimize profiling accuracy. Considering the objective of teaching an autonomous agent to imitate a mobile user to plan next-visit based on the user's profile, the user profile is the most accurate when the agent can perfectly mimic the activity patterns of the user. We propose to formulate the problem into a reinforcement learning task, where an agent is a next-visit planner, an action is a POI that a user will visit next, and the state of environment is a fused representation of a user and spatial entities (e.g., POIs, activity types, functional zones). An event that a user takes an action to visit a POI, will change the environment, resulting into a new state of user profiles and spatial entities, which helps the agent to predict next visit more accurately. After analyzing such interactions among events, users, and spatial entities, we identify (1)semantic connectivity among spatial entities, and, thus, introduce a spatial Knowledge Graph (KG) to characterize the semantics of user visits over connected locations, activities, and zones. Besides, we identify (2) mutual influence between users and the spatial KG, and, thus, develop a mutual-updating strategy between users and the spatial KG, mixed with temporal context, to quantify the state representation that evolves over time. Along these lines, we develop a reinforcement learning framework integrated with spatial KG. The proposed framework can achieve incremental learning in multi-user profiling given a mixed-user event stream. Finally, we apply our approach to human mobility activity prediction and present extensive experiments to demonstrate improved performances.

【Keywords】: Information systems; Information retrieval; Users and interactive retrieval; Information systems applications; Data mining

【Paper Link】【Pages】:862-872

【Authors】: Changchang Yin ; Ruoqi Liu ; Dongdong Zhang ; Ping Zhang

【Abstract】: Sepsis is a heterogeneous clinical syndrome that is the leading cause of mortality in hospital intensive care units (ICUs). Identification of sepsis subphenotypes may allow for more precise treatments and lead to more targeted clinical interventions. Recently, sepsis subtyping on electronic health records (EHRs) has attracted interest from healthcare researchers. However, most sepsis subtyping studies ignore the temporality of EHR data and suffer from missing values. In this paper, we propose a new sepsis subtyping framework to address the two issues. Our subtyping framework consists of a novel Time-Aware Multi-modal auto-Encoder (TAME) model which introduces time-aware attention mechanism and incorporates multi-modal inputs (e.g., demographics, diagnoses, medications, lab tests and vital signs) to impute missing values, a dynamic time wrapping (DTW) method to measure patients' temporal similarity based on the imputed EHR data, and a weighted k-means algorithm to cluster patients. Comprehensive experiments on real-world datasets show TAME outperforms the baselines on imputation accuracy. After analyzing TAME-imputed EHR data, we identify four novel subphenotypes of sepsis patients, paving the way for improved personalization of sepsis management.

【Keywords】: Applied computing; Life and medical sciences; Health informatics; Mathematics of computing; Probability and statistics; Statistical paradigms; Time series analysis; Social and professional topics; Computing / technology policy; Medical information policy; Medical records

89. A Causal Look at Statistical Definitions of Discrimination.

【Paper Link】【Pages】:873-881

【Authors】: Elias Chaibub Neto

【Abstract】: Predictive parity and error rate balance are both widely accepted and adopted criteria for assessing fairness of classifiers. The realization that these equally reasonable criteria can lead to contradictory results has, nonetheless, generated a lot of debate/controversy, and has motivated the development of mathematical results establishing the impossibility of concomitantly satisfying predictive parity and error rate balance. Here, we investigate these fairness criteria from a causality perspective. By taking into consideration the data generation process giving rise to the observed data, as well as, the data generation process giving rise to the predictions, and assuming faithfulness, we prove that when the base rates differ across the protected groups and there is no perfect separation, then a standard classifier cannot achieve exact predictive parity. (Where, by standard classifier we mean a classifier trained in the usual way, without adopting pre-processing, in-processing, or post-processing fairness techniques.) This result holds in general, irrespective of the data generation process giving rise to the observed data. Furthermore, we show that the amount of disparate mistreatment for the positive predictive value metric is proportional to the difference between the base rates. For the error rate balance, as well as, the closely related equalized odds and equality of opportunity criteria, we show that there are, nonetheless, data generation processes that can still satisfy these criteria when the base rates differ by protected group, and we characterize the conditions under which these criteria hold. We illustrate our results using synthetic data, and with the re-analysis of the COMPAS data.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Mathematics of computing; Probability and statistics; Probabilistic representations; Causal networks

90. Targeted Data-driven Regularization for Out-of-Distribution Generalization.

【Paper Link】【Pages】:882-891

【Authors】: Mohammad Mahdi Kamani ; Sadegh Farhang ; Mehrdad Mahdavi ; James Z. Wang

【Abstract】: Due to biases introduced by large real-world datasets, deviations of deep learning models from their expected behavior on out-of-distribution test data are worrisome. Especially when data come from imbalanced or heavy-tailed label distributions, or minority groups of a sensitive feature. Classical approaches to address these biases are mostly data- or application-dependent, hence are burdensome to tune. Some meta-learning approaches, on the other hand, aim to learn hyperparameters in the learning process using different objective functions on training and validation data. However, these methods suffer from high computational complexity and are not scalable to large datasets. In this paper, we propose a unified data-driven regularization approach to learn a generalizable model from biased data. The proposed framework, named as targeted data-driven regularization (TDR), is model- and dataset-agnostic, and employs a target dataset that resembles the desired nature of test data in order to guide the learning process in a coupled manner. We cast the problem as a bilevel optimization and propose an efficient stochastic gradient descent based method to solve it. The framework can be utilized to alleviate various types of biases in real-world applications. We empirically show, on both synthetic and real-world datasets, the superior performance of TDR for resolving issues stem from these biases.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Cost-sensitive learning; Machine learning algorithms; Regularization; Machine learning approaches; Neural networks

91. Neural Dynamics on Complex Networks.

【Paper Link】【Pages】:892-902

【Authors】: Chengxi Zang ; Fei Wang

【Abstract】: Learning continuous-time dynamics on complex networks is crucial for understanding, predicting, and controlling complex systems in science and engineering. However, this task is very challenging due to the combinatorial complexities in the structures of high dimensional systems, their elusive continuous-time nonlinear dynamics, and their structural-dynamic dependencies. To address these challenges, we propose to combine Ordinary Differential Equation Systems (ODEs) and Graph Neural Networks (GNNs) to learn continuous-time dynamics on complex networks in a data-driven manner. We model differential equation systems by GNNs. Instead of mapping through a discrete number of neural layers in the forward process, we integrate GNN layers over continuous time numerically, leading to capturing continuous-time dynamics on graphs. Our model can be interpreted as a Continuous-time GNN model or a Graph Neural ODEs model. Our model can be utilized for continuous-time network dynamics prediction, structured sequence prediction (a regularly-sampled case), and node semi-supervised classification tasks (a one-snapshot case) in a unified framework. We validate our model by extensive experiments in the above three scenarios. The promising experimental results demonstrate our model's capability of jointly capturing the structure and dynamics of complex systems in a unified framework.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Modeling and simulation; Simulation theory; Network science; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms; Mathematical analysis; Differential equations; Ordinary differential equations

92. Grammatically Recognizing Images with Tree Convolution.

【Paper Link】【Pages】:903-912

【Authors】: Guangrun Wang ; Guangcong Wang ; Keze Wang ; Xiaodan Liang ; Liang Lin

【Abstract】: Similar to language, understanding an image can be considered as a hierarchical decomposition process from scenes to objects, parts, pixels, and the corresponding spatial/contextual relations. However, the existing convolutional networks concentrate on stacking redundant convolutional layers with a large number of kernels in a hierarchical organization to implicitly approximate this decomposition. This may limit the network to learn the semantic information conveyed in the internal feature maps that may reveal minor yet crucial differences for visual understanding. Attempting to tackle this problem, this paper proposes a simple yet effective tree convolution (TreeConv) operation for deep neural networks. Specifically, inspired by the image grammar techniques[73] that serve as a unified framework of object representation, learning, and recognition, our TreeConv designs a generative image grammar, i.e., tree generation rule, to parse the hierarchy of internal feature maps by generating tree structures and implicitly learning the specific visual grammars for each object category. Extensive experiments on a variety of benchmarks, i.e., classification (ImageNet / CIFAR), detection & segmentation (COCO 2017), and person re-identification (CUHK03), demonstrate the superiority of our TreeConv in both boosting the accuracy and reducing the computational cost. The source code will be available at: https://github.com/wanggrun/TreeConv.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision representations; Hierarchical representations; Machine learning; Machine learning approaches; Neural networks

93. Generic Outlier Detection in Multi-Armed Bandit.

【Paper Link】【Pages】:913-923

【Authors】: Yikun Ban ; Jingrui He

【Abstract】: In this paper, we study the problem of outlier arm detection in multi-armed bandit settings, which finds plenty of applications in many high-impact domains such as finance, healthcare, and online advertising. For this problem, a learner aims to identify the arms whose expected rewards deviate significantly from most of the other arms. Different from existing work, we target the generic outlier arms or outlier arm groups whose expected rewards can be larger, smaller, or even in between those of normal arms. To this end, we start by providing a comprehensive definition of such generic outlier arms and outlier arm groups. Then we propose a novel pulling algorithm named GOLD to identify such generic outlier arms. It builds a real-time neighborhood graph based on upper confidence bounds and catches the behavior pattern of outliers from normal arms. We also analyze its performance from various aspects. In the experiments conducted on both synthetic and real-world data sets, the proposed algorithm achieves 98% accuracy while saving 83% exploration cost on average compared with state-of-the-art techniques.

【Keywords】: Theory of computation; Design and analysis of algorithms; Online algorithms; Online learning algorithms; Theory and algorithms for application domains; Machine learning theory; Reinforcement learning; Sequential decision making

94. Robust Spammer Detection by Nash Reinforcement Learning.

【Paper Link】【Pages】:924-933

【Authors】: Yingtong Dou ; Guixiang Ma ; Philip S. Yu ; Sihong Xie

【Abstract】: Online reviews provide product evaluations for customers to make decisions. Unfortunately, the evaluations can be manipulated using fake reviews ("spams") by professional spammers, who have learned increasingly insidious and powerful spamming strategies by adapting to the deployed detectors. Spamming strategies are hard to capture, as they can be varying quickly along time, different across spammers and target products, and more critically, remained unknown in most cases. Furthermore, most existing detectors focus on detection accuracy, which is not well-aligned with the goal of maintaining the trustworthiness of product evaluations. To address the challenges, we formulate a minimax game where the spammers and spam detectors compete with each other on their practical goals that are not solely based on detection accuracy. Nash equilibria of the game lead to stable detectors that are agnostic to any mixed detection strategies. However, the game has no closed-form solution and is not differentiable to admit the typical gradient-based algorithms. We turn the game into two dependent Markov Decision Processes (MDPs) to allow efficient stochastic optimization based on multi-armed bandit and policy gradient. We experiment on three large review datasets using various state-of-the-art spamming and detection strategies and show that the optimization algorithm can reliably find an equilibrial detector that can robustly and effectively prevent spammers with any mixed spamming strategies from attaining their practical goal. Our code is available at https://github.com/YingtongDou/Nash-Detect.

【Keywords】: Information systems; World Wide Web; Web searching and information discovery; Web search engines; Spam detection; Security and privacy; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Reinforcement learning; Adversarial learning

95. Mining Persistent Activity in Continually Evolving Networks.

【Paper Link】【Pages】:934-944

【Authors】: Caleb Belth ; Xinyi Zheng ; Danai Koutra

【Abstract】: Frequent pattern mining is a key area of study that gives insights into the structure and dynamics of evolving networks, such as social or road networks. However, not only does a network evolve, but often the way that it evolves, itself evolves. Thus, knowing, in addition to patterns' frequencies, for how long and how regularly they have occurred-i.e., their persistence-can add to our understanding of evolving networks. In this work, we propose the problem of mining activity that persists through time in continually evolving networks-i.e., activity that repeatedly and consistently occurs. We extend the notion of temporal motifs to capture activity among specific nodes, in what we call activity snippets, which are small sequences of edge-updates that reoccur. We propose axioms and properties that a measure of persistence should satisfy, and develop such a persistence measure. We also propose PENminer, an efficient framework for mining activity snippets' Persistence in Evolving Networks, and design both offline and streaming algorithms. We apply PENminer to numerous real, large-scale evolving networks and edge streams, and find activity that is surprisingly regular over a long period of time, but too infrequent to be discovered by aggregate count alone, and bursts of activity exposed by their lack of persistence. Our findings with PENminer include neighborhoods in NYC where taxi traffic persisted through Hurricane Sandy, the opening of new bike-stations, characteristics of social network users, and more. Moreover, we use PENminer towards identifying anomalies in multiple networks, outperforming baselines at identifying subtle anomalies by 9.8-48% in AUC.

【Keywords】: Human-centered computing; Collaborative and social computing; Collaborative and social computing design and evaluation methods; Social network analysis; Information systems; Information systems applications; Data mining; Data stream mining; Theory of computation; Design and analysis of algorithms; Graph algorithms analysis; Dynamic graph algorithms

96. Towards Automated Neural Interaction Discovery for Click-Through Rate Prediction.

【Paper Link】【Pages】:945-955

【Authors】: Qingquan Song ; Dehua Cheng ; Hanning Zhou ; Jiyan Yang ; Yuandong Tian ; Xia Hu

【Abstract】: Click-Through Rate (CTR) prediction is one of the most important machine learning tasks in recommender systems, driving personalized experience for billions of consumers. Neural architecture search (NAS), as an emerging field, has demonstrated its capabilities in discovering powerful neural network architectures, which motivates us to explore its potential for CTR predictions. Due to 1) diverse unstructured feature interactions, 2) heterogeneous feature space, and 3) high data volume and intrinsic data randomness, it is challenging to construct, search, and compare different architectures effectively for recommendation models. To address these challenges, we propose an automated interaction architecture discovering framework for CTR prediction named AutoCTR. Via modularizing simple yet representative interactions as virtual building blocks and wiring them into a space of direct acyclic graphs, AutoCTR performs evolutionary architecture exploration with learning-to-rank guidance at the architecture level and achieves acceleration using low-fidelity model. Empirical analysis demonstrates the effectiveness of AutoCTR on different datasets comparing to human-crafted architectures. The discovered architecture also enjoys generalizability and transferability among different datasets.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems; Theory of computation; Design and analysis of algorithms; Mathematical optimization; Discrete optimization; Optimization with randomized search heuristics; Evolutionary algorithms

97. High-Dimensional Similarity Search with Quantum-Assisted Variational Autoencoder.

【Paper Link】【Pages】:956-964

【Authors】: Nicholas Gao ; Max Wilson ; Thomas Vandal ; Walter Vinci ; Ramakrishna R. Nemani ; Eleanor Gilbert Rieffel

【Abstract】: Recent progress in quantum algorithms and hardware indicates the potential importance of quantum computing in the near future. However, finding suitable application areas remains an active area of research. Quantum machine learning is touted as a potential approach to demonstrate quantum advantage within both the gate-model and the adiabatic schemes. For instance, the Quantum-assisted Variational Autoencoder (QVAE) has been proposed as a quantum enhancement to the discrete VAE. We extend on previous work and study the real-world applicability of a QVAE by presenting a proof-of-concept for similarity search in large-scale high-dimensional datasets. While exact and fast similarity search algorithms are available for low dimensional datasets, scaling to high-dimensional data is non-trivial. We show how to construct a space-efficient search index based on the latent space representation of a QVAE. Our experiments show a correlation between the Hamming distance in the embedded space and the Euclidean distance in the original space on the Moderate Resolution Imaging Spectroradiometer (MODIS) dataset.Further, we find real-world speedups compared to linear search and demonstrate memory-efficient scaling to half a billion data points.

【Keywords】: Applied computing; Physical sciences and engineering; Earth and atmospheric sciences; Hardware; Emerging technologies; Quantum technologies; Quantum computation; Theory of computation; Design and analysis of algorithms; Streaming, sublinear and near linear time algorithms; Nearest neighbor algorithms

98. Off-policy Bandits with Deficient Support.

【Paper Link】【Pages】:965-975

【Authors】: Noveen Sachdeva ; Yi Su ; Thorsten Joachims

【Abstract】: Learning effective contextual-bandit policies from past actions of a deployed system is highly desirable in many settings (e.g. voice assistants, recommendation, search), since it enables the reuse of large amounts of log data. State-of-the-art methods for such off-policy learning, however, are based on inverse propensity score (IPS) weighting. A key theoretical requirement of IPS weighting is that the policy that logged the data has "full support", which typically translates into requiring non-zero probability for any action in any context. Unfortunately, many real-world systems produce support deficient data, especially when the action space is large, and we show how existing methods can fail catastrophically. To overcome this gap between theory and applications, we identify three approaches that provide various guarantees for IPS-based learning despite the inherent limitations of support-deficient data: restricting the action space, reward extrapolation, and restricting the policy space. We systematically analyze the statistical and computational properties of these three approaches, and we empirically evaluate their effectiveness. In addition to providing the first systematic analysis of support-deficiency in contextual-bandit learning, we conclude with recommendations that provide practical guidance.

【Keywords】: Computing methodologies; Machine learning; Learning settings; Learning from implicit feedback; Information systems; Information retrieval; Retrieval models and ranking

99. Adaptive Graph Encoder for Attributed Graph Embedding.

【Paper Link】【Pages】:976-985

【Authors】: Ganqu Cui ; Jie Zhou ; Cheng Yang ; Zhiyuan Liu

【Abstract】: Attributed graph embedding, which learns vector representations from graph topology and node features, is a challenging task for graph analysis. Recently, methods based on graph convolutional networks (GCNs) have made great progress on this task. However,existing GCN-based methods have three major drawbacks. Firstly,our experiments indicate that the entanglement of graph convolutional filters and weight matrices will harm both the performance and robustness. Secondly, we show that graph convolutional filters in these methods reveal to be special cases of generalized Laplacian smoothing filters, but they do not preserve optimal low-pass characteristics. Finally, the training objectives of existing algorithms are usually recovering the adjacency matrix or feature matrix, which are not always consistent with real-world applications. To address these issues, we propose Adaptive Graph Encoder (AGE), a novel attributed graph embedding framework. AGE consists of two modules: (1) To better alleviate the high-frequency noises in the node features, AGE first applies a carefully-designed Laplacian smoothing filter. (2) AGE employs an adaptive encoder that iteratively strengthens the filtered features for better node embeddings. We conduct experiments using four public benchmark datasets to validate AGE on node clustering and link prediction tasks. Experimental results show that AGE consistently outperforms state-of-the-artgraph embedding methods considerably on these tasks.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Information systems; Information systems applications; Data mining; Clustering

100. NetTrans: Neural Cross-Network Transformation.

【Paper Link】【Pages】:986-996

【Authors】: Si Zhang ; Hanghang Tong ; Yinglong Xia ; Liang Xiong ; Jiejun Xu

【Abstract】: Finding node associations across different networks is the cornerstone behind a wealth of high-impact data mining applications. Traditional approaches are often, explicitly or implicitly, built upon the linearity and/or consistency assumptions. On the other hand, the recent network embedding based methods promise a natural way to handle the non-linearity, yet they could suffer from the disparate node embedding space of different networks. In this paper, we address these limitations and tackle cross-network node associations from a new angle, i.e., cross-network transformation. We ask a generic question: Given two different networks, how can we transform one network to another? We propose an end-to-end model that learns a composition of nonlinear operations so that one network can be transformed to another in a hierarchical manner. The proposed model bears three distinctive advantages. First (composite transformation), it goes beyond the linearity/consistency assumptions and performs the cross-network transformation through a composition of nonlinear computations. Second (representation power), it can learn the transformation of both network structures and node attributes at different resolutions while identifying the cross-network node associations. Third (generality), it can be applied to various tasks, including network alignment, recommendation, cross-layer dependency inference. Extensive experiments on different tasks validate and verify the effectiveness of the proposed model.

【Keywords】: Information systems; Information systems applications; Data mining

101. Redundancy-Free Computation for Graph Neural Networks.

【Paper Link】【Pages】:997-1005

【Authors】: Zhihao Jia ; Sina Lin ; Rex Ying ; Jiaxuan You ; Jure Leskovec ; Alex Aiken

【Abstract】: Graph Neural Networks (GNNs) are based on repeated aggregations of information from nodes' neighbors in a graph. However, because nodes share many neighbors, a naive implementation leads to repeated and inefficient aggregations and represents significant computational overhead. Here we propose Hierarchically Aggregated computation Graphs(HAGs), a new GNN representation technique that explicitly avoids redundancy by managing intermediate aggregation results hierarchically and eliminates repeated computations and unnecessary data transfers in GNN training and inference. HAGs perform the same computations and give the same models/accuracy as traditional GNNs, but in a much shorter time dueto optimized computations. To identify redundant computations,we introduce an accurate cost function and use a novel search algorithm to find optimized HAGs. Experiments show that the HAG representation significantly outperforms the standard GNN by increasing the end-to-end training throughput by up to 2.8× and reducing the aggregations and data transfers in GNN training byup to 6.3× and 5.6×, with only 0.1% memory overhead. Overall,our results represent an important advancement in speeding-up and scaling-up GNNs without any loss in model predictive performance.

【Keywords】: Computer systems organization; Architectures; Other architectures; Neural networks; Computing methodologies; Machine learning; Machine learning approaches; Neural networks

102. Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion.

【Paper Link】【Pages】:1006-1014

【Authors】: Kun Zhou ; Wayne Xin Zhao ; Shuqing Bian ; Yuanhang Zhou ; Ji-Rong Wen ; Jingsong Yu

【Abstract】: Conversational recommender systems (CRS) aim to recommend high-quality items to users through interactive conversations. Although several efforts have been made for CRS, two major issues still remain to be solved. First, the conversation data itself lacks of sufficient contextual information for accurately understanding users' preference. Second, there is a semantic gap between natural language expression and item-level user preference.

【Keywords】: Computing methodologies; Artificial intelligence; Natural language processing; Natural language generation; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

103. Sliding Sketches: A Framework using Time Zones for Data Stream Processing in Sliding Windows.

【Paper Link】【Pages】:1015-1025

【Authors】: Xiangyang Gou ; Long He ; Yinda Zhang ; Ke Wang ; Xilai Liu ; Tong Yang ; Yi Wang ; Bin Cui

【Abstract】: Data stream processing has become a hot issue in recent years due to the arrival of big data era. There are three fundamental stream processing tasks: membership query, frequency query and heavy hitter query. While most existing solutions address these queries in fixed windows, this paper focuses on a more challenging task: answering these queries in sliding windows. While most existing solutions address different kinds of queries by using different algorithms, this paper focuses on a generic framework. In this paper, we propose a generic framework, namely Sliding sketches, which can be applied to many existing solutions for the above three queries, and enable them to support queries in sliding windows. We apply our framework to five state-of-the-art sketches for the above three kinds of queries. Theoretical analysis and extensive experimental results show that after using our framework, the accuracy of existing sketches that do not support sliding windows becomes much higher than the corresponding best prior art. We released all the source code at Github.

【Keywords】: Information systems; Data management systems; Data structures; Information systems applications; Data mining; Data stream mining

104. STEAM: Self-Supervised Taxonomy Expansion with Mini-Paths.

【Paper Link】【Pages】:1026-1035

【Authors】: Yue Yu ; Yinghao Li ; Jiaming Shen ; Hao Feng ; Jimeng Sun ; Chao Zhang

【Abstract】: Taxonomies are important knowledge ontologies that underpin numerous applications on a daily basis, but many taxonomies used in practice suffer from the low coverage issue. We study the taxonomy expansion problem, which aims to expand existing taxonomies with new concept terms. We propose a self-supervised taxonomy expansion model named STEAM, which leverages natural supervision in the existing taxonomy for expansion. To generate natural self-supervision signals, STEAM samples mini-paths from the existing taxonomy, and formulates a node attachment prediction task between anchor mini-paths and query terms. To solve the node attachment task, it learns feature representations for query-anchor pairs from multiple views and performs multi-view co-training for prediction. Extensive experiments show that STEAM outperforms state-of-the-art methods for taxonomy expansion by 11.6% in accuracy and 7.0% in mean reciprocal rank on three public benchmarks. The code and data for STEAM can be found at https://github.com/yueyu1030/STEAM.

【Keywords】: Computing methodologies; Artificial intelligence; Natural language processing; Information extraction

105. Probabilistic Metric Learning with Adaptive Margin for Top-K Recommendation.

【Paper Link】【Pages】:1036-1044

【Authors】: Chen Ma ; Liheng Ma ; Yingxue Zhang ; Ruiming Tang ; Xue Liu ; Mark Coates

【Abstract】: Personalized recommender systems are playing an increasingly important role as more content and services become available and users struggle to identify what might interest them. Although matrix factorization and deep learning based methods have proved effective in user preference modeling, they violate the triangle inequality and fail to capture fine-grained preference information. To tackle this, we develop a distance-based recommendation model with several novel aspects: (i) each user and item are parameterized by Gaussian distributions to capture the learning uncertainties; (ii) an adaptive margin generation scheme is proposed to generate the margins regarding different training triplets; (iii) explicit user-user/item-item similarity modeling is incorporated in the objective function. The Wasserstein distance is employed to determine preferences because it obeys the triangle inequality and can measure the distance between probabilistic distributions. Via a comparison using five real-world datasets with state-of-the-art methods, the proposed model outperforms the best existing models by 4-22% in terms of [email protected] on Top-K recommendation.

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

106. Re-identification Attack to Privacy-Preserving Data Analysis with Noisy Sample-Mean.

【Paper Link】【Pages】:1045-1053

【Authors】: Du Su ; Hieu Tri Huynh ; Ziao Chen ; Yi Lu ; Wenmiao Lu

【Abstract】: In mining sensitive databases, access to sensitive class attributes of individual records is often prohibited by enforcing field-level security, while only aggregate class-specific statistics are allowed to be released. We consider a common privacy-preserving data analytics scenario where only a noisy sample mean of the class of interest can be queried. Such practice is widely found in medical research and business analytics settings. This paper studies the hazard of re-identification of entire class caused by revealing a noisy sample mean of the class. With a novel formulation of the re-identification attack as a generalized positive-unlabeled learning problem, we prove that the risk function of the re-identification problem is closely related to that of learning with complete data. We demonstrate that with a one-sided noisy sample mean, an effective re-identification attack can be devised with existing PU learning algorithms. We then propose a novel algorithm, growPU, that exploits the unique property of sample mean and consistently outperforms existing PU learning algorithms on the re-identification task. GrowPU achieves re-identification accuracy of 93.6% on the MNIST dataset and 88.1% on an online behavioral dataset with noiseless sample mean. With noise that guarantees 0.01-differential privacy, growPU achieves 91.9% on the MNIST dataset and 84.6% on the online behavioral dataset.

【Keywords】: Computing methodologies; Machine learning; Learning settings; Semi-supervised learning settings; Security and privacy; Human and societal aspects of security and privacy; Privacy protections

107. BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision.

【Paper Link】【Pages】:1054-1064

【Authors】: Chen Liang ; Yue Yu ; Haoming Jiang ; Siawpeng Er ; Ruijia Wang ; Tuo Zhao ; Chao Zhang

【Abstract】: We study the open-domain named entity recognition (NER) problem under distant supervision. The distant supervision, though does not require large amounts of manual annotations, yields highly incomplete and noisy distant labels via external knowledge bases. To address this challenge, we propose a new computational framework -- BOND, which leverages the power of pre-trained language models (e.g., BERT and RoBERTa) to improve the prediction performance of NER models. Specifically, we propose a two-stage training algorithm: In the first stage, we adapt the pre-trained language model to the NER tasks using the distant labels, which can significantly improve the recall and precision; In the second stage, we drop the distant labels, and propose a self-training approach to further improve the model performance. Thorough experiments on 5 benchmark datasets demonstrate the superiority of BOND over existing distantly supervised NER methods. The code and distantly labeled data have been released in https://github.com/cliang1453/BOND.

【Keywords】: Computing methodologies; Artificial intelligence; Natural language processing; Information extraction; Machine learning; Learning paradigms; Multi-task learning; Transfer learning

108. Graph Structural-topic Neural Network.

【Paper Link】【Pages】:1065-1073

【Authors】: Qingqing Long ; Yilun Jin ; Guojie Song ; Yi Li ; Wei Lin

【Abstract】: Graph Convolutional Networks (GCNs) achieved tremendous success by effectively gathering local features for nodes. However, commonly do GCNs focus more on node features but less on graph structures within the neighborhood, especially higher-order structural patterns. However, such local structural patterns are shown to be indicative of node properties in numerous fields. In addition, it is not just single patterns, but the distribution over all these patterns matter, because networks are complex and the neighborhood of each node consists of a mixture of various nodes and structural patterns. Correspondingly, in this paper, we propose Graph Structural topic Neural Network, abbreviated GraphSTONE 1, a GCN model that utilizes topic models of graphs, such that the structural topics capture indicative graph structures broadly from a probabilistic aspect rather than merely a few structures. Specifically, we build topic models upon graphs using anonymous walks and Graph Anchor LDA, an LDA variant that selects significant structural patterns first, so as to alleviate the complexity and generate structural topics efficiently. In addition, we design multi-view GCNs to unify node features and structural topic features and utilize structural topics to guide the aggregation. We evaluate our model through both quantitative and qualitative experiments, where our model exhibits promising performance, high efficiency, and clear interpretability.

【Keywords】: Information systems; Information systems applications; Data mining

109. Correlation Networks for Extreme Multi-label Text Classification.

【Paper Link】【Pages】:1074-1082

【Authors】: Guangxu Xun ; Kishlay Jha ; Jianhui Sun ; Aidong Zhang

【Abstract】: This paper develops the Correlation Networks (CorNet) architecture for the extreme multi-label text classification (XMTC) task, where the objective is to tag an input text sequence with the most relevant subset of labels from an extremely large label set. XMTC can be found in many real-world applications, such as document tagging and product annotation. Recently, deep learning models have achieved outstanding performances in XMTC tasks. However, these deep XMTC models ignore the useful correlation information among different labels. CorNet addresses this limitation by adding an extra CorNet module at the prediction layer of a deep model, which is able to learn label correlations, enhance raw label predictions with correlation knowledge and output augmented label predictions. We show that CorNet can be easily integrated with deep XMTC models and generalize effectively across different datasets. We further demonstrate that CorNet can bring significant improvements over the existing deep XMTC models in terms of both performance and convergence rate. The models and datasets are available at: https://github.com/XunGuangxu/CorNet.

【Keywords】: Computing methodologies; Artificial intelligence; Natural language processing; Information systems; Information retrieval; Retrieval models and ranking

110. Predicting Temporal Sets with Deep Neural Networks.

【Paper Link】【Pages】:1083-1091

【Authors】: Le Yu ; Leilei Sun ; Bowen Du ; Chuanren Liu ; Hui Xiong ; Weifeng Lv

【Abstract】: Given a sequence of sets, where each set contains an arbitrary number of elements, the problem of temporal sets prediction aims to predict the elements in the subsequent set. In practice, temporal sets prediction is much more complex than predictive modelling of temporal events and time series, and is still an open problem. Many possible existing methods, if adapted for the problem of temporal sets prediction, usually follow a two-step strategy by first projecting temporal sets into latent representations and then learning a predictive model with the latent representations. The two-step approach often leads to information loss and unsatisfactory prediction performance. In this paper, we propose an integrated solution based on the deep neural networks for temporal sets prediction. A unique perspective of our approach is to learn element relationship by constructing set-level co-occurrence graph and then perform graph convolutions on the dynamic relationship graphs. Moreover, we design an attention-based module to adaptively learn the temporal dependency of elements and sets. Finally, we provide a gated updating mechanism to find the hidden shared patterns in different sequences and fuse both static and dynamic information to improve the prediction performance. Experiments on real-world data sets demonstrate that our approach can achieve competitive performances even with a portion of the training data and can outperform existing methods with a significant margin.

【Keywords】: Information systems; Information systems applications; Data mining

111. FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents.

【Paper Link】【Pages】:1092-1102

【Authors】: Bill Yuchen Lin ; Ying Sheng ; Nguyen Vo ; Sandeep Tata

【Abstract】: Extracting structured data from HTML documents is a long-studied problem with a broad range of applications like augmenting knowledge bases, supporting faceted search, and providing domain-specific experiences for key verticals like shopping and movies. Previous approaches have either required a small number of examples for each target site or relied on carefully handcrafted heuristics built over visual renderings of websites. In this paper, we present a novel two-stage neural approach, named FreeDOM, which overcomes both these limitations. The first stage learns a representation for each DOM node in the page by combining both the text and markup information. The second stage captures longer range distance and semantic relatedness using a relational neural network. By combining these stages, FreeDOM is able to generalize to unseen sites after training on a small number of seed sites from that vertical without requiring expensive hand-crafted features over visual renderings of the page. Through experiments on a public dataset with 8 different verticals, we show that FreeDOM beats the previous state of the art by nearly 3.7 F1 points on average without requiring features over rendered pages or expensive hand-crafted features.

【Keywords】: Information systems; World Wide Web; Web mining; Data extraction and integration; Site wrapping

112. SEAL: Learning Heuristics for Community Detection with Generative Adversarial Networks.

【Paper Link】【Pages】:1103-1113

【Authors】: Yao Zhang ; Yun Xiong ; Yun Ye ; Tengfei Liu ; Weiqiang Wang ; Yangyong Zhu ; Philip S. Yu

【Abstract】: Community detection is an important task with many applications. However, there is no universal definition of communities, and a variety of algorithms have been proposed based on different assumptions. In this paper, we instead study the semi-supervised community detection problem where we are given several communities in a network as training data and aim to discover more communities. This setting makes it possible to learn concepts of communities from data without any prior knowledge. We propose the Seed Expansion with generative Adversarial Learning (SEAL), a framework for learning heuristics for community detection. SEAL contains a generative adversarial network, where the discriminator predicts whether a community is real or fake, and the generator generates communities that cheat the discriminator by implicitly fitting characteristics of real ones. The generator is a graph neural network specialized in sequential decision processes and gets trained by policy gradient. Moreover, a locator is proposed to avoid well-known free-rider effects by forming a dual learning task with the generator. Last but not least, a seed selector is utilized to provide promising seeds to the generator. We evaluate SEAL on 5 real-world networks and prove its effectiveness.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Reinforcement learning; Learning settings; Semi-supervised learning settings; Mathematics of computing; Discrete mathematics; Combinatorics; Combinatorial optimization

113. Matrix Profile XXI: A Geometric Approach to Time Series Chains Improves Robustness.

【Paper Link】【Pages】:1114-1122

【Authors】: Makoto Imamura ; Takaaki Nakamura ; Eamonn J. Keogh

【Abstract】: Time series motifs have become a fundamental tool to characterize repeated and conserved structure in systems, such as manufacturing telemetry, economic activities, and both human physiological and cultural behaviors. Recently time series chains were introduced as a generalization of time series motifs to represent evolving patterns in time series, in order to characterize the evolution of systems. Time series chains are a very promising primitive; however, we have observed that the original definition can be brittle in the sense that a small fluctuation in time series may "cut" a chain. Furthermore, the original definition does not provide a measure of the "significance" of a chain, and therefore cannot support top-k search for chains or provide a mechanism to discard spurious chains that might be discovered when searching large datasets. Inspired by observations from dynamical systems theory, this paper introduces two novel quality metrics for time series chains, directionality and graduality, to improve robustness and to enable top-K search. With extensive empirical work we show that our proposed definition is much more robust to the vagaries of real-word datasets and allows us to find unexpected regularities in time series datasets.

【Keywords】: Information systems; Information systems applications; Data mining

114. Retrospective Loss: Looking Back to Improve Training of Deep Neural Networks.

【Paper Link】【Pages】:1123-1131

【Authors】: Surgan Jandial ; Ayush Chopra ; Mausoom Sarkar ; Piyush Gupta ; Balaji Krishnamurthy ; Vineeth Balasubramanian

【Abstract】: Deep neural networks (DNNs) are powerful learning machines that have enabled breakthroughs in several domains. In this work, we introduce a new retrospective loss to improve the training of deep neural network models by utilizing the prior experience available in past model states during training. Minimizing the retrospective loss, along with the task-specific loss, pushes the parameter state at the current training step towards the optimal parameter state while pulling it away from the parameter state at a previous training step. Although a simple idea, we analyze the method as well as to conduct comprehensive sets of experiments across domains - images, speech, text, and graphs - to show that the proposed loss results in improved performance across input domains, tasks, and architectures.

【Keywords】: Computer systems organization; Architectures; Other architectures; Neural networks; Computing methodologies; Machine learning; Machine learning approaches; Neural networks

115. Average Sensitivity of Spectral Clustering.

【Paper Link】【Pages】:1132-1140

【Authors】: Pan Peng ; Yuichi Yoshida

【Abstract】: Spectral clustering is one of the most popular clustering methods for finding clusters in a graph, which has found many applications in data mining. However, the input graph in those applications may have many missing edges due to error in measurement, withholding for a privacy reason, or arbitrariness in data conversion. To make reliable and efficient decisions based on spectral clustering, we assess the stability of spectral clustering against edge perturbations in the input graph using the notion of average sensitivity, which is the expected size of the symmetric difference of the output clusters before and after we randomly remove edges. We first prove that the average sensitivity of spectral clustering is proportional to $łambda_2/łambda_3^2$, where $łambda_i$ is the i-th smallest eigenvalue of the (normalized) Laplacian. We also prove an analogous bound for k-way spectral clustering, which partitions the graph into k clusters. Then, we empirically confirm our theoretical bounds by conducting experiments on synthetic and real networks. Our results suggest that spectral clustering is stable against edge perturbations when there is a cluster structure in the input graph.

【Keywords】: General and reference; Cross-computing tools and techniques; Reliability; Information systems; Information systems applications; Data mining; Clustering

116. Semi-Supervised Multi-Label Learning from Crowds via Deep Sequential Generative Model.

【Paper Link】【Pages】:1141-1149

【Authors】: Wanli Shi ; Victor S. Sheng ; Xiang Li ; Bin Gu

【Abstract】: Multi-label classification (MLC) is pervasive in real-world applications. Conventional MLC algorithms assume that enough ground truth labels are available for training a classifier. While in reality, obtaining ground truth labels is expensive and time-consuming. In the field of data mining, it is more efficient to use crowdsourcing for label collection. In this setting, an MLC algorithm needs to deal with the noisiness of the crowdsourced labels as well as the remaining massive unlabeled data. In this paper, we propose a deep generative model to describe the label generation process for this semi-supervised multi-label learning problem. Although deep generative models are widely used for MLC problems, no previous work could address the noisy crowdsourced multi-labels and unlabeled data simultaneously. To address this challenging problem, our novel generative model incorporates latent variables to describe the labeled/unlabeled data as well as the labeling process of crowdsourcing. We introduce an efficient sequential inference model to approximate the model posterior and infer the ground truth labels. Our experimental results on various scales of datasets demonstrate the effectiveness of our proposed model. It performs favorably against four state-of-the-art deep generative models.

【Keywords】: Information systems; World Wide Web; Web applications; Crowdsourcing; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Semi-supervised learning

117. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training.

【Paper Link】【Pages】:1150-1160

【Authors】: Jiezhong Qiu ; Qibin Chen ; Yuxiao Dong ; Jing Zhang ; Hongxia Yang ; Ming Ding ; Kuansan Wang ; Jie Tang

【Abstract】: Graph representation learning has emerged as a powerful technique for addressing real-world problems. Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, and graph classification. However, prior arts on graph representation learning focus on domain specific problems and train a dedicated model for each graph dataset, which is usually non-transferable to out-of-domain data. Inspired by the recent advances in pre-training from natural language processing and computer vision, we design Graph Contrastive Coding (GCC) --- a self-supervised graph neural network pre-training framework --- to capture the universal network topological properties across multiple networks. We design GCC's pre-training task as subgraph instance discrimination in and across networks and leverage contrastive learning to empower graph neural networks to learn the intrinsic and transferable structural representations. We conduct extensive experiments on three graph learning tasks and ten graph datasets. The results show that GCC pre-trained on a collection of diverse datasets can achieve competitive or better performance to its task-specific and trained-from-scratch counterparts. This suggests that the pre-training and fine-tuning paradigm presents great potential for graph representation learning.

118. HGCN: A Heterogeneous Graph Convolutional Network-Based Deep Learning Model Toward Collective Classification.

【Paper Link】【Pages】:1161-1171

【Authors】: Zhihua Zhu ; Xinxin Fan ; Xiaokai Chu ; Jingping Bi

【Abstract】: Collective classification, as an important technique to study networked data, aims to exploit the label autocorrelation for a group of inter-connected entities with complex dependencies. As the emergence of various heterogeneous information networks (HINs), collective classification at present is confronting several severe challenges stemming from the heterogeneity of HINs, such as complex relational hierarchy, potential incompatible semantics and node-context relational semantics. To address the challenges, in this paper, we propose a novel heterogeneous graph convolutional network-based deep learning model, called HGCN, to collectively categorize the entities in HINs. Our work involves three primary contributions: i) HGCN not only learns the latent relations from the relation-sophisticated HINs via multi-layer heterogeneous convolutions, but also captures the semantic incompatibility among relations with properly-learned edge-level filter parameters; ii) to preserve the fine-grained relational semantics of different-type nodes, we propose a heterogeneous graph convolution to directly tackle the original HINs without any in advance transforming the network from heterogeneity to homogeneity; iii) we perform extensive experiments using four real-world datasets to validate our proposed HGCN, the multi-facet results show that our proposed HGCN can significantly improve the performance of collective classification compared with the state-of-the-art baseline methods.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Learning latent representations; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms

119. Handling Information Loss of Graph Neural Networks for Session-based Recommendation.

【Paper Link】【Pages】:1172-1180

【Authors】: Tianwen Chen ; Raymond Chi-Wing Wong

【Abstract】: Recently, graph neural networks (GNNs) have gained increasing popularity due to their convincing performance in various applications. Many previous studies also attempted to apply GNNs to session-based recommendation and obtained promising results. However, we spot that there are two information loss problems in these GNN-based methods for session-based recommendation, namely the lossy session encoding problem and the ineffective long-range dependency capturing problem. The first problem is the lossy session encoding problem. Some sequential information about item transitions is ignored because of the lossy encoding from sessions to graphs and the permutation-invariant aggregation during message passing. The second problem is the ineffective long-range dependency capturing problem. Some long-range dependencies within sessions cannot be captured due to the limited number of layers. To solve the first problem, we propose a lossless encoding scheme and an edge-order preserving aggregation layer based on GRU that is dedicatedly designed to process the losslessly encoded graphs. To solve the second problem, we propose a shortcut graph attention layer that effectively captures long-range dependencies by propagating information along shortcut connections. By combining the two kinds of layers, we are able to build a model that does not have the information loss problems and outperforms the state-of-the-art models on three public datasets.

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

120. Ultrafast Local Outlier Detection from a Data Stream with Stationary Region Skipping.

【Paper Link】【Pages】:1181-1191

【Authors】: Susik Yoon ; Jae-Gil Lee ; Byung Suk Lee

【Abstract】: Real-time outlier detection from a data stream is an increasingly important problem, especially as sensor-generated data streams abound in many applications owing to the prevalence of IoT and emergence of digital twins. Several density-based approaches have been proposed to address this problem, but arguably none of them is fast enough to meet the performance demand of real applications. This paper is founded upon a novel observation that, in many regions of the data space, data distributions hardly change across window slides. We propose a new algorithm, abbr. STARE, which identifies local regions in which data distributions hardly change and then skips updating the densities in those regions-a notion called stationary region skipping. Two techniques, data distribution approximation and cumulative net-change-based skip, are employed to efficiently and effectively implement the notion. Extensive experiments using synthetic and real data streams as well as a case study show that STARE is several orders of magnitude faster than the existing algorithms while achieving comparable or higher accuracy.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Anomaly detection; Information systems; Information systems applications; Data mining; Data stream mining

121. LayoutLM: Pre-training of Text and Layout for Document Image Understanding.

【Paper Link】【Pages】:1192-1200

【Authors】: Yiheng Xu ; Minghao Li ; Lei Cui ; Shaohan Huang ; Furu Wei ; Ming Zhou

【Abstract】: Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at https://aka.ms/layoutlm.

【Keywords】: Applied computing; Document management and text processing; Document capture; Document analysis; Computing methodologies; Artificial intelligence; Natural language processing; Information extraction; Machine learning; Learning paradigms; Multi-task learning; Transfer learning; Information systems; Information retrieval; Retrieval tasks and goals; Business intelligence

122. Block Model Guided Unsupervised Feature Selection.

【Paper Link】【Pages】:1201-1211

【Authors】: Zilong Bai ; Hoa Nguyen ; Ian Davidson

【Abstract】: Feature selection is a core area of data mining with a recent innovation of graph-driven unsupervised feature selection for linked data. In this setting we have a dataset Y consisting of n instances each with m features and a corresponding n node graph (whose adjacency matrix is A) with an edge indicating that the two instances are similar. Existing efforts for unsupervised feature selection on attributed networks have explored either directly regenerating the links by solving for f such that f(yi,yj) ~ Ai,j or finding community structure in A and using the features in Y to predict these communities. However, graph-driven unsupervised feature selection remains an understudied area with respect to exploring more complex guidance. Here we take the novel approach of first building a block model on the graph and then using the block model for feature selection. That is, we discover FMFT ~ A and then find a subset of features S that induces another graph to preserve both F and M. We call our approach Block Model Guided Unsupervised Feature Selection (BMGUFS). Experimental results show that our method outperforms the state of the art on several real-world public datasets in finding high-quality features for clustering.

【Keywords】: Computing methodologies; Machine learning; Machine learning algorithms; Feature selection

123. Data Compression as a Comprehensive Framework for Graph Drawing and Representation Learning.

【Paper Link】【Pages】:1212-1222

【Authors】: Claudia Plant ; Sonja Biedermann ; Christian Böhm

【Abstract】: Embedding a graph into feature space is a promising approach to understand its structure. Embedding into 2D or 3D space enables visualization; representation in higher-dimensional vector space (typically >100D) enables the application of data mining techniques. For the success of knowledge discovery it is essential that the distances between the embedded vertices truly reflect the structure of the graph. Our fundamental idea is to compress the adjacency matrix by predicting the existence of an edge from the Euclidean distance between the corresponding vertices in the embedding, and to use the achieved compression as a quality measure for the embedding. We call this quality measure Predictive Entropy (PE). PE uses a sigmoid function to define the probability which is monotonically decreasing with the Euclidean distance. We use this sigmoid probability to compress the adjacency matrix of the graph by an entropy coding. While PE could be used to assess the result of any graph drawing or representation learning method we particularly use it as objective function in our new method GEMPE (Graph Embedding by Minimizing the Predictive Entropy). We demonstrate in our experiments that GEMPE clearly outperforms comparison methods with respect to quality of the visual result, clustering and node-labeling accuracy on the discovered coordinates.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Learning latent representations

124. Joint Policy-Value Learning for Recommendation.

【Paper Link】【Pages】:1223-1233

【Authors】: Olivier Jeunen ; David Rohde ; Flavian Vasile ; Martin Bompaire

【Abstract】: Conventional approaches to recommendation often do not explicitly take into account information on previously shown recommendations and their recorded responses. One reason is that, since we do not know the outcome of actions the system did not take, learning directly from such logs is not a straightforward task. Several methods for off-policy or counterfactual learning have been proposed in recent years, but their efficacy for the recommendation task remains understudied. Due to the limitations of offline datasets and the lack of access of most academic researchers to online experiments, this is a non-trivial task. Simulation environments can provide a reproducible solution to this problem.

【Keywords】: Computing methodologies; Machine learning; Learning settings; Learning from implicit feedback; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

125. FedFast: Going Beyond Average for Faster Training of Federated Recommender Systems.

【Paper Link】【Pages】:1234-1242

【Authors】: Khalil Muhammad ; Qinqin Wang ; Diarmuid O'Reilly-Morgan ; Elias Z. Tragos ; Barry Smyth ; Neil Hurley ; James Geraci ; Aonghus Lawlor

【Abstract】: Federated learning (FL) is quickly becoming the de facto standard for the distributed training of deep recommendation models, using on-device user data and reducing server costs. In a typical FL process, a central server tasks end-users to train a shared recommendation model using their local data. The local models are trained over several rounds on the users' devices and the server combines them into a global model, which is sent to the devices for the purpose of providing recommendations. Standard FL approaches use randomly selected users for training at each round, and simply average their local models to compute the global model. The resulting federated recommendation models require significant client effort to train and many communication rounds before they converge to a satisfactory accuracy. Users are left with poor quality recommendations until the late stages of training. We present a novel technique, FedFast, to accelerate distributed learning which achieves good accuracy for all users very early in the training process. We achieve this by sampling from a diverse set of participating clients in each training round and applying an active aggregation method that propagates the updated model to the other clients. Consequently, with FedFast the users benefit from far lower communication costs and more accurate models that can be consumed anytime during the training process even at the very early stages. We demonstrate the efficacy of our approach across a variety of benchmark datasets and in comparison to state-of-the-art recommendation techniques.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

126. AM-GCN: Adaptive Multi-channel Graph Convolutional Networks.

【Paper Link】【Pages】:1243-1253

【Authors】: Xiao Wang ; Meiqi Zhu ; Deyu Bo ; Peng Cui ; Chuan Shi ; Jian Pei

【Abstract】: Graph Convolutional Networks (GCNs) have gained great popularity in tackling various analytics tasks on graph and network data. However, some recent studies raise concerns about whether GCNs can optimally integrate node features and topological structures in a complex graph with rich information. In this paper, we first present an experimental investigation. Surprisingly, our experimental results clearly show that the capability of the state-of-the-art GCNs in fusing node features and topological structures is distant from optimal or even satisfactory. The weakness may severely hinder the capability of GCNs in some classification tasks, since GCNs may not be able to adaptively learn some deep correlation information between topological structures and node features. Can we remedy the weakness and design a new type of GCNs that can retain the advantages of the state-of-the-art GCNs and, at the same time, enhance the capability of fusing topological structures and node features substantially? We tackle the challenge and propose an adaptive multi-channel graph convolutional networks for semi-supervised classification (AM-GCN). The central idea is that we extract the specific and common embeddings from node features, topological structures, and their combinations simultaneously, and use the attention mechanism to learn adaptive importance weights of the embeddings. Our extensive experiments on benchmark data sets clearly show that AM-GCN extracts the most correlated information from both node features and topological structures substantially, and improves the classification accuracy with a clear margin.

【Keywords】: Computing methodologies; Machine learning; Networks; Network algorithms

127. Discovering Approximate Functional Dependencies using Smoothed Mutual Information.

【Paper Link】【Pages】:1254-1264

【Authors】: Frédéric Pennerath ; Panagiotis Mandros ; Jilles Vreeken

【Abstract】: We consider the task of discovering the top-K reliable approximate functional dependencies X -> Y from high dimensional data. While naively maximizing mutual information involving high dimensional entropies over empirical data is subject to false discoveries, correcting the empirical estimator against data sparsity can lead to efficient exact algorithms for robust dependency discovery. Previous approaches focused on correcting by subtracting expected values of different null hypothesis models. In this paper, we consider a different correction strategy and counter data sparsity using uniform priors and smoothing techniques, that leads to an efficient and robust estimating process. In addition, we derive an admissible and tight bounding function for the smoothed estimator that allows us to efficiently solve via branch-and-bound the hard search problem for the top-K dependencies. Our experiments show that our approach is much faster than previous proposals, and leads to the discovery of sparse and informative functional dependencies.

【Keywords】: Information systems; Information systems applications; Data mining

128. Competitive Analysis for Points of Interest.

【Paper Link】【Pages】:1265-1274

【Authors】: Shuangli Li ; Jingbo Zhou ; Tong Xu ; Hao Liu ; Xinjiang Lu ; Hui Xiong

【Abstract】: The competitive relationship of Points of Interest (POIs) refers to the degree of competition between two POIs for business opportunities from third parties in an urban area. Existing studies for competitive analysis usually focus on mining competitive relationships of entities, such as companies or products, from textual data. However, there are few studies which have a focus on competitive analysis for POIs. Indeed, the growing availability of user behavior data about POIs, such as POI reviews and human mobility data, enables a new paradigm for understanding the competitive relationships among POIs. To this end, in this paper, we study how to predict the POI competitive relationship. Along this line, a very first challenge is how to integrate heterogeneous user behavior data with the spatial features of POIs. As a solution, we first build a heterogeneous POI information network (HPIN) from POI reviews and map search data. Then, we develop a graph neural network-based deep learning framework, named DeepR, for POI competitive relationship prediction based on HPIN. Specifically, DeepR contains two components: a spatial adaptive graph neural network (SA-GNN) and a POI pairwise knowledge extraction learning (PKE) model. The SA-GNN is a novel GNN architecture with incorporating POI's spatial information and location distribution by a specially designed spatial oriented aggregation layer and spatial-dependency attentive propagation mechanism. In addition, PKE is devised to distill the POI pairwise knowledge in HPIN being useful for relationship prediction into condensate vectors with relational graph convolution and cross attention. Finally, extensive experiments on two real-world datasets demonstrate the effectiveness of our method.

【Keywords】: Information systems; Information systems applications; Data mining; Spatial-temporal systems; Location based services

129. HOPS: Probabilistic Subtree Mining for Small and Large Graphs.

【Paper Link】【Pages】:1275-1284

【Authors】: Pascal Welke ; Florian Seiffarth ; Michael Kamp ; Stefan Wrobel

【Abstract】: Frequent subgraph mining, i.e., the identification of relevant patterns in graph databases, is a well-known data mining problem with high practical relevance, since next to summarizing the data, the resulting patterns can also be used to define powerful domain-specific similarity functions for prediction. In recent years, significant progress has been made towards subgraph mining algorithms that scale to complex graphs by focusing on tree patterns and probabilistically allowing a small amount of incompleteness in the result. Nonetheless, the complexity of the pattern matching component used for deciding subtree isomorphism on arbitrary graphs has significantly limited the scalability of existing approaches. In this paper, we adapt sampling techniques from mathematical combinatorics to the problem of probabilistic subtree mining in arbitrary databases of many small to medium-size graphs or a single large graph. By restricting on tree patterns, we provide an algorithm that approximately counts or decides subtree isomorphism for arbitrary transaction graphs in sub-linear time with one-sided error. Our empirical evaluation on a range of benchmark graph datasets shows that the novel algorithm substantially outperforms state-of-the-art approaches both in the task of approximate counting of embeddings in single large graphs and in probabilistic frequent subtree mining in large databases of small to medium sized graphs.

【Keywords】: Mathematics of computing; Discrete mathematics; Graph theory; Matchings and factors; Trees

130. The NodeHopper: Enabling Low Latency Ranking with Constraints via a Fast Dual Solver.

【Paper Link】【Pages】:1285-1294

【Authors】: Anton Zhernov ; Krishnamurthy (Dj) Dvijotham ; Ivan Lobov ; Dan A. Calian ; Michelle Gong ; Natarajan Chandrashekar ; Timothy A. Mann

【Abstract】: Modern recommender systems need to deal with multiple objectives like balancing user engagement with recommending diverse and fresh content. An appealing way to optimally trade these off is by imposing constraints on the ranking according to which items are presented to a user. This results in a constrained ranking optimization problem that can be solved as a linear program (LP). However, off-the-shelf LP solvers are unable to meet the severe latency constraints in systems that serve live traffic. To address this challenge, we exploit the structure of the dual optimization problem to develop a fast solver. We analyze theoretical properties of our solver and show experimentally that it is able to solve constrained ranking problems on synthetic and real-world recommendation datasets an order of magnitude faster than off-the-shelf solvers, thereby enabling their deployment under severe latency constraints.

【Keywords】: Information systems; World Wide Web; Web searching and information discovery; Content ranking; Mathematics of computing; Mathematical analysis; Mathematical optimization; Continuous optimization; Linear programming

131. HGMF: Heterogeneous Graph-based Fusion for Multimodal Data with Incompleteness.

【Paper Link】【Pages】:1295-1305

【Authors】: Jiayi Chen ; Aidong Zhang

【Abstract】: With the advances in data collection techniques, large amounts of multimodal data collected from multiple sources are becoming available. Such multimodal data can provide complementary information that can reveal fundamental characteristics of real-world subjects. Thus, multimodal machine learning has become an active research area. Extensive works have been developed to exploit multimodal interactions and integrate multi-source information. However, multimodal data in the real world usually comes with missing modalities due to various reasons, such as sensor damage, data corruption, and human mistakes in recording. Effectively integrating and analyzing multimodal data with incompleteness remains a challenging problem. We propose a Heterogeneous Graph-based Multimodal Fusion (HGMF) approach to enable multimodal fusion of incomplete data within a heterogeneous graph structure. The proposed approach develops a unique strategy for learning on incomplete multimodal data without data deletion or data imputation. More specifically, we construct a heterogeneous hypernode graph to model the multimodal data having different combinations of missing modalities, and then we formulate a graph neural network based transductive learning framework to project the heterogeneous incomplete data onto a unified embedding space, and multi-modalities are fused along the way. The learning framework captures modality interactions from available data, and leverages the relationships between different incompleteness patterns. Our experimental results demonstrate that the proposed method outperforms existing graph-based as well as non-graph based baselines on three different datasets.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Classification and regression trees; Neural networks; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Semi-supervised learning

132. ST-SiameseNet: Spatio-Temporal Siamese Networks for Human Mobility Signature Identification.

【Paper Link】【Pages】:1306-1315

【Authors】: Huimin Ren ; Menghai Pan ; Yanhua Li ; Xun Zhou ; Jun Luo

【Abstract】: Given the historical movement trajectories of a set of individual human agents (e.g., pedestrians, taxi drivers) and a set of new trajectories claimed to be generated by a specific agent, the Human Mobility Signature Identification (HuMID) problem aims at validating if the incoming trajectories were indeed generated by the claimed agent. This problem is important in many real-world applications such as driver verification in ride-sharing services, risk analysis for auto insurance companies, and criminal identification. Prior work on identifying human mobility behaviors requires additional data from other sources besides the trajectories, e.g., sensor readings in the vehicle for driving behavior identification. However, these data might not be universally available and is costly to obtain. To deal with this challenge, in this work, we make the first attempt to match identities of human agents only from the observed location trajectory data by proposing a novel and efficient framework named Spatio-temporal Siamese Networks (ST-SiameseNet). For each human agent, we extract a set of profile and online features from his/her trajectories. We train ST-SiameseNet to predict the mobility signature similarity between each pair of agents, where each agent is represented by his/her trajectories and the extracted features. Experimental results on a real-world taxi trajectory dataset show that our proposed ST-SiamesNet can achieve an $F_1$ score of $0.8508$, which significantly outperforms the state-of-the-art techniques.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by classification; Unsupervised learning; Anomaly detection; Machine learning approaches; Neural networks

133. A Novel Deep Learning Model by Stacking Conditional Restricted Boltzmann Machine and Deep Neural Network.

【Paper Link】【Pages】:1316-1324

【Authors】: Tianyu Kang ; Ping Chen ; John Quackenbush ; Wei Ding

【Abstract】: A real-world system often exhibits complex dynamics arising from interaction among its subunits. In machine learning and data mining, these interactions are usually formulated as dependency and correlation among system variables. Similar to Convolution Neural Network dealing with spatially correlated features and Recurrent Neural Network with temporally correlated features, in this paper we present a novel deep learning model to tackle functionally interactive features by stacking a Conditional Restricted Boltzmann Machine and a Deep Neural Network (CRBM-DNN). Variables with their dependency relationships are organized into a bipartite graph, which is further converted into a Restricted Boltzmann Machine conditioned by domain knowledge. We integrate this CRBM and a DNN into one deep learning model constrained by one overall cost function. CRBM-DNN can solve both supervised and unsupervised learning problems. Compared to a regular neural network of the same size, CRBM-DNN has fewer parameters so they require fewer training samples. We perform extensive comparative studies with a large number of supervised learning and unsupervised learning methods using several challenging real-world datasets, and achieve significant superior performance.

【Keywords】: Theory of computation; Theory and algorithms for application domains; Machine learning theory

134. InfiniteWalk: Deep Network Embeddings as Laplacian Embeddings with a Nonlinearity.

【Paper Link】【Pages】:1325-1333

【Authors】: Sudhanshu Chanpuriya ; Cameron Musco

【Abstract】: The skip-gram model for learning word embeddings (Mikolov et al. 2013) has been widely popular, and DeepWalk (Perozzi et al. 2014), among other methods, has extended the model to learning node representations from networks. Recent work of Qiu et al. (2018) provides a closed-form expression for the DeepWalk objective, obviating the need for sampling for small datasets and improving accuracy. In these methods, the "window size" T within which words or nodes are considered to co-occur is a key hyperparameter. We study the objective in the limit as T goes to infinity, which allows us to simplify the expression of Qiu et al. We prove that this limiting objective corresponds to factoring a simple transformation of the pseudoinverse of the graph Laplacian, linking DeepWalk to extensive prior work in spectral graph embeddings. Further, we show that by a applying a simple nonlinear entrywise transformation to this pseudoinverse, we recover a good approximation of the finite-T objective and embeddings that are competitive with those from DeepWalk and other skip-gram methods in multi-label classification. Surprisingly, we find that even simple binary thresholding of the Laplacian pseudoinverse is often competitive, suggesting that the core advancement of recent methods is a nonlinearity on top of the classical spectral embedding approach.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Dimensionality reduction and manifold learning; Information systems; Information systems applications; Data mining

135. xGAIL: Explainable Generative Adversarial Imitation Learning for Explainable Human Decision Analysis.

【Paper Link】【Pages】:1334-1343

【Authors】: Menghai Pan ; Weixiao Huang ; Yanhua Li ; Xun Zhou ; Jun Luo

【Abstract】: To make daily decisions, human agents devise their own "strategies" governing their mobility dynamics (e.g., taxi drivers have preferred working regions and times, and urban commuters have preferred routes and transit modes). Recent research such as generative adversarial imitation learning (GAIL) demonstrates successes in learning human decision-making strategies from their behavior data using deep neural networks (DNNs), which can accurately mimic how humans behave in various scenarios, e.g., playing video games, etc. However, such DNN-based models are "black box" models in nature, making it hard to explain what knowledge the models have learned from human, and how the models make such decisions, which was not addressed in the literature of imitation learning. This paper addresses this research gap by proposing xGAIL, the first explainable generative adversarial imitation learning framework. The proposed xGAIL framework consists of two novel components, including Spatial Activation Maximization (SpatialAM) and Spatial Randomized Input Sampling Explanation (SpatialRISE), to extract both global and local knowledge from a well-trained GAIL model that explains how a human agent makes decisions. Especially, we take taxi drivers' passenger-seeking strategy as an example to validate the effectiveness of the proposed xGAIL framework. Our analysis on a large-scale real-world taxi trajectory data shows promising results from two aspects: i) global explainable knowledge of what nearby traffic condition impels a taxi driver to choose a particular direction to find the next passenger, and ii) local explainable knowledge of what key (sometimes hidden) factors a taxi driver considers when making a particular decision.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Reinforcement learning; Inverse reinforcement learning; Machine learning approaches; Markov decision processes; Neural networks

136. Catalysis Clustering with GAN by Incorporating Domain Knowledge.

【Paper Link】【Pages】:1344-1352

【Authors】: Olga Andreeva ; Wei Li ; Wei Ding ; Marieke L. Kuijjer ; John Quackenbush ; Ping Chen

【Abstract】: Clustering is an important unsupervised learning method with serious challenges when data is sparse and high-dimensional. Generated clusters are often evaluated with general measures, which may not be meaningful or useful for practical applications and domains. Using a distance metric, a clustering algorithm searches through the data space, groups close items into one cluster, and assigns far away samples to different clusters. In many real-world applications, the number of dimensions is high and data space becomes very sparse. Selection of a suitable distance metric is very difficult and becomes even harder when categorical data is involved. Moreover, existing distance metrics are mostly generic, and clusters created based on them will not necessarily make sense to domain-specific applications. One option to address these challenges is to integrate domain-defined rules and guidelines into the clustering process. In this work we propose a GAN-based approach called Catalysis Clustering to incorporate domain knowledge into the clustering process. With GANs we generate catalysts, which are special synthetic points drawn from the original data distribution and verified to improve clustering quality when measured by a domain-specific metric. We then perform clustering analysis using both catalysts and real data. Final clusters are produced after catalyst points are removed. Experiments on two challenging real-world datasets clearly show that our approach is effective and can generate clusters that are meaningful and useful for real-world applications.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Cluster analysis; Machine learning approaches; Neural networks

137. Prediction and Profiling of Audience Competition for Online Television Series.

【Paper Link】【Pages】:1353-1361

【Authors】: Peng Zhang ; Chuanren Liu ; Kefeng Ning ; Wenxiang Zhu ; Yu Zhang

【Abstract】: Understanding the target audience for popular television series is valuable for online video platform to manage advertising sales, purchase video copyrights, and compete with other video service platforms. Existing studies in this domain generally focus on using data mining and machine learning techniques to recommend television series to individual users or predict the popularity of television series. Knowing only the popularity of television series may, however, limit our ability to answer more in-depth questions and develop more intelligent applications. In this paper, we develop a data-driven framework to model and predict audience competition patterns for popular online television series. Specifically, we first construct a sequence of dynamic competition networks of television series by mining the detailed viewership records. Then, we design the Dynamic Deep Network Factorization (DDNF), a hybrid modeling framework for predicting the future competition networks. Our framework adopts the deep neural network (DNN) and the knowledge-base (KB) embedding to incorporate static features, and integrates the Long Short-Term Memory (LSTM) network to learn dynamic features of the television series. Finally, extensive experiments on real-world data sets validate the effectiveness of our approach compared with state-of-the-art baselines in predicting the audience competition for existing and new television series.

【Keywords】: Information systems; Information systems applications; Data mining; Decision support systems; Data analytics

138. Multi-Class Data Description for Out-of-distribution Detection.

【Paper Link】【Pages】:1362-1370

【Authors】: Dongha Lee ; Sehun Yu ; Hwanjo Yu

【Abstract】: The capability of reliably detecting out-of-distribution samples is one of the key factors in deploying a good classifier, as the test distribution always does not match with the training distribution in most real-world applications. In this work, we present a deep multi-class data description, termed as Deep-MCDD, which is effective to detect out-of-distribution (OOD) samples as well as classify in-distribution (ID) samples. Unlike the softmax classifier that only focuses on the linear decision boundary partitioning its latent space into multiple regions, our Deep-MCDD aims to find a spherical decision boundary for each class which determines whether a test sample belongs to the class or not. By integrating the concept of Gaussian discriminant analysis into deep neural networks, we propose a deep learning objective to learn class-conditional distributions that are explicitly modeled as separable Gaussian distributions. Thereby, we can define the confidence score by the distance of a test sample from each class-conditional distribution, and utilize it for identifying OOD samples. Our empirical evaluation on multi-class tabular and image datasets demonstrates that Deep-MCDD achieves the best performances in distinguishing OOD samples while showing the classification accuracy as high as the other competitors.

139. In and Out: Optimizing Overall Interaction in Probabilistic Graphs under Clustering Constraints.

【Paper Link】【Pages】:1371-1381

【Authors】: Domenico Mandaglio ; Andrea Tagarelli ; Francesco Gullo

【Abstract】: We study two novel clustering problems in which the pairwise interactions between entities are characterized by probability distributions and conditioned by external factors within the environment where the entities interact. This covers any scenario where a set of actions can alter the entities' interaction behavior. In particular, we consider the case where the interaction conditioning factors can be modeled as cluster memberships of entities in a graph and the goal is to partition a set of entities such as to maximize the overall vertex interactions or, equivalently, minimize the loss of interactions in the graph. We show that both problems are NP-hard and they are equivalent in terms of optimality. However, we focus on the minimization formulation as it enables the possibility of devising both practical and efficient approximation algorithms and heuristics. Experimental evaluation of our algorithms, on both synthetic and real network datasets, has shown evidence of their meaningfulness as well as superiority with respect to competing methods, both in terms of effectiveness and efficiency.

【Keywords】: Information systems; World Wide Web; Web searching and information discovery; Mathematics of computing; Discrete mathematics; Graph theory

140. Recurrent Halting Chain for Early Multi-label Classification.

【Paper Link】【Pages】:1382-1392

【Authors】: Thomas Hartvigsen ; Cansu Sen ; Xiangnan Kong ; Elke A. Rundensteiner

【Abstract】: Early multi-label classification of time series, the assignment of a label set to a time series before the series is entirely observed, is critical for time-sensitive domains such as healthcare. In such cases, waiting too long to classify can render predictions useless, regardless of their accuracy, while predicting prematurely can result in potentially costly erroneous results. When predicting multiple labels (for example, types of infections), dependencies between labels can be learned and leveraged to improve overall accuracy. Together, reliably predicting the correct label set of a time series while observing as few timesteps as possible is challenging because these goals are contradictory in that fewer timesteps often means worse accuracy. To achieve early yet sufficiently accurate predictions, correlations between labels must be accounted for since direct evidence of some labels may only appear late in the series. We design an effective solution to this open problem, the Recurrent Halting Chain (RHC), that for the first time integrates key innovations in both Early and Multi-label Classification into one multi-objective model. RHC uses a recurrent neural network to jointly model raw time series as well as correlations between labels, resulting in a novel order-free classifier chain that tackles this time-sensitive multi-label learning task. Further, RHC employs a reinforcement learning-based halting network to decide at each timestep which, if any, classes should be predicted, learning to build the label set over time. Using two real-world time-sensitive datasets and popular multi-label metrics, we show that RHC outperforms recent alternatives by predicting more-accurate label sets earlier.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by classification; Machine learning approaches; Neural networks

141. Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks.

【Paper Link】【Pages】:1393-1403

【Authors】: Weilin Cong ; Rana Forsati ; Mahmut T. Kandemir ; Mehrdad Mahdavi

【Abstract】: Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed intoembedding approximation variance in the forward stage andstochastic gradient variance in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Learning latent representations

142. Discovering Functional Dependencies from Mixed-Type Data.

【Paper Link】【Pages】:1404-1414

【Authors】: Panagiotis Mandros ; David Kaltenpoth ; Mario Boley ; Jilles Vreeken

【Abstract】: Given complex data collections, practitioners can perform non-parametric functional dependency discovery (FDD) to uncover relationships between variables that were previously unknown. However, known FDD methods are applicable to nominal data, and in practice non-nominal variables are discretized, e.g., in a pre-processing step. This is problematic because, as soon as a mix of discrete and continuous variables is involved, the interaction of discretization with the various dependency measures from the literature is poorly understood. In particular, it is unclear whether a given discretization method even leads to a consistent dependency estimate. In this paper, we analyze these fundamental questions and derive formal criteria as to when a discretization process applied to a mixed set of random variables leads to consistent estimates of mutual information. With these insights, we derive an estimator framework applicable to any task that involves estimating mutual information from multivariate and mixed-type data. Last, we extend with this framework a previously proposed FDD approach for reliable dependencies. Experimental evaluation shows that the derived reliable estimator is both computationally and statistically efficient, and leads to effective FDD algorithms for mixed-type data.

【Keywords】: Information systems; Information systems applications; Data mining

143. Attackability Characterization of Adversarial Evasion Attack on Discrete Data.

【Paper Link】【Pages】:1415-1425

【Authors】: Yutong Wang ; Yufei Han ; Hongyan Bao ; Yun Shen ; Fenglong Ma ; Jin Li ; Xiangliang Zhang

【Abstract】: Evasion attack on discrete data is a challenging, while practically interesting research topic. It is intrinsically an NP-hard combinatorial optimization problem. Characterizing the conditions guaranteeing the solvability of an evasion attack task thus becomes the key to understand the adversarial threat. Our study is inspired by the weak submodularity theory. We characterize the attackability of a targeted classifier on discrete data in evasion attack by bridging the attackability measurement and the regularity of the targeted classifier. Based on our attackability analysis, we propose a computationally efficient orthogonal matching pursuit-guided attack method for evasion attack on discrete data. It provides provably computational efficiency and attack performances. Substantial experimental results on real-world datasets validate the proposed attackability conditions and the effectiveness of the proposed attack method.

【Keywords】: Computing methodologies; Artificial intelligence; Search methodologies; Discrete space search; Machine learning; Machine learning approaches; Neural networks; Symbolic and algebraic manipulation; Symbolic and algebraic algorithms; Optimization algorithms; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Reinforcement learning; Adversarial learning

144. The Spectral Zoo of Networks: Embedding and Visualizing Networks with Spectral Moments.

【Paper Link】【Pages】:1426-1434

【Authors】: Shengmin Jin ; Reza Zafarani

【Abstract】: Network embedding methods have been widely and successfully used in network-based applications such as node classification and link prediction. However, an ideal network embedding should not only be useful for machine learning, but interpretable. We introduce a spectral embedding method for a network, its Spectral Point, which is basically the first few spectral moments of a network. Spectral moments are interpretable, where we prove their close relationships to network structure (e.g. number of triangles and squares) and various network properties (e.g. degree distribution, clustering coefficient, and network connectivity). Using spectral points, we introduce a visualizable and bounded 3D embedding space for all possible graphs, in which one can characterize various types of graphs (e.g., cycles), or real-world networks from different categories (e.g., social or biological networks). We demonstrate that spectral points can be used for network identification (i.e., what network is this subgraph sampled from?) and that by using just the first few moments one does not lose much predictive power.

【Keywords】: Computing methodologies; Machine learning; Machine learning algorithms; Spectral methods; Human-centered computing; Visualization; Visualization techniques

145. Unsupervised Differentiable Multi-aspect Network Embedding.

【Paper Link】【Pages】:1435-1445

【Authors】: Chanyoung Park ; Carl Yang ; Qi Zhu ; Donghyun Kim ; Hwanjo Yu ; Jiawei Han

【Abstract】: Network embedding is an influential graph mining technique for representing nodes in a graph as distributed vectors. However, the majority of network embedding methods focus on learning a single vector representation for each node, which has been recently criticized for not being capable of modeling multiple aspects of a node. To capture the multiple aspects of each node, existing studies mainly rely on offline graph clustering performed prior to the actual embedding, which results in the cluster membership of each node (i.e., node aspect distribution) fixed throughout training of the embedding model. We argue that this not only makes each node always have the same aspect distribution regardless of its dynamic context, but also hinders the end-to-end training of the model that eventually leads to the final embedding quality largely dependent on the clustering. In this paper, we propose a novel end-to-end framework for multi-aspect network embedding, called asp2vec, in which the aspects of each node are dynamically assigned based on its local context. More precisely, among multiple aspects, we dynamically assign a single aspect to each node based on its current context, and our aspect selection module is end-to-end differentiable via the Gumbel-Softmax trick. We also introduce the aspect regularization framework to capture the interactions among the multiple aspects in terms of relatedness and diversity. We further demonstrate that our proposed framework can be readily extended to heterogeneous networks. Extensive experiments towards various downstream tasks on various types of homogeneous networks and a heterogeneous network demonstrate the superiority of asp2vec.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning

146. AutoML Pipeline Selection: Efficiently Navigating the Combinatorial Space.

【Paper Link】【Pages】:1446-1456

【Authors】: Chengrun Yang ; Jicong Fan ; Ziyang Wu ; Madeleine Udell

【Abstract】: Data scientists seeking a good supervised learning model on a dataset have many choices to make: they must preprocess the data, select features, possibly reduce the dimension, select an estimation algorithm, and choose hyperparameters for each of these pipeline components. With new pipeline components comes a combinatorial explosion in the number of choices! In this work, we design a new AutoML system TensorOboe to address this challenge: an automated system to design a supervised learning pipeline. TensorOboe uses low rank tensor decomposition as a surrogate model for efficient pipeline search. We also develop a new greedy experiment design protocol to gather information about a new dataset efficiently. Experiments on large corpora of real-world classification problems demonstrate the effectiveness of our approach.

【Keywords】: Computing methodologies; Artificial intelligence; Search methodologies; Continuous space search; Discrete space search; Machine learning; Learning settings; Active learning settings; Machine learning approaches; Factorization methods; Principal component analysis; Learning latent representations

147. Towards Physics-informed Deep Learning for Turbulent Flow Prediction.

【Paper Link】【Pages】:1457-1466

【Authors】: Rui Wang ; Karthik Kashinath ; Mustafa Mustafa ; Adrian Albert ; Rose Yu

【Abstract】: While deep learning has shown tremendous success in a wide range of domains, it remains a grand challenge to incorporate physical principles in a systematic manner to the design, training, and inference of such models. In this paper, we aim to predict turbulent flow by learning its highly nonlinear dynamics from spatiotemporal velocity fields of large-scale fluid flow simulations of relevance to turbulence modeling and climate modeling. We adopt a hybrid approach by marrying two well-established turbulent flow simulation techniques with deep learning. Specifically, we introduce trainable spectral filters in a coupled model of Reynolds-averaged Navier-Stokes (RANS) and Large Eddy Simulation (LES), followed by a specialized U-net for prediction. Our approach, which we call Turbulent-Flow Net, is grounded in a principled physics model, yet offers the flexibility of learned representations. We compare our model with state-of-the-art baselines and observe significant reductions in error for predictions 60 frames ahead. Most importantly, our method predicts physical fields that obey desirable physical characteristics, such as conservation of mass, whilst faithfully emulating the turbulent kinetic energy field and spectrum, which are critical for accurate prediction of turbulent flows.

【Keywords】: Applied computing; Physical sciences and engineering; Physics; Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by regression; Machine learning approaches; Neural networks; Mathematics of computing; Probability and statistics; Statistical paradigms; Time series analysis

148. Evaluating Fairness Using Permutation Tests.

【Paper Link】【Pages】:1467-1477

【Authors】: Cyrus DiCiccio ; Sriram Vasudevan ; Kinjal Basu ; Krishnaram Kenthapadi ; Deepak Agarwal

【Abstract】: Machine learning models are central to people's lives and impact society in ways as fundamental as determining how people access information. The gravity of these models imparts a responsibility to model developers to ensure that they are treating users in a fair and equitable manner. Before deploying a model into production, it is crucial to examine the extent to which its predictions demonstrate biases. This paper deals with the detection of bias exhibited by a machine learning model through statistical hypothesis testing. We propose a permutation testing methodology that performs a hypothesis test that a model is fair across two groups with respect to any given metric. There are increasingly many notions of fairness that can speak to different aspects of model fairness. Our aim is to provide a flexible framework that empowers practitioners to identify significant biases in any metric they wish to study. We provide a formal testing mechanism as well as extensive experiments to show how this method works in practice.

【Keywords】: Information systems; World Wide Web; Web applications; Crowdsourcing; Trust; Mathematics of computing; Probability and statistics; Probabilistic inference problems; Hypothesis testing and confidence interval computation; Probabilistic reasoning algorithms; Resampling methods

149. Leveraging Model Inherent Variable Importance for Stable Online Feature Selection.

【Paper Link】【Pages】:1478-1502

【Authors】: Johannes Haug ; Martin Pawelczyk ; Klaus Broelemann ; Gjergji Kasneci

【Abstract】: Feature selection can be a crucial factor in obtaining robust and accurate predictions. Online feature selection models, however, operate under considerable restrictions; they need to efficiently extract salient input features based on a bounded set of observations, while enabling robust and accurate predictions. In this work, we introduce FIRES, a novel framework for online feature selection. The proposed feature weighting mechanism leverages the importance information inherent in the parameters of a predictive model. By treating model parameters as random variables, we can penalize features with high uncertainty and thus generate more stable feature sets. Our framework is generic in that it leaves the choice of the underlying model to the user. Strikingly, experiments suggest that the model complexity has only a minor effect on the discriminative power and stability of the selected feature sets. In fact, using a simple linear model, FIRES obtains feature sets that compete with state-of-the-art methods, while dramatically reducing computation time. In addition, experiments show that the proposed framework is clearly superior in terms of feature selection stability.

【Keywords】: Computing methodologies; Machine learning; Learning settings; Online learning settings; Machine learning algorithms; Feature selection; Information systems; Data management systems; Database design and models; Data model extensions; Data streams; Uncertainty; Mathematics of computing; Probability and statistics; Statistical paradigms; Dimensionality reduction

150. Multi-level Graph Convolutional Networks for Cross-platform Anchor Link Prediction.

【Paper Link】【Pages】:1503-1511

【Authors】: Hongxu Chen ; Hongzhi Yin ; Xiangguo Sun ; Tong Chen ; Bogdan Gabrys ; Katarzyna Musial

【Abstract】: Cross-platform account matching plays a significant role in social network analytics, and is beneficial for a wide range of applications. However, existing methods either heavily rely on high-quality user generated content (including user profiles) or suffer from data insufficiency problem if only focusing on network topology, which brings researchers into an insoluble dilemma of model selection. In this paper, to address this problem, we propose a novel framework that considers multi-level graph convolutions on both local network structure and hypergraph structure in a unified manner. The proposed method overcomes data insufficiency problem of existing work and does not necessarily rely on user demographic information. Moreover, to adapt the proposed method to be capable of handling large-scale social networks, we propose a two-phase space reconciliation mechanism to align the embedding spaces in both network partitioning based parallel training and account matching across different social networks. Extensive experiments have been conducted on two large-scale real-life social networks. The experimental results demonstrate that the proposed method outperforms the state-of-the-art models with a big margin.

【Keywords】: Information systems; Information systems applications; Data mining

151. Evaluating Conversational Recommender Systems via User Simulation.

【Paper Link】【Pages】:1512-1520

【Authors】: Shuo Zhang ; Krisztian Balog

【Abstract】: Conversational information access is an emerging research area. Currently, human evaluation is used for end-to-end system evaluation, which is both very time and resource intensive at scale, and thus becomes a bottleneck of progress. As an alternative, we propose automated evaluation by means of simulating users. Our user simulator aims to generate responses that a real human would give by considering both individual preferences and the general flow of interaction with the system. We evaluate our simulation approach on an item recommendation task by comparing three existing conversational recommender systems. We show that preference modeling and task-specific interaction models both contribute to more realistic simulations, and can help achieve high correlation between automatic evaluation measures and manual human assessments.

【Keywords】: Human-centered computing; Human computer interaction (HCI); HCI design and evaluation methods; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems; Users and interactive retrieval

152. Measuring Model Complexity of Neural Networks with Curve Activation Functions.

【Paper Link】【Pages】:1521-1531

【Authors】: Xia Hu ; Weiqing Liu ; Jiang Bian ; Jian Pei

【Abstract】: It is fundamental to measure model complexity of deep neural networks. A good model complexity measure can help to tackle many challenging problems, such as overfitting detection, model selection, and performance improvement. The existing literature on model complexity mainly focuses on neural networks with piecewise linear activation functions. Model complexity of neural networks with general curve activation functions remains an open problem. To tackle the challenge, in this paper, we first propose linear approximation neural network (LANN for short), a piecewise linear framework to approximate a given deep model with curve activation function. LANN constructs individual piecewise linear approximation for the activation function of each neuron, and minimizes the number of linear regions to satisfy a required approximation degree. Then, we analyze the upper bound of the number of linear regions formed by LANNs, and derive the complexity measure based on the upper bound. To examine the usefulness of the complexity measure, we experimentally explore the training process of neural networks and detect overfitting. Our results demonstrate that the occurrence of overfitting is positively correlated with the increase of model complexity during training. We find that the L1 and L2 regularizations suppress the increase of model complexity. Finally, we propose two approaches to prevent overfitting by directly constraining model complexity, namely neuron pruning and customized L1 regularization.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Machine learning approaches; Neural networks; General and reference; Document types; General literature

153. Diverse Rule Sets.

【Paper Link】【Pages】:1532-1541

【Authors】: Guangyi Zhang ; Aristides Gionis

【Abstract】: While machine-learning models are flourishing and transforming many aspects of everyday life, the inability of humans to understand complex models poses difficulties for these models to be fully trusted and embraced. Thus, interpretability of models has been recognized as an equally important quality as their predictive power. In particular, rule-based systems are experiencing a renaissance owing to their intuitive if-then representation.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by classification; Machine learning approaches; Rule learning

154. Vamsa: Automated Provenance Tracking in Data Science Scripts.

【Paper Link】【Pages】:1542-1551

【Authors】: Mohammad Hossein Namaki ; Avrilia Floratou ; Fotis Psallidas ; Subru Krishnan ; Ashvin Agrawal ; Yinghui Wu ; Yiwen Zhu ; Markus Weimer

【Abstract】: There has recently been a lot of ongoing research in the areas of fairness, bias and explainability of machine learning (ML) models due to the self-evident or regulatory requirements of various ML applications. We make the following observation: All of these approaches require a robust understanding of the relationship between ML models and the data used to train them. In this work, we introduce the ML provenance tracking problem: the fundamental idea is to automatically track which columns in a dataset have been used to derive the features/labels of an ML model. We discuss the challenges in capturing such information in the context of Python, the most common language used by data scientists.

【Keywords】: Computing methodologies; Machine learning; Information systems; Data management systems; Database design and models; Data model extensions; Data provenance

155. Deep State-Space Generative Model For Correlated Time-to-Event Predictions.

【Paper Link】【Pages】:1552-1562

【Authors】: Yuan Xue ; Denny Zhou ; Nan Du ; Andrew M. Dai ; Zhen Xu ; Kun Zhang ; Claire Cui

【Abstract】: Capturing the inter-dependencies among multiple types of clinically-critical events is critical not only to accurate future event prediction, but also to better treatment planning. In this work, we propose a deep latent state-space generative model to capture the interactions among different types of correlated clinical events (e.g., kidney failure, mortality) by explicitly modeling the temporal dynamics of patients' latent states. Based on these learned patient states, we further develop a new general discrete-time formulation of the hazard rate function to estimate the survival distribution of patients with significantly improved accuracy. Extensive evaluations over real EMR data show that our proposed model compares favorably to various state-of-the-art baselines. Furthermore, our method also uncovers meaningful insights about the latent correlations among mortality and different types of organ failures.

【Keywords】: Computing methodologies; Artificial intelligence; Machine learning

156. Meta-learning on Heterogeneous Information Networks for Cold-start Recommendation.

【Paper Link】【Pages】:1563-1573

【Authors】: Yuanfu Lu ; Yuan Fang ; Chuan Shi

【Abstract】: Cold-start recommendation has been a challenging problem due to sparse user-item interactions for new users or items. Existing efforts have alleviated the cold-start issue to some extent, most of which approach the problem at the data level. Earlier methods often incorporate auxiliary data as user or item features, while more recent methods leverage heterogeneous information networks (HIN) to capture richer semantics via higher-order graph structures. On the other hand, recent meta-learning paradigm sheds light on addressing cold-start recommendation at the model level, given its ability to rapidly adapt to new tasks with scarce labeled data, or in the context of cold-start recommendation, new users and items with very few interactions. Thus, we are inspired to develop a novel meta-learning approach named MetaHIN to address cold-start recommendation on HINs, to exploit the power of meta-learning at the model level and HINs at the data level simultaneously. The solution is non-trivial, for how to capture HIN-based semantics in the meta-learning setting, and how to learn the general knowledge that can be easily adapted to multifaceted semantics, remain open questions. In MetaHIN, we propose a novel semantic-enhanced tasks constructor and a co-adaptation meta-learner to address the two questions. Extensive experiments demonstrate that MetaHIN significantly outperforms the state of the arts in various cold-start scenarios. (Code and dataset are available at https://github.com/rootlu/MetaHIN.)

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems; Information systems applications; Data mining

157. WavingSketch: An Unbiased and Generic Sketch for Finding Top-k Items in Data Streams.

【Paper Link】【Pages】:1574-1584

【Authors】: Jizhou Li ; Zikun Li ; Yifei Xu ; Shiqi Jiang ; Tong Yang ; Bin Cui ; Yafei Dai ; Gong Zhang

【Abstract】: Finding top-k items in data streams is a fundamental problem in data mining. Existing algorithms that can achieve unbiased estimation suffer from poor accuracy. In this paper, we propose a new sketch, WavingSketch, which is much more accurate than existing unbiased algorithms. WavingSketch is generic, and we show how it can be applied to four applications: finding top-k frequent items, finding top-k heavy changes, finding top-k persistent items, and finding top-k Super-Spreaders. We theoretically prove that WavingSketch can provide unbiased estimation, and then give an error bound of our algorithm. Our experimental results show that, compared with the state-of-the-art, WavingSketch has 4.50 times higher insertion speed and up to 9 x 106 times (2 x 104 times in average) lower error rate in finding frequent items when memory size is tight. For other applications, WavingSketch can also achieve up to 286 times lower error rate. All related codes are open-sourced and available at Github anonymously.

【Keywords】: Information systems; Data management systems; Data structures; Information systems applications; Data mining; Data stream mining

158. Dynamic Knowledge Graph based Multi-Event Forecasting.

【Paper Link】【Pages】:1585-1595

【Authors】: Songgaojun Deng ; Huzefa Rangwala ; Yue Ning

【Abstract】: Modeling concurrent events of multiple types and their involved actors from open-source social sensors is an important task for many domains such as health care, disaster relief, and financial analysis. Forecasting events in the future can help human analysts better understand global social dynamics and make quick and accurate decisions. Anticipating participants or actors who may be involved in these activities can also help stakeholders to better respond to unexpected events. However, achieving these goals is challenging due to several factors: (i) it is hard to filter relevant information from large-scale input, (ii) the input data is usually high dimensional, unstructured, and Non-IID (Non-independent and identically distributed) and (iii) associated text features are dynamic and vary over time. Recently, graph neural networks have demonstrated strengths in learning complex and relational data. In this paper, we study a temporal graph learning method with heterogeneous data fusion for predicting concurrent events of multiple types and inferring multiple candidate actors simultaneously. In order to capture temporal information from historical data, we propose Glean, a graph learning framework based on event knowledge graphs to incorporate both relational and word contexts. We present a context-aware embedding fusion module to enrich hidden features for event actors. We conducted extensive experiments on multiple real-world datasets and show that the proposed method is competitive against various state-of-the-art methods for social event prediction and also provides much-need interpretation capabilities.

【Keywords】: Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Temporal reasoning; Information systems; Information systems applications; Data mining

159. A Geometric Approach to Predicting Bounds of Downstream Model Performance.

【Paper Link】【Pages】:1596-1604

【Authors】: Brian J. Goode ; Debanjan Datta

【Abstract】: This paper presents the motivation and methodology for including model application criteria into baseline analysis. We will focus on detailing the interplay between the common measures of mean square error (MSE) and accuracy as it relates to perceived model performance. MSE is a common aggregate measure for the performance of predictive regression models. The advantages are numerous. MSE is agnostic to the choice of model given that the set of possible outcome values are defined on the appropriate metric space. In practice, decisions on how to subsequently use a trained model are based on predictive performance, relative to a baseline where input features are not used - colloquially a "random model". However, the relative performance gains of a model in terms of MSE to the baseline does not guarantee commensurate gains when deployed in downstream applications, systems, or processes. This paper demonstrates one derivation of a distribution to qualify MSE performance for multi-class decision making systems desiring a certain level of accuracy. The model error is qualified through comparison to relevant baselines tied to the application suited to evaluating individual outcome performance criteria.

【Keywords】: Applied computing; Law, social and behavioral sciences; Sociology; Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Mathematics of computing; Probability and statistics; Distribution functions; Probabilistic representations

160. Context-to-Session Matching: Utilizing Whole Session for Response Selection in Information-Seeking Dialogue Systems.

【Paper Link】【Pages】:1605-1613

【Authors】: Zhenxin Fu ; Shaobo Cui ; Mingyue Shang ; Feng Ji ; Dongyan Zhao ; Haiqing Chen ; Rui Yan

【Abstract】: We study the retrieval-based multi-turn information-seeking dialogue systems, which are widely used in many scenarios. Most of the previous works select the response according to the matching degree between the query's context and the candidate responses. Though great progress has been made, existing works ignore the contexts of the responses, which could provide rich information for selecting the most appropriate response. The more similar the query's context and certain response's context are, the more likely they are to indicate the same question, and thus, the more likely this response is to answer the query. In this paper, we consider the response and its context as a whole session and explore the task of matching the query's context with the sessions. More specifically, we propose to match between the query's context and response's context and integrate the context-to-context matching with context-to-response matching. Experiment results prove that our proposed context-to-session method outperforms the strong baselines significantly.

【Keywords】: Computing methodologies; Artificial intelligence; Natural language processing; Discourse, dialogue and pragmatics; Information systems; Information retrieval; Retrieval tasks and goals; Question answering

161. HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units.

【Paper Link】【Pages】:1614-1624

【Authors】: Shenda Hong ; Yanbo Xu ; Alind Khare ; Satria Priambada ; Kevin O. Maher ; Alaa Aljiffry ; Jimeng Sun ; Alexey Tumanov

【Abstract】: Deep learning models have achieved expert-level performance in healthcare with an exclusive focus on training accurate models. However, in many clinical environments such as intensive care unit (ICU), real-time model serving is equally if not more important than accuracy, because in ICU patient care is simultaneously more urgent and more expensive. Clinical decisions and their timeliness, therefore, directly affect both the patient outcome and the cost of care. To make timely decisions, we argue the underlying serving system must be latency-aware. To compound the challenge, health analytic applications often require a combination of models instead of a single model, to better specialize individual models for different targets, multi-modal data, different prediction windows, and potentially personalized predictions. To address these challenges, we propose HOLMES---an online model ensemble serving framework for healthcare applications. HOLMES dynamically identifies the best performing set of models to ensemble for highest accuracy, while also satisfying sub-second latency constraints on end-to-end prediction. We demonstrate that HOLMES is able to navigate the accuracy/latency tradeoff efficiently, compose the ensemble, and serve the model ensemble pipeline, scaling to simultaneously streaming data from 100 patients, each producing waveform data at 250~Hz. HOLMES outperforms the conventional offline batch-processed inference for the same clinical task in terms of accuracy and latency (by order of magnitude). HOLMES is tested on risk prediction task on pediatric cardio ICU data with above 95% prediction accuracy and sub-second latency on 64-bed simulation.

【Keywords】:

162. LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values.

【Paper Link】【Pages】:1625-1635

【Authors】: Kejing Yin ; Ardavan Afshar ; Joyce C. Ho ; William K. Cheung ; Chao Zhang ; Jimeng Sun

【Abstract】: Binary data with one-class missing values are ubiquitous in real-world applications. They can be represented by irregular tensors with varying sizes in one dimension, where value one means presence of a feature while zero means unknown (i.e., either presence or absence of a feature). Learning accurate low-rank approximations from such binary irregular tensors is a challenging task. However, none of the existing models developed for factorizing irregular tensors take the missing values into account, and they assume Gaussian distributions, resulting in a distribution mismatch when applied to binary data. In this paper, we propose Logistic PARAFAC2 (LogPar) by modeling the binary irregular tensor with Bernoulli distribution parameterized by an underlying real-valued tensor. Then we approximate the underlying tensor with a positive-unlabeled learning loss function to account for the missing values. We also incorporate uniqueness and temporal smoothness regularization to enhance the interpretability. Extensive experiments using large-scale real-world datasets show that LogPar outperforms all baselines in both irregular tensor completion and downstream predictive tasks. For the irregular tensor completion, LogPar achieves up to 26% relative improvement compared to the best baseline. Besides, LogPar obtains relative improvement of 13.2% for heart failure prediction and 14% for mortality prediction on average compared to the state-of-the-art PARAFAC2 models.

【Keywords】: Applied computing; Life and medical sciences; Health informatics; Computing methodologies; Machine learning; Machine learning approaches; Factorization methods; Information systems; Information systems applications; Data mining

163. RECORD: Resource Constrained Semi-Supervised Learning under Distribution Shift.

【Paper Link】【Pages】:1636-1644

【Authors】: Lan-Zhe Guo ; Zhi Zhou ; Yu-Feng Li

【Abstract】: Semi-supervised learning (SSL) tries to improve performance with the use of massive unlabeled data, which typically works in an offline manner with two assumptions. i) Data distribution is static; ii) Data storage overhead is unlimited. In many online tasks, however, none of the above assumptions is valid. For example, in online image classification, a large amount of unlabeled images increases sharply, which makes it difficult to store them in full; meanwhile, the content of unlabeled images changes constantly, and it is no longer suitable to assume a fixed distribution. We call such a novel setting Resource Constrained SSL under Distribution Shift (or Record for short) and to our best knowledge, it has not been thoroughly studied yet. This paper presents a systemic solution Record consisting of three sub-steps, that is, distribution tracking, sample selection and model updating. Specifically, we propose an effective method to track the distribution changes and locate distribution shifted samples. A novel influence-based approach is used to select the most influential samples for the distribution change based on resource constraints. Finally, we free up memory to put the latest unlabeled data with its pseudo-label for the next distribution tracking. Extensive empirical results confirm the effectiveness of our scheme. In the case of diverse and unknown distribution shifts, our solution is consistently and clearly better than many baseline and SOTA methods along with the memory budget and in some cases it can even approximate the performance of oracle.

【Keywords】: Computing methodologies; Machine learning; Information systems; Information systems applications; Data mining

164. Statistically Significant Pattern Mining with Ordinal Utility.

【Paper Link】【Pages】:1645-1655

【Authors】: Thien Q. Tran ; Kazuto Fukuchi ; Youhei Akimoto ; Jun Sakuma

【Abstract】: Statistically significant patterns mining (SSPM) is an essential and challenging data mining task in the field of knowledge discovery in databases (KDD), in which each pattern is evaluated via a hypothesis test. Our study aims to introduce a preference relation into patterns and to discover the most preferred patterns under the constraint of statistical significance, which has never been considered in existing SSPM problems. We propose an iterative multiple testing procedure that can alternately reject a hypothesis and safely ignore the hypotheses that are less useful than the rejected hypothesis. One advantage of filtering out patterns with low utility is that it avoids consumption of the significance budget by rejection of useless (that is, uninteresting) patterns. This allows the significance budget to be focused on useful patterns, leading to more useful discoveries. We show that the proposed method can control the familywise error rate (FWER) under certain assumptions, that can be satisfied by a realistic problem class in SSPM. We also show that the proposed method always discovers a set of patterns that is at least equally or more useful than those discovered using the standard Tarone-Bonferroni method SSPM. Finally, we conducted several experiments with both synthetic and real-world data to evaluate the performance of our method. As a result, in the experiments with real-world datasets, the proposed method discovered a larger number of more useful patterns than the existing method for all five conducted tasks.

【Keywords】: Information systems; Information systems applications; Data mining; Association rules; Mathematics of computing; Probability and statistics; Probabilistic inference problems; Hypothesis testing and confidence interval computation

165. Certifiable Robustness of Graph Convolutional Networks under Structure Perturbations.

【Paper Link】【Pages】:1656-1665

【Authors】: Daniel Zügner ; Stephan Günnemann

【Abstract】: Recent works show that message-passing neural networks (MPNNs) can be fooled by adversarial attacks on both the node attributes and the graph structure. Since MPNNs are currently being rapidly adopted in real-world applications, it is thus crucial to improve their reliablility and robustness. While there has been progress on robustness certification of MPNNs under perturbation of the node attributes, no existing method can handle structural perturbations. These perturbations are especially challenging because they alter the message passing scheme itself. In this work we close this gap and propose the first method to certify robustness of Graph Convolutional Networks (GCNs) under perturbations of the graph structure. We show how this problem can be expressed as a jointly constrained bilinear program - a challenging, yet well-studied class of problems - and propose a novel branch-and-bound algorithm to obtain lower bounds on the global optimum. These lower bounds are significantly tighter and can certify up to twice as many nodes compared to a standard linear relaxation.

【Keywords】: Computing methodologies; Machine learning; Learning settings; Semi-supervised learning settings; Machine learning approaches; Neural networks

166. Understanding Negative Sampling in Graph Representation Learning.

【Paper Link】【Pages】:1666-1676

【Authors】: Zhen Yang ; Ming Ding ; Chang Zhou ; Hongxia Yang ; Jingren Zhou ; Jie Tang

【Abstract】: Graph representation learning has been extensively studied in recent years, in which sampling is a critical point. Prior arts usually focus on sampling positive node pairs, while the strategy for negative sampling is left insufficiently explored. To bridge the gap, we systematically analyze the role of negative sampling from the perspectives of both objective and risk, theoretically demonstrating that negative sampling is as important as positive sampling in determining the optimization objective and the resulted variance. To the best of our knowledge, we are the first to derive the theory and quantify that a nice negative sampling distribution is pn(u|v) ∝ pd(u|v)α, 0 < α < 1. With the guidance of the theory, we propose MCNS, approximating the positive distribution with self-contrast approximation and accelerating negative sampling by Metropolis-Hastings. We evaluate our method on 5 datasets that cover extensive downstream graph learning tasks, including link prediction, node classification and recommendation, on a total of 19 experimental settings. These relatively comprehensive experimental results demonstrate its robustness and superiorities.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Learning latent representations; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms

167. Aligning Superhuman AI with Human Behavior: Chess as a Model System.

【Paper Link】【Pages】:1677-1687

【Authors】: Reid McIlroy-Young ; Siddhartha Sen ; Jon M. Kleinberg ; Ashton Anderson

【Abstract】: As artificial intelligence becomes increasingly intelligent---in some cases, achieving superhuman performance---there is growing potential for humans to learn from and collaborate with algorithms. However, the ways in which AI systems approach problems are often different from the ways people do, and thus may be uninterpretable and hard to learn from. A crucial step in bridging this gap between human and artificial intelligence is modeling the granular actions that constitute human behavior, rather than simply matching aggregate human performance. We pursue this goal in a model system with a long history in artificial intelligence: chess. The aggregate performance of a chess player unfolds as they make decisions over the course of a game. The hundreds of millions of games played online by players at every skill level form a rich source of data in which these decisions, and their exact context, are recorded in minute detail. Applying existing chess engines to this data, including an open-source implementation of AlphaZero, we find that they do not predict human moves well. We develop and introduce Maia, a customized version of AlphaZero trained on human chess games, that predicts human moves at a much higher accuracy than existing engines, and can achieve maximum accuracy when predicting decisions made by players at a specific skill level in a tuneable way. For a dual task of predicting whether a human will make a large mistake on the next move, we develop a deep neural network that significantly outperforms competitive baselines. Taken together, our results suggest that there is substantial promise in designing artificial intelligence systems with human collaboration in mind by first accurately modeling granular human decision-making.

【Keywords】: Human-centered computing; Collaborative and social computing; Empirical studies in collaborative and social computing

168. Heidegger: Interpretable Temporal Causal Discovery.

【Paper Link】【Pages】:1688-1696

【Authors】: Mehrdad Mansouri ; Ali Arab ; Zahra Zohrevand ; Martin Ester

【Abstract】: Temporal causal discovery aims to find cause-effect relationships between time-series. However, none of the existing techniques is able to identify the causal profile, the temporal pattern that the causal variable needs to follow in order to trigger the most significant change in the outcome. Toward a new horizon, this study introduces the novel problem of Causal Profile Discovery, which is crucial for many applications such as adverse drug reaction and cyber-attack detection. This work correspondingly proposes Heidegger to discover causal profiles, comprised of a flexible randomized block design for hypothesis evaluation and an efficient profile search via on-the-fly graph construction and entropy-based pruning. Heidegger's performance is demonstrated/evaluated extensively on both synthetic and real-world data. The experimental results show the proposed method is robust to noise and flexible at detecting complex patterns.

【Keywords】: Applied computing; Life and medical sciences; Health informatics; Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Causal reasoning and diagnostics; Mathematics of computing; Probability and statistics; Statistical paradigms; Time series analysis

169. Interpretable Deep Graph Generation with Node-edge Co-disentanglement.

【Paper Link】【Pages】:1697-1707

【Authors】: Xiaojie Guo ; Liang Zhao ; Zhao Qin ; Lingfei Wu ; Amarda Shehu ; Yanfang Ye

【Abstract】: Disentangled representation learning has recently attracted a significant amount of attention, particularly in the field of image representation learning. However, learning the disentangled representations behind a graph remains largely unexplored, especially for the attributed graph with both node and edge features. Disentanglement learning for graph generation has substantial new challenges including 1) the lack of graph deconvolution operations to jointly decode node and edge attributes; and 2) the difficulty in enforcing the disentanglement among latent factors that respectively influence: i) only nodes, ii) only edges, and iii) joint patterns between them. To address these challenges, we propose a new disentanglement enhancement framework for deep generative models for attributed graphs. In particular, a novel variational objective is proposed to disentangle the above three types of latent factors, with novel architecture for node and edge deconvolutions. Qualitative and quantitative experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed model and its extensions.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Machine learning approaches; Bio-inspired approaches; Generative and developmental approaches; Neural networks; Information systems; Information systems applications; Data mining; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms; Networks; Network properties; Network structure; Topology analysis and generation

170. Minimizing Localized Ratio Cut Objectives in Hypergraphs.

【Paper Link】【Pages】:1708-1718

【Authors】: Nate Veldt ; Austin R. Benson ; Jon M. Kleinberg

【Abstract】: Hypergraphs are a useful abstraction for modeling multiway relationships in data, and hypergraph clustering is the task of detecting groups of closely related nodes in such data.Graph clustering has been studied extensively, and there are numerous methods for detecting small, localized clusters without having to explore an entire input graph. However, there are only a few specialized approaches for localized clustering in hypergraphs. Here we present a framework for local hypergraph clustering based on minimizing localized ratio cut objectives. Our framework takes an input set of reference nodes in a hypergraph and solves a sequence of hypergraph minimum s-t cut problems in order to identify a nearby well-connected cluster of nodes that overlaps substantially with the input set.

【Keywords】: Mathematics of computing; Discrete mathematics; Combinatorics; Combinatorial optimization; Graph theory; Graph algorithms; Hypergraphs; Theory of computation; Design and analysis of algorithms; Graph algorithms analysis; Network flows; Mathematical optimization; Discrete optimization

171. RECIPTOR: An Effective Pretrained Model for Recipe Representation Learning.

【Paper Link】【Pages】:1719-1727

【Authors】: Diya Li ; Mohammed J. Zaki

【Abstract】: Recipe representation plays an important role in food computing for perception, recognition, recommendation and other applications. Learning pretrained recipe embeddings is a challenging task, as there is a lack of high quality annotated food datasets. In this paper, we provide a joint approach for learning effective pretrained recipe embeddings using both the ingredients and cooking instructions. We present RECIPTOR, a novel set transformer-based joint model to learn recipe representations, that preserves permutation-invariance for the ingredient set and uses a novel knowledge graph (KG) derived triplet sampling approach to optimize the learned embeddings so that related recipes are closer in the latent semantic space. The embeddings are further jointly optimized by combining similarity among cooking instructions with a KG based triplet loss. We experimentally show that RECIPTOR's recipe embeddings outperform state-of-the-art baselines on two newly designed downstream classification tasks by a wide margin.

【Keywords】: Computing methodologies; Artificial intelligence; Natural language processing; Information systems; Information systems applications; Data mining

172. Hyperbolic Distance Matrices.

【Paper Link】【Pages】:1728-1738

【Authors】: Puoya Tabaghi ; Ivan Dokmanic

【Abstract】: Hyperbolic space is a natural setting for mining and visualizing data with hierarchical structure. In order to compute a hyperbolic embedding from comparison or similarity information, one has to solve a hyperbolic distance geometry problem. In this paper, we propose a unified framework to compute hyperbolic embeddings from an arbitrary mix of noisy metric and non-metric data. Our algorithms are based on semidefinite programming and the notion of a hyperbolic distance matrix, in many ways parallel to its famous Euclidean counterpart. A central ingredient we put forward is a semidefinite characterization of the hyperbolic Gramian---a matrix of Lorentzian inner products. This characterization allows us to formulate a semidefinite relaxation to efficiently compute hyperbolic embeddings in two stages: first, we complete and denoise the observed hyperbolic distance matrix; second, we propose a spectral factorization method to estimate the embedded points from the hyperbolic distance matrix. We show through numerical experiments how the flexibility to mix metric and non-metric constraints allows us to efficiently compute embeddings from arbitrary data.

【Keywords】: Computing methodologies; Machine learning; Human-centered computing; Visualization; Visualization techniques; Hyperbolic trees; Networks

173. RayS: A Ray Searching Method for Hard-label Adversarial Attack.

【Paper Link】【Pages】:1739-1747

【Authors】: Jinghui Chen ; Quanquan Gu

【Abstract】: Deep neural networks are vulnerable to adversarial attacks. Among different attack settings, the most challenging yet the most practical one is the hard-label setting where the attacker only has access to the hard-label output (prediction label) of the target model. Previous attempts are neither effective enough in terms of attack success rate nor efficient enough in terms of query complexity under the widely used $L_\infty$ norm threat model. In this paper, we present the Ray Searching attack (RayS), which greatly improves the hard-label attack effectiveness as well as efficiency. Unlike previous works, we reformulate the continuous problem of finding the closest decision boundary into a discrete problem that does not require any zeroth-order gradient estimation. In the meantime, all unnecessary searches are eliminated via a fast check step. This significantly reduces the number of queries needed for our hard-label attack. Moreover, interestingly, we found that the proposed RayS attack can also be used as a sanity check for possible "falsely robust" models. On several recently proposed defenses that claim to achieve the state-of-the-art robust accuracy, our attack method demonstrates that the current white-box/black-box attacks could still give a false sense of security and the robust accuracy drop between the most popular PGD attack and RayS attack could be as large as 28%. We believe that our proposed RayS attack could help identify falsely robust models that beat most white-box/black-box attacks.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision problems; Object recognition; Search methodologies; Discrete space search

174. On Sampled Metrics for Item Recommendation.

【Paper Link】【Pages】:1748-1757

【Authors】: Walid Krichene ; Steffen Rendle

【Abstract】: The task of item recommendation requires ranking a large catalogue of items given a context. Item recommendation algorithms are evaluated using ranking metrics that depend on the positions of relevant items. To speed up the computation of metrics, recent work often uses sampled metrics where only a smaller set of random items and the relevant items are ranked. This paper investigates sampled metrics in more detail and shows that they are inconsistent with their exact version, in the sense that they do not persist relative statements, e.g., recommender A is better than B, not even in expectation. Moreover, the smaller the sampling size, the less difference there is between metrics, and for very small sampling size, all metrics collapse to the AUC metric. We show that it is possible to improve the quality of the sampled metrics by applying a correction, obtained by minimizing different criteria such as bias or mean squared error. We conclude with an empirical evaluation of the naive sampled metrics and their corrected variants. To summarize, our work suggests that sampling should be avoided for metric calculation, however if an experimental study needs to sample, the proposed corrections can improve the quality of the estimate.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Ranking; Information systems; Information retrieval; Evaluation of retrieval results; Retrieval tasks and goals; Recommender systems

175. ALO-NMF: Accelerated Locality-Optimized Non-negative Matrix Factorization.

【Paper Link】【Pages】:1758-1767

【Authors】: Gordon E. Moon ; J. Austin Ellis ; Aravind Sukumaran-Rajam ; Srinivasan Parthasarathy ; P. Sadayappan

【Abstract】: Non-negative Matrix Factorization (NMF) is a key kernel for unsupervised dimension reduction used in a wide range of applications, including graph mining, recommender systems and natural language processing. Due to the compute-intensive nature of applications that must perform repeated NMF, several parallel implementations have been developed. However, existing parallel NMF algorithms have not addressed data locality optimizations, which are critical for high performance since data movement costs greatly exceed the cost of arithmetic/logic operations on current computer systems. In this paper, we present a novel optimization method for parallel NMF algorithm based on the HALS (Hierarchical Alternating Least Squares) scheme that incorporates algorithmic transformations to enhance data locality. Efficient realizations of the algorithm on multi-core CPUs and GPUs are developed, demonstrating a new Accelerated Locality-Optimized NMF (ALO-NMF) that obtains up to 2.29x lower data movement cost and up to 4.45x speedup over existing state-of-the-art parallel NMF algorithms.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Factorization methods; Non-negative matrix factorization; Parallel computing methodologies; Parallel algorithms; Shared memory algorithms

176. Multi-Source Deep Domain Adaptation with Weak Supervision for Time-Series Sensor Data.

【Paper Link】【Pages】:1768-1778

【Authors】: Garrett Wilson ; Janardhan Rao Doppa ; Diane J. Cook

【Abstract】: Domain adaptation (DA) offers a valuable means to reuse data and models for new problem domains. However, robust techniques have not yet been considered for time series data with varying amounts of data availability. In this paper, we make three main contributions to fill this gap. First, we propose a novel Convolutional deep Domain Adaptation model for Time Series data (CoDATS) that significantly improves accuracy and training time over state-of-the-art DA strategies on real-world sensor data benchmarks. By utilizing data from multiple source domains, we increase the usefulness of CoDATS to further improve accuracy over prior single-source methods, particularly on complex time series datasets that have high variability between domains. Second, we propose a novel Domain Adaptation with Weak Supervision (DA-WS) method by utilizing weak supervision in the form of target-domain label distributions, which may be easier to collect than additional data labels. Third, we perform comprehensive experiments on diverse real-world datasets to evaluate the effectiveness of our domain adaptation and weak supervision methods. Results show that CoDATS for single-source DA significantly improves over the state-of-the-art methods, and we achieve additional improvements in accuracy using data from multiple source domains and weakly supervised signals.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Multi-task learning; Transfer learning; Reinforcement learning; Adversarial learning; Unsupervised learning; Machine learning approaches; Neural networks; Mathematics of computing; Probability and statistics; Statistical paradigms; Time series analysis

177. Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions.

【Paper Link】【Pages】:1779-1788

【Authors】: James McInerney ; Brian Brost ; Praveen Chandar ; Rishabh Mehrotra ; Benjamin A. Carterette

【Abstract】: Users of music streaming, video streaming, news recommendation, and e-commerce services often engage with content in a sequential manner. Providing and evaluating good sequences of recommendations is therefore a central problem for these services. Prior reweighting-based counterfactual evaluation methods either suffer from high variance or make strong independence assumptions about rewards. We propose a new counterfactual estimator that allows for sequential interactions in the rewards with lower variance in an asymptotically unbiased manner. Our method uses graphical assumptions about the causal relationships of the slate to reweight the rewards in the logging policy in a way that approximates the expected sum of rewards under the target policy. Extensive experiments in simulation and on a live recommender system show that our approach outperforms existing methods in terms of bias and data efficiency for the sequential track recommendations problem.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Learning to rank; Information systems; Information retrieval; Evaluation of retrieval results; Retrieval effectiveness; Retrieval models and ranking; Learning to rank; Retrieval tasks and goals; Recommender systems; Mathematics of computing; Probability and statistics; Probabilistic representations; Causal networks; Theory of computation; Design and analysis of algorithms; Online algorithms; Online learning algorithms

178. TAdaNet: Task-Adaptive Network for Graph-Enriched Meta-Learning.

【Paper Link】【Pages】:1789-1799

【Authors】: Qiuling Suo ; Jingyuan Chou ; Weida Zhong ; Aidong Zhang

【Abstract】: Annotated data samples in real-world applications are often limited. Meta-learning, which utilizes prior knowledge learned from related tasks and generalizes to new tasks of limited supervised experience, is an effective approach for few-shot learning. However, standard meta-learning with globally shared knowledge cannot handle the task heterogeneity problem well, i.e., tasks lie in different distributions. Recent advances have explored several ways to trigger task-dependent initial parameters or metrics, in order to customize task-specific information. These approaches learn task contextual information from data, but ignore external domain knowledge that can help in the learning process. In this paper, we propose a task-adaptive network (TAdaNet) that makes use of a domain-knowledge graph to enrich data representations and provide task-specific customization. Specifically, we learn a task embedding that characterizes task relationships and tailors task-specific parameters, resulting in a task-adaptive metric space for classification. Experimental results on a few-shot image classification problem show the effectiveness of the proposed method. We also apply it on a real-world disease classification problem, and show promising results for clinical decision support.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information systems applications; Data mining

179. Unsupervised Paraphrasing via Deep Reinforcement Learning.

【Paper Link】【Pages】:1800-1809

【Authors】: A. B. Siddique ; Samet Oymak ; Vagelis Hristidis

【Abstract】: Paraphrasing is expressing the meaning of an input sentence in different wording while maintaining fluency (i.e., grammatical and syntactical correctness). Most existing work on paraphrasing use supervised models that are limited to specific domains (e.g., image captions). Such models can neither be straightforwardly transferred to other domains nor generalize well, and creating labeled training data for new domains is expensive and laborious. The need for paraphrasing across different domains and the scarcity of labeled training data in many such domains call for exploring unsupervised paraphrase generation methods. We propose Progressive Unsupervised Paraphrasing (PUP): a novel unsupervised paraphrase generation method based on deep reinforcement learning (DRL). PUP uses a variational autoencoder (trained using a non-parallel corpus) to generate a seed paraphrase that warm-starts the DRL model. Then, PUP progressively tunes the seed paraphrase guided by our novel reward function which combines semantic adequacy, language fluency, and expression diversity measures to quantify the quality of the generated paraphrases in each iteration without needing parallel sentences. Our extensive experimental evaluation shows that PUP outperforms unsupervised state-of-the-art paraphrasing techniques in terms of both automatic metrics and user studies on four real datasets. We also show that PUP outperforms domain-adapted supervised algorithms on several datasets. Our evaluation also shows that PUP achieves a great trade-off between semantic similarity and diversity of expression.

【Keywords】: Computing methodologies; Artificial intelligence; Natural language processing; Natural language generation; Search methodologies; Discrete space search; Machine learning; Learning paradigms; Reinforcement learning; Unsupervised learning

180. CICLAD: A Fast and Memory-efficient Closed Itemset Miner for Streams.

【Paper Link】【Pages】:1810-1818

【Authors】: Tomas Martin ; Guy Francoeur ; Petko Valtchev

【Abstract】: Mining association rules from data streams is a challenging task due to the (typically) limited resources available vs. the large size of the result. Frequent closed itemsets (FCI) enable an efficient first step, yet current FCI stream miners are not optimal on resource consumption, e.g. they store a large number of extra itemsets at an additional cost. In a search for a better storage-efficiency trade-off, we designed Ciclad, an intersection-based sliding-window FCI miner. Leveraging in-depth insights into FCI evolution, it combines minimal storage with quick access. Experimental results indicate Ciclad's memory imprint is much lower and its performances globally better than competitor methods.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Information systems; Information systems applications; Data mining; Association rules; Data stream mining; Decision support systems; Data analytics; Mathematics of computing; Discrete mathematics; Combinatorics; Combinatorial algorithms

181. Graph Attention Networks over Edge Content-Based Channels.

【Paper Link】【Pages】:1819-1827

【Authors】: Lu Lin ; Hongning Wang

【Abstract】: Edges play a crucial role in passing information on a graph, especially when they carry textual content reflecting semantics behind how nodes are linked and interacting with each other. In this paper, we propose a channel-aware attention mechanism enabled by edge text content when aggregating information from neighboring nodes; and we realize this mechanism in a graph autoencoder framework. Edge text content is encoded as low-dimensional mixtures of latent topics, which serve as semantic channels for topic-level information passing on edges. We embed nodes and topics in the same latent space to capture their mutual dependency when decoding the structural and textual information on graph. We evaluated the proposed model on Yelp user-item bipartite graph and StackOverflow user-user interaction graph. The proposed model outperformed a set of baselines on link prediction and content prediction tasks. Qualitative evaluations also demonstrated the descriptive power of the learnt node embeddings, showing its potential as an interpretable representation of graphs.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Factorization methods; Latent Dirichlet allocation; Learning in probabilistic graphical models; Latent variable models; Learning latent representations; Neural networks

182. Multimodal Learning with Incomplete Modalities by Knowledge Distillation.

【Paper Link】【Pages】:1828-1838

【Authors】: Qi Wang ; Liang Zhan ; Paul M. Thompson ; Jiayu Zhou

【Abstract】: Multimodal learning aims at utilizing information from a variety of data modalities to improve the generalization performance. One common approach is to seek the common information that is shared among different modalities for learning, whereas we can also fuse the supplementary information to leverage modality-specific information. Though the supplementary information is often desired, most existing multimodal approaches can only learn from samples with complete modalities, which wastes a considerable amount of data collected. Otherwise, model-based imputation needs to be used to complete the missing values and yet may introduce undesired noise, especially when the sample size is limited. In this paper, we proposed a framework based on knowledge distillation, utilizing the supplementary information from all modalities, and avoiding imputation and noise associated with it. Specifically, we first train models on each modality independently using all the available data. Then the trained models are used as teachers to teach the student model, which is trained with the samples having complete modalities. We demonstrate the effectiveness of the proposed method in extensive empirical studies on both synthetic datasets and real-world datasets.

【Keywords】: Computing methodologies; Machine learning; Information systems; Information systems applications; Data mining

183. Estimating the Percolation Centrality of Large Networks through Pseudo-dimension Theory.

【Paper Link】【Pages】:1839-1847

【Authors】: Alane M. de Lima ; Murilo V. G. da Silva ; André Luís Vignatti

【Abstract】: In this work we investigate the problem of estimating the percolation centrality of every vertex in a graph. This centrality measure quantifies the importance of each vertex in a graph going through a contagious process. It is an open problem whether the percolation centrality can be computed in O(n3-c) time, for any constant c>0. In this paper we present a ~O(m) randomized approximation algorithm for the percolation centrality for every vertex of G, generalizing techniques developed by Riondato, Upfal and Kornaropoulos. The estimation obtained by the algorithm is within ε of the exact value with probability 1- δ, for fixed constants 0 < ε,δ < 1. In fact, we show in our experimental analysis that in the case of real-world complex networks, the output produced by our algorithm is significantly closer to the exact values than its guarantee in terms of theoretical worst case analysis.

【Keywords】: Theory of computation; Design and analysis of algorithms; Approximation algorithms analysis; Graph algorithms analysis; Shortest paths; Theory and algorithms for application domains; Machine learning theory; Sample complexity and generalization bounds

184. TinyGNN: Learning Efficient Graph Neural Networks.

【Paper Link】【Pages】:1848-1856

【Authors】: Bencheng Yan ; Chaokun Wang ; Gaoyang Guo ; Yunkai Lou

【Abstract】: Recently, Graph Neural Networks (GNNs) arouse a lot of research interest and achieve great success in dealing with graph-based data. The basic idea of GNNs is to aggregate neighbor information iteratively. After k iterations, a k-layer GNN can capture nodes' k-hop local structure. In this way, a deeper GNN can access much more neighbor information leading to better performance. However, when a GNN goes deeper, the exponential expansion of neighborhoods incurs expensive computations in batched training and inference. This takes the deeper GNN away from many applications, e.g., real-time systems. In this paper, we try to learn a small GNN (called TinyGNN), which can achieve high performance and infer the node representation in a short time. However, since a small GNN cannot explore as much local structure as a deeper GNN does, there exists a neighbor information gap between the deeper GNN and the small GNN. To address this problem, we leverage peer node information to model the local structure explicitly and adopt a neighbor distillation strategy to learn local structure knowledge from a deeper GNN implicitly. Extensive experimental results demonstrate that TinyGNN is empirically effective and achieves similar or even better performance compared with the deeper GNNs. Meanwhile, TinyGNN gains a 7.73x--126.59x speed-up on inference over all data sets.

【Keywords】: Human-centered computing; Collaborative and social computing; Collaborative and social computing design and evaluation methods; Social network analysis; Collaborative and social computing theory, concepts and paradigms; Social networks; Information systems; Information systems applications; Data mining; Decision support systems; Data analytics

185. GPT-GNN: Generative Pre-Training of Graph Neural Networks.

【Paper Link】【Pages】:1857-1867

【Authors】: Ziniu Hu ; Yuxiao Dong ; Kuansan Wang ; Kai-Wei Chang ; Yizhou Sun

【Abstract】: Graph neural networks (GNNs) have been demonstrated to be powerful in modeling graph-structured data. However, training GNNs requires abundant task-specific labeled data, which is often arduously expensive to obtain. One effective way to reduce the labeling effort is to pre-train an expressive GNN model on unlabelled data with self-supervision and then transfer the learned model to downstream tasks with only a few labels. In this paper, we present the GPT-GNN framework to initialize GNNs by generative pre-training. GPT-GNN introduces a self-supervised attributed graph generation task to pre-train a GNN so that it can capture the structural and semantic properties of the graph. We factorize the likelihood of graph generation into two components: 1) attribute generation and 2) edge generation. By modeling both components, GPT-GNN captures the inherent dependency between node attributes and graph structure during the generative process. Comprehensive experiments on the billion-scale open academic graph and Amazon recommendation data demonstrate that GPT-GNN significantly outperforms state-of-the-art GNN models without pre-training by up to 9.1% across various downstream tasks?

【Keywords】: Computer systems organization; Architectures; Other architectures; Neural networks; Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Machine learning approaches; Learning latent representations

186. Parameterized Correlation Clustering in Hypergraphs and Bipartite Graphs.

【Paper Link】【Pages】:1868-1876

【Authors】: Nate Veldt ; Anthony Wirth ; David F. Gleich

【Abstract】: Motivated by applications in community detection and dense subgraph discovery, we consider new clustering objectives in hypergraphs and bipartite graphs. These objectives are parameterized by one or more resolution parameters in order to enable diverse knowledge discovery in complex data.

【Keywords】: Mathematics of computing; Discrete mathematics; Graph theory; Approximation algorithms; Hypergraphs; Theory of computation; Design and analysis of algorithms

187. Prioritized Restreaming Algorithms for Balanced Graph Partitioning.

【Paper Link】【Pages】:1877-1887

【Authors】: Amel Awadelkarim ; Johan Ugander

【Abstract】: Balanced graph partitioning is a critical step for many large-scale distributed computations with relational data. As graph datasets have grown in size and density, a range of highly-scalable balanced partitioning algorithms have appeared to meet varied demands across different domains. As the starting point for the present work, we observe that two recently introduced families of iterative partitioners---those based on restreaming and those based on balanced label propagation (including Facebook's Social Hash Partitioner)---can be viewed through a common modular framework of design decisions. With the help of this modular perspective, we find that a key combination of design decisions leads to a novel family of algorithms with notably better empirical performance than any existing highly-scalable algorithm on a broad range of real-world graphs. The resulting prioritized restreaming algorithms employ a constraint management strategy based on multiplicative weights, borrowed from the restreaming literature, while adopting notions of priority from balanced label propagation to optimize the ordering of the streaming process. Our experimental results consider a range of stream orders, where a dynamic ordering based on what we call ambivalence is broadly the most performative in terms of the cut quality of the resulting balanced partitions, with a static ordering based on degree being nearly as good.

【Keywords】: Information systems; Information storage systems; Storage architectures; Distributed storage; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms; Theory of computation; Design and analysis of algorithms; Graph algorithms analysis

188. A Non-Iterative Quantile Change Detection Method in Mixture Model with Heavy-Tailed Components.

【Paper Link】【Pages】:1888-1898

【Authors】: Yuantong Li ; Qi Ma ; Sujit K. Ghosh

【Abstract】: Estimating parameters of mixture model has wide applications ranging from classification problems to estimating of complex distributions. Most of the current literature on estimating the parameters of the mixture densities are based on iterative Expectation Maximization (EM) type algorithms which require the use of either taking expectations over the latent label variables or generating samples from the conditional distribution of such latent labels using the Bayes rule. Moreover, when the number of components is unknown, the problem becomes computationally more demanding due to well-known label switching issues [28]. In this paper, we propose a robust and quick approach based on change-point methods to determine the number of mixture components that works for almost any location-scale families even when the components are heavy tailed (e.g., Cauchy). We present several numerical illustrations by comparing our method with some of popular methods available in the literature using simulated data and real case studies. The proposed method is shown be as much as 500 times faster than some of the competing methods and are also shown to be more accurate in estimating the mixture distributions by goodness-of-fit tests.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Mixture modeling

189. AdvMind: Inferring Adversary Intent of Black-Box Attacks.

【Paper Link】【Pages】:1899-1907

【Authors】: Ren Pang ; Xinyang Zhang ; Shouling Ji ; Xiapu Luo ; Ting Wang

【Abstract】: Deep neural networks (DNNs) are inherently susceptible to adversarial attacks even under black-box settings, in which the adversary only has query access to the target models. In practice, while it may be possible to effectively detect such attacks (e.g., observing massive similar but non-identical queries), it is often challenging to exactly infer the adversary intent (e.g., the target class of the adversarial example the adversary attempts to craft) especially during early stages of the attacks, which is crucial for performing effective deterrence and remediation of the threats in many scenarios.

【Keywords】: Computing methodologies; Machine learning; Security and privacy

190. Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding.

【Paper Link】【Pages】:1908-1917

【Authors】: Yu Meng ; Yunyi Zhang ; Jiaxin Huang ; Yu Zhang ; Chao Zhang ; Jiawei Han

【Abstract】: Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora. To account for potential hierarchical topic structures, hierarchical topic models generalize flat topic models by incorporating latent topic hierarchies into their generative modeling process. However, due to their purely unsupervised nature, the learned topic hierarchy often deviates from users' particular needs or interests. To guide the hierarchical topic discovery process with minimal user supervision, we propose a new task, Hierarchical Topic Mining, which takes a category tree described by category names only, and aims to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics. We develop a novel joint tree and text embedding method along with a principled optimization procedure that allows simultaneous modeling of the category tree structure and the corpus generative process in the spherical space for effective category-representative term discovery. Our comprehensive experiments show that our model, named JoSH, mines a high-quality set of hierarchical topics with high efficiency and benefits weakly-supervised hierarchical text classification tasks.

【Keywords】: Information systems; Information systems applications; Data mining

191. Combinatorial Black-Box Optimization with Expert Advice.

【Paper Link】【Pages】:1918-1927

【Authors】: Hamid Dadkhahi ; Karthikeyan Shanmugam ; Jesus Rios ; Payel Das ; Samuel C. Hoffman ; Troy David Loeffler ; Subramanian Sankaranarayanan

【Abstract】: We consider the problem of black-box function optimization over the Boolean hypercube. Despite the vast literature on black-box function optimization over continuous domains, not much attention has been paid to learning models for optimization over combinatorial domains until recently. However, the computational complexity of the recently devised algorithms are prohibitive even for moderate numbers of variables; drawing one sample using the existing algorithms is more expensive than a function evaluation for many black-box functions of interest. To address this problem, we propose a computationally efficient model learning algorithm based on multilinear polynomials and exponential weight updates. In the proposed algorithm, we alternate between simulated annealing with respect to the current polynomial representation and updating the weights using monomial experts' advice. Numerical experiments on various datasets in both unconstrained and sum-constrained Boolean optimization indicate the competitive performance of the proposed algorithm, while improving the computational time up to several orders of magnitude compared to state-of-the-art algorithms in the literature.

192. CoRel: Seed-Guided Topical Taxonomy Construction by Concept Learning and Relation Transferring.

【Paper Link】【Pages】:1928-1936

【Authors】: Jiaxin Huang ; Yiqing Xie ; Yu Meng ; Yunyi Zhang ; Jiawei Han

【Abstract】: Taxonomy is not only a fundamental form of knowledge representation, but also crucial to vast knowledge-rich applications, such as question answering and web search. Most existing taxonomy construction methods extract hypernym-hyponym entity pairs to organize a "universal" taxonomy. However, these generic taxonomies cannot satisfy user's specific interest in certain areas and relations. Moreover, the nature of instance taxonomy treats each node as a single word, which has low semantic coverage for people to fully understand. In this paper, we propose a method for seed-guided topical taxonomy construction, which takes a corpus and a seed taxonomy described by concept names as input, and constructs a more complete taxonomy based on user's interest, wherein each node is represented by a cluster of coherent terms. Our framework, CoRel, has two modules to fulfill this goal. A relation transferring module learns and transfers the user's interested relation along multiple paths to expand the seed taxonomy structure in width and depth. A concept learning module enriches the semantics of each concept node by jointly embedding the taxonomy and text. Comprehensive experiments conducted on real-world datasets show that CoRel generates high-quality topical taxonomies and outperforms all the baselines significantly.

【Keywords】: Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Ontology engineering; Natural language processing; Information extraction; Information systems; Information retrieval; Retrieval tasks and goals; Clustering and classification; Information systems applications; Data mining

193. Treatment Policy Learning in Multiobjective Settings with Fully Observed Outcomes.

【Paper Link】【Pages】:1937-1947

【Authors】: Soorajnath Boominathan ; Michael Oberst ; Helen Zhou ; Sanjat Kanjilal ; David A. Sontag

【Abstract】: In several medical decision-making problems, such as antibiotic prescription, laboratory testing can provide precise indications for how a patient will respond to different treatment options. This enables us to "fully observe" all potential treatment outcomes, but while present in historical data, these results are infeasible to produce in real-time at the point of the initial treatment decision. Moreover, treatment policies in these settings often need to trade off between multiple competing objectives, such as effectiveness of treatment and harmful side effects. We present, compare, and evaluate three approaches for learning individualized treatment policies in this setting: First, we consider two indirect approaches, which use predictive models of treatment response to construct policies optimal for different trade-offs between objectives. Second, we consider a direct approach that constructs such a set of policies without intermediate models of outcomes. Using a medical dataset of Urinary Tract Infection (UTI) patients, we show that all approaches learn policies that achieve strictly better performance on all outcomes than clinicians, while also trading off between different objectives. We demonstrate additional benefits of the direct approach, including flexibly incorporating other goals such as deferral to physicians on simple cases.

【Keywords】: Applied computing; Life and medical sciences; Health care information systems; Computing methodologies; Machine learning; Learning paradigms; Supervised learning

194. List-wise Fairness Criterion for Point Processes.

【Paper Link】【Pages】:1948-1958

【Authors】: Jin Shang ; Mingxuan Sun ; Nina S.-N. Lam

【Abstract】: Many types of event sequence data exhibit triggering and clustering properties in space and time. Point processes are widely used in modeling such event data with applications such as predictive policing and disaster event forecasting. Although current algorithms can achieve significant event prediction accuracy, the historic data or the self-excitation property can introduce biased prediction. For example, hotspots ranked by event hazard rates can make the visibility of a disadvantaged group (e.g., racial minorities or the communities of lower social economic status) more apparent. Existing methods have explored ways to achieve parity between the groups by penalizing the objective function with several group fairness metrics. However, these metrics fail to measure the fairness on every prefix of the ranking. In this paper, we propose a novel list-wise fairness criterion for point processes, which can efficiently evaluate the ranking fairness in event prediction. We also present a strict definition of the unfairness consistency property of a fairness metric and prove that our list-wise fairness criterion satisfies this property. Experiments on several real-world spatial-temporal sequence datasets demonstrate the effectiveness of our list-wise fairness criterion.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Learning to rank; Information systems; Data management systems; Database design and models; Data model extensions; Temporal data; Information systems applications; Spatial-temporal systems; Geographic information systems

195. Neural Subgraph Isomorphism Counting.

【Paper Link】【Pages】:1959-1969

【Authors】: Xin Liu ; Haojie Pan ; Mutian He ; Yangqiu Song ; Xin Jiang ; Lifeng Shang

【Abstract】: In this paper, we study a new graph learning problem: learning to count subgraph isomorphisms. Different from other traditional graph learning problems such as node classification and link prediction, subgraph isomorphism counting is NP-complete and requires more global inference to oversee the whole graph. To make it scalable for large-scale graphs and patterns, we propose a learning framework that augments different representation learning architectures and iteratively attends pattern and target data graphs to memorize intermediate states of subgraph isomorphism searching for global counting. We develop both small graphs (<= 1,024 subgraph isomorphisms in each) and large graphs (<= 4,096 subgraph isomorphisms in each) sets to evaluate different representation and interaction modules. A mutagenic compound dataset, MUTAG, is also used to evaluate neural models and demonstrate the success of transfer learning. While the learning based approach is inexact, we are able to generalize to count large patterns and data graphs in linear time compared to the exponential time of the original NP-complete problem. Experimental results show that learning based subgraph isomorphism counting can speed up the traditional algorithm, VF2, 10-1,000 times with acceptable errors. Domain adaptation based on fine-tuning also shows the usefulness of our approach in real-world applications.

【Keywords】: Information systems; Data management systems; Information integration; Theory of computation; Design and analysis of algorithms; Data structures design and analysis; Pattern matching

196. Hypergraph Clustering Based on PageRank.

【Paper Link】【Pages】:1970-1978

【Authors】: Yuuki Takai ; Atsushi Miyauchi ; Masahiro Ikeda ; Yuichi Yoshida

【Abstract】: A hypergraph is a useful combinatorial object to model ternary or higher-order relations among entities. Clustering hypergraphs is a fundamental task in network analysis. In this study, we develop two clustering algorithms based on personalized PageRank on hypergraphs. The first one is local in the sense that its goal is to find a tightly connected vertex set with a bounded volume including a specified vertex. The second one is global in the sense that its goal is to find a tightly connected vertex set. For both algorithms, we discuss theoretical guarantees on the conductance of the output vertex set. Also, we experimentally demonstrate that our clustering algorithms outperform existing methods in terms of both the solution quality and running time. To the best of our knowledge, ours are the first practical algorithms for hypergraphs with theoretical guarantees on the conductance of the output set.

【Keywords】: Information systems; World Wide Web; Web mining; Mathematics of computing; Discrete mathematics; Graph theory; Spectra of graphs

197. DeepSinger: Singing Voice Synthesis with Data Mined From the Web.

【Paper Link】【Pages】:1979-1989

【Authors】: Yi Ren ; Xu Tan ; Tao Qin ; Jian Luan ; Zhou Zhao ; Tie-Yan Liu

【Abstract】: In this paper, we develop DeepSinger, a multi-lingual multi-singer singing voice synthesis (SVS) system, which is built from scratch using singing training data mined from music websites. The pipeline of DeepSinger consists of several steps, including data crawling, singing and accompaniment separation, lyrics-to-singing alignment, data filtration, and singing modeling. Specifically, we design a lyrics-to-singing alignment model to automatically extract the duration of each phoneme in lyrics starting from coarse-grained sentence level to fine-grained phoneme level, and further design a multi-lingual multi-singer singing model based on a feed-forward Transformer to directly generate linear-spectrograms from lyrics, and synthesize voices using Griffn-Lim. DeepSinger has several advantages over previous SVS systems: 1) to the best of our knowledge, it is the first SVS system that directly mines training data from music websites, 2) the lyrics-to-singing alignment model further avoids any human efforts for alignment labeling and greatly reduces labeling cost, 3) the singing model based on a feed-forward Transformer is simple and efficient, by removing the complicated acoustic feature modeling in parametric synthesis and leveraging a reference encoder to capture the timbre of a singer from noisy singing data, and 4) it can synthesize singing voices in multiple languages and multiple singers. We evaluate DeepSinger on our mined singing dataset that consists of about 92 hours data from 89 singers on three languages (Chinese, Cantonese and English). The results demonstrate that with the singing data purely mined from the Web, DeepSinger can synthesize high-quality singing voices in terms of both pitch accuracy and voice naturalness. Our audio samples are shown in https://speechresearch.github.io/deepsinger/.

【Keywords】: Applied computing; Arts and humanities; Sound and music computing; Computing methodologies; Artificial intelligence; Natural language processing

【Paper Link】【Pages】:1990-1998

【Authors】: Jan Overgoor ; George Pakapol Supaniratisai ; Johan Ugander

【Abstract】: Many prediction problems on social networks, from recommendations to anomaly detection, can be approached by modeling network data as a sequence of relational events and then leveraging the resulting model for prediction. Conditional logit models of discrete choice are a natural approach to modeling relational events as "choices'' in a framework that envelops and extends many long-studied models of network formation. The conditional logit model is simplistic, but it is particularly attractive because it allows for efficient consistent likelihood maximization via negative sampling, something that isn't true for mixed logit and many other richer models. The value of negative sampling is particularly pronounced because choice sets in relational data are often enormous. Given the importance of negative sampling, in this work we introduce a model simplification technique for mixed logit models that we call "de-mixing'', whereby standard mixture models of network formation---particularly models that mix local and global link formation---are reformulated to operate their modes over disjoint choice sets. This reformulation reduces mixed logit models to conditional logit models, opening the door to negative sampling while also circumventing other standard challenges with maximizing mixture model likelihoods. To further improve scalability, we also study importance sampling for more efficiently selecting negative samples, finding that it can greatly speed up inference in both standard and de-mixed models. Together, these steps make it possible to much more realistically model network formation in very large graphs. We illustrate the relative gains of our improvements on synthetic datasets with known ground truth as well as a large-scale dataset of public transactions on the Venmo platform.

【Keywords】: Applied computing; Law, social and behavioral sciences; Economics; Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Mixture modeling; Information systems; Information systems applications; Data mining; World Wide Web; Web applications; Social networks

【Paper Link】【Pages】:1999-2008

【Authors】: Subhabrata Dutta ; Sarah Masud ; Soumen Chakrabarti ; Tanmoy Chakraborty

【Abstract】: Modeling user engagement dynamics on social media has compelling applications in market trend analysis, user-persona detection, and political discourse mining. Most existing approaches depend heavily on knowledge of the underlying user network. However, a large number of discussions happen on platforms that either lack any reliable social network (news portal, blogs, Buzzfeed) or reveal only partially the inter-user ties (Reddit, Stackoverflow). Many approaches require observing a discussion for some considerable period before they can make useful predictions. In real-time streaming scenarios, observations incur costs. Lastly, most models do not capture complex interactions between exogenous events (such as news articles published externally) and in-network effects (such as follow-up discussions on Reddit) to determine engagement levels. To address the three limitations noted above, we propose a novel framework, ChatterNet, which, to our knowledge, is the first that can model and predict user engagement without considering the underlying user network. Given streams of timestamped news articles and discussions, the task is to observe the streams for a short period leading up to a time horizon, then predict chatter: the volume of discussions through a specified period after the horizon. ChatterNet processes text from news and discussions using a novel time-evolving recurrent network architecture that captures both temporal properties within news and discussions, as well as influence of news on discussions. We report on extensive experiments using a two-month-long discussion corpus of Reddit, and a contemporaneous corpus of online news articles from the Common Crawl. ChatterNet shows considerable improvements beyond recent state-of-the-art models of engagement prediction. Detailed studies controlling observation and prediction windows, over 43 different subreddits, yield further useful insights.

【Keywords】: Information systems; Information systems applications; Data mining

200. Geography-Aware Sequential Location Recommendation.

【Paper Link】【Pages】:2009-2019

【Authors】: Defu Lian ; Yongji Wu ; Yong Ge ; Xing Xie ; Enhong Chen

【Abstract】: Sequential location recommendation plays an important role in many applications such as mobility prediction, route planning and location-based advertisements. In spite of evolving from tensor factorization to RNN-based neural networks, existing methods did not make effective use of geographical information and suffered from the sparsity issue. To this end, we propose a Geography-aware sequential recommender based on the Self-Attention Network (GeoSAN for short) for location recommendation. On the one hand, we propose a new loss function based on importance sampling for optimization, to address the sparsity issue by emphasizing the use of informative negative samples. On the other hand, to make better use of geographical information, GeoSAN represents the hierarchical gridding of each GPS point with a self-attention based geography encoder. Moreover, we put forward geography-aware negative samplers to promote the informativeness of negative samples. We evaluate the proposed algorithm with three real-world LBSN datasets, and show that GeoSAN outperforms the state-of-the-art sequential location recommenders by 34.9%. The experimental results further verify significant effectiveness of the new loss function, geography encoder, and geography-aware negative samplers.

【Keywords】: Information systems; Information systems applications; Data mining; Collaborative filtering; Spatial-temporal systems; Location based services

201. Dual Channel Hypergraph Collaborative Filtering.

【Paper Link】【Pages】:2020-2029

【Authors】: Shuyi Ji ; Yifan Feng ; Rongrong Ji ; Xibin Zhao ; Wanwan Tang ; Yue Gao

【Abstract】: Collaborative filtering (CF) is one of the most popular and important recommendation methodologies in the heart of numerous recommender systems today. Although widely adopted, existing CF-based methods, ranging from matrix factorization to the emerging graph-based methods, suffer inferior performance especially when the data for training are very limited. In this paper, we first pinpoint the root causes of such deficiency and observe two main disadvantages that stem from the inherent designs of existing CF-based methods, i.e., 1) inflexible modeling of users and items and 2) insufficient modeling of high-order correlations among the subjects. Under such circumstances, we propose a dual channel hypergraph collaborative filtering (DHCF) framework to tackle the above issues. First, a dual channel learning strategy, which holistically leverages the divide-and-conquer strategy, is introduced to learn the representation of users and items so that these two types of data can be elegantly interconnected while still maintaining their specific properties. Second, the hypergraph structure is employed for modeling users and items with explicit hybrid high-order correlations. The jump hypergraph convolution (JHConv) method is proposed to support the explicit and efficient embedding propagation of high-order correlations. Comprehensive experiments on two public benchmarks and two new real-world datasets demonstrate that DHCF can achieve significant and consistent improvements against other state-of-the-art methods.

【Keywords】: Information systems; Information retrieval; Retrieval models and ranking; Retrieval tasks and goals; Recommender systems

202. A Framework for Recommending Accurate and Diverse Items Using Bayesian Graph Convolutional Neural Networks.

【Paper Link】【Pages】:2030-2039

【Authors】: Jianing Sun ; Wei Guo ; Dengcheng Zhang ; Yingxue Zhang ; Florence Regol ; Yaochen Hu ; Huifeng Guo ; Ruiming Tang ; Han Yuan ; Xiuqiang He ; Mark Coates

【Abstract】: Personalized recommender systems are playing an increasingly important role for online consumption platforms. Because of the multitude of relationships existing in recommender systems, Graph Neural Networks (GNNs) based approaches have been proposed to better characterize the various relationships between a user and items while modeling a user's preferences. Previous graph-based recommendation approaches process the observed user-item interaction graph as a ground-truth depiction of the relationships between users and items. However, especially in the implicit recommendation setting, all the unobserved user-item interactions are usually assumed to be negative samples. There are missing links that represent a user's future actions. In addition, there may be spurious or misleading positive interactions. To alleviate the above issue, in this work, we take a first step to introduce a principled way to model the uncertainty in the user-item interaction graph using the Bayesian Graph Convolutional Neural Network framework. We discuss how inference can be performed under our framework and provide a concrete formulation using the Bayesian Probabilistic Ranking training loss. We demonstrate the effectiveness of our proposed framework on four benchmark recommendation datasets. The proposed method outperforms state-of-the-art graph-based recommendation models. Furthermore, we conducted an offline evaluation on one industrial large-scale dataset. It shows that our proposed method outperforms the baselines, with the potential gain being more significant for cold-start users. This illustrates the potential practical benefit in real-world recommender systems.

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

203. Learning Based Distributed Tracking.

【Paper Link】【Pages】:2040-2050

【Authors】: Hao Wu ; Junhao Gan ; Rui Zhang

【Abstract】: Inspired by the great success of machine learning in the past decade, people have been thinking about the possibility of improving the theoretical results by exploring data distribution. In this paper, we revisit a fundamental problem called Distributed Tracking (DT) under an assumption that the data follows a certain (known or unknown) distribution, and propose a number Data-dependent algorithms with improved theoretical bounds. Informally, in the DT problem, there is a coordinator and k players, where the coordinator holds a threshold N and each player has a counter. At each time stamp, at most one counter can be increased by one. The job of the coordinator is to capture the exact moment when the sum of all these k counters reaches N. The goal is to minimise the communication cost. While our first type of algorithms assume the concrete data distribution is known in advance, our second type of algorithms can learn the distribution on the fly. Both of the algorithms achieve a communication cost bounded by O(k log log N) with high probability, improving the state-of-the-art data-independent bound O(k log N/k). We further propose a number of implementation optimisation heuristics to improve both efficiency and robustness of the algorithms. Finally, we conduct extensive experiments on three real datasets and four synthetic datasets. The experimental results show that the communication cost of our algorithms is as least as $20%$ of that of the state-of-the-art algorithms.

【Keywords】: Mathematics of computing; Probability and statistics; Probabilistic algorithms

204. Tight Sensitivity Bounds For Smaller Coresets.

【Paper Link】【Pages】:2051-2061

【Authors】: Alaa Maalouf ; Adiel Statman ; Dan Feldman

【Abstract】: An ε-coreset to the dimensionality reduction problem for a (possibly very large) matrix A ∈ Rn x d is a small scaled subset of its n rows that approximates their sum of squared distances to every affine k-dimensional subspace of Rd, up to a factor of 1±ε. Such a coreset is useful for boosting the running time of computing a low-rank approximation (k-SVD/k-PCA) while using small memory. Coresets are also useful for handling streaming, dynamic and distributed data in parallel. With high probability, non-uniform sampling based on the so called leverage score or sensitivity of each row in A yields a coreset. The size of the (sampled) coreset is then near-linear in the total sum of these sensitivity bounds. We provide algorithms that compute provably tight bounds for the sensitivity of each input row. It is based on two ingredients: (i) iterative algorithm that computes the exact sensitivity of each row up to arbitrary small precision for (non-affine) k-subspaces, and (ii) a general reduction for computing a coreset for affine subspaces, given a coreset for (non-affine) subspaces in Rd. Experimental results on real-world datasets, including the English Wikipedia documents-term matrix, show that our bounds provide significantly smaller and data-dependent coresets also in practice. Full open source code is also provided.

【Keywords】: Theory of computation; Design and analysis of algorithms; Approximation algorithms analysis; Theory and algorithms for application domains; Machine learning theory

205. GHashing: Semantic Graph Hashing for Approximate Similarity Search in Graph Databases.

【Paper Link】【Pages】:2062-2072

【Authors】: Zongyue Qin ; Yunsheng Bai ; Yizhou Sun

【Abstract】: Graph similarity search aims to find the most similar graphs to a query in a graph database in terms of a given proximity measure, say Graph Edit Distance (GED). It is a widely studied yet still challenging problem. Most of the studies are based on the pruning-verification framework, which first prunes non-promising graphs and then conducts verification on the small candidate set. Existing methods are capable of managing databases with thousands or tens of thousands of graphs, but fail to scale to even larger database, due to their exact pruning strategy. Inspired by the recent success of deep-learning-based semantic hashing in image and document retrieval, we propose a novel graph neural network (GNN) based semantic hashing, i.e. GHashing, for approximate pruning. We first train a GNN with ground-truth GED results so that it learns to generate embeddings and hash codes that preserve GED between graphs. Then a hash index is built to enable graph lookup in constant time. To answer a query, we use the hash codes and the continuous embeddings as two-level pruning to retrieve the most promising candidates, which are sent to the exact solver for final verification. Due to the approximate pruning strategy leveraged by our graph hashing technique, our approach achieves significantly faster query time compared to state-of-the-art methods while maintaining a high recall. Experiments show that our approach is on average 20x faster than the only baseline that works on million-scale databases, which demonstrates GHashing successfully provides a new direction in addressing graph search problem for large-scale graph databases.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by regression; Machine learning approaches; Neural networks; Information systems; Information systems applications; Data mining; Nearest-neighbor search

206. Interactive Path Reasoning on Graph for Conversational Recommendation.

【Paper Link】【Pages】:2073-2083

【Authors】: Wenqiang Lei ; Gangyi Zhang ; Xiangnan He ; Yisong Miao ; Xiang Wang ; Liang Chen ; Tat-Seng Chua

【Abstract】: Traditional recommendation systems estimate user preference on items from past interaction history, thus suffering from the limitations of obtaining fine-grained and dynamic user preference. Conversational recommendation system (CRS) brings revolutions to those limitations by enabling the system to directly ask users about their preferred attributes on items. However, existing CRS methods do not make full use of such advantage --- they only use the attribute feedback in rather implicit ways such as updating the latent user representation. In this paper, we propose Conversational Path Reasoning (CPR), a generic framework that models conversational recommendation as an interactive path reasoning problem on a graph. It walks through the attribute vertices by following user feedback, utilizing the user preferred attributes in an explicit way. By leveraging on the graph structure, CPR is able to prune off many irrelevant candidate attributes, leading to a better chance of hitting user-preferred attributes. To demonstrate how CPR works, we propose a simple yet effective instantiation named SCPR (Simple CPR). We perform empirical studies on the multi-round conversational recommendation scenario, the most realistic CRS setting so far that considers multiple rounds of asking attributes and recommending items. Through extensive experiments on two datasets Yelp and LastFM, we validate the effectiveness of our SCPR, which significantly outperforms the state-of-the-art CRS methods EAR and CRM. In particular, we find that the more attributes there are, the more advantages our method can achieve.

【Keywords】: Human-centered computing; Collaborative and social computing; Collaborative and social computing theory, concepts and paradigms; Social recommendation

207. Algorithmic Aspects of Temporal Betweenness.

【Paper Link】【Pages】:2084-2092

【Authors】: Sebastian Buß ; Hendrik Molter ; Rolf Niedermeier ; Maciej Rymar

【Abstract】: The betweenness centrality of a graph vertex measures how often this vertex is visited on shortest paths between other vertices of the graph. In the analysis of many real-world graphs or networks, betweenness centrality of a vertex is used as an indicator for its relative importance in the network. In recent years, a growing number of real-world networks is modeled as temporal graphs instead of conventional (static) graphs. In a temporal graph, we have a fixed set of vertices and there is a finite discrete set of time steps and every edge might be present only at some time steps. While shortest paths are straightforward to define in static graphs, temporal paths can be considered "optimal" with respect to many different criteria, including length, arrival time, and overall travel time (shortest, foremost, and fastest paths). This leads to different concepts of temporal betweenness centrality, posing new challenges on the algorithmic side. We provide a systematic study of temporal betweenness variants based on various concepts of optimal temporal paths both on a theoretical and empirical level.

【Keywords】: Mathematics of computing; Discrete mathematics; Combinatorics; Combinatorial algorithms; Graph theory; Graph algorithms; Theory of computation; Design and analysis of algorithms; Graph algorithms analysis

【Paper Link】【Pages】:2093-2102

【Authors】: Koki Kawabata ; Yasuko Matsubara ; Takato Honda ; Yasushi Sakurai

【Abstract】: Given a large time-evolving event series such as Google web-search logs, which are collected according to various aspects, i.e., timestamps, locations and keywords, how accurately can we forecast their future activities? How can we reveal significant patterns that allow us to long-term forecast from such complex tensor streams? In this paper, we propose a streaming method, namely, CubeCast, that is designed to capture basic trends and seasonality in tensor streams and extract temporal and multi-dimensional relationships between such dynamics. Our proposed method has the following properties: (a) it is effective: it finds both trends and seasonality and summarizes their dynamics into simultaneous non-linear latent space. (b) it is automatic: it automatically recognizes and models such structural patterns without any parameter tuning or prior information. (c) it is scalable: it incrementally and adaptively detects shifting points of patterns for a semi-infinite collection of tensor streams. Extensive experiments that we conducted on real datasets demonstrate that our algorithm can effectively and efficiently find meaningful patterns for generating future values, and outperforms the state-of-the-art algorithms for time series forecasting in terms of forecasting accuracy and computational time.

【Keywords】: Information systems; Information systems applications; Data mining

209. DeepLine: AutoML Tool for Pipelines Generation using Deep Reinforcement Learning and Hierarchical Actions Filtering.

【Paper Link】【Pages】:2103-2113

【Authors】: Yuval Heffetz ; Roman Vainshtein ; Gilad Katz ; Lior Rokach

【Abstract】: Automatic Machine Learning (AutoML) is an area of research aimed at automating Machine Learning (ML) activities that currently require the involvement of human experts. One of the most challenging tasks in this field is the automatic generation of end-to-end ML pipelines: combining multiple types of ML algorithms into a single architecture used for analysis of previously-unseen data. This task has two challenging aspects: the first is the need to explore a large search space of algorithms and pipeline architectures. The second challenge is the computational cost of training and evaluating multiple pipelines. In this study we present DeepLine, a reinforcement learning-based approach for automatic pipeline generation. Our proposed approach utilizes an efficient representation of the search space together with a novel method for operating in environments with large and dynamic action spaces. By leveraging past knowledge gained from previously-analyzed datasets, our approach only needs to generate and evaluate few dozens of pipelines to reach comparable or better performance than current state-of-the-art AutoML systems that evaluate hundreds and even thousands of pipelines in their optimization process. Evaluation on 56 classification datasets demonstrates the merits of our approach.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Reinforcement learning; Sequential decision making; Supervised learning; Supervised learning by classification; Machine learning algorithms; Dynamic programming for Markov decision processes; Q-learning

210. On Sampling Top-K Recommendation Evaluation.

【Paper Link】【Pages】:2114-2124

【Authors】: Dong Li ; Ruoming Jin ; Jing Gao ; Zhi Liu

【Abstract】: Recently, Rendle has warned that the use of sampling-based top-k metrics might not suffice. This throws a number of recent studies on deep learning-based recommendation algorithms, and classic non-deep-learning algorithms using such a metric, into jeopardy. In this work, we thoroughly investigate the relationship between the sampling and global top-K Hit-Ratio (HR, or Recall), originally proposed by Koren[2] and extensively used by others. By formulating the problem of aligning sampling top-k ([email protected]$) and global top-K ([email protected]) Hit-Ratios through a mapping function f, so that [email protected]~ email protected, we demonstrate both theoretically and experimentally that the sampling top-k Hit-Ratio provides an accurate approximation of its global (exact) counterpart, and can consistently predict the correct winners (the same as indicate by their corresponding global Hit-Ratios).

【Keywords】: Information systems; Information retrieval; Evaluation of retrieval results; Retrieval effectiveness; Retrieval tasks and goals; Recommender systems; Information systems applications; Data mining; Collaborative filtering

211. Algorithmic Decision Making with Conditional Fairness.

【Paper Link】【Pages】:2125-2135

【Authors】: Renzhe Xu ; Peng Cui ; Kun Kuang ; Bo Li ; Linjun Zhou ; Zheyan Shen ; Wei Cui

【Abstract】: Nowadays fairness issues have raised great concerns in decision-making systems. Various fairness notions have been proposed to measure the degree to which an algorithm is unfair. In practice, there frequently exist a certain set of variables we term as fair variables, which are pre-decision covariates such as users' choices. The effects of fair variables are irrelevant in assessing the fairness of the decision support algorithm. We thus define conditional fairness as a more sound fairness metric by conditioning on the fairness variables. Given different prior knowledge of fair variables, we demonstrate that traditional fairness notations, such as demographic parity and equalized odds, are special cases of our conditional fairness notations. Moreover, we propose a Derivable Conditional Fairness Regularizer (DCFR), which can be integrated into any decision-making model, to track the trade-off between precision and fairness of algorithmic decision making. Specifically, an adversarial representation based conditional independence loss is proposed in our DCFR to measure the degree of unfairness. With extensive experiments on three real-world datasets, we demonstrate the advantages of our conditional fairness notation and DCFR.

【Keywords】: Computing methodologies; Machine learning

212. Semi-supervised Collaborative Filtering by Text-enhanced Domain Adaptation.

【Paper Link】【Pages】:2136-2144

【Authors】: Wenhui Yu ; Xiao Lin ; Junfeng Ge ; Wenwu Ou ; Zheng Qin

【Abstract】: Data sparsity is an inherent challenge in the recommender systems, where most of the data is collected from the implicit feedbacks of users. This causes two difficulties in designing effective algorithms: first, the majority of users only have a few interactions with the system and there is no enough data for learning; second, there are no negative samples in the implicit feedbacks and it is a common practice to perform negative sampling to generate negative samples. However, this leads to a consequence that many potential positive samples are mislabeled as negative ones and data sparsity would exacerbate the mislabeling problem.

【Keywords】: Human-centered computing; Collaborative and social computing; Collaborative and social computing theory, concepts and paradigms; Social recommendation; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems; World Wide Web; Web searching and information discovery; Collaborative filtering; Social recommendation

213. Rich Information is Affordable: A Systematic Performance Analysis of Second-order Optimization Using K-FAC.

【Paper Link】【Pages】:2145-2153

【Authors】: Yuichiro Ueno ; Kazuki Osawa ; Yohei Tsuji ; Akira Naruse ; Rio Yokota

【Abstract】: Rich information matrices from first and second-order derivatives have many potential applications in both theoretical and practical problems in deep learning. However, computing these information matrices is extremely expensive and this enormous cost is currently limiting its application to important problems regarding generalization, hyperparameter tuning, and optimization of deep neural networks. One of the most challenging use cases of information matrices is their use as a preconditioner for the optimizers, since the information matrices need to be updated every step. In this work, we conduct a step-by-step performance analysis when computing the Fisher information matrix during training of ResNet-50 on ImageNet, and show that the overhead can be reduced to the same amount as the cost of performing a single SGD step. We also show that the resulting Fisher preconditioned optimizer can converge in 1/3 the number of epochs compared to SGD, while achieving the same Top-1 validation accuracy. This is the first work to achieve such accuracy with K-FAC while reducing the training time to match that of SGD.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks

214. Voronoi Graph Traversal in High Dimensions with Applications to Topological Data Analysis and Piecewise Linear Interpolation.

【Paper Link】【Pages】:2154-2164

【Authors】: Vladislav Polianskii ; Florian T. Pokorny

【Abstract】: Voronoi diagrams and their dual, the Delaunay complex, are two fundamental geometric concepts that lie at the foundation of many machine learning algorithms and play a role in particular in classical piecewise linear interpolation and regression methods. More recently, they are also crucial for the construction of a common class of simplicial complexes such as Alpha and Delaunay-\vC ech complexes in topological data analysis. We propose a randomized approximation approach that mitigates the prohibitive cost of exact computation of Voronoi diagrams in high dimensions for machine learning applications. In experiments with data in up to 50 dimensions, we show that this allows us to significantly extend the use of Voronoi-based simplicial complexes in Topological Data Analysis (TDA) to higher dimensions. We confirm prior TDA results on image patches that previously had to rely on sub-sampled data with increased resolution and demonstrate the scalability of our approach by performing a TDA analysis on synthetic data as well as on filters of a ResNet neural network architecture. Secondly, we propose an application of our approach to piecewise linear interpolation of high dimensional data that avoids explicit complete computation of an associated Delaunay triangulation.

【Keywords】: Mathematics of computing; Discrete mathematics; Graph theory; Approximation algorithms; Mathematical analysis; Numerical analysis; Interpolation; Theory of computation; Randomness, geometry and discrete structures; Computational geometry; Random walks and Markov chains

215. MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining.

【Paper Link】【Pages】:2165-2174

【Authors】: Leonardo Pellegrina ; Cyrus Cousins ; Fabio Vandin ; Matteo Riondato

【Abstract】: We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for families of functions exhibiting poset (e.g., lattice) structure, such as those that arise in many pattern mining tasks. The MCERA allows us to compute upper bounds to the maximum deviation of sample means from their expectations, thus it can be used to find both statistically-significant functions (i.e., patterns) when the available data is seen as a sample from an unknown distribution, and approximations of collections of high-expectation functions (e.g., frequent patterns) when the available data is a small sample from a large dataset. This feature is a strong improvement over previously proposed solutions that could only achieve one of the two. MCRapper uses upper bounds to the discrepancy of the functions to efficiently explore and prune the search space, a technique borrowed from pattern mining itself. To show the practical use of MCRapper, we employ it to develop an algorithm TFP-R for the task of True Frequent Pattern (TFP) mining. TFP-R gives guarantees on the probability of including any false positives (precision) and exhibits higher statistical power (recall) than existing methods offering the same guarantees. We evaluate MCRapper and TFP-R and show that they outperform the state-of-the-art for their respective tasks.

【Keywords】: Information systems; Information systems applications; Data mining; Mathematics of computing; Probability and statistics; Probabilistic algorithms; Theory of computation; Design and analysis of algorithms; Streaming, sublinear and near linear time algorithms; Sketching and sampling

216. REA: Robust Cross-lingual Entity Alignment Between Knowledge Graphs.

【Paper Link】【Pages】:2175-2184

【Authors】: Shichao Pei ; Lu Yu ; Guoxian Yu ; Xiangliang Zhang

【Abstract】: Cross-lingual entity alignment aims at associating semantically similar entities in knowledge graphs with different languages. It has been an essential research problem for knowledge integration and knowledge graph connection, and been studied with supervised or semi-supervised machine learning methods with the assumption of clean labeled data. However, labels from human annotations often include errors, which can largely affect the alignment results. We thus aim to formulate and explore the robust entity alignment problem, which is non-trivial, due to the deficiency of noisy labels. Our proposed method named REA (Robust Entity Alignment) consists of two components: noise detection and noise-aware entity alignment. The noise detection is designed by following the adversarial training principle. The noise-aware entity alignment is devised by leveraging graph neural network based knowledge graph encoder as the core. In order to mutually boost the performance of the two components, we propose a unified reinforced training strategy to combine them. To evaluate our REA method, we conduct extensive experiments on several real-world datasets. The experimental results demonstrate the effectiveness of our proposed method and also show that our model consistently outperforms the state-of-the-art methods with significant improvement on alignment accuracy in the noise-involved scenario.

【Keywords】: Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Information systems; Data management systems; Information integration

217. Stable Learning via Differentiated Variable Decorrelation.

【Paper Link】【Pages】:2185-2193

【Authors】: Zheyan Shen ; Peng Cui ; Jiashuo Liu ; Tong Zhang ; Bo Li ; Zhitang Chen

【Abstract】: Recently, as the applications of artificial intelligence gradually seeping into some risk-sensitive areas such as justice, healthcare and autonomous driving, an upsurge of research interest on model stability and robustness has arisen in the field of machine learning. Rather than purely fitting the observed training data, stable learning tries to learn a model with uniformly good performance under non-stationary and agnostic testing data. The key challenge of stable learning in practice is that we do not have any knowledge about the true model and test data distribution as a priori. Under such condition, we cannot expect a faithful estimation of model parameters and its stability over wild changing environments. Previous methods resort to a reweighting scheme to remove the correlations between all the variables through a set of new sample weights. However, we argue that such aggressive decorrelation between all the variables may cause the over-reduced sample size, which leads to the variance inflation and possible underperformance. In this paper, we incorporate the unlabled data from multiple environments into the variable decorrelation framework and propose a Differentiated Variable Decorrelation (DVD) algorithm based on the clustering of variables. Specifically, the variables are clustered according to the stability of their correlations and the variable decorrelation module learns a set of sample weights to remove the correlations merely between the variables of different clusters. Empirical studies on both synthetic and real world datasets clearly demonstrate the efficacy of our DVD algorithm on improving the model parameter estimation and the prediction stability over changing distributions.

【Keywords】: Computing methodologies; Machine learning; Learning settings; Semi-supervised learning settings; Machine learning approaches; Learning linear models

218. Learning Stable Graphs from Multiple Environments with Selection Bias.

【Paper Link】【Pages】:2194-2202

【Authors】: Yue He ; Peng Cui ; Jianxin Ma ; Hao Zou ; Xiaowei Wang ; Hongxia Yang ; Philip S. Yu

【Abstract】: Nowadays graph has become a general and powerful representation to describe the rich relationships among different kinds of entities via the underlying patterns encoded in its structure. The knowledge (more generally) accumulated in graph is expected to be able to cross populations from one to another and the past to future. However the data collection process of graph generation is full of known or unknown sample selection biases, leading to spurious correlations among entities, especially in the non-stationary and heterogeneous environments. In this paper, we target the problem of learning stable graphs from multiple environments with selection bias. We purpose a Stable Graph Learning (SGL) framework to learn a graph that can capture general relational patterns which are irrelevant with the selection bias in an unsupervised way. Extensive experimental results from both simulation and real data demonstrate that our method could significantly benefit the generalization capacity of graph structure.

【Keywords】: Computing methodologies; Machine learning

219. Fast RobustSTL: Efficient and Robust Seasonal-Trend Decomposition for Time Series with Complex Patterns.

【Paper Link】【Pages】:2203-2213

【Authors】: Qingsong Wen ; Zhe Zhang ; Yan Li ; Liang Sun

【Abstract】: Many real-world time series data exhibit complex patterns with trend, seasonality, outlier and noise. Robustly and accurately decomposing these components would greatly facilitate time series tasks including anomaly detection, forecasting and classification. RobustSTL is an effective seasonal-trend decomposition for time series data with complicated patterns. However, it cannot handle multiple seasonal components properly. Also it suffers from its high computational complexity, which limits its usage in practice. In this paper, we extend RobustSTL to handle multiple seasonality. To speed up the computation, we propose a special generalized ADMM algorithm to perform the decomposition efficiently. We rigorously prove that the proposed algorithm converges approximately as standard ADMM while reducing the complexity from O(N2) to O(N log N) for each iteration. We empirically study our proposed algorithm with other state-of-the-art seasonal-trend decomposition methods, including MSTL, STR, TBATS, on both synthetic and real-world datasets with single and multiple seasonality. The experimental results demonstrate the superior performance of our decomposition algorithm in terms of both effectiveness and efficiency.

【Keywords】: Information systems; Information systems applications; Data mining; Data stream mining; Spatial-temporal systems; Data streaming; Mathematics of computing; Probability and statistics; Statistical paradigms; Time series analysis

220. CurvaNet: Geometric Deep Learning based on Directional Curvature for 3D Shape Analysis.

【Paper Link】【Pages】:2214-2224

【Authors】: Wenchong He ; Zhe Jiang ; Chengming Zhang ; Arpan Man Sainju

【Abstract】: Over the last decade, deep learning research has achieved tremendous success in computer vision and natural language processing. The current widely successful deep learning models are largely based on convolution and pooling operations on a Euclidean plane with a regular grid (e.g., image and video data) and thus cannot be directly applied to the non-Euclidean surface. Geometric deep learning aims to fill the gap by generalizing deep learning models from a 2D Euclidean plane to a 3D geometric surface. The problem has important applications in human-computer interaction, biochemistry, and mechanical engineering, but is uniquely challenging due to the lack of a regular grid framework and the difficulties in learning geometric features on a non-Euclidean manifold. Existing works focus on generalizing deep learning models from 2D image to graphs (e.g., graph neural networks) or 3D mesh surfaces but without fully learning geometric features from a differential geometry perspective. In contrast, this paper proposes a novel geometric deep learning model called CurvaNet that integrates differential geometry with graph neural networks. The key idea is to learn direction sensitive 3D shape features through directional curvature filters. We design a U-Net like architecture with downsampling and upsampling paths based on mesh pooling and unpooling operations. Evaluation on real-world datasets shows that the proposed model outperforms several baseline methods in classification accuracy.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer graphics; Shape modeling; Shape analysis; Machine learning; Machine learning algorithms

221. Attentional Multi-graph Convolutional Network for Regional Economy Prediction with Open Migration Data.

【Paper Link】【Pages】:2225-2233

【Authors】: Fengli Xu ; Yong Li ; Shusheng Xu

【Abstract】: We study the problem of predicting regional economy of U.S. counties with open migration data collected from U.S. Internal Revenue Service (IRS) records. To capture the complicated correlations between them, we design a novel Attentional Multi-graph Convolutional Network (AMCN), which models the migration behavior as a multi-graph with different types of edges denoting the migration flows collected from heterogeneous sources of different years and different demographics. AMCN extracts high quality feature from the migration multi-graph by first applying customized aggregator functions on the induced subgraphs, and then fusing the aggregated features with a higher-order attentional aggregator function. In addition, we address the data sparsity problem with an important neighbor discovery algorithm that can automatically supplement important neighbors that are absent in the empirical data. Experiment results show our AMCN model significantly outperforms all baselines in terms of reducing the relative mean square error by 43.8% against the classic regression model and by 12.7% against the state-of-the-art deep learning baselines. In-depth model analysis shows our proposed AMCN model reveals insightful correlations between regional economy and migration data.

【Keywords】: Applied computing; Law, social and behavioral sciences; Economics; Computing methodologies; Machine learning; Machine learning approaches; Learning latent representations; Information systems; Information systems applications; Data mining

Applied Data Science Track Papers 121

222. Octet: Online Catalog Taxonomy Enrichment with Self-Supervision.

【Paper Link】【Pages】:2247-2257

【Authors】: Yuning Mao ; Tong Zhao ; Andrey Kan ; Chenwei Zhang ; Xin Luna Dong ; Christos Faloutsos ; Jiawei Han

【Abstract】: Taxonomies have found wide applications in various domains, especially online for item categorization, browsing, and search. Despite the prevalent use of online catalog taxonomies, most of them in practice are maintained by humans, which is labor-intensive and difficult to scale. While taxonomy construction from scratch is considerably studied in the literature, how to effectively enrich existing incomplete taxonomies remains an open yet important research question. Taxonomy enrichment not only requires the robustness to deal with emerging terms but also the consistency between existing taxonomy structure and new term attachment. In this paper, we present a self-supervised end-to-end framework, Octet, for Online Catalog Taxonomy EnrichmenT. Octet leverages heterogeneous information unique to online catalog taxonomies such as user queries, items, and their relations to the taxonomy nodes while requiring no other supervision than the existing taxonomies. We propose to distantly train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure as well as the query-item-taxonomy interactions for term attachment. Extensive experiments in different online domains demonstrate the superiority of Octet over state-of-the-art methods via both automatic and human evaluations. Notably, Octet enriches an online catalog taxonomy in production to 2 times larger in the open-world evaluation.

【Keywords】: Applied computing; Electronic commerce; E-commerce infrastructure; Enterprise computing; Enterprise ontologies, taxonomies and vocabularies; Computing methodologies; Artificial intelligence; Natural language processing; Information extraction; Lexical semantics; Information systems; World Wide Web; Web applications; Electronic commerce

223. TIMME: Twitter Ideology-detection via Multi-task Multi-relational Embedding.

【Paper Link】【Pages】:2258-2268

【Authors】: Zhiping Xiao ; Weiping Song ; Haoyan Xu ; Zhicheng Ren ; Yizhou Sun

【Abstract】: We aim at solving the problem of predicting people's ideology, or political tendency. We estimate it by using Twitter data, and formalize it as a classification problem. Ideology-detection has long been a challenging yet important problem. Certain groups, such as the policy makers, rely on it to make wise decisions. Back in the old days when labor-intensive survey-studies were needed to collect public opinions, analyzing ordinary citizens' political tendencies was uneasy. The rise of social medias, such as Twitter, has enabled us to gather ordinary citizen's data easily. However, the incompleteness of the labels and the features in social network datasets is tricky, not to mention the enormous data size and the heterogeneousity. The data differ dramatically from many commonly-used datasets, thus brings unique challenges. In our work, first we built our own datasets from Twitter. Next, we proposed TIMME, a multi-task multi-relational embedding model, that works efficiently on sparsely-labeled heterogeneous real-world dataset. It could also handle the incompleteness of the input features. Experimental results showed that TIMME is overall better than the state-of-the-art models for ideology detection on Twitter. Our findings include: links can lead to good classification outcomes without text; conservative voice is under-represented on Twitter; follow is the most important relation to predict ideology; retweet and mention enhance a higher chance of like, etc. Last but not least, TIMME could be extended to other datasets and tasks in theory.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Multi-task learning; Machine learning approaches; Neural networks

【Paper Link】【Pages】:2269-2279

【Authors】: Xianfeng Tang ; Yozen Liu ; Neil Shah ; Xiaolin Shi ; Prasenjit Mitra ; Suhang Wang

【Abstract】: With the rapid growth and prevalence of social network applications (Apps) in recent years, understanding user engagement has become increasingly important, to provide useful insights for future App design and development. While several promising neural modeling approaches were recently pioneered for accurate user engagement prediction, their black-box designs are unfortunately limited in model explainability. In this paper, we study a novel problem of explainable user engagement prediction for social network Apps. First, we propose a flexible definition of user engagement for various business scenarios, based on future metric expectations. Next, we design an end-to-end neural framework, FATE, which incorporates three key factors that we identify to influence user engagement, namely friendships, user actions, and temporal dynamics to achieve explainable engagement predictions. FATE is based on a tensor-based graph neural network (GNN), LSTM and a mixture attention mechanism, which allows for (a) predictive explanations based on learned weights across different feature categories, (b) reduced network complexity, and (c) improved performance in both prediction accuracy and training/inference time. We conduct extensive experiments on two large-scale datasets from Snapchat, where FATE outperforms state-of-the-art approaches by 10% error and 20% runtime reduction. We also evaluate explanations from FATE, showing strong quantitative and qualitative performance.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; World Wide Web; Web applications; Social networks; Networks; Network types; Overlay and other logical network structures; Online social networks; Social media networks

225. Sub-Matrix Factorization for Real-Time Vote Prediction.

【Paper Link】【Pages】:2280-2290

【Authors】: Alexander Immer ; Victor Kristof ; Matthias Grossglauser ; Patrick Thiran

【Abstract】: We address the problem of predicting aggregate vote outcomes (e.g., national) from partial outcomes (e.g., regional) that are revealed sequentially. We combine matrix factorization techniques and generalized linear models (GLMs) to obtain a flexible, efficient, and accurate algorithm. This algorithm works in two stages: First, it learns representations of the regions from high-dimensional historical data. Second, it uses these representations to fit a GLM to the partially observed results and to predict unobserved results. We show experimentally that our algorithm is able to accurately predict the outcomes of Swiss referenda, U.S. presidential elections, and German legislative elections. We also explore the regional representations in terms of ideological and cultural patterns. Finally, we deploy an online Web platform (www.predikon.ch) to provide real-time vote predictions in Switzerland and a data visualization tool to explore voting behavior. A by-product is a dataset of sequential vote results for 330 referenda and 2196 Swiss municipalities.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Factorization methods; Information systems; Information systems applications; Data mining; Collaborative filtering; World Wide Web; Web applications

226. Temporal-Contextual Recommendation in Real-Time.

【Paper Link】【Pages】:2291-2299

【Authors】: Yifei Ma ; Balakrishnan (Murali) Narayanaswamy ; Haibin Lin ; Hao Ding

【Abstract】: Personalized real-time recommendation has had a profound impact on retail, media, entertainment and other industries. However, developing recommender systems for every use case is costly, time consuming and resource-intensive. To fill this gap, we present a black-box recommender system that can adapt to a diverse set of scenarios without the need for manual tuning. We build on techniques that go beyond simple matrix factorization to incorporate important new sources of information: the temporal order of events [Hidasi et al., 2015], contextual information to bootstrap cold-start users, metadata information about items [Rendle 2012] and the additional information surrounding each event. Additionally, we address two fundamental challenges when putting recommender systems in the real-world: how to efficiently train them with even millions of unique items and how to cope with changing item popularity trends [Wu et al., 2017]. We introduce a compact model, which we call hierarchical recurrent network with meta data (HRNN-meta) to address the real-time and diverse metadata needs; we further provide efficient training techniques via importance sampling that can scale to millions of items with little loss in performance. We report significant improvements on a wide range of real-world datasets and provide intuition into model capabilities with synthetic experiments. Parts of HRNN-meta have been deployed in production at scale for customers to use at Amazon Web Services and serves as the underlying recommender engine for thousands of websites.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Reinforcement learning; Sequential decision making; Information systems; Information retrieval; Retrieval models and ranking; Learning to rank; Retrieval tasks and goals; Recommender systems; Information systems applications; Enterprise information systems; Enterprise applications; World Wide Web; Web searching and information discovery; Personalization

227. OptMatch: Optimized Matchmaking via Modeling the High-Order Interactions on the Arena.

【Paper Link】【Pages】:2300-2310

【Authors】: Linxia Gong ; Xiaochuan Feng ; Dezhi Ye ; Hao Li ; Runze Wu ; Jianrong Tao ; Changjie Fan ; Peng Cui

【Abstract】: Matchmaking is a core problem for the e-sports and online games, which determines the player satisfaction and further influences the life cycle of the gaming products. Most of matchmaking systems take the form of grouping the queuing players into two opposing teams by following certain rules. The design and implementation of matchmaking systems are usually product-specific and labor-intensive.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Learning latent representations; Information systems; Information systems applications; Data mining; Multimedia information systems; Massively multiplayer online games

【Paper Link】【Pages】:2311-2320

【Authors】: Aditya Pal ; Chantat Eksombatchai ; Yitong Zhou ; Bo Zhao ; Charles Rosenberg ; Jure Leskovec

【Abstract】: Latent user representations are widely adopted in the tech industry for powering personalized recommender systems. Most prior work infers a single high dimensional embedding to represent a user, which is a good starting point but falls short in delivering a full understanding of the user's interests. In this work, we introduce PinnerSage, an end-to-end recommender system that represents each user via multi-modal embeddings and leverages this rich representation of users to provides high quality personalized recommendations. PinnerSage achieves this by clustering users' actions into conceptually coherent clusters with the help of a hierarchical clustering method (Ward) and summarizes the clusters via representative pins (Medoids) for efficiency and interpretability. PinnerSage is deployed in production at Pinterest and we outline the several design decisions that makes it run seamlessly at a very large scale. We conduct several offline and online A/B experiments to show that our method significantly outperforms single embedding methods.

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems; Information systems applications; Data mining; Clustering; Nearest-neighbor search; World Wide Web; Web searching and information discovery; Personalization

229. Polestar: An Intelligent, Efficient and National-Wide Public Transportation Routing Engine.

【Paper Link】【Pages】:2321-2329

【Authors】: Hao Liu ; Ying Li ; Yanjie Fu ; Huaibo Mei ; Jingbo Zhou ; Xu Ma ; Hui Xiong

【Abstract】: Public transportation plays a critical role in people's daily life. It has been proven that public transportation is more environmentally sustainable, efficient, and economical than any other forms of travel. However, due to the increasing expansion of transportation networks and more complex travel situations, people are having difficulties in efficiently finding the most preferred route from one place to another through public transportation systems. To this end, in this paper, we present Polestar, a data-driven engine for intelligent and efficient public transportation routing.Specifically, we first propose a novel Public Transportation Graph (PTG) to model public transportation system in terms of various travel costs, such as time or distance. Then, we introduce a general route search algorithm coupled with an efficient station binding method for efficient route candidate generation. After that, we propose a two-pass route candidate ranking module to capture user preferences under dynamic travel situations. Finally, experiments on two real-world data sets demonstrate the advantages of Polestar in terms of both efficiency and effectivenes Indeed, in early 2019, Polestar has been deployed on Baidu Maps, one of the world's largest map services. To date, Polestar is servicing over 330 cities, answers over a hundred millions of queries each day, and achieves substantial improvement of user click ratio.

【Keywords】: Information systems; Information systems applications; Spatial-temporal systems

230. Context-Aware Attentive Knowledge Tracing.

【Paper Link】【Pages】:2330-2339

【Authors】: Aritra Ghosh ; Neil T. Heffernan ; Andrew S. Lan

【Abstract】: Knowledge tracing (KT) refers to the problem of predicting future learner performance given their past performance in educational applications. Recent developments in KT using flexible deep neural network-based models excel at this task. However, these models often offer limited interpretability, thus making them insufficient for personalized learning, which requires using interpretable feedback and actionable recommendations to help learners achieve better learning outcomes. In this paper, we propose attentive knowledge tracing (AKT), which couples flexible attention-based neural network models with a series of novel, interpretable model components inspired by cognitive and psychometric models. AKT uses a novel monotonic attention mechanism that relates a learner's future responses to assessment questions to their past responses; attention weights are computed using exponential decay and a context-aware relative distance measure, in addition to the similarity between questions. Moreover, we use the Rasch model to regularize the concept and question embeddings; these embeddings are able to capture individual differences among questions on the same concept without using an excessive number of parameters. We conduct experiments on several real-world benchmark datasets and show that AKT outperforms existing KT methods (by up to $6%$ in AUC in some cases) on predicting future learner responses. We also conduct several case studies and show that AKT exhibits excellent interpretability and thus has potential for automated feedback and personalization in real-world educational settings.

【Keywords】: Applied computing; Education; Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information systems applications; Data mining

231. Improving Movement Predictions of Traffic Actors in Bird's-Eye View Models using GANs and Differentiable Trajectory Rasterization.

【Paper Link】【Pages】:2340-2348

【Authors】: Eason Wang ; Henggang Cui ; Sai Yalamanchi ; Mohana Moorthy ; Nemanja Djuric

【Abstract】: One of the most critical pieces of the self-driving puzzle is the task of predicting future movement of surrounding traffic actors, which allows the autonomous vehicle to safely and effectively plan its future route in a complex world. Recently, a number of algorithms have been proposed to address this important problem, spurred by a growing interest of researchers from both industry and academia. Methods based on top-down scene rasterization on one side and Generative Adversarial Networks (GANs) on the other have shown to be particularly successful, obtaining state-of-the-art accuracies on the task of traffic movement prediction. In this paper we build upon these two directions and propose a raster-based conditional GAN architecture, powered by a novel differentiable rasterizer module at the input of the conditional discriminator that maps generated trajectories into the raster space in a differentiable manner. This simplifies the task for the discriminator as trajectories that are not scene-compliant are easier to discern, and allows the gradients to flow back forcing the generator to output better, more realistic trajectories. We evaluated the proposed method on a large-scale, real-world data set, showing that it outperforms state-of-the-art GAN-based baselines.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision problems; Object detection; Computer vision tasks; Scene understanding; Vision for robotics; Machine learning; Machine learning approaches; Neural networks

232. M2GRL: A Multi-task Multi-view Graph Representation Learning Framework for Web-scale Recommender Systems.

【Paper Link】【Pages】:2349-2358

【Authors】: Menghan Wang ; Yujie Lin ; Guli Lin ; Keping Yang ; Xiao-Ming Wu

【Abstract】: Combining graph representation learning with multi-view data (side information) for recommendation is a trend in industry. Most existing methods can be categorized as multi-view representation fusion; they first build one graph and then integrate multi-view data into a single compact representation for each node in the graph. However, these methods are raising concerns in both engineering and algorithm aspects: 1) multi-view data are abundant and informative in industry and may exceed the capacity of one single vector, and 2) inductive bias may be introduced as multi-view data are often from different distributions. In this paper, we use a multi-view representation alignment approach to address this issue. Particularly, we propose a multi-task multi-view graph representation learning framework (M2GRL) to learn node representations from multi-view graphs for web-scale recommender systems. M2GRL constructs one graph for each single-view data, learns multiple separate representations from multiple graphs, and performs alignment to model cross-view relations. M2GRL chooses a multi-task learning paradigm to learn intra-view representations and cross-view relations jointly. Besides, M2GRL applies homoscedastic uncertainty to adaptively tune the loss weights of tasks during training. We deploy M2GRL at Taobao and train it on 57 billion examples. According to offline metrics and online A/B tests, M2GRL significantly outperforms other state-of-the-art algorithms. Further exploration on diversity recommendation in Taobao shows the effectiveness of utilizing multiple representations produced by M2GRL, which we argue is a promising direction for various industrial recommendation tasks of different focus.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Multi-task learning; Machine learning approaches; Learning latent representations; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

233. Attribute-based Propensity for Unbiased Learning in Recommender Systems: Algorithm and Case Studies.

【Paper Link】【Pages】:2359-2367

【Authors】: Zhen Qin ; Suming J. Chen ; Donald Metzler ; Yongwoo Noh ; Jingzheng Qin ; Xuanhui Wang

【Abstract】: Many modern recommender systems train their models based on a large amount of implicit user feedback data. Due to the inherent bias in this data (e.g., position bias), learning from it directly can lead to suboptimal models. Recently, unbiased learning was proposed to address such problems by leveraging counterfactual techniques like inverse propensity weighting (IPW). In these methods, propensity scores estimation is usually limited to item's display position in a single user interface (UI).

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Multi-task learning; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

【Paper Link】【Pages】:2368-2377

【Authors】: Teng Ye ; Wei Ai ; Lingyu Zhang ; Ning Luo ; Lulu Zhang ; Jieping Ye ; Qiaozhu Mei

【Abstract】: Millions of drivers worldwide have enjoyed financial benefits and work schedule flexibility through a ride-sharing economy, but meanwhile they have suffered from the lack of a sense of identity and career achievement. Equipped with social identity and contest theories, financially incentivized team competitions have been an effective instrument to increase drivers' productivity, job satisfaction, and retention, and to improve revenue over cost for ride-sharing platforms. While these competitions are overall effective, the decisive factors behind the treatment effects and how they affect the outcomes of individual drivers have been largely mysterious. In this study, we analyze data collected from more than 500 large-scale team competitions organized by a leading ride-sharing platform, building machine learning models to predict individual treatment effects. Through a careful investigation of features and predictors, we are able to reduce out-sample prediction error by more than 24%. Through interpreting the best-performing models, we discover many novel and actionable insights regarding how to optimize the design and the execution of team competitions on ride-sharing platforms. A simulated analysis demonstrates that by simply changing a few contest design options, the average treatment effect of a real competition is expected to increase by as much as 26%. Our procedure and findings shed light on how to analyze and optimize large-scale online field experiments in general.

【Keywords】: General and reference; Cross-computing tools and techniques; Experimentation

235. Cellular Network Radio Propagation Modeling with Deep Convolutional Neural Networks.

【Paper Link】【Pages】:2378-2386

【Authors】: Xin Zhang ; Xiujun Shu ; Bingwen Zhang ; Jie Ren ; Lizhou Zhou ; Xin Chen

【Abstract】: Radio propagation modeling and prediction is fundamental for modern cellular network planning and optimization. Conventional radio propagation models fall into two categories. Empirical models, based on coarse statistics, are simple and computationally efficient, but are inaccurate due to oversimplification. Deterministic models, such as ray tracing based on physical laws of wave propagation, are more accurate and site specific. But they have higher computational complexity and are inflexible to utilize site information other than traditional global information system (GIS) maps.

【Keywords】: General and reference; Document types; General conference proceedings; Networks; Network components; Wireless access points, base stations and infrastructure; Network types; Mobile networks

236. Neural Input Search for Large Scale Recommendation Models.

【Paper Link】【Pages】:2387-2397

【Authors】: Manas R. Joglekar ; Cong Li ; Mei Chen ; Taibai Xu ; Xiaoming Wang ; Jay K. Adams ; Pranav Khaitan ; Jiahui Liu ; Quoc V. Le

【Abstract】: Recommendation problems with large numbers of discrete items, such as products, webpages, or videos, are ubiquitous in the technology industry. Deep neural networks are being increasingly used for these recommendation problems. These models use embeddings to represent discrete items as continuous vectors, and the vocabulary sizes and embedding dimensions, despite their heavy influence on the model's accuracy, are often manually selected in a heuristical manner.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

237. Easy Perturbation EEG Algorithm for Spectral Importance (easyPEASI): A Simple Method to Identify Important Spectral Features of EEG in Deep Learning Models.

【Paper Link】【Pages】:2398-2406

【Authors】: David O. Nahmias ; Kimberly L. Kontson

【Abstract】: Efforts into understanding neurological differences between populations is an active area of research. Deep learning has recently shown promising results using EEG as input to distinguish recordings of subjects based on neurological activity. However, only about one quarter of these studies investigate the underlying neurophysiological implications. This work proposes and validates a method to investigate frequency bands important to EEG-driven deep learning models. Easy perturbation EEG algorithm for spectral importance (easyPEASI) is simpler than previous methods and requires only perturbations to input data. We validate easyPEASI on EEG pathology classification using the Temple University Health EEG Corpus. easyPEASI is further applied to characterize the effects of patients' medications on brain rhythms. We investigate classifications of patients taking one of two anticonvulsant medications, Dilantin (phenytoin) and Keppra (levetiracetam), and subjects taking no medications. We find that for recordings of subjects with clinically-determined normal EEG that these medications effect the Theta and Alpha band most significantly. For recordings with clinically-determined abnormal EEG these medications affected the Delta, Theta, and Alpha bands most significantly. We also find the Beta band to be affected differently by the two medications. Results found here show promise for a method of obtaining explainable artificial intelligence and interpretable models from EEG-driven deep learning through a simpler more accessible method perturbing only input data. Overall, this work provides a fast, easy, and reproducible method to automatically determine salient spectral features of neural activity that have been learned by machine learning models, such as deep learning.

【Keywords】: Applied computing; Life and medical sciences; Computing methodologies; Machine learning; Learning paradigms; Supervised learning

238. Building Continuous Integration Services for Machine Learning.

【Paper Link】【Pages】:2407-2415

【Authors】: Bojan Karlas ; Matteo Interlandi ; Cédric Renggli ; Wentao Wu ; Ce Zhang ; Deepak Mukunthu Iyappan Babu ; Jordan Edwards ; Chris Lauren ; Andy Xu ; Markus Weimer

【Abstract】: Continuous integration (CI) has been a de facto standard for building industrial-strength software. Yet, there is little attention towards applying CI to the development of machine learning (ML) applications until the very recent effort on the theoretical side. In this paper, we take a step forward to bring the theory into practice.

【Keywords】: General and reference; Document types; General conference proceedings; Social and professional topics; Professional topics; Management of computing and information systems; Software management; Software maintenance; Software and its engineering; Software creation and management; Software development process management; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Sample complexity and generalization bounds

239. Learning to Cluster Documents into Workspaces Using Large Scale Activity Logs.

【Paper Link】【Pages】:2416-2424

【Authors】: Weize Kong ; Michael Bendersky ; Marc Najork ; Brandon Vargo ; Mike Colagrosso

【Abstract】: Google Drive is widely used for managing personal and work-related documents in the cloud. To help users organize their documents in Google Drive, we develop a new feature to allow users to create a set of working files for ongoing easy access, called workspace. A workspace is a cluster of documents, but unlike a typical document cluster, it contains documents that are not only topically coherent, but are also useful in the ongoing user tasks.

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Clustering and classification; Information systems applications; Data mining; Clustering; World Wide Web; Web mining; Web log analysis; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Unsupervised learning and clustering

240. What is that Building?: An End-to-end System for Building Recognition from Streetside Images.

【Paper Link】【Pages】:2425-2433

【Authors】: Chiqun Zhang ; Dragomir Yankov ; Chun-Ting Wu ; Simon Shapiro ; Jason Hong ; Wei Wu

【Abstract】: The paper describes Streetside Building Search-Retrieve System (SBSRS) - a system for recognizing buildings from steetside images. SBSRS powers several distinct applications: 1) it improves map-search by enriching its streetview service with semantic information, such as location, business name, open hours, etc.; 2) it enables search by image and location - a novel form of visual image search where both visual and location signals are used to identify the most relevant result to a query image of a building.

【Keywords】: Information systems; Information retrieval; Document representation; Content analysis and feature selection; Retrieval models and ranking

241. MultiSage: Empowering GCN with Contextualized Multi-Embeddings on Web-Scale Multipartite Networks.

【Paper Link】【Pages】:2434-2443

【Authors】: Carl Yang ; Aditya Pal ; Andrew Zhai ; Nikil Pancha ; Jiawei Han ; Charles Rosenberg ; Jure Leskovec

【Abstract】: Graph convolutional networks (GCNs) are a powerful class of graph neural networks. Trained in a semi-supervised end-to-end fashion, GCNs can learn to integrate node features and graph structures to generate high-quality embeddings that can be used for various downstream tasks like search and recommendation. However, existing GCNs mostly work on homogeneous graphs and consider a single embedding for each node, which do not sufficiently model the multi-facet nature and complex interaction of nodes in real-world networks. Here, we present a contextualized GCN engine by modeling the multipartite networks of target nodes and their intermediatecontext nodes that specify the contexts of their interactions. Towards the neighborhood aggregation process, we devise a contextual masking operation at the feature level and a contextual attention mechanism at the node level to achieve interaction contextualization by treating neighboring target nodes based on intermediate context nodes. Consequently, we compute multiple embeddings for target nodes that capture their diverse facets and different interactions during graph convolution, which is useful for fine-grained downstream applications. To enable efficient web-scale training, we build a parallel random walk engine to pre-sample contextualized neighbors, and a Hadoop2-based data provider pipeline to pre-join training data, dynamically reduce multi-GPU training time, and avoid high memory cost. Extensive experiments on the bipartite Pinterest graph and tripartite OAG graph corroborate the advantage of the proposed system.

【Keywords】: Information systems; Data management systems; Data structures; Data access methods; Proximity search; Database design and models; Entity relationship models; Graph-based database models; Hierarchical data models; Network data models; Database management system engines; Parallel and distributed DBMSs; MapReduce-based systems; Information retrieval; Retrieval models and ranking; Novelty in information retrieval; Specialized information retrieval; Environment-specific retrieval; Web and social media search; Information systems applications; Collaborative and social computing systems and tools; Social networking sites; Data mining; Collaborative filtering; World Wide Web; Web applications; Electronic commerce; E-commerce infrastructure; Web searching and information discovery; Social recommendation

242. HetETA: Heterogeneous Information Network Embedding for Estimating Time of Arrival.

【Paper Link】【Pages】:2444-2454

【Authors】: Huiting Hong ; Yucheng Lin ; Xiaoqing Yang ; Zang Li ; Kung Fu ; Zheng Wang ; Xiaohu Qie ; Jieping Ye

【Abstract】: The estimated time of arrival (ETA) is a critical task in the intelligent transportation system, which involves the spatiotemporal data. Despite a significant amount of prior efforts have been made to design efficient and accurate systems for ETA task, few of them take structural graph data into account, much less the heterogeneous information network. In this paper, we propose HetETA to leverage heterogeneous information graph in ETA task. Specifically, we translate the road map into a multi-relational network and introduce a vehicle-trajectories based network to jointly consider the traffic behavior pattern. Moreover, we employ three components to model temporal information from recent periods, daily periods and weekly periods respectively. Each component comprises temporal convolutions and graph convolutions to learn representations of the spatiotemporal heterogeneous information for ETA task. Experiments on large-scale datasets illustrate the effectiveness of the proposed HetETA beyond the state-of-the-art methods, and show the importance of representation learning of heterogeneous information networks for ETA task.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information systems applications; Data mining; Spatial-temporal systems

243. Hubble: An Industrial System for Audience Expansion in Mobile Marketing.

【Paper Link】【Pages】:2455-2463

【Authors】: Chenyi Zhuang ; Ziqi Liu ; Zhiqiang Zhang ; Yize Tan ; Zhengwei Wu ; Zhining Liu ; Jianping Wei ; Jinjie Gu ; Guannan Zhang ; Jun Zhou ; Yuan Qi

【Abstract】: Recently, in order to take a preemptive opportunity in the mobile economy, the Internet companies conduct thousands of marketing campaigns every day, to promote their mobile products and services. In the mobile marketing scenario, one of the fundamental issues is the audience expansion task for marketing campaigns. Given a set of seed users, audience expansion aims to seek more users (audiences), who are similar to the seeds and will finish the business goal of the targeted campaign (ie convert). However, the problem is challenging in three aspects. First, a company will run hundreds of campaigns to serve massive users every day. The requirements of scalability and timeliness make training model for each campaign extremely resource-consuming thus impractical. Therefore, we proposed to solve the problem in a two-stage manner, in which the offline stage employs heavyweight user representation learning and the online stage performs embedding-based lightweight audience expansion. Second, conventional two-stage audience expansion systems neglect the high-order user-campaign interactions and usually generate entangled user embeddings, thus fail to achieve high-quality user representation. Third, the seeds, which are usually provided by experts or collected from users' feedbacks, could be noisy and cannot cover the entire actual audiences, thus introduce coverage bias. Unfortunately, to our best knowledge, none of the related literatures tackle this crucial issue of audience expansion.

【Keywords】: Applied computing; Operations research; Marketing; Computing methodologies; Machine learning; Machine learning approaches; Learning latent representations; Neural networks; Information systems; Information systems applications; Data mining; Collaborative filtering

244. Scaling Graph Neural Networks with Approximate PageRank.

【Paper Link】【Pages】:2464-2473

【Authors】: Aleksandar Bojchevski ; Johannes Klicpera ; Bryan Perozzi ; Amol Kapoor ; Martin Blais ; Benedek Rózemberczki ; Michal Lukasik ; Stephan Günnemann

【Abstract】: Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, learning on large graphs remains a challenge -- many recently proposed scalable GNN approaches rely on an expensive message-passing procedure to propagate information through the graph. We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs resulting in significant speed gains while maintaining state-of-the-art prediction performance. In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.

【Keywords】: Computing methodologies; Machine learning

245. Combo-Attention Network for Baidu Video Advertising.

【Paper Link】【Pages】:2474-2482

【Authors】: Tan Yu ; Yi Yang ; Yi Li ; Xiaodong Chen ; Mingming Sun ; Ping Li

【Abstract】: With the progress of communication technology and the popularity of the smart phone, videos grow to be the largest medium. Since videos can grab a customer's attention quickly and leave a big impression, video ads can gain more trust than traditional ads. Thus advertisers start to pour more resources into making creative video ads to built the connections with potential customers. Baidu, as the leading search engine company in China, receives billions of search queries per day. In this paper, we introduce a technique used in Baidu video advertising for feeding relevant video ads according to the user's query. Note that, retrieving relevant videos using the text query is a cross-modal problem. Due to the modal gap, the text-to-video search is more challenging than well exploited text-to-text search and image-to-image search. To tackle this challenge, we propose a Combo-Attention Network (CAN) and launch it in Baidu video advertising. In the proposed CAN model, we represent a video as a set of bounding boxes features and represent a sentence as a set of words features, and formulate the sentence-to-video search as a set-to-set matching problem. The proposed CAN is built upon the proposed combo-attention module, which exploits cross-modal attentions besides self attentions to effectively capture the relevance between words and bounding boxes. To testify the effectiveness of the proposed CAN offline, we built a Daily700K dataset collected from HaoKan APP. The systematic experiments on Daily700K as well as a public dataset, VATEX, demonstrate the effectiveness of our CAN. After launching the proposed CAN in Baidu's dynamic video advertising (DVA), we achieve a $5.47%$ increase in Conversion Rate (CVR) and a $11.69%$ increase in advertisement impression rate.

【Keywords】: Information systems; Information systems applications; Computational advertising; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Models of learning

246. Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data.

【Paper Link】【Pages】:2483-2493

【Authors】: Bin Gu ; Zhiyuan Dang ; Xiang Li ; Heng Huang

【Abstract】: In a lot of real-world data mining and machine learning applications, data are provided by multiple providers and each maintains private records of different feature sets about common entities. It is challenging to train these vertically partitioned data effectively and efficiently while keeping data privacy for traditional data mining and machine learning algorithms. In this paper, we focus on nonlinear learning with kernels,and propose a federated doubly stochastic kernel learning (FDSKL) algorithm for vertically partitioned data. Specifically, we use random features to approximate the kernel mapping function and use doubly stochastic gradients to update the solutions, which are all computed federatedly without the disclosure of data. Importantly, we prove that FDSKL has a sublinear convergence rate, and can guarantee the data security under the semi-honest assumption. Extensive experimental results on a variety of benchmark datasets show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels, while retaining the similar generalization performance.

【Keywords】: Theory of computation; Design and analysis of algorithms; Parallel algorithms; Massively parallel algorithms

247. To Tune or Not to Tune?: In Search of Optimal Configurations for Data Analytics.

【Paper Link】【Pages】:2494-2504

【Authors】: Ayat Fekry ; Lucian Carata ; Thomas F. J.-M. Pasquier ; Andrew Rice ; Andy Hopper

【Abstract】: This experimental study presents a number of issues that pose a challenge for practical configuration tuning and its deployment in data analytics frameworks. These issues include: 1) the assumption of a static workload or environment, ignoring the dynamic characteristics of the analytics environment (e.g., increase in input data size, changes in allocation of resources). 2) the amortization of tuning costs and how this influences what workloads can be tuned in practice in a cost-effective manner. 3) the need for a comprehensive incremental tuning solution for a diverse set of workloads. We adapt different ML techniques in order to obtain efficient incremental tuning in our problem domain, and propose Tuneful, a configuration tuning framework. We show how it is designed to overcome the above issues and illustrate its applicability by running a wide array of experiments in cloud environments provided by two different service providers.

【Keywords】: Theory of computation; Design and analysis of algorithms; Mathematical optimization; Non-parametric optimization; Online algorithms; Online learning algorithms; Theory and algorithms for application domains; Machine learning theory; Kernel methods; Gaussian processes

248. Reconstruction and Decomposition of High-Dimensional Landscapes via Unsupervised Learning.

【Paper Link】【Pages】:2505-2513

【Authors】: Jing Lei ; Nasrin Akhter ; Wanli Qiao ; Amarda Shehu

【Abstract】: Uncovering the organization of a landscape that encapsulates all states of a dynamic system is a central task in many domains, as it promises to reveal, in an unsupervised manner, a system's inner working. One domain where this task is crucial is in bioinformatics, where the energy landscape that organizes three-dimensional structures of a molecule by their energetics is a powerful construct. The landscape can be leveraged, among other things, to reveal macrostates where a molecule is biologically-active. This is a daunting task, as landscapes of complex actuated systems, such as molecules, are inherently high-dimensional. Nonetheless, our laboratories have made some progress via topological and statistical analysis of spatial data over the recent years. We have proposed what is essentially a dichotomy, methods that are more pertinent for visualization-driven discovery, and methods that are more pertinent for discovery of the biologically-active macrostates but not amenable to visualization. In this paper, we present a novel, hybrid method that combines strengths of these methods, allowing both visualization of the landscape and discovery of macrostates. We demonstrate what the method is capable of uncovering in comparison with existing methods over structure spaces sampled with conformational sampling algorithms. Though the direct evaluation in this paper is on protein energy landscapes, the proposed method is of broad interest in cross-cutting problems that necessitate characterization of fitness and optimization landscapes.

【Keywords】: Applied computing; Life and medical sciences; Computational biology; Molecular structural biology

249. Map Generation from Large Scale Incomplete and Inaccurate Data Labels.

【Paper Link】【Pages】:2514-2522

【Authors】: Rui Zhang ; Conrad M. Albrecht ; Wei Zhang ; Xiaodong Cui ; Ulrich Finkler ; David S. Kung ; Siyuan Lu

【Abstract】: Accurately and globally mapping human infrastructure is an important and challenging task with applications in routing, regulation compliance monitoring, and natural disaster response management etc.. In this paper we present progress in developing an algorithmic pipeline and distributed compute system that automates the process of map creation using high resolution aerial images. Unlike previous studies, most of which use datasets that are available only in a few cities across the world, we utilizes publicly available imagery and map data, both of which cover the contiguous United States (CONUS). We approach the technical challenge of inaccurate and incomplete training data adopting state-of-the-art convolutional neural network architectures such as the U-Net and the CycleGAN to incrementally generate maps with increasingly more accurate and more complete labels of man-made infrastructure such as roads and houses. Since scaling the mapping task to CONUS calls for parallelization, we then adopted an asynchronous distributed stochastic parallel gradient descent training scheme to distribute the computational workload onto a cluster of GPUs with nearly linear speed-up.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision problems; Image segmentation; Machine learning; Learning paradigms; Reinforcement learning; Adversarial learning; Learning settings; Semi-supervised learning settings

250. Grale: Designing Networks for Graph Learning.

【Paper Link】【Pages】:2523-2532

【Authors】: Jonathan Halcrow ; Alexandru Mosoi ; Sam Ruth ; Bryan Perozzi

【Abstract】: How can we find the right graph for semi-supervised learning? In real world applications, the choice of which edges to use for computation is the first step in any graph learning process. Interestingly, there are often many types of similarity available to choose as the edges between nodes, and the choice of edges can drastically affect the performance of downstream semi-supervised learning systems. However, despite the importance of graph design, most of the literature assumes that the graph is static.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by classification; Learning settings; Semi-supervised learning settings; Machine learning approaches; Instance-based learning

251. Automatic Validation of Textual Attribute Values in E-commerce Catalog by Learning with Limited Labeled Data.

【Paper Link】【Pages】:2533-2541

【Authors】: Yaqing Wang ; Yifan Ethan Xu ; Xian Li ; Xin Luna Dong ; Jing Gao

【Abstract】: Product catalogs are valuable resources for eCommerce website. In the catalog, a product is associated with multiple attributes whose values are short texts, such as product name, brand, functionality and flavor. Usually individual retailers self-report these key values, and thus the catalog information unavoidably contains noisy facts. It is very important to validate the correctness of these values in order to improve shopper experiences and enable more effective product recommendation. Due to the huge volume of products, an effective automatic validation approach is needed. In this paper, we propose to develop an automatic validation approach that verifies the correctness of textual attribute values for products. This can be formulated as a task as cross-checking a textual attribute value against product profile, which is a short textual description of the product on eCommerce website. Although existing deep neural network models have shown success in conducting cross-checking between two pieces of texts, their success has to be dependent upon a large set of quality labeled data, which are hard to obtain in this validation task: products span a variety of categories. Due to the category difference, annotation has to be done on all the categories, which is impossible to achieve in real practice.

【Keywords】: Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Natural language processing; Machine learning; Machine learning approaches; Neural networks; Information systems; Data management systems; Information integration

252. CLARA: Confidence of Labels and Raters.

【Paper Link】【Pages】:2542-2552

【Authors】: Viet-An Nguyen ; Peibei Shi ; Jagdish Ramakrishnan ; Udi Weinsberg ; Henry C. Lin ; Steve Metz ; Neil Chandra ; Jane Jing ; Dimitris Kalimeris

【Abstract】: Large online services employ thousands of people to label content for applications such as video understanding, natural language processing, and content policy enforcement. While labelers typically reach their decisions by following a well-defined "protocol'', humans may still make mistakes. A common countermeasure is to have multiple people review the same content; however, this process is often time-intensive and requires accurate aggregation of potentially noisy decisions.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Learning in probabilistic graphical models; Latent variable models; Information systems; World Wide Web; Web applications; Crowdsourcing

253. Embedding-based Retrieval in Facebook Search.

【Paper Link】【Pages】:2553-2561

【Authors】: Jui-Ting Huang ; Ashish Sharma ; Shuying Sun ; Li Xia ; David Zhang ; Philip Pronin ; Janani Padmanabhan ; Giuseppe Ottaviano ; Linjun Yang

【Abstract】: Search in social networks such as Facebook poses different challenges than in classical web search: besides the query text, it is important to take into account the searcher's context to provide relevant results. Their social graph is an integral part of this context and is a unique aspect of Facebook search. While embedding-based retrieval (EBR) has been applied in web search engines for years, Facebook search was still mainly based on a Boolean matching model. In this paper, we discuss the techniques for applying EBR to a Facebook Search system. We introduce the unified embedding framework developed to model semantic embeddings for personalized search, and the system to serve embedding-based retrieval in a typical search system based on an inverted index. We discuss various tricks and experiences on end-to-end optimization of the whole system, including ANN parameter tuning and full-stack optimization. Finally, we present our progress on two selected advanced topics about modeling. We evaluated EBR on verticals for Facebook Search with significant metrics gains observed in online A/B experiments. We believe this paper will provide useful insights and experiences to help people on developing embedding-based retrieval systems in search engines.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Learning latent representations; Information systems; Information retrieval; Retrieval models and ranking; Search engine architectures and scalability

254. Lumos: A Library for Diagnosing Metric Regressions in Web-Scale Applications.

【Paper Link】【Pages】:2562-2570

【Authors】: Jamie Pool ; Ebrahim Beyrami ; Vishak Gopal ; Ashkan Aazami ; Jayant Gupchup ; Jeff Rowland ; Binlong Li ; Pritesh Kanani ; Ross Cutler ; Johannes Gehrke

【Abstract】: Web-scale applications can ship code on a daily to weekly cadence. These applications rely on online metrics to monitor the health of new releases. Regressions in metric values need to be detected and diagnosed as early as possible to reduce the disruption to users and product owners. Regressions in metrics can surface due to a variety of reasons: genuine product regressions, changes in user population and bias due to telemetry loss (or processing) are among the common causes. Diagnosing the cause of these metric regressions is costly for engineering teams as they need to invest time in finding the root cause of the issue as soon as possible. We presentLumos, a Python library built using the principles of A/B testing to systematically diagnose metric regressions to automate such analysis.Lumos has been deployed across the component teams in Microsoft's Real-Time Communication (RTC) applications Skype and Microsoft Teams. It has enabled engineering teams to detect 100s of real changes in metrics and reject 1000s of false alarms detected by anomaly detectors. The application ofLumos has resulted in freeing up as much as $95%$ of the time allocated to metric-based investigations. In this work, we open sourceLumos and present our results from applying it to two different components within the RTC group over millions of sessions. This general library can be coupled with any production system to manage the volume of alerting efficiently.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Anomaly detection; General and reference; Cross-computing tools and techniques; Experimentation; Metrics; Mathematics of computing; Probability and statistics; Statistical paradigms; Time series analysis

255. Order Fulfillment Cycle Time Estimation for On-Demand Food Delivery.

【Paper Link】【Pages】:2571-2580

【Authors】: Lin Zhu ; Wei Yu ; Kairong Zhou ; Xing Wang ; Wenxing Feng ; Pengyu Wang ; Ning Chen ; Pei Lee

【Abstract】: By providing customers with conveniences such as easy access to an extensive variety of restaurants, effortless food ordering and fast delivery, on-demand food delivery (OFD) platforms have achieved explosive growth in recent years. A crucial machine learning task performed at OFD platforms is prediction of the Order Fulfillment Cycle Time (OFCT), which refers to the amount of time elapsed between a customer places an order and he/she receives the meal. The accuracy of predicted OFCT is important for customer satisfaction, as it needs to be communicated to a customer before he/she places the order, and is considered as a service promise that should be fulfilled as well as possible. As a result, the estimated OFCT also heavily influences planning decisions such as dispatching and routing.

【Keywords】: Applied computing; Operations research; Forecasting; Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information systems applications; Data mining

256. Calendar Graph Neural Networks for Modeling Time Structures in Spatiotemporal User Behaviors.

【Paper Link】【Pages】:2581-2589

【Authors】: Daheng Wang ; Meng Jiang ; Munira Syed ; Oliver Conway ; Vishal Juneja ; Sriram Subramanian ; Nitesh V. Chawla

【Abstract】: User behavior modeling is important for industrial applications such as demographic attribute prediction, content recommendation, and target advertising. Existing methods represent behavior log as a sequence of adopted items and find sequential patterns; however, concrete location and time information in the behavior log, reflecting dynamic and periodic patterns, joint with the spatial dimension, can be useful for modeling users and predicting their characteristics. In this work, we propose a novel model based on graph neural networks for learning user representations from spatiotemporal behavior data. Our model's architecture incorporates two networked structures. One is a tripartite network of items, sessions, and locations. The other is a hierarchical calendar network of hour, week, and weekday nodes. It first aggregates embeddings of location and items into session embeddings via the tripartite network, and then generates user embeddings from the session embeddings via the calendar structure. The user embeddings preserve spatial patterns and temporal patterns of a variety of periodicity (e.g., hourly, weekly, and weekday patterns). It adopts the attention mechanism to model complex interactions among the multiple patterns in user behaviors. Experiments on real datasets (i.e., clicks on news articles in a mobile app) show our approach outperforms strong baselines for predicting missing demographic attributes.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information systems applications; Data mining

257. Privileged Features Distillation at Taobao Recommendations.

【Paper Link】【Pages】:2590-2598

【Authors】: Chen Xu ; Quan Li ; Junfeng Ge ; Jinyang Gao ; Xiaoyong Yang ; Changhua Pei ; Fei Sun ; Jian Wu ; Hanxiao Sun ; Wenwu Ou

【Abstract】: Features play an important role in the prediction tasks of e-commerce recommendations. To guarantee the consistency of off-line training and on-line serving, we usually utilize the same features that are both available. However, the consistency in turn neglects some discriminative features. For example, when estimating the conversion rate (CVR), i.e., the probability that a user would purchase the item if she clicked it, features like dwell time on the item detailed page are informative. However, CVR prediction should be conducted for on-line ranking before the click happens. Thus we cannot get such post-event features during serving.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by regression; Machine learning approaches; Neural networks

258. Cracking Tabular Presentation Diversity for Automatic Cross-Checking over Numerical Facts.

【Paper Link】【Pages】:2599-2607

【Authors】: Hongwei Li ; Qingping Yang ; Yixuan Cao ; Jiaquan Yao ; Ping Luo

【Abstract】: Tabular forms of numerical facts widely exist in the disclosure documents of vertical domains, especially the financial fields. It is also quite common that the same fact might be mentioned multiple times in different tables with diverse tabular presentation. Firm's disclosure documents are the main source of accounting information for individual investors. Its authenticity is crucial for both firms' development and investors' investment decisions. However, due to large volumes of tables, frequent updates during editing, and limited time for manual cross-checking, these facts might be inconsistent with each other even after official publishing. Such errors may bring about huge reputational risk, and even economic losses even if the mistakes are made unintentionally instead of deliberately. Hence, it creates an opportunity for Automatic Numerical Cross-Checking over Tables. This paper introduces the key module of such a system, which aims to identify whether a pair of table cells are semantically equivalent, namely referring to the same fact. We observed that due to tabular presentation diversity the facts in tabular forms are difficult to be parsed into relational tuples. Thus, we present an end-to-end solution of binary classification over each pair of table cells, which does not involve with explicit semantic parsing over tables. Also, we discuss the design of this neural model to compromise between prediction accuracy and inference time for a large number of table cell pairs, and propose some practical techniques to address the issue of extreme classification imbalance among pairs. Experiments show that our model achieves macro F1 = 0.8297 in linking semantically equivalent table cells from the IPO prospectus. Finally, an auditing tool is built to support guided cross-checking over financial documents, reducing work hours by 52% ~ 68%. This system has received wide recognition in the Chinese financial community. Nine of the top ten Chinese security brokers have adopted this system to support their business of investment banking.

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Business intelligence; Social and professional topics; Professional topics; Computing and business; Automation

259. GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce.

【Paper Link】【Pages】:2608-2616

【Authors】: Sean Bell ; Yiqun Liu ; Sami Alsheikh ; Yina Tang ; Edward Pizzi ; M. Henning ; Karun Singh ; Omkar Parkhi ; Fedor Borisyuk

【Abstract】: In this paper, we present GrokNet, a deployed image recognition system for commerce applications. GrokNet leverages a multi-task learning approach to train a single computer vision trunk. We achieve a 2.1x improvement in exact product match accuracy when compared to the previous state-of-the-art Facebook product recognition system. We achieve this by training on 7 datasets across several commerce verticals, using 80 categorical loss functions and 3 embedding losses. We share our experience of combining diverse sources with wide-ranging label semantics and image statistics, including learning from human annotations, user-generated tags, and noisy search engine interaction data. GrokNet has demonstrated gains in production applications and operates at Facebook scale.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision representations; Image representations; Machine learning; Learning paradigms; Multi-task learning; Information systems; Information retrieval; Retrieval tasks and goals; Clustering and classification; Specialized information retrieval; Multimedia and multimodal retrieval; Image search

260. Learning Instrument Invariant Characteristics for Generating High-resolution Global Coral Reef Maps.

【Paper Link】【Pages】:2617-2624

【Authors】: Ata Akbari Asanjan ; Kamalika Das ; Alan Li ; Ved Chirayath ; Juan Torres-Perez ; Soroosh Sorooshian

【Abstract】: Coral reefs are one of the most biologically complex and diverse ecosystems within the shallow marine environment. Unfortunately, these underwater ecosystems are threatened by a number of anthropogenic challenges, including ocean acidification and warming, overfishing, and the continued increase of marine debris in oceans. This requires a comprehensive assessment of the world's coastal environments, including a quantitative analysis on the health and extent of coral reefs and other associated marine species, as a vital Earth Science measurement. However, limitations in observational and technological capabilities inhibit global sustained imaging of the marine environment. Harmonizing multimodal data sets acquired using different remote sensing instruments presents additional challenges, thereby limiting the availability of good quality labeled data for analysis. In this work, we develop a deep learning model for extracting domain invariant features from multimodal remote sensing imagery and creating high-resolution global maps of coral reefs by combining various sources of imagery and limited hand-labeled data available for certain regions. This framework allows us to generate, for the first time, coral reef segmentation maps at 2-meter resolution, which is a significant improvement over the kilometer-scale state-of-the-art maps. Additionally, this framework doubles accuracy and IoU metrics over baselines that do not account for domain invariance.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision problems; Image segmentation; Machine learning; Machine learning algorithms; Feature selection; Machine learning approaches; Neural networks

261. Causal Meta-Mediation Analysis: Inferring Dose-Response Function From Summary Statistics of Many Randomized Experiments.

【Paper Link】【Pages】:2625-2635

【Authors】: Zenan Wang ; Xuan Yin ; Tianbo Li ; Liangjie Hong

【Abstract】: It is common in the internet industry to use offline-developed algorithms to power online products that contribute to the success of a business. Offline-developed algorithms are guided by offline evaluation metrics, which are often different from online business key performance indicators (KPIs). To maximize business KPIs, it is important to pick a north star among all available offline evaluation metrics. By noting that online products can be measured by online evaluation metrics, the online counterparts of offline evaluation metrics, we decompose the problem into two parts. As the offline A/B test literature works out the first part: counterfactual estimators of offline evaluation metrics that move the same way as their online counterparts, we focus on the second part: causal effects of online evaluation metrics on business KPIs. The north star of offline evaluation metrics should be the one whose online counterpart causes the most significant lift in the business KPI. We model the online evaluation metric as a mediator and formalize its causality with the business KPI as dose-response function (DRF). Our novel approach, causal meta-mediation analysis, leverages summary statistics of many existing randomized experiments to identify, estimate, and test the mediator DRF. It is easy to implement and to scale up, and has many advantages over the literature of mediation analysis and meta-analysis. We demonstrate its effectiveness by simulation and implementation on real data.

【Keywords】: General and reference; Cross-computing tools and techniques; Empirical studies; Evaluation; Measurement; Metrics

262. AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction.

【Paper Link】【Pages】:2636-2645

【Authors】: Bin Liu ; Chenxu Zhu ; Guilin Li ; Weinan Zhang ; Jincai Lai ; Ruiming Tang ; Xiuqiang He ; Zhenguo Li ; Yong Yu

【Abstract】: Learning feature interactions is crucial for click-through rate (CTR) prediction in recommender systems. In most existing deep learning models, feature interactions are either manually designed or simply enumerated. However, enumerating all feature interactions brings large memory and computation cost. Even worse, useless interactions may introduce noise and complicate the training process. In this work, we propose a two-stage algorithm called Automatic Feature Interaction Selection (AutoFIS). AutoFIS can automatically identify important feature interactions for factorization models with computational cost just equivalent to training the target model to convergence. In the search stage, instead of searching over a discrete set of candidate feature interactions, we relax the choices to be continuous by introducing the architecture parameters. By implementing a regularized optimizer over the architecture parameters, the model can automatically identify and remove the redundant feature interactions during the training process of the model. In the re-train stage, we keep the architecture parameters serving as an attention unit to further boost the performance. Offline experiments on three large-scale datasets (two public benchmarks, one private) demonstrate that AutoFIS can significantly improve various FM based models. AutoFIS has been deployed onto the training platform of Huawei App Store recommendation service, where a 10-day online A/B test demonstrated that AutoFIS improved the DeepFM model by 20.3% and 20.1% in terms of CTR and CVR respectively.

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

263. City Metro Network Expansion with Reinforcement Learning.

【Paper Link】【Pages】:2646-2656

【Authors】: Yu Wei ; Minjia Mao ; Xi Zhao ; Jianhua Zou ; Ping An

【Abstract】: City metro network expansion, included in the transportation network design, aims to design new lines based on the existing metro network. Existing methods in the field of transportation network design either (i) can hardly formulate this problem efficiently, (ii) depend on expert guidance to produce solutions, or (iii) appeal to problem-specific heuristics which are difficult to design. To address these limitations, we propose a reinforcement learning based method for the city metro network expansion problem. In this method, we formulate the metro line expansion as a Markov decision process (MDP), which characterizes the problem as a process of sequential station selection. Then, we train an actor-critic model to design the next metro line on the basis of the existing metro network. The actor is an encoder-decoder network with an attention mechanism to generate the parameterized policy which is used to select the stations. The critic estimates the expected cumulative reward to assist the training of the actor by reducing training variance. The proposed method does not require expert guidance during design, since the learning procedure only relies on the reward calculation to tune the policy for better station selection. Also, it avoids the difficulty of heuristics designing by the policy formalizing the station selection. Considering origin-destination (OD) trips and social equity, we expand the current metro network in Xi'an, China, based on the real mobility information of 24,770,715 mobile phone users in the whole city. The results demonstrate the advantages of our method compared with existing approaches.

【Keywords】: Applied computing; Operations research; Transportation; Computing methodologies; Artificial intelligence; Planning and scheduling; Machine learning; Learning paradigms; Reinforcement learning

264. Game Action Modeling for Fine Grained Analyses of Player Behavior in Multi-player Card Games (Rummy as Case Study).

【Paper Link】【Pages】:2657-2665

【Authors】: Sharanya Eswaran ; Mridul Sachdeva ; Vikram Vimal ; Deepanshi Seth ; Suhaas Kalpam ; Sanjay Agarwal ; Tridib Mukherjee ; Samrat Dattagupta

【Abstract】: We present a deep learning framework for game action modeling, which enables fine-grained analyses of player behavior. We develop CNN-based supervised models that effectively learn the critical game play decisions from skilled players, and use these models to assess player characteristics in the system, such as their retention, engagement, deposit buckets, etc. We show that with a carefully constructed input format, that efficiently represents the game state and history as a multi-dimensional image, along with a custom architecture the model learns the strategies of the game accurately. It is further enhanced with look-ahead achieved by self-play simulation to better estimate the game state, and this information is used in a new loss function. Next, we show that analyzing the players with these models as reference has immense benefit in understanding player potential in terms of engagement and revenue. We also use the model to understand the various contexts under which players tend to make mistakes, and use these insights to up-skill players.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by classification; Machine learning approaches; Neural networks; Partially-observable Markov decision processes; Stochastic games; Human-centered computing; Human computer interaction (HCI); HCI design and evaluation methods; User models

265. Cascade-LSTM: A Tree-Structured Neural Classifier for Detecting Misinformation Cascades.

【Paper Link】【Pages】:2666-2676

【Authors】: Francesco Ducci ; Mathias Kraus ; Stefan Feuerriegel

【Abstract】: Misinformation in social media - such as fake news, rumors, or other forms of deceptive content - poses a significant threat to society and, hence, scalable strategies for an early detection of online cascades with misinformation are in dire need. The prominent approach in detecting online cascades with misinformation builds upon neural networks based on sequences of simple structural features of the propagation dynamics (e.g., cascade size, average retweeting time). However, these structural features neglect large parts of the information in the cascade. As a remedy, we propose a novel tree-structured neural network named Cascade-LSTM.

【Keywords】: Computer systems organization; Architectures; Other architectures; Neural networks; Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by classification; Information systems; World Wide Web; Web applications; Social networks

266. Personalized Prefix Embedding for POI Auto-Completion in the Search Engine of Baidu Maps.

【Paper Link】【Pages】:2677-2685

【Authors】: Jizhou Huang ; Haifeng Wang ; Miao Fan ; An Zhuo ; Ying Li

【Abstract】: Point of interest auto-completion (POI-AC) is a featured function in the search engine of many Web mapping services. This function keeps suggesting a dynamic list of POIs as a user types each character, and it can dramatically save the effort of typing, which is quite useful on mobile devices. Existing approaches on POI-AC for industrial use mainly adopt various learning to rank (LTR) models with handcrafted features and even historically clicked POIs are taken into account for personalization. However, these prior arts tend to reach performance bottlenecks as both heuristic features and search history of users cannot directly model personal input habits. In this paper, we present an end-to-end neural-based framework for POI-AC, which has been recently deployed in the search engine of Baidu Maps, one of the largest Web mapping applications with hundreds of millions monthly active users worldwide. In order to establish connections among users, their personal input habits, and correspondingly interested POIs, the proposed framework (abbr. P3AC) is composed of three components, i.e., a multi-layer Bi-LSTM network to adapt to personalized prefixes, a CNN-based network to model multi-sourced information on POIs, and a triplet ranking loss function to optimize both personalized prefix embeddings and distributed representations of POIs. We first use large-scale real-world search logs of Baidu Maps to assess the performance of P3AC offline measured by multiple metrics, including Mean Reciprocal Rank (MRR), Success Rate (SR), and normalized Discounted Cumulative Gain (nDCG). Extensive experimental results demonstrate that it can achieve substantial improvements. Then we decide to launch it online and observe that some other critical indicators on user satisfaction, such as the average number of keystrokes and the average typing speed at keystrokes in a POI-AC session, which significantly decrease as well. In addition, we have released both the source codes of P3AC and the experimental data to the public for reproducibility tests.

【Keywords】: Information systems; Information retrieval; Information retrieval query processing; Information systems applications; Mobile information processing systems

267. Category-Specific CNN for Visual-aware CTR Prediction at JD.com.

【Paper Link】【Pages】:2686-2696

【Authors】: Hu Liu ; Jing Lu ; Hao Yang ; Xiwei Zhao ; Sulong Xu ; Hao Peng ; Zehua Zhang ; Wenjie Niu ; Xiaokun Zhu ; Yongjun Bao ; Weipeng Yan

【Abstract】: As one of the largest B2C e-commerce platforms in China, JD.com also powers a leading advertising system, serving millions of advertisers with fingertip connection to hundreds of millions of customers. In our system, as well as most e-commerce scenarios, ads are displayed with images. This makes visual-aware Click Through Rate (CTR) prediction of crucial importance to both business effectiveness and user experience. Existing algorithms usually extract visual features using off-the-shelf Convolutional Neural Networks (CNNs) and late fuse the visual and non-visual features for the finally predicted CTR. Despite being extensively studied, this field still face two key challenges. First, although encouraging progress has been made in offline studies, applying CNNs in real systems remains non-trivial, due to the strict requirements for efficient end-to-end training and low-latency online serving. Second, the off-the-shelf CNNs and late fusion architectures are suboptimal. Specifically, off-the-shelf CNNs were designed for classification thus never take categories as input features. While in e-commerce, categories are precisely labeled and contain abundant visual priors that will help the visual modeling. Unaware of the ad category, these CNNs may extract some unnecessary category-unrelated features, wasting CNN's limited expression ability. To overcome the two challenges, we propose Category-specific CNN (CSCNN) specially for CTR prediction. CSCNN early incorporates the category knowledge with a light-weighted attention-module on each convolutional layer. This enables CSCNN to extract expressive category-specific visual patterns that benefit the CTR prediction. Offline experiments on benchmark and a 10 billion scale real production dataset from JD, together with an Online A/B test show that CSCNN outperforms all compared state-of-the-art algorithms. We also build a highly efficient infrastructure to accomplish end-to-end training with CNN on the 10 billion scale real production dataset within 24 hours, and meet the low latency requirements of online system (20ms on CPU). CSCNN is now deployed in the search advertising system of JD, serving the main traffic of hundreds of millions of active users.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision representations; Image representations

268. ConSTGAT: Contextual Spatial-Temporal Graph Attention Network for Travel Time Estimation at Baidu Maps.

【Paper Link】【Pages】:2697-2705

【Authors】: Xiaomin Fang ; Jizhou Huang ; Fan Wang ; Lingke Zeng ; Haijin Liang ; Haifeng Wang

【Abstract】: The task of travel time estimation (TTE), which estimates the travel time for a given route and departure time, plays an important role in intelligent transportation systems such as navigation, route planning, and ride-hailing services. This task is challenging because of many essential aspects, such as traffic prediction and contextual information. First, the accuracy of traffic prediction is strongly correlated with the traffic speed of the road segments in a route. Existing work mainly adopts spatial-temporal graph neural networks to improve the accuracy of traffic prediction, where spatial and temporal information is used separately. However, one drawback is that the spatial and temporal correlations are not fully exploited to obtain better accuracy. Second, contextual information of a route, i.e., the connections of adjacent road segments in the route, is an essential factor that impacts the driving speed. Previous work mainly uses sequential encoding models to address this issue. However, it is difficult to scale up sequential models to large-scale real-world services. In this paper, we propose an end-to-end neural framework named ConSTGAT, which integrates traffic prediction and contextual information to address these two problems. Specifically, we first propose a spatial-temporal graph neural network that adopts a novel graph attention mechanism, which is designed to fully exploit the joint relations of spatial and temporal information. Then, in order to efficiently take advantage of the contextual information, we design a computationally efficient model that applies convolutions over local windows to capture a route's contextual information and further employs multi-task learning to improve the performance. In this way, the travel time of each road segment can be computed in parallel and in advance. Extensive experiments conducted on large-scale real-world datasets demonstrate the superiority of ConSTGAT. In addition, ConSTGAT has already been deployed in production at Baidu Maps, and it successfully keeps serving tens of billions of requests every day. This confirms that ConSTGAT is a practical and robust solution for large-scale real-world TTE services.

【Keywords】: Applied computing; Operations research; Transportation; Information systems; Information systems applications; Data mining; Spatial-temporal systems

269. Faster Secure Data Mining via Distributed Homomorphic Encryption.

【Paper Link】【Pages】:2706-2714

【Authors】: Junyi Li ; Heng Huang

【Abstract】: Due to the rising privacy demand in data mining, Homomorphic Encryption (HE) is receiving more and more attention recently for its capability to do computations over the encrypted field. By using the HE technique, it is possible to securely outsource model learning to the not fully trustful but powerful public cloud computing environments. However, HE-based training scales badly because of the high computation complexity. It is still an open problem whether it is possible to apply HE to large-scale problems. In this paper, we propose a novel general distributed HE-based data mining framework towards one step of solving the scaling problem. The main idea of our approach is to use the slightly more communication overhead in exchange of shallower computational circuit in HE, so as to reduce the overall complexity. We verify the efficiency and effectiveness of our new framework by testing over various data mining algorithms and benchmark data-sets. For example, we successfully train a logistic regression model to recognize the digit 3 and 8 within around 5 minutes, while a centralized counterpart needs almost 2 hours.

【Keywords】: Security and privacy; Cryptography; Software and application security; Domain-specific security and privacy architectures

270. Contagious Chain Risk Rating for Networked-guarantee Loans.

【Paper Link】【Pages】:2715-2723

【Authors】: Dawei Cheng ; Zhibin Niu ; Yiyi Zhang

【Abstract】: The small and medium-sized enterprises (SMEs) are allowed to guarantee each other and form complex loan networks to receive loans from banks during the economic expansion stage. However, external shocks may weaken the robustness, and an accidental default may spread across the network and lead to large-scale defaults, even systemic crisis. Thus, predicting and rating the default contagion chains in the guarantee network in order to reduce or prevent potential systemic financial risk, attracts a grave concern from the Regulatory Authority and the banks. Existing credit risk models in the banking industry utilize machine learning methods to generate a credit score for each customer. Such approaches dismiss the contagion risk from guarantee chains and need extensive feature engineering with deep domain expertise. To this end, we propose a novel approach to rate the risk of contagion chains in the bank industry with the deep neural network. We employed the temporal inter-chain attention network on graph-structured loan behavior data to compute risk scores for the contagion chains. We show that our approach is significantly better than the state-of-the-art baselines on the dataset from a major financial institution in Asia. Besides, we conducted empirical studies on the real-world loan dataset for risk assessment. The proposed approach enabled loan managers to monitor risks in a boarder view and avoid significant financial losses for the financial institution.

【Keywords】: Applied computing; Law, social and behavioral sciences; Economics; Information systems; Information systems applications; Data mining

271. AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types.

【Paper Link】【Pages】:2724-2734

【Authors】: Xin Luna Dong ; Xiang He ; Andrey Kan ; Xian Li ; Yan Liang ; Jun Ma ; Yifan Ethan Xu ; Chenwei Zhang ; Tong Zhao ; Gabriel Blanco Saldana ; Saurabh Deshpande ; Alexandre Michetti Manduca ; Jay Ren ; Surender Pal Singh ; Fan Xiao ; Haw-Shiuan Chang ; Giannis Karamanolakis ; Yuning Mao ; Yaqing Wang ; Christos Faloutsos ; Andrew McCallum ; Jiawei Han

【Abstract】: Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs have firmly established themselves as valuable sources of information for search and question answering, and it is natural to wonder if a KG can contain information about products offered at online retail sites. There have been several successful examples of generic KGs, but organizing information about products poses many additional challenges, including sparsity and noise of structured data for products, complexity of the domain with millions of product types and thousands of attributes, heterogeneity across large number of categories, as well as large and constantly growing number of products.

【Keywords】: Computer systems organization; Architectures; Other architectures; Neural networks; Computing methodologies; Machine learning; Learning paradigms; Multi-task learning; Supervised learning; Unsupervised learning; Anomaly detection; Information systems; Data management systems; Database design and models; Graph-based database models; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Semi-supervised learning

272. Personalized Image Retrieval with Sparse Graph Representation Learning.

【Paper Link】【Pages】:2735-2743

【Authors】: Xiaowei Jia ; Handong Zhao ; Zhe Lin ; Ajinkya Kale ; Vipin Kumar

【Abstract】: Personalization is essential for enhancing the customer experience in retrieval tasks. In this paper, we develop a novel method CA-GCN for personalized image retrieval in the Adobe Stock image system. The proposed method CA-GCN leverages user behavior data in a Graph Convolutional Neural Network (GCN) model to learn user and image embeddings simultaneously. Standard GCN performs poorly on sparse user-image interaction graphs due to the limited knowledge gain from less representative neighbors. To address this challenge, we propose to augment the sparse user-image interaction data by considering the similarities among images. Specifically, we detect clusters of similar images and introduce a set of hidden super-nodes in the graph to represent clusters. We show that such an augmented graph structure can significantly improve the retrieval performance on real-world data collected from Adobe Stock service. In particular, when testing the proposed method on real users' stock image retrieval sessions, we get promoted average click position from 70 to 51.

【Keywords】: Information systems; Information retrieval; Users and interactive retrieval; Personalization

273. Comprehensive Information Integration Modeling Framework for Video Titling.

【Paper Link】【Pages】:2744-2754

【Authors】: Shengyu Zhang ; Ziqi Tan ; Zhou Zhao ; Jin Yu ; Kun Kuang ; Tan Jiang ; Jingren Zhou ; Hongxia Yang ; Fei Wu

【Abstract】: In e-commerce, consumer-generated videos, which in general deliver consumers' individual preferences for the different aspects of certain products, are massive in volume. To recommend these videos to potential consumers more effectively, diverse and catchy video titles are critical. However, consumer-generated videos seldom accompany appropriate titles. To bridge this gap, we integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework. Although automatic video titling is very useful and demanding, it is much less addressed than video captioning. The latter focuses on generating sentences that describe videos as a whole while our task requires the product-aware multi-grained video analysis. To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization. Specifically, the granular-level interaction modeling first utilizes temporal-spatial landmark cues, descriptive words, and abstractive attributes to builds three individual graphs and recognizes the intra-actions in each graph through Graph Neural Networks (GNN). Then the global-local aggregation module is proposed to model inter-actions across graphs and aggregate heterogeneous graphs into a holistic graph representation. The abstraction-level story-line summarization further considers both frame-level video features and the holistic graph to utilize the interactions between products and backgrounds, and generate the story-line topic of the video. We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform, and will make the desensitized version publicly available to nourish further development of the research community. Relatively extensive experiments on various datasets demonstrate the efficacy of the proposed method.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision representations; Hierarchical representations; Natural language processing; Natural language generation

274. Acoustic Measures for Real-Time Voice Coaching.

【Paper Link】【Pages】:2755-2763

【Authors】: Ying Li ; Abraham Miller ; Arthur Liu ; Kyle Coburn ; Luis J. Salazar

【Abstract】: Our voices can convey many different types of thoughts and intent; how our voices carry them is often not consciously controlled and as a consequence, unintended effects may arise that negatively impact our relationships. How we say things is as important as what we say. This paper presents methodologies for computing a set of physical properties from sound waves of a speaker's voice directly, referred to as acoustic measures. Experiments are designed and conducted to establish the correlations between physical properties and auditory measures for human perception of sound waves. Based on these correlations, a voice coaching app can guide users, in real-time or deferred retrospective, to modify their speech's auditory measures, such as rate of speech, energy level, and intonation, to achieve their intended communication goals.

【Keywords】: Applied computing; Arts and humanities; Sound and music computing; Hardware; Communication hardware, interfaces and storage; Sound-based input / output; Human-centered computing; Ubiquitous and mobile computing; Ubiquitous and mobile computing theory, concepts and paradigms; Mobile computing; Information systems; Information systems applications; Data mining

275. Geodemographic Influence Maximization.

【Paper Link】【Pages】:2764-2774

【Authors】: Kaichen Zhang ; Jingbo Zhou ; Donglai Tao ; Panagiotis Karras ; Qing Li ; Hui Xiong

【Abstract】: Given a set of locations in a city, on which ones should we place ads on so as to reach as many people as possible within a limited budget? Past research has addressed this question under the assumption that dense trajectory data are available to determine the reach of each ad. However, the data that are available in most industrial settings do not consist of dense, long-range trajectories; instead, they consist of statistics on people's short-range point-to-point movements. In this paper, we address the natural problem that arises such data: given a distribution of population and point-to-point movement statistics over a network, find a set of locations within a budget that achieves maximum expected reach. We call this problem geodemographic influence maximization (GIM). We show that the problem is NP-hard, but its objective function is monotone and submodular, thus admits a greedy algorithm with a 1 over 2 (1-1 over e) approximation ratio. Still, this algorithm is inapplicable on large-scale data for high-frequency digital signage ads. We develop an efficient deterministic algorithm, Lazy-Sower, exploiting a novel, tight double-bounding scheme of marginal influence gain as well as the locality proprieties of the problem; a learning-based variant, NN-Sower, utilizes randomization and deep learning to further improve efficiency, with a slight loss of quality. Our exhaustive experimental study on two real-world urban datasets demonstrates the efficacy and efficiency of our solutions compared to baselines.

【Keywords】: Computing methodologies; Modeling and simulation; Simulation theory; Network science; Information systems; Information systems applications; Data mining; Spatial-temporal systems; Geographic information systems

276. A Self-Evolving Mutually-Operative Recurrent Network-based Model for Online Tool Condition Monitoring in Delay Scenario.

【Paper Link】【Pages】:2775-2783

【Authors】: Monidipa Das ; Mahardhika Pratama ; Tegoeh Tjahjowidodo

【Abstract】: With the increasing demand of product supply, manufacturers are in urgent need of online tool condition monitoring (TCM) without compromising with the maintenance cost in terms of time as well as man-power requirement. However, the existing machine learning models for TCM are mostly offline and not suitable for the non-stationary environment of the machining settings. Moreover, the access of the ground truth always imposes a shutdown of the machining process and the existing models are severely affected by such delay in receiving labelled samples. In order to tackle these issues, we propose SERMON as a novel learning model based on a pair of self-evolving mutually-operative recurrent neural networks. The proposed SERMON is well-equipped with features for automated and real-time monitoring of machine fault status even in the finite/infinite label delay scenario. The experimental evaluation of SERMON using real-world dataset on 3D-printing process demonstrates its effectiveness in online fault detection under non-stationary as well as delayed label context of the machining process. Additional comparative study on large-scale benchmark streaming datasets further exhibits the scalability power of SERMON.

【Keywords】: Applied computing; Computing methodologies; Machine learning; Machine learning algorithms; Ensemble methods; Regularization

277. Maximizing Cumulative User Engagement in Sequential Recommendation: An Online Optimization Perspective.

【Paper Link】【Pages】:2784-2792

【Authors】: Yifei Zhao ; Yu-Hang Zhou ; Mingdong Ou ; Huan Xu ; Nan Li

【Abstract】: To maximize cumulative user engagement (e.g. cumulative clicks) in sequential recommendation, it is often needed to tradeoff two potentially conflicting objectives, that is, pursuing higher immediate user engagement (e.g., click-through rate) and encouraging user browsing (i.e., more items exposured). Existing works often study these two tasks separately, thus tend to result in sub-optimal results. In this paper, we study this problem from an online optimization perspective, and propose a flexible and practical framework to explicitly tradeoff longer user browsing length and high immediate user engagement. Specifically, by considering items as actions, user's requests as states and user leaving as an absorbing state, we formulate each user's behavior as a personalized Markov decision process (MDP), and the problem of maximizing cumulative user engagement is reduced to a stochastic shortest path (SSP) problem. Meanwhile, with immediate user engagement and quit probability estimation, it is shown that the SSP problem can be efficiently solved via dynamic programming. Experiments on real-world datasets demonstrate the effectiveness of the proposed approach. Moreover, this approach is deployed at a large E-commerce platform, achieved over 7% improvement of cumulative clicks.

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems; World Wide Web; Web searching and information discovery; Personalization; Social recommendation

【Paper Link】【Pages】:2793-2801

【Authors】: Ying Li ; Vitalii Zakhozhyi ; Daniel Zhu ; Luis J. Salazar

【Abstract】: Web and mobile technologies enable ubiquitous access to information. Yet, it is getting harder, even for subject matter experts, to quickly identify quality, trustworthy, and reliable content available online through search engines powered by advanced knowledge graphs. This paper explores the practical applications of Domain Specific Knowledge Graphs that allow for the extraction of information from trusted published and unpublished sources, to map the extracted information to an ontology defined in collaboration with sector experts, and to enable the public to go from single queries into ongoing conversations meeting their knowledge needs reliably. We focused on Social-Impact Funding, an area of need for over one million nonprofit organizations, foundations, government entities, social entrepreneurs, impact investors, and academic institutions in the US.

【Keywords】: Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Ontology engineering; Natural language processing; Information extraction; Information systems; Data management systems; Database design and models; Graph-based database models; Information systems applications; Data mining

279. LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition.

【Paper Link】【Pages】:2802-2812

【Authors】: Jin Xu ; Xu Tan ; Yi Ren ; Tao Qin ; Jian Li ; Sheng Zhao ; Tie-Yan Liu

【Abstract】: Speech synthesis (text to speech, TTS) and recognition (automatic speech recognition, ASR) are important speech tasks, and require a large amount of text and speech pairs for model training. However, there are more than 6,000 languages in the world and most languages are lack of speech training data, which poses significant challenges when building TTS and ASR systems for extremely low-resource languages. In this paper, we develop LRSpeech, a TTS and ASR system under the extremely low-resource setting, which can support rare languages with low data cost. LRSpeech consists of three key techniques: 1) pre-training on rich-resource languages and fine-tuning on low-resource languages; 2) dual transformation between TTS and ASR to iteratively boost the accuracy of each other; 3) knowledge distillation to customize the TTS model on a high-quality target-speaker voice and improve the ASR model on multiple voices. We conduct experiments on an experimental language (English) and a truly low-resource language (Lithuanian) to verify the effectiveness of LRSpeech. Experimental results show that LRSpeech 1) achieves high quality for TTS in terms of both intelligibility (more than $98%$ intelligibility rate) and naturalness (above 3.5 mean opinion score (MOS)) of the synthesized speech, which satisfy the requirements for industrial deployment, 2) achieves promising recognition accuracy for ASR, and 3) last but not least, uses extremely low-resource training data. We also conduct comprehensive analyses on LRSpeech with different amounts of data resources, and provide valuable insights and guidances for industrial deployment. We are currently deploying LRSpeech into a commercialized cloud speech service to support TTS on more rare languages.

【Keywords】: Applied computing; Arts and humanities; Sound and music computing; Computing methodologies; Artificial intelligence; Natural language processing; Speech recognition; Machine learning; Learning settings; Semi-supervised learning settings; Machine learning approaches; Neural networks

280. Doing in One Go: Delivery Time Inference Based on Couriers' Trajectories.

【Paper Link】【Pages】:2813-2821

【Authors】: Sijie Ruan ; Zi Xiong ; Cheng Long ; Yiheng Chen ; Jie Bao ; Tianfu He ; Ruiyuan Li ; Shengnan Wu ; Zhongyuan Jiang ; Yu Zheng

【Abstract】: The rapid development of e-commerce requires efficient and reliable logistics services. Nowadays, couriers are still the main solution to address the "last mile" problem in logistics. They are usually required to record the accurate delivery time of each parcel manually, which provides vital information for applications like delivery insurances, delivery performance evaluations, and customer available time discovery. Couriers' trajectories generated by their PDAs provide a chance to infer the delivery time automatically to ease the burdens on the couriers. However, directly using the nearest stay point to infer the delivery time is under satisfactory due to two challenges: 1) inaccurate delivery locations, and 2) various stay scenarios. To this end, we propose Delivery Time Inference (DTInf), to automatically infer the delivery time of waybills based on couriers' trajectories. Our solution is composed of three steps: 1) Data Pre-processing, which detects stay points from trajectories, and separates stay points and waybills by delivery trips, 2) Delivery Location Correction, which infers true delivery locations of waybills by mining historical deliveries, and 3) Delivery Event-based Matching, which selects the best-matched stay point for waybills in the same delivery location to infer the delivery time. Extensive experiments and case studies based on large scale real-world waybill and trajectory data from JD Logistics confirm the effectiveness of our approach. Finally, we introduce a system based on DTInf, which is deployed and used internally in JD Logistics.

【Keywords】: Information systems; Information systems applications; Spatial-temporal systems

281. Improving Deep Learning for Airbnb Search.

【Paper Link】【Pages】:2822-2830

【Authors】: Malay Haldar ; Prashant Ramanathan ; Tyler Sax ; Mustafa Abdool ; Lanbo Zhang ; Aamir Mansawala ; Shulin Yang ; Bradley C. Turnbull ; Junshuo Liao

【Abstract】: The application of deep learning to search ranking was one of the most impactful product improvements at Airbnb. But what comes next after you launch a deep learning model? In this paper we describe the journey beyond, discussing what we refer to as the ABCs of improving search: A for architecture, ℬ for bias and ℂ for cold start. For architecture, we describe a new ranking neural network, focusing on the process that evolved our existing DNN beyond a fully connected two layer network. On handling positional bias in ranking, we describe a novel approach that led to one of the most significant improvements in tackling inventory that the DNN historically found challenging. To solve cold start, we describe our perspective on the problem and changes we made to improve the treatment of new listings on the platform. We hope ranking teams transitioning to deep learning will find this a practical case study of how to iterate on DNNs.

【Keywords】: Applied computing; Electronic commerce; E-commerce infrastructure; Computer systems organization; Architectures; Other architectures; Neural networks; Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Learning to rank; Information systems; Information retrieval; Retrieval models and ranking; Learning to rank

282. General-Purpose User Embeddings based on Mobile App Usage.

【Paper Link】【Pages】:2831-2840

【Authors】: Junqi Zhang ; Bing Bai ; Ye Lin ; Jian Liang ; Kun Bai ; Fei Wang

【Abstract】: In this paper, we report our recent practice at Tencent for user modeling based on mobile app usage. User behaviors on mobile app usage, including retention, installation, and uninstallation, can be a good indicator for both long-term and short-term interests of users. For example, if a user installs Snapseed recently, she might have a growing interest in photographing. Such information is valuable for numerous downstream applications, including advertising, recommendations, etc. Traditionally, user modeling from mobile app usage heavily relies on handcrafted feature engineering, which requires onerous human work for different downstream applications, and could be sub-optimal without domain experts. However, automatic user modeling based on mobile app usage faces unique challenges, including (1) retention, installation, and uninstallation are heterogeneous but need to be modeled collectively, (2) user behaviors are distributed unevenly over time, and (3) many long-tailed apps suffer from serious sparsity. In this paper, we present a tailored Auto Encoder-coupled Transformer Network (AETN), by which we overcome these challenges and achieve the goals of reducing manual efforts and boosting performance. We have deployed the model at Tencent, and both online/offline experiments from multiple domains of downstream applications have demonstrated the effectiveness of the output user embeddings.

【Keywords】: Information systems; Information systems applications; Data mining

283. Unsupervised Translation via Hierarchical Anchoring: Functional Mapping of Places across Cities.

【Paper Link】【Pages】:2841-2851

【Authors】: Takahiro Yabe ; Kota Tsubouchi ; Toru Shimizu ; Yoshihide Sekimoto ; Satish V. Ukkusuri

【Abstract】: Unsupervised translation has become a popular task in natural language processing (NLP) due to difficulties in collecting large scale parallel datasets. In the urban computing field, place embeddings generated using human mobility patterns via recurrent neural networks are used to understand the functionality of urban areas. Translating place embeddings across cities allow us to transfer knowledge across cities, which may be used for various downstream tasks such as planning new store locations. Despite such advances, current methods fail to translate place embeddings across domains with different scales (e.g. Tokyo to Niigata), due to the straightforward adoption of neural machine translation (NMT) methods from NLP, where vocabulary sizes are similar across languages. We refer to this issue as the domain imbalance problem in unsupervised translation tasks. We address this problem by proposing an unsupervised translation method that translates embeddings by exploiting common hierarchical structures that exist across imbalanced domains. The effectiveness of our method is tested using place embeddings generated from mobile phone data in 6 Japanese cities of heterogeneous sizes. Validation using landuse data clarify that using hierarchical anchors improves the translation accuracy across imbalanced domains. Our method is agnostic to input data type, thus could be applied to unsupervised translation tasks in various fields in addition to linguistics and urban computing.

【Keywords】: Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Spatial and physical reasoning; Natural language processing; Machine translation

284. Debiasing Grid-based Product Search in E-commerce.

【Paper Link】【Pages】:2852-2860

【Authors】: Ruocheng Guo ; Xiaoting Zhao ; Adam Henderson ; Liangjie Hong ; Huan Liu

【Abstract】: The widespread usage of e-commerce websites in daily life and the resulting wealth of implicit feedback data form the foundation for systems that train and test e-commerce search ranking algorithms. While convenient to collect, implicit feedback data inherently suffers from various types of bias since user feedback is limited to products they are exposed to by existing search ranking algorithms and impacted by how the products are displayed. In the literature, a vast majority of existing methods have been proposed towards unbiased learning to rank for list-based web search scenarios. However, such methods cannot be directly adopted by e-commerce websites mainly for two reasons. First, in e-commerce websites, search engine results pages (SERPs) are displayed in 2-dimensional grids. The existing methods have not considered the difference in user behavior between list-based web search and grid-based product search. Second, there can be multiple types of implicit feedback (e.g., clicks and purchases) on e-commerce websites. We aim to utilize all types of implicit feedback as the supervision signals. In this work, we extend unbiased learning to rank to the world of e-commerce search via considering a grid-based product search scenario. We propose a novel framework which (1) forms the theoretical foundations to allow multiple types of implicit feedback in unbiased learning to rank and (2) incorporates the row skipping and slower decay click models to capture unique user behavior patterns in grid-based product search for inverse propensity scoring. Through extensive experiments on real-world e-commerce search log datasets across browsing devices and product taxonomies, we show that the proposed framework outperforms the state of the art unbiased learning to rank algorithms. These results also reveal important insights on how user behavior patterns vary in e-commerce SERPs across browsing devices and product taxonomies.

【Keywords】: Information systems; Information retrieval; Retrieval models and ranking; Learning to rank

285. Forecasting the Evolution of Hydropower Generation.

【Paper Link】【Pages】:2861-2870

【Authors】: Fan Zhou ; Liang Li ; Kunpeng Zhang ; Goce Trajcevski ; Fuming Yao ; Ying Huang ; Ting Zhong ; Jiahao Wang ; Qiao Liu

【Abstract】: Hydropower is the largest renewable energy source for electricity generation in the world, with numerous benefits in terms of: environment protection (near-zero air pollution and climate impact), cost-effectiveness (long-term use, without significant impacts of market fluctuation), and reliability (quickly respond to surge in demand). However, the effectiveness of hydropower plants is affected by multiple factors such as reservoir capacity, rainfall, temperature and fluctuating electricity demand, and particularly their complicated relationships, which make the prediction/recommendation of station operational output a difficult challenge. In this paper, we present DeepHydro, a novel stochastic method for modeling multivariate time series (e.g., water inflow/outflow and temperature) and forecasting power generation of hydropower stations. DeepHydro captures temporal dependencies in co-evolving time series with a new conditioned latent recurrent neural networks, which not only considers the hidden states of observations but also preserves the uncertainty of latent variables. We introduce a generative network parameterized on a continuous normalizing flow to approximate the complex posterior distribution of multivariate time series data, and further use neural ordinary differential equations to estimate the continuous-time dynamics of the latent variables constituting the observable data. This allows our model to deal with the discrete observations in the context of continuous dynamic systems, while being robust to the noise. We conduct extensive experiments on real-world datasets from a large power generation company consisting of cascade hydropower stations. The experimental results demonstrate that the proposed method can effectively predict the power production and significantly outperform the possible candidate baseline approaches.

【Keywords】: Applied computing; Enterprise computing; Enterprise information systems; Operations research; Decision analysis; Forecasting; Industry and manufacturing; Computing methodologies; Artificial intelligence

286. Salience and Market-aware Skill Extraction for Job Targeting.

【Paper Link】【Pages】:2871-2879

【Authors】: Baoxu Shi ; Jaewon Yang ; Feng Guo ; Qi He

【Abstract】: At LinkedIn, we want to create economic opportunity for everyone in the global workforce. To make this happen, LinkedIn offers a reactive Job Search system, and a proactive Jobs You May Be Interested In (JYMBII) system to match the best candidates with their dream jobs. One of the most challenging tasks for developing these systems is to properly extract important skill entities from job postings and then target members with matched attributes. In this work, we show that the commonly used text-based salience and market-agnostic skill extraction approach is sub-optimal because it only considers skill mention and ignores the salient level of a skill and its market dynamics, i.e., the market supply and demand influence on the importance of skills. To address the above drawbacks, we present Job2Skills, our deployed salience and market-aware skill extraction system. The proposed Job2Skills shows promising results in improving the online performance of job recommendation (JYMBII) (+1.92% job apply) and skill suggestions for job posters (-37% suggestion rejection rate). Lastly, we present case studies to show interesting insights that contrast traditional skill recognition method and the proposed Job2Skills from occupation, industry, country, and individual skill levels. Based on the above promising results, we deployed the Job2Skills online to extract job targeting skills for all 20M job postings served at LinkedIn.

【Keywords】: Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Natural language processing; Information extraction; Information systems; Information retrieval; Retrieval tasks and goals; Information extraction; Recommender systems

287. DATE: Dual Attentive Tree-aware Embedding for Customs Fraud Detection.

【Paper Link】【Pages】:2880-2890

【Authors】: Sundong Kim ; Yu-Che Tsai ; Karandeep Singh ; Yeonsoo Choi ; Etim Ibok ; Cheng-Te Li ; Meeyoung Cha

【Abstract】: Intentional manipulation of invoices that lead to undervaluation of trade goods is the most common type of customs fraud to avoid ad valorem duties and taxes. To secure government revenue without interrupting legitimate trade flows, customs administrations around the world strive to develop ways to detect illicit trades. This paper proposes DATE, a model of Dual-task Attentive Tree-aware Embedding, to classify and rank illegal trade flows that contribute the most to the overall customs revenue when caught. The strength of DATE comes from combining a tree-based model for interpretability and transaction-level embeddings with dual attention mechanisms. To accurately identify illicit transactions and predict tax revenue, DATE learns simultaneously from illicitness and surtax of each transaction. With a five-year amount of customs import data with a test illicit ratio of 2.24%, DATE shows a remarkable precision of 92.7% on illegal cases and a recall of 49.3% on revenue after inspecting only 1% of all trade flows. We also discuss issues on deploying DATE in Nigeria Customs Service, in collaboration with the World Customs Organization.

【Keywords】: Applied computing; Computers in other domains; Computing in government; E-government; Computing methodologies; Machine learning; Learning paradigms; Multi-task learning; Machine learning approaches; Neural networks; Social and professional topics; Computing / technology policy; Commerce policy; Taxation

288. User Sentiment as a Success Metric: Persistent Biases Under Full Randomization.

【Paper Link】【Pages】:2891-2899

【Authors】: Ercan Yildiz ; Joshua Safyan ; Marc Harper

【Abstract】: We study user sentiment (reported via optional surveys) as a metric for fully randomized A/B tests. Both user-level covariates and treatment assignment can impact response propensity. We show that a simple mean comparison produces biased population level estimates and propose a set of consistent estimators for the average and local treatment effects on treated and respondent users. We show that our problem can be mapped onto the intersection of the missing data problem and observational causal inference, and we identify conditions under which consistent estimators exist. Finally, we evaluate the performance of estimators and find that more complicated models do not necessarily provide superior performance as long as models satisfy consistency criteria.

【Keywords】: General and reference; Cross-computing tools and techniques; Estimation; Evaluation; Experimentation; Metrics; Software and its engineering; Software organization and properties; Software functional properties; Correctness; Consistency

289. Improving Recommendation Quality in Google Drive.

【Paper Link】【Pages】:2900-2908

【Authors】: Suming J. Chen ; Zhen Qin ; Zac Wilson ; Brian Calaci ; Michael Rose ; Ryan Evans ; Sean Abraham ; Donald Metzler ; Sandeep Tata ; Mike Colagrosso

【Abstract】: Quick Access is a machine-learned system in Google Drive that predicts which files a user wants to open. Adding Quick Access recommendations to the Drive homepage cut the amount of time that users spend locating their files in half. Aggregated over the ~1 billion users of Drive, the time saved up adds up to ~1000 work weeks every day. In this paper, we discuss both the challenges of iteratively improving the quality of a personal recommendation system as well as the variety of approaches that we took in order to improve this feature. We explored different deep network architectures, novel modeling techniques, additional data sources, and the effects of latency and biases in the UX. We share both pitfalls as well as successes in our attempts to improve this product, and also discuss how we scaled and managed the complexity of the system. We believe that these insights will be especially useful to those who are working with private corpora as well as those who are building a large-scale production recommendation system.

【Keywords】: Computer systems organization; Architectures; Other architectures; Neural networks; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

290. Large-Scale Training System for 100-Million Classification at Alibaba.

【Paper Link】【Pages】:2909-2930

【Authors】: Liuyihan Song ; Pan Pan ; Kang Zhao ; Hao Yang ; Yiming Chen ; Yingya Zhang ; Yinghui Xu ; Rong Jin

【Abstract】: In the last decades, extreme classification has become an essential topic for deep learning. It has achieved great success in many areas, especially in computer vision and natural language processing (NLP). However, it is very challenging to train a deep model with millions of classes due to the memory and computation explosion in the last output layer. In this paper, we propose a large-scale training system to address these challenges. First, we build a hybrid parallel training framework to make the training process feasible. Second, we propose a novel softmax variation named KNN softmax, which reduces both the GPU memory consumption and computation costs and improves the throughput of training. Then, to eliminate the communication overhead, we propose a new overlapping pipeline and a gradient sparsification method. Furthermore, we design a fast continuous convergence strategy to reduce total training iterations by adaptively adjusting learning rate and updating model parameters. With the help of all the proposed methods, we gain 3.9× throughput of our training system and reduce almost 60% of training iterations. The experimental results show that using an in-house 256 GPUs cluster, we could train a classifier of 100 million classes on Alibaba Retail Product Dataset in about five days while achieving a comparable accuracy with the naive softmax training process.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision tasks; Information systems; Information retrieval; Retrieval tasks and goals; Clustering and classification

291. Mining Implicit Relevance Feedback from User Behavior for Web Question Answering.

【Paper Link】【Pages】:2931-2941

【Authors】: Linjun Shou ; Shining Bo ; Feixiang Cheng ; Ming Gong ; Jian Pei ; Daxin Jiang

【Abstract】: Training and refreshing a web-scale Question Answering (QA) system for a multi-lingual commercial search engine often requires a huge amount of training examples. One principled idea is to mine implicit relevance feedback from user behavior recorded in search engine logs. All previous works on mining implicit relevance feedback target at relevance of web documents rather than passages. Due to several unique characteristics of QA tasks, the existing user behavior models for web documents cannot be applied to infer passage relevance. In this paper, we make the first study to explore the correlation between user behavior and passage relevance, and propose a novel approach for mining training data for Web QA. We conduct extensive experiments on four test datasets and the results show our approach significantly improves the accuracy of passage ranking without extra human labeled data. In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine, especially for languages with low resources. Our techniques have been deployed in multi-language services.

292. Controllable Multi-Interest Framework for Recommendation.

【Paper Link】【Pages】:2942-2951

【Authors】: Yukuo Cen ; Jianwei Zhang ; Xu Zou ; Chang Zhou ; Hongxia Yang ; Jie Tang

【Abstract】: Recently, neural networks have been widely used in e-commerce recommender systems, owing to the rapid development of deep learning. We formalize the recommender system as a sequential recommendation problem, intending to predict the next items that the user might be interacted with. Recent works usually give an overall embedding from a user's behavior sequence. However, a unified user embedding cannot reflect the user's multiple interests during a period. In this paper, we propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec. Our multi-interest module captures multiple interests from user behavior sequences, which can be exploited for retrieving candidate items from the large-scale item pool. These items are then fed into an aggregation module to obtain the overall recommendation. The aggregation module leverages a controllable factor to balance the recommendation accuracy and diversity. We conduct experiments for the sequential recommendation on two real-world datasets, Amazon and Taobao. Experimental results demonstrate that our framework achieves significant improvements over state-of-the-art models. Our framework has also been successfully deployed on the offline Alibaba distributed cloud platform.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

293. Managing Diversity in Airbnb Search.

【Paper Link】【Pages】:2952-2960

【Authors】: Mustafa Abdool ; Malay Haldar ; Prashant Ramanathan ; Tyler Sax ; Lanbo Zhang ; Aamir Manaswala ; Lynn Yang ; Bradley C. Turnbull ; Qing Zhang ; Thomas Legrand

【Abstract】: One of the long-standing questions in search systems is the role of diversity in results. From a product perspective, showing diverse results provides the user with more choice and should lead to an improved experience. However, this intuition is at odds with common machine learning approaches to ranking which directly optimize the relevance of each individual item without a holistic view of the result set. In this paper, we describe our journey in tackling the problem of diversity for Airbnb search, starting from heuristic based approaches and concluding with a novel deep learning solution that produces an embedding of the entire query context by leveraging Recurrent Neural Networks (RNNs). We hope our lessons learned will prove useful to others and motivate further research in this area.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks

294. Molecular Inverse-Design Platform for Material Industries.

【Paper Link】【Pages】:2961-2969

【Authors】: Seiji Takeda ; Toshiyuki Hama ; Hsiang-Han Hsu ; Victoria A. Piunova ; Dmitry Zubarev ; Daniel P. Sanders ; Jed W. Pitera ; Makoto Kogoh ; Takumi Hongo ; Yenwei Cheng ; Wolf Bocanett ; Hideaki Nakashika ; Akihiro Fujita ; Yuta Tsuchiya ; Katsuhiko Hino ; Kentaro Yano ; Shuichi Hirose ; Hiroki Toda ; Yasumitsu Orii ; Daiju Nakano

【Abstract】: The discovery of new materials has been the essential force which brings a discontinuous improvement to industrial products' performance. However, the extra-vast combinatorial design space of material structures exceeds human experts' capability to explore all, thereby hampering material development. In this paper, we present a material industry-oriented web platform of an AI-driven molecular inverse-design system, which automatically designs brand new molecular structures rapidly and diversely. Different from existing inverse-design solutions, in this system, the combination of substructure-based feature encoding and molecular graph generation algorithms allows a user to gain high-speed, interpretable, and customizable design process. Also, a hierarchical data structure and user-oriented UI provide a flexible and intuitive workflow. The system is deployed on IBM's and our client's cloud servers and has been used by 5 partner companies. To illustrate actual industrial use cases, we exhibit inverse-design of sugar and dye molecules, that were carried out by experimental chemists in those client companies. Compared to a general human chemist's standard performance, the molecular design speed was accelerated more than 10 times, and greatly increased variety was observed in the inverse-designed molecules without loss of chemical realism.

【Keywords】: Applied computing; Physical sciences and engineering; Chemistry; Engineering; Computer-aided design; Computing methodologies; Modeling and simulation; Simulation types and techniques; Molecular simulation; Mathematics of computing; Discrete mathematics; Graph theory; Graph enumeration

295. Learning to Score Economic Development from Satellite Imagery.

【Paper Link】【Pages】:2970-2979

【Authors】: Sungwon Han ; Donghyun Ahn ; Sungwon Park ; Jeasurk Yang ; Susang Lee ; Jihee Kim ; Hyunjoo Yang ; Sangyoon Park ; Meeyoung Cha

【Abstract】: Reliable and timely measurements of economic activities are fundamental for understanding economic development and designing government policies. However, many developing countries still lack reliable data. In this paper, we introduce a novel approach for measuring economic development from high-resolution satellite images in the absence of ground truth statistics. Our method consists of three steps. First, we run a clustering algorithm on satellite images that distinguishes artifacts from nature (siCluster). Second, we generate a partial order graph of the identified clusters based on the level of economic development, either by human guidance or by low-resolution statistics (siPog). Third, we use a CNN-based sorter that assigns differentiable scores to each satellite grid based on the relative ranks of clusters (siScore). The novelty of our method is that we break down a computationally hard problem into sub-tasks, which involves a human-in-the-loop solution. With the combination of unsupervised learning and the partial orders of dozens of urban vs. rural clusters, our method can estimate the economic development scores of over 10,000 satellite grids consistently with other baseline development proxies (Spearman correlation of 0.851). This efficient method is interpretable and robust; we demonstrate how to apply our method to both developed (e.g., South Korea) and developing economies (e.g., Vietnam and Malawi).

【Keywords】: Applied computing; Law, social and behavioral sciences; Economics; Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning

296. A Request-level Guaranteed Delivery Advertising Planning: Forecasting and Allocation.

【Paper Link】【Pages】:2980-2988

【Authors】: Hong Zhang ; Lan Zhang ; Lan Xu ; Xiaoyang Ma ; Zhengtao Wu ; Cong Tang ; Wei Xu ; Yiguo Yang

【Abstract】: The guaranteed delivery model is widely used in online advertising. The publisher sells impressions in advance by promising to serve each advertiser an agreed-upon number of target impressions that satisfy specific attribute requirements over a fixed time period. Previous efforts usually model the service as a crowd-level or user-level supply allocation problem and focus on searching optimal allocation for online serving, assuming that forecasts of supply are available and contracts are already signed. Existing techniques are not sufficient to meet the needs of today's industry trends: 1) advertisers pursue more precise targeting, which requires not only user-level attributes but also request-level attributes; 2) users prefer more friendly ad serving, which imposes more diverse serving constraints; 3) the bottleneck of the publisher's revenue growth lies in not only the ad serving, but also the forecast accuracy and sales strategy. These issues are non-trivial to address, since the scale of the request-level model is orders of magnitude larger than that of the crowd-level or user-level models. Facing the challenges, we present a holistic design of a request-level guaranteed delivery advertising planning system with careful optimization for all three critical components including impression forecasting, selling and serving. Our system has been deployed in the Tencent online guaranteed delivery advertising system serving billion level users for nearly one year. Evaluations on large-scale real data and the performance of the deployed system both demonstrate that our design can significantly increase the request-level impression forecast accuracy and delivery rate.

297. Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer Learning.

【Paper Link】【Pages】:2989-2997

【Authors】: Yinghua Zhang ; Yangqiu Song ; Jian Liang ; Kun Bai ; Qiang Yang

【Abstract】: Transfer learning has become a common practice for training deep learning models with limited labeled data in a target domain. On the other hand, deep models are vulnerable to adversarial attacks. Though transfer learning has been widely applied, its effect on model robustness is unclear. To figure out this problem, we conduct extensive empirical evaluations to show that fine-tuning effectively enhances model robustness under white-box FGSM attacks. We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model. To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model. Empirical results show that the adversarial examples are more transferable when fine-tuning is used than they are when the two networks are trained independently.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Security and privacy

298. Learning to Generate Personalized Query Auto-Completions via a Multi-View Multi-Task Attentive Approach.

【Paper Link】【Pages】:2998-3007

【Authors】: Di Yin ; Jiwei Tan ; Zhe Zhang ; Hongbo Deng ; Shujian Huang ; Jiajun Chen

【Abstract】: In this paper, we study the task of Query Auto-Completion (QAC), which is a very significant feature of modern search engines. In real industrial application, there always exist two major problems of QAC - weak personalization and unseen queries. To address these problems, we propose M2A, a multi-view multi-task attentive framework to learn personalized query auto-completion models. We propose a new Transformer-based hierarchical encoder to model different kinds of sequential behaviors, which can be seen as multiple distinct views of the user's searching history, and then a prefix-to-history attention mechanism is used to select the most relevant information to compose the final intention representation. To learn more informative representations, we propose to incorporate multi-task learning into the model training. Two different kinds of supervisory information provided by query logs are utilized at the same time by jointly training a CTR prediction model and a query generation model.

【Keywords】: Computing methodologies; Artificial intelligence; Natural language processing; Natural language generation; Machine learning; Learning paradigms; Multi-task learning; Information systems; Information retrieval; Information retrieval query processing; Query log analysis

299. A Sleeping, Recovering Bandit Algorithm for Optimizing Recurring Notifications.

【Paper Link】【Pages】:3008-3016

【Authors】: Kevin P. Yancey ; Burr Settles

【Abstract】: Many online and mobile applications rely on daily emails and push notifications to increase and maintain user engagement. The multi-armed bandit approach provides a useful framework for optimizing the content of these notifications, but a number of complications (such as novelty effects and conditional eligibility) make conventional bandit algorithms unsuitable in practice. In this paper, we introduce the Recovering Difference Softmax Algorithm to address the particular challenges of this problem domain, and use it to successfully optimize millions of daily reminders for the online language-learning app Duolingo. This lead to a 0.5%. increase in total daily active users (DAUs) and a 2%, increase in new user retention over a strong baseline. We provide technical details of its design and deployment, and demonstrate its efficacy through both offline and online evaluation experiments.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Reinforcement learning; Sequential decision making; Mathematics of computing; Probability and statistics

300. Multi-objective Optimization for Guaranteed Delivery in Video Service Platform.

【Paper Link】【Pages】:3017-3025

【Authors】: Hang Lei ; Yin Zhao ; Longjun Cai

【Abstract】: Guaranteed-Delivery (GD) is one of the important display strategies for the IP videos in video service platform. Different from the traditional recommendation strategy, GD requires the delivery system to guarantee the exposure amount (also called impressions in some works) for the content, where the amount generally comes from the purchase contract or business consideration of the platform. In this paper, we study the problem of how to maximize certain gains, such as video view (VV) or fairness of different contents (CTR variations between contents) under the GD constraints. We formulate such a problem as a constrained nonlinear programming problem, in which the objectives are to maximize the total VVs of contents and the exposure fairness between contents. In order to capture the trends of VV versus the impression number (page views, PV) for each video content, we propose a parameterized ordinary differential equation (ODE) model, and the parameters of the ODE are fitted by the video historical PV and CLICK datas. To solve the constrained nonlinear programming, we use genetic algorithm (GA) with a specific design of coding scheme considering the ODE constraints. The empirical study based on real-world data and online test on Youku.com verifies the effectiveness and superiority of our approach compared with the state of the art in the industry practice.

【Keywords】: Mathematics of computing; Mathematical analysis; Differential equations; Ordinary differential equations; Mathematical optimization; Non-parametric optimization

301. Delivery Scope: A New Way of Restaurant Retrieval for On-demand Food Delivery Service.

【Paper Link】【Pages】:3026-3034

【Authors】: Xuetao Ding ; Runfeng Zhang ; Zhen Mao ; Ke Xing ; Fangxiao Du ; Xingyu Liu ; Guoxing Wei ; Feifan Yin ; Renqing He ; Zhizhao Sun

【Abstract】: Recently on-demand food delivery service has become very popular in China. More than 30 million orders are placed by eaters of Meituan-Dianping everyday. Delicacies are delivered to eaters in 30 minutes on average. To fully leverage the ability of our couriers and restaurants, delivery scope is proposed as an infrastructure product for on-demand food delivery area. A delivery scope based retrieval system is designed and built on our platform. In order to draw suitable delivery scopes for millions of restaurant partners, we propose a pioneering delivery scope generation framework. In our framework, a single delivery scope generation algorithm is proposed by using spatial computational techniques and data mining techniques. Moreover, a scope scoring algorithm and decision algorithm are proposed by utilizing machine learning models and combinatorial optimization techniques. Specifically, we propose a novel delivery scope sample generation method and use the scope related features to estimate order numbers and average delivery time in a period of time for each delivery scope. Then we formalize the candidate scopes selection process as a binary integer programming problem. Both branch&bound algorithm and a heuristic search algorithm are integrated in our system. Results of online experiments show that scopes generated by our new algorithm significantly outperform manual generated ones. Our algorithm brings more orders without hurt of users' experience. After deployed online, our system has saved thousands of hours for operation staff, and it is considered to be one of the most useful operation tools to balance demand of eaters and supply of restaurants and couriers.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by regression; Information systems; Information systems applications; Spatial-temporal systems; Location based services; Mathematics of computing; Discrete mathematics; Combinatorics; Combinatorial optimization

302. Fraud Transactions Detection via Behavior Tree with Local Intention Calibration.

【Paper Link】【Pages】:3035-3043

【Authors】: Can Liu ; Qiwei Zhong ; Xiang Ao ; Li Sun ; Wangli Lin ; Jinghua Feng ; Qing He ; Jiayu Tang

【Abstract】: Fraud transactions obtain the rights and interests of e-commerce platforms by illegal ways, and have been the emerging threats to the healthy development of these platforms. Recently, user behavioral data is extensively exploited to detect fraud transactions, and it is usually processed as a sequence consisting of individual actions. However, such sequence-like user behaviors have logical patterns associated with user intentions, which motivates a fine-grained management strategy that binds and cuts off these actions into intention-related segments. In this paper, we devise a tree-like structure named behavior tree to reorganize the user behavioral data, in which a group of successive sequential actions denoting a specific user intention are represented as a branch on the tree. We then propose a novel neural method coined LIC Tree-LSTM(Local Intention Calibrated Tree-LSTM) to utilize the behavior tree for fraud transactions detection. In our LIC Tree-LSTM, the global user intention is captured by an attentional method applied on different branches. Then, we calibrate the entire tree by attentions within tree branches to pinpoint the balance between global and local user intentions. We investigate the effectiveness of LIC Tree-LSTM on a real-world dataset of Alibaba platform, and the experimental results show that our proposed algorithm outperforms state-of-the-art methods in both offline and online modes. Furthermore, our model provides good interpretability which helps us better understand user behaviors.

【Keywords】: Information systems; World Wide Web; Web applications; Electronic commerce

303. Balanced Order Batching with Task-Oriented Graph Clustering.

【Paper Link】【Pages】:3044-3053

【Authors】: Lu Duan ; Haoyuan Hu ; Zili Wu ; Guozheng Li ; Xinhang Zhang ; Yu Gong ; Yinghui Xu

【Abstract】: Balanced order batching problem (BOBP) arises from the process of warehouse picking in Cainiao, the largest logistics platform in China. Batching orders together in the picking process to form a single picking route, reduces travel distance. The reason for its importance is that order picking is a labor intensive process and, by using good batching methods, substantial savings can be obtained. The BOBP is a NP-hard combinational optimization problem and designing a good problem-specific heuristic under the quasi-real-time system response requirement is non-trivial. In this paper, rather than designing heuristics, we propose an end-to-end learning and optimization framework named Balanced Task-orientated Graph Clustering Network (BTOGCN) to solve the BOBP by reducing it to balanced graph clustering optimization problem. In BTOGCN, a task-oriented estimator network is introduced to guide the type-aware heterogeneous graph clustering networks to find a better clustering result related to the BOBP objective. Through comprehensive experiments on single-graph and multi-graphs, we show: 1) our balanced task-oriented graph clustering network can directly utilize the guidance of target signal and outperforms the two-stage deep embedding and deep clustering method; 2) our method obtains an average 4.57m and 0.13m picking distance reduction than the expert-designed algorithm on single and multi-graph set and has a good generalization ability to apply in practical scenario.

【Keywords】: Applied computing; Operations research; Decision analysis; Multi-criterion optimization and decision-making; Information systems; Data management systems; Database design and models; Graph-based database models; Hierarchical data models; Information systems applications; Data mining; Clustering; Mathematics of computing; Discrete mathematics; Combinatorics; Combinatorial optimization

304. Efficiently Solving the Practical Vehicle Routing Problem: A Novel Joint Learning Approach.

【Paper Link】【Pages】:3054-3063

【Authors】: Lu Duan ; Yang Zhan ; Haoyuan Hu ; Yu Gong ; Jiangwen Wei ; Xiaodong Zhang ; Yinghui Xu

【Abstract】: Our model is based on the graph convolutional network (GCN) with node feature (coordination and demand) and edge feature (the real distance between nodes) as input and embedded. Separate decoders are proposed to decode the representations of these two features. The output of one decoder is the supervision of the other decoder. We propose a strategy that combines the reinforcement learning manner with the supervised learning manner to train the model. Through comprehensive experiments on real-world data, we show that 1) the edge feature is important to be explicitly considered in the model; 2) the joint learning strategy can accelerate the convergence of the training and improve the solution quality; 3) our model significantly outperforms several well-known algorithms in the literature, especially when the problem size is large; 3) our method is generalized beyond the size of problem instances they were trained on.

【Keywords】: Applied computing; Operations research; Transportation; Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by classification; Mathematics of computing; Discrete mathematics; Combinatorics; Combinatorial optimization; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Reinforcement learning; Sequential decision making

305. Meta-Learning for Query Conceptualization at Web Scale.

【Paper Link】【Pages】:3064-3073

【Authors】: Fred X. Han ; Di Niu ; Haolan Chen ; Weidong Guo ; Shengli Yan ; Bowei Long

【Abstract】: Concepts naturally constitute an abstraction for fine-grained entities and knowledge in the open domain. They enable search engines and recommendation systems to enhance user experience by discovering high-level abstraction of a search query and the user intent behind it. In this paper, we study the problem of query conceptualization, which is to find the most appropriate matching concepts for any given search query from a large pool of pre-defined concepts. We propose a coarse-to-fine approach to first reduce the search space for each query through a shortlisting scheme and then identify the matching concepts using pre-trained language models, which are meta-tuned to our query-concept matching task. Our shortlisting scheme involves using a GRU-based Relevant Words Generator (RWG) to first expand and complete the context of the given query and then shortlisting the candidate concepts through a scoring mechanism based on word overlaps. To accurately identify the most appropriate matching concepts for a query, even when the concepts may have zero verbatim overlaps with the query, we meta-fine-tune a BERT pairwise text-matching model under the Reptile meta-learning algorithm, which achieves zero-shot transfer learning on the conceptualization problem. Our two-stage framework can be trained with data completely derived from a search click graph, without requiring any human labelling efforts. For evaluation, we have constructed a large click graph based on more than $7$ million instances of the click history recorded in Tencent QQ browser and performed the query conceptualization task based on a large ontology with $159,148$ unique concepts. Results from a range of evaluation methods, including an offline evaluation procedure on the click graph, human evaluation, online A/B testing and case studies, have demonstrated the superiority of our approach over a number of competitive pre-trained language models and fine-tuned neural network baselines.

【Keywords】: Information systems; Information retrieval; Information retrieval query processing; Query log analysis; Query representation; Query suggestion

【Paper Link】【Pages】:3074-3082

【Authors】: Rui Dai ; Shenkun Xu ; Qian Gu ; Chenguang Ji ; Kaikui Liu

【Abstract】: Traffic forecasting has recently attracted increasing interest due to the popularity of online navigation services, ridesharing and smart city projects. Owing to the non-stationary nature of road traffic, forecasting accuracy is fundamentally limited by the lack of contextual information. To address this issue, we propose the Hybrid Spatio-Temporal Graph Convolutional Network (H-STGCN), which is able to "deduce" future travel time by exploiting the data of upcoming traffic volume. Specifically, we propose an algorithm to acquire the upcoming traffic volume from an online navigation engine. Taking advantage of the piecewise-linear flow-density relationship, a novel transformer structure converts the upcoming volume into its equivalent in travel time. We combine this signal with the commonly-utilized travel-time signal, and then apply graph convolution to capture the spatial dependency. Particularly, we construct a compound adjacency matrix which reflects the innate traffic proximity. We conduct extensive experiments on real-world datasets. The results show that H-STGCN remarkably outperforms state-of-the-art methods in various metrics, especially for the prediction of non-recurring congestion.

【Keywords】: Information systems; Information systems applications; Data mining; Decision support systems; Data analytics; Spatial-temporal systems

307. Multitask Mixture of Sequential Experts for User Activity Streams.

【Paper Link】【Pages】:3083-3091

【Authors】: Zhen Qin ; Yicheng Cheng ; Zhe Zhao ; Zhe Chen ; Donald Metzler ; Jingzheng Qin

【Abstract】: It is often desirable to model multiple objectives in real-world web applications, such as user satisfaction and user engagement in recommender systems. Multi-task learning has become the standard approach for such applications recently.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Multi-task learning

308. Identifying Homeless Youth At-Risk of Substance Use Disorder: Data-Driven Insights for Policymakers.

【Paper Link】【Pages】:3092-3100

【Authors】: Maryam Tabar ; Heesoo Park ; Stephanie Winkler ; Dongwon Lee ; Anamika Barman-Adhikari ; Amulya Yadav

【Abstract】: Substance Use Disorder (SUD) is a devastating disease that leads to significant mental and behavioral impairments. Its negative effects damage the homeless youth population more severely (as compared to stably housed counterparts) because of their high-risk behaviors. To assist policymakers in devising effective and accurate long-term strategies to mitigate SUD, it is necessary to critically analyze environmental, psychological, and other factors associated with SUD among homeless youth. Unfortunately, there is no definitive data-driven study on analyzing factors associated with SUD among homeless youth. While there have been a few prior studies in the past, they (i) do not analyze variation in the associated factors for SUD with geographical heterogeneity in their studies; and (ii) only consider a few contributing factors to SUD in relatively small samples. This work aims to fill this gap by making the following three contributions: (i) we use a real-world dataset collected from ~1,400 homeless youth (across six American states) to build accurate Machine Learning (ML) models for predicting the susceptibility of homeless youth to SUD; (ii) we find a representative set of factors associated with SUD among this population by analyzing feature importance values associated with our ML models; and (iii) we investigate the effect of geographical heterogeneity on the factors associated with SUD. Our results show that our system using adaptively boosted decision trees achieves the best predictive accuracy out of several algorithms on the SUD prediction task, achieving an Area Under the ROC Curve of 0.85. Further, among other things, we also find that both Post-Traumatic Stress Disorder (PTSD) and depression are very strongly associated with SUD among homeless youth because of their propensity to self-medicate to alleviate stress. This work is done in collaboration with social work scientists, who are currently evaluating the results for potential future deployment.

【Keywords】: Applied computing; Life and medical sciences; Health care information systems; Computing methodologies; Machine learning; Machine learning approaches; Classification and regression trees

309. Interleaved Sequence RNNs for Fraud Detection.

【Paper Link】【Pages】:3101-3109

【Authors】: Bernardo Branco ; Pedro Abreu ; Ana Sofia Gomes ; Mariana S. C. Almeida ; João Tiago Ascensão ; Pedro Bizarro

【Abstract】: Payment card fraud causes multibillion dollar losses for banks and merchants worldwide, often fueling complex criminal activities. To address this, many real-time fraud detection systems use tree-based models, demanding complex feature engineering systems to efficiently enrich transactions with historical data while complying with millisecond-level latencies. In this work, we do not require those expensive features by using recurrent neural networks and treating payments as an interleaved sequence, where the history of each card is an unbounded, irregular sub-sequence. We present a complete RNN framework to detect fraud in real-time, proposing an efficient ML pipeline from preprocessing to deployment. We show that these feature-free, multi-sequence RNNs outperform state-of-the-art models saving millions of dollars in fraud detection and using fewer computational resources.

【Keywords】: Computer systems organization; Real-time systems; Real-time system specification; Computing methodologies; Machine learning; Machine learning approaches; Neural networks

【Paper Link】【Pages】:3110-3118

【Authors】: Vijay Ekambaram ; Kushagra Manglik ; Sumanta Mukherjee ; Surya Shravan Kumar Sajja ; Satyam Dwivedi ; Vikas Raykar

【Abstract】: Trend driven retail industries such as fashion, launch substantial new products every season. In such a scenario, an accurate demand forecast for these newly launched products is vital for efficient downstream supply chain planning like assortment planning and stock allocation. While classical time-series forecasting algorithms can be used for existing products to forecast the sales, new products do not have any historical time-series data to base the forecast on. In this paper, we propose and empirically evaluate several novel attention-based multi-modal encoder-decoder models to forecast the sales for a new product purely based on product images, any available product attributes and also external factors like holidays, events, weather, and discount. We experimentally validate our approaches on a large fashion dataset and report the improvements in achieved accuracy and enhanced model interpretability as compared to existing k-nearest neighbor based baseline approaches.

【Keywords】: Applied computing; Operations research; Forecasting; Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Temporal reasoning; Machine learning; Machine learning approaches; Neural networks

311. Pest Management In Cotton Farms: An AI-System Case Study from the Global South.

【Paper Link】【Pages】:3119-3127

【Authors】: Aman Dalmia ; Jerome White ; Ankit Chaurasia ; Vishal Agarwal ; Rajesh Jain ; Dhruvin Vora ; Balasaheb Dhame ; Raghu Dharmaraju ; Rahul Panicker

【Abstract】: Nearly 100 million families across the world rely on cotton farming for their livelihood. Cotton is particularly vulnerable to pest attacks, leading to overuse of pesticides, lost income for farmers, and in some cases farmer suicides. We address this problem by presenting a new solution for pesticide management that uses deep learning, smartphone cameras, inexpensive pest traps, existing digital pipelines, and agricultural extension-worker programs. Although generic, the platform is specifically designed to assist smallholder farmers in the developing world. In addition to outlining the solution, we consider the set of unique constraints this context places on it: data diversity, annotation challenges, shortcomings with traditional evaluation metrics, computing on low-resource devices, and deployment through intermediaries. This paper summarizes key lessons learned while developing and deploying the proposed solution. Such lessons may be applicable to other teams interested in building AI solutions for global development.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision problems; Object detection; Machine learning; Learning paradigms; Supervised learning; Supervised learning by classification; Supervised learning by regression

【Paper Link】【Pages】:3128-3135

【Authors】: Nima Noorshams ; Saurabh Verma ; Aude Hofleitner

【Abstract】: Since its inception, Facebook has become an integral part of the online social community. People rely on Facebook to connect with others and build communities. As a result, it is paramount to protect the integrity of such a large network in a fast and scalable manner. In this paper, we present our efforts to protect various social media entities at Facebook from people who try to abuse our platform. We present a novel Temporal Interaction EmbeddingS (TIES) model that is designed to capture rogue social interactions and flag them for further suitable actions. TIES is a supervised, deep learning, production ready model at Facebook-scale networks. Prior works on integrity problems are mostly focused on capturing either only static or certain dynamic features of social entities. In contrast, TIES can capture both these variant behaviors in a unified model owing to the recent strides made in the domains of graph embedding and deep sequential pattern learning. To show the real-world impact of TIES, we present a few applications especially for preventing spread of misinformation, fake account detection, and reducing ads payment risks in order to enhance Facebook platform's integrity.

【Keywords】: Computing methodologies; Artificial intelligence; Knowledge representation and reasoning

313. Price Investment using Prescriptive Analytics and Optimization in Retail.

【Paper Link】【Pages】:3136-3144

【Authors】: Prakhar Mehrotra ; Linsey Pang ; Karthick Gopalswamy ; Avinash Thangali ; Timothy Winters ; Ketki Gupte ; Dnyanesh Kulkarni ; Sunil Potnuru ; Supreeth Shastry ; Harshada Vuyyuri

【Abstract】: As the world's largest retailer, Walmart's core mission is to save people money so they can live better. We call the strategy we use to accomplish this goal our Every Day Low Price strategy. By keeping operational expenses as low as possible, we can continually apply a downward pressure on our prices, in turn increasing the amount of traffic, and ultimately, sales within our stores. In this paper, we apply Machine Learning (ML) algorithms and Operations Research techniques for forecasting and optimization to build a new price recommendation system, which improves our ability to generate price recommendations accurately and automatically. Comprised of a demand forecasting step, two optimizations, and causal inference analysis, our system was evaluated in the form of forecast backtests and live pricing experiments, both of which suggested that our approach was more effective than the current rule-based pricing system.

【Keywords】: Applied computing; Operations research; Consumer products; Decision analysis; Forecasting; Marketing

314. Climate Downscaling Using YNet: A Deep Convolutional Network with Skip Connections and Fusion.

【Paper Link】【Pages】:3145-3153

【Authors】: Yumin Liu ; Auroop R. Ganguly ; Jennifer G. Dy

【Abstract】: Climate change is one of the major challenges to human beings in our time. It brings many unexpected disasters which cause drastic losses including lives and properties. To better understand climate change, scientists developed various Global Climate Models (GCMs) to simulate the global climate and make projections for future climate values. These global climate models have coarse grids (i.e., low resolutions both in space and time) due to limitations of computing power and simulation time. Although they are helpful in predicting large scale long term trend in climate, they are too coarse for impact analysis in smaller scales such as in regional or local scale. However, climate conditions in regional or local scale are very important in making decisions related to climate conditions such as infrastructure, transportation and evacuation, as they highly depend on small scale climate conditions. In this paper, we proposed YNet, a novel deep convolutional neural network (CNN) with skip connections and fusion capabilities to perform downscaling for climate variables, on multiple GCMs directly rather than on reanalysis data. We analyzed and compared our proposed method with four other methods on datasets of three climate variables: mean precipitation, and extreme values (maximum temperature and minimum temperature). The results show the effectiveness of the proposed method.

【Keywords】: Applied computing; Physical sciences and engineering; Earth and atmospheric sciences; Environmental sciences; Computing methodologies; Machine learning; Machine learning approaches; Neural networks

315. Cracking the Black Box: Distilling Deep Sports Analytics.

【Paper Link】【Pages】:3154-3162

【Authors】: Xiangyu Sun ; Jack Davis ; Oliver Schulte ; Guiliang Liu

【Abstract】: This paper addresses the trade-off between Accuracy and Transparency for deep learning applied to sports analytics. Neural nets achieve great predictive accuracy through deep learning, and are popular in sports analytics. But it is hard to interpret a neural net model and harder still to extract actionable insights from the knowledge implicit in it. Therefore, we built a simple and transparent model that mimics the output of the original deep learning model and represents the learned knowledge in an explicit interpretable way. Our mimic model is a linear model tree, which combines a collection of linear models with a regression-tree structure. The tree version of a neural network achieves high fidelity, explains itself, and produces insights for expert stakeholders such as athletes and coaches. We propose and compare several scalable model tree learning heuristics to address the computational challenge from datasets with millions of data points.

【Keywords】: Applied computing; Computers in other domains; Computing methodologies; Artificial intelligence; Machine learning; Machine learning approaches; Classification and regression trees

316. Taming Pretrained Transformers for Extreme Multi-label Text Classification.

【Paper Link】【Pages】:3163-3171

【Authors】: Wei-Cheng Chang ; Hsiang-Fu Yu ; Kai Zhong ; Yiming Yang ; Inderjit S. Dhillon

【Abstract】: We consider the extreme multi-label text classification (XMC) problem: given an input text, return the most relevant labels from a large label collection. For example, the input text could be a product description on Amazon.com and the labels could be product categories. XMC is an important yet challenging problem in the NLP community. Recently, deep pretrained transformer models have achieved state-of-the-art performance on many NLP tasks including sentence classification, albeit with small label sets. However, naively applying deep transformer models to the XMC problem leads to sub-optimal performance due to the large output space and the label sparsity issue. In this paper, we propose X-Transformer, the first scalable approach to fine-tuning deep transformer models for the XMC problem. The proposed method achieves new state-of-the-art results on four XMC benchmark datasets. In particular, on a Wiki dataset with around 0.5 million labels, the [email protected] of X-Transformer is 77.28%, a substantial improvement over state-of-the-art XMC approaches Parabel (linear) and AttentionXML (neural), which achieve 68.70% and 76.95% [email protected], respectively. We further apply X-Transformer to a product2query dataset from Amazon and gained 10.7% relative improvement on [email protected] over Parabel.

【Keywords】: Computing methodologies; Artificial intelligence; Natural language processing; Machine learning; Information systems; Information retrieval

317. Prediction of Hourly Earnings and Completion Time on a Crowdsourcing Platform.

【Paper Link】【Pages】:3172-3182

【Authors】: Anna Lioznova ; Alexey Drutsa ; Vladimir Kukushkin ; Anastasia A. Bezzubtseva

【Abstract】: We study the problem of predicting future hourly earnings and task completion time for a crowdsourcing platform user who sees the list of available tasks and wants to select one of them to execute. Namely, for each task shown in the list, one needs to have an estimated value of the user's performance (i.e., hourly earnings and completion time) that will be if she selects this task. We address this problem on real crowd tasks completed on one of the global crowdsourcing marketplaces by (1) conducting a survey and an A/B test on real users; the results confirm the dominance of monetary incentives and importance of knowledge on hourly earnings for users; (2) an in-depth analysis of user behavior that shows that the prediction problem is challenging: (a) users and projects are highly heterogeneous, (b) there exists the so-called "learning effect" of a user selected a new task; and (3) the solution to the problem of predicting user performance that demonstrates improvement of prediction quality by up to 25% for hourly earnings and up to $32%$ completion time w.r.t. a naive baseline which is based solely on historical performance of users on tasks. In our experimentation, we use data about 18 million real crowdsourcing tasks performed by $161$ thousand users on the crowd platform; we publish this dataset. The hourly earning prediction has been deployed in Yandex.Toloka.

【Keywords】: General and reference; Cross-computing tools and techniques; Reliability; Information systems; World Wide Web; Web applications; Crowdsourcing; Social and professional topics; Professional topics; Computing industry; Sustainability

318. SimClusters: Community-Based Representations for Heterogeneous Recommendations at Twitter.

【Paper Link】【Pages】:3183-3193

【Authors】: Venu Satuluri ; Yao Wu ; Xun Zheng ; Yilei Qian ; Brian Wichers ; Qieyun Dai ; Gui Ming Tang ; Jerry Jiang ; Jimmy Lin

【Abstract】: Personalized recommendation products at Twitter target a multitude of heterogeneous items: Tweets, Events, Topics, Hashtags, and users. Each of these targets varies in their cardinality (which affects the scale of the problem) and their "shelf life'' (which constrains the latency of generating the recommendations). Although Twitter has built a variety of recommendation systems before dating back a decade, solutions to the broader problem were mostly tackled piecemeal. In this paper, we present SimClusters, a general-purpose representation layer based on overlapping communities into which users as well as heterogeneous content can be captured as sparse, interpretable vectors to support a multitude of recommendation tasks. We propose a novel algorithm for community discovery based on Metropolis-Hastings sampling, which is both more accurate and significantly faster than off-the-shelf alternatives. SimClusters scales to networks with billions of users and has been effective across a variety of deployed applications at Twitter.

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

319. Time-Aware User Embeddings as a Service.

【Paper Link】【Pages】:3194-3202

【Authors】: Martin Pavlovski ; Jelena Gligorijevic ; Ivan Stojkovic ; Shubham Agrawal ; Shabhareesh Komirishetty ; Djordje Gligorijevic ; Narayan Bhamidipati ; Zoran Obradovic

【Abstract】: Digital media companies typically collect rich data in the form of sequences of online user activities. Such data is used in various applications, involving tasks ranging from click or conversion prediction to recommendation or user segmentation. Nonetheless, each application depends upon specialized feature engineering that requires a lot of effort and typically disregards the time-varying nature of the online user behavior. Learning time-preserving vector representations of users (user embeddings), irrespective of a specific task, would save redundant effort and potentially lead to higher embedding quality. To that end, we address the limitations of the current state-of-the-art self-supervised methods for task-independent (unsupervised) sequence embedding, and propose a novel Time-Aware Sequential Autoencoder (TASA) that accounts for the temporal aspects of sequences of activities. The generated embeddings are intended to be readily accessible for many problem formulations and seamlessly applicable to desired tasks, thus sidestepping the burden of task-driven feature engineering. The proposed TASA shows improvements over alternative self-supervised models in terms of sequence reconstruction. Moreover, the embeddings generated by TASA yield increases in predictive performance on both proprietary and public data. It also achieves comparable results to supervised approaches that are trained on individual tasks separately and require substantially more computational effort. TASA has been incorporated within a pipeline designed to provide time-aware user embeddings as a service, and the use of its embeddings exhibited lifts in conversion prediction AUC on four audiences.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Dimensionality reduction and manifold learning; Machine learning approaches; Learning latent representations; Neural networks; Information systems; Information systems applications; Computational advertising; World Wide Web; Online advertising

320. Shop The Look: Building a Large Scale Visual Shopping System at Pinterest.

【Paper Link】【Pages】:3203-3212

【Authors】: Raymond Shiau ; Hao-Yu Wu ; Eric Kim ; Yue Li Du ; Anqi Guo ; Zhiyuan Zhang ; Eileen Li ; Kunlong Gu ; Charles Rosenberg ; Andrew Zhai

【Abstract】: As online content becomes ever more visual, the demand for searching by visual queries grows correspondingly stronger. Shop The Look is an online shopping discovery service at Pinterest, leveraging visual search to enable users to find and buy products within an image. In this work, we provide a holistic view of how we built Shop The Look, a shopping oriented visual search system, along with lessons learned from addressing shopping needs. We discuss topics including core technology across object detection and visual embeddings, serving infrastructure for realtime inference, and data labeling methodology for training/evaluation data collection and human evaluation. The user-facing impacts of our system design choices are measured through offline evaluations, human relevance judgements, and online A/B experiments. The collective improvements amount to cumulative relative gains of over 160% in end-to-end human relevance judgements and over 80% in engagement. Shop The Look is deployed in production at Pinterest.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision problems; Object detection; Computer vision representations; Image representations; Machine learning; Learning paradigms; Multi-task learning; Information systems; Information retrieval; Specialized information retrieval; Multimedia and multimodal retrieval; Image search; World Wide Web; Web applications; Electronic commerce; Online shopping

321. Dynamic Heterogeneous Graph Neural Network for Real-time Event Prediction.

【Paper Link】【Pages】:3213-3223

【Authors】: Wenjuan Luo ; Han Zhang ; Xiaodi Yang ; Lin Bo ; Xiaoqing Yang ; Zang Li ; Xiaohu Qie ; Jieping Ye

【Abstract】: Customer response prediction is critical in many industrial applications such as online advertising and recommendations. In particular, the challenge is greater for ride-hailing platforms such as Uber and DiDi, because the response prediction models need to consider historical and real-time event information in the physical environment, such as surrounding traffic and supply and demand conditions. In this paper, we propose to use dynamically constructed heterogeneous graph for each ongoing event to encode the attributes of the event and its surroundings. In addition, we propose a multi-layer graph neural network model to learn the impact of historical actions and the surrounding environment on the current events, and generate an effective event representation to improve the accuracy of the response model. We investigate this framework to two practical applications on the DiDi platform. Offline and online experiments show that the framework can significantly improve prediction performance. The framework has been deployed in the online production environment and serves tens of millions of event prediction requests every day.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Information systems; Information systems applications; Data mining; Spatial-temporal systems

322. Bandit based Optimization of Multiple Objectives on a Music Streaming Platform.

【Paper Link】【Pages】:3224-3233

【Authors】: Rishabh Mehrotra ; Niannan Xue ; Mounia Lalmas

【Abstract】: Recommender systems powering online multi-stakeholder platforms often face the challenge of jointly optimizing multiple objectives, in an attempt to efficiently match suppliers and consumers. Examples of such objectives include user behavioral metrics (e.g. clicks, streams, dwell time, etc), supplier exposure objectives (e.g. diversity) and platform centric objectives (e.g. promotions). Jointly optimizing multiple metrics in online recommender systems remains a challenging task. Recent work has demonstrated the prowess of contextual bandits in powering recommendation systems to serve recommendation of interest to users. This paper aims at extending contextual bandits to multi-objective setting so as to power recommendations in a multi-stakeholder platforms.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Cost-sensitive learning; Learning settings; Learning from implicit feedback; Information systems; World Wide Web; Web searching and information discovery; Content ranking

323. Multimodal Deep Learning Based Crop Classification Using Multispectral and Multitemporal Satellite Imagery.

【Paper Link】【Pages】:3234-3242

【Authors】: Krishna Karthik Gadiraju ; Bharathkumar Ramachandra ; Zexi Chen ; Ranga Raju Vatsavai

【Abstract】: The Food and Agriculture Organization (FAO) of the United Nations predicts that in order to meet the needs of the expected 3 billion population growth by 2050, food production has to increase by 60%. Therefore, monitoring and mapping crops accurately is essential for estimating food production during each crop growing season across the globe. Traditionally, multispectral remote sensing imagery has been widely used for mapping crops worldwide. However, single date imagery does not capture temporal characteristics (phenology) of growing crops, leading to imprecise crop maps and food estimates. On the other hand, purely temporal classification approaches also produce inaccurate crop maps as they do not account for spatial autocorrelations. In this paper, we present a multimodal deep learning solution that jointly exploits spatial-spectral and phenological properties to identify major crop types. Using a two stream architecture, spatial characteristics are captured via a spatial stream consisting of very high resolution images (single date, 1m, 3-spectral bands, USDA NAIP) with a CNN and the phenological characteristics via a temporal stream images (biweekly, 250m, MODIS NDVI) with an LSTM. Experimental results show that the proposed multimodal solution reduces prediction error by 60%.

【Keywords】: Applied computing; Computers in other domains; Agriculture; Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Machine learning approaches; Neural networks; Information systems; Information systems applications; Spatial-temporal systems

324. BusTr: Predicting Bus Travel Times from Real-Time Traffic.

【Paper Link】【Pages】:3243-3251

【Authors】: Richard Barnes ; Senaka Buthpitiya ; James Cook ; Alex Fabrikant ; Andrew Tomkins ; Fangzhou Xu

【Abstract】: We present BusTr, a machine-learned model for translating road traffic forecasts into predictions of bus delays, used by Google Maps to serve the majority of the world's public transit systems where no official real-time bus tracking is provided. We demonstrate that our neural sequence model improves over DeepTTE, the state-of-the-art baseline, both in performance (-30% MAPE) and training stability. We also demonstrate significant generalization gains over simpler models, evaluated on longitudinal data to cope with a constantly evolving world.

【Keywords】: Applied computing; Operations research; Forecasting; Transportation; Information systems; Information systems applications; Spatial-temporal systems

325. Characterizing and Learning Representation on Customer Contact Journeys in Cellular Services.

【Paper Link】【Pages】:3252-3260

【Authors】: Shuai Zhao ; Wen-Ling Hsu ; George Ma ; Tan Xu ; Guy Jacobson ; Raif Rustamov

【Abstract】: Corporations spend billions of dollars annually caring for customers across multiple contact channels. A customer journey is the complete sequence of contacts that a given customer has with a company across multiple channels of communication. While each contact is important and contains rich information, studying customer journeys provides a better context to understand customers' behavior in order to improve customer satisfaction and loyalty, and to reduce care costs. However, journey sequences have a complex format due to the heterogeneity of user behavior: they are variable-length, multi-attribute, and exhibit a large cardinality in categories (e.g. contact reasons). The question of how to characterize and learn representations of customer journeys has not been studied in the literature. We propose to learn journey embeddings using a sequence-to-sequence framework that converts each customer journey into a fixed-length latent embedding. In order to improve the disentanglement and distributional properties of embeddings, the model is further modified by incorporating a Wasserstein autoencoder inspired regularization on the distribution of embeddings. Experiments conducted on an enterprise-scale dataset demonstrate the effectiveness of the proposed model and reveal significant improvements due to the regularization in both distinguishing journey pattern characteristics and predicting future customer engagement.

【Keywords】: Applied computing; Enterprise computing; Information systems; Information systems applications; Data mining; Social and professional topics; User characteristics

326. CrowdQuake: A Networked System of Low-Cost Sensors for Earthquake Detection via Deep Learning.

【Paper Link】【Pages】:3261-3271

【Authors】: Xin Huang ; Jangsoo Lee ; Young-Woo Kwon ; Chul-Ho Lee

【Abstract】: Recently, low-cost acceleration sensors have been widely used to detect earthquakes due to the significant development of MEMS technologies. It, however, still requires a high-density network to fully harness the low-cost sensors, especially for real-time earthquake detection. The design of a high-performance and scalable networked system thus becomes essential to be able to process a large amount of sensor data from hundreds to thousands of the sensors. An efficient and accurate earthquake-detection algorithm is also necessary to distinguish earthquake waveforms from various kinds of non-earthquake ones within the huge data in real time. In this paper, we present CrowdQuake, a networked system based on low-cost acceleration sensors, which monitors ground motions and detects earthquakes, by developing a convolutional-recurrent neural network model. This model ensures high detection performance while maintaining false alarms at a negligible level. We also provide detailed case studies on two of a few small earthquakes that have been detected by CrowdQuake during its last one-year operation.

【Keywords】: Computer systems organization; Embedded and cyber-physical systems; Sensor networks; Computing methodologies; Machine learning

327. An Empirical Analysis of Backward Compatibility in Machine Learning Systems.

【Paper Link】【Pages】:3272-3280

【Authors】: Megha Srivastava ; Besmira Nushi ; Ece Kamar ; Shital Shah ; Eric Horvitz

【Abstract】: In many applications of machine learning (ML), updates are performed with the goal of enhancing model performance. However, current practices for updating models rely solely on isolated, aggregate performance analyses, overlooking important dependencies, expectations, and needs in real-world deployments. We consider how updates, intended to improve ML models, can introduce new errors that can significantly affect downstream systems and users. For example, updates in models used in cloud-based classification services, such as image recognition, can cause unexpected erroneous behavior in systems that make calls to the services. Prior work has shown the importance of "backward compatibility" for maintaining human trust. We study challenges with backward compatibility across different ML architectures and datasets, focusing on common settings including data shifts with structured noise and ML employed in inferential pipelines. Our results show that (i) compatibility issues arise even without data shift due to optimization stochasticity, (ii) training on large-scale noisy datasets often results in significant decreases in backward compatibility even when model accuracy increases, and (iii) distributions of incompatible points align with noise bias, motivating the need for compatibility aware de-noising and robustness methods.

【Keywords】: Computing methodologies; Machine learning; Software and its engineering; Software creation and management; Software development techniques

328. DeepTriage: Automated Transfer Assistance for Incidents in Cloud Services.

【Paper Link】【Pages】:3281-3289

【Authors】: Phuong Pham ; Vivek Jain ; Lukas Dauterman ; Justin Ormont ; Navendu Jain

【Abstract】: As cloud services are growing and generating high revenues, the cost of downtime in these services is becoming significantly expensive. To reduce loss and service downtime, a critical primary step is to execute incident triage, the process of assigning a service incident to the correct responsible team, in a timely manner. An incorrect assignment risks additional incident reroutings and increases its time to mitigate by 10x. However, automated incident triage in large cloud services faces many challenges: (1) a highly imbalanced incident distribution from a large number of teams, (2) wide variety in formats of input data or data sources, (3) scaling to meet production-grade requirements, and (4) gaining engineers' trust in using machine learning recommendations. To address these challenges, we introduce DeepTriage, an intelligent incident transfer service combining multiple machine learning techniques - gradient boosted classifiers, clustering methods, and deep neural networks - in an ensemble to recommend the responsible team to triage an incident. Experimental results on real incidents in Microsoft Azure show that our service achieves 82.9% F1 score. For highly impacted incidents, DeepTriage achieves F1 score from 76.3% -- 91.3%. We have applied best practices and state-of-the-art frameworks to scale DeepTriage to handle incident routing for all cloud services. DeepTriage has been deployed in Azure since October 2017 and is used by thousands of teams daily.

【Keywords】: Applied computing; Enterprise computing; Business process management; Computing methodologies; Artificial intelligence; Natural language processing; Machine learning

329. An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images.

【Paper Link】【Pages】:3290-3298

【Authors】: Zekun Li ; Yao-Yi Chiang ; Sasan Tavakkol ; Basel Shbita ; Johannes H. Uhl ; Stefan Leyk ; Craig A. Knoblock

【Abstract】: Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., "Black'' and "Mountain'' vs. "Black Mountain''). This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata that is linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Kartta Labs project at https://github.com/kartta-labs/Project.

【Keywords】: Applied computing; Document management and text processing; Document capture; Document analysis; Graphics recognition and interpretation; Information systems; Information systems applications; Digital libraries and archives

330. Bootstrapping Complete The Look at Pinterest.

【Paper Link】【Pages】:3299-3307

【Authors】: Eileen Li ; Eric Kim ; Andrew Zhai ; Josh Beal ; Kunlong Gu

【Abstract】: Putting together an ideal outfit is a process that involves creativity and style intuition. This makes it a particularly difficult task to automate. Existing styling products generally involve human specialists and a highly curated set of fashion items. In this paper, we will describe how we bootstrapped the Complete The Look (CTL) system at Pinterest. This is a technology that aims to learn the subjective task of "style compatibility" in order to recommend complementary items that complete an outfit. In particular, we want to show recommendations from other categories that are compatible with an item of interest. For example, what are some heels that go well with this cocktail dress? We will introduce our outfit dataset of over 1 million outfits and 4 million objects, a subset of which we will make available to the research community, and describe the pipeline used to obtain and refresh this dataset. Furthermore, we will describe how we evaluate this subjective task and compare model performance across multiple training methods. Lastly, we will share our lessons going from experimentation to working prototype, and how to mitigate failure modes in the production environment. Our work represents one of the first examples of an industrial-scale solution for compatibility-based fashion recommendation.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems

331. Explainable Classification of Brain Networks via Contrast Subgraphs.

【Paper Link】【Pages】:3308-3318

【Authors】: Tommaso Lanciano ; Francesco Bonchi ; Aristides Gionis

【Abstract】: Mining human-brain networks to discover patterns that can be used to discriminate between healthy individuals and patients affected by some neurological disorder, is a fundamental task in neuro-science. Learning simple and interpretable models is as important as mere classification accuracy. In this paper we introduce a novel approach for classifying brain networks based on extracting contrast subgraphs, i.e., a set of vertices whose induced subgraphs are dense in one class of graphs and sparse in the other. We formally define the problem and present an algorithmic solution for extracting contrast subgraphs. We then apply our method to a brain-network dataset consisting of children affected by Autism Spectrum Disorder and children Typically Developed. Our analysis confirms the interestingness of the discovered patterns, which match background knowledge in the neuro-science literature. Further analysis on other classification tasks confirm the simplicity, soundness, and high explainability of our proposal, which also exhibits superior classification accuracy, to more complex state-of-the-art methods.

【Keywords】: Information systems; Information systems applications; Data mining

【Paper Link】【Pages】:3319-3327

【Authors】: Xiangyu Zhao ; Xudong Zheng ; Xiwang Yang ; Xiaobing Liu ; Jiliang Tang

【Abstract】: Online recommendation and advertising are two major income channels for online recommendation platforms (e.g. e-commerce and news feed site). However, most platforms optimize recommending and advertising strategies by different teams separately via different techniques, which may lead to suboptimal overall performances. To this end, in this paper, we propose a novel two-level reinforcement learning framework to jointly optimize the recommending and advertising strategies, where the first level generates a list of recommendations to optimize user experience in the long run; then the second level inserts ads into the recommendation list that can balance the immediate advertising revenue from advertisers and the negative influence of ads on long-term user experience. To be specific, the first level tackles high combinatorial action space problem that selects a subset items from the large item space; while the second level determines three internally related tasks, i.e., (i) whether to insert an ad, and if yes, (ii) the optimal ad and (iii) the optimal location to insert. The experimental results based on real-world data demonstrate the effectiveness of the proposed framework. We have released the implementation code to ease reproductivity.

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Recommender systems; World Wide Web; Online advertising; Web applications; Electronic commerce

333. Fitbit for Chickens?: Time Series Data Mining Can Increase the Productivity of Poultry Farms.

【Paper Link】【Pages】:3328-3336

【Authors】: Alireza Abdoli ; Sara Alaee ; Shima Imani ; Amy C. Murillo ; Alec C. Gerry ; Leslie Hickle ; Eamonn J. Keogh

【Abstract】: Chickens are the most important poultry species in the world. Globally, industrial-scale production systems account for most of the poultry meat and eggs produced. The welfare of these birds matters for both ethical and economic reasons. From an ethical perspective, poultry have a sufficient degree of awareness to suffer pain if their health is poor, or deprivation if poorly housed. From an economic viewpoint, consumers increasingly value poultry welfare, so better market access can be obtained by producers who demonstrate concern for their flocks. Recent advances in sensor technology has allowed the opportunity to record behavioral patterns in chickens, and several research groups have shown that such data can be exploited to enhance chicken welfare. However, classifying chicken behaviors poses several unique challenges which are not observed in the UCR archive or other classic benchmark collections. In particular, some behaviors are manifested in the shape of the subsequences, whereas others only in more abstract features. Most algorithms only work well for one such modality. In addition, our data of interest has classes that greatly differ in duration, and are only weakly labeled, again defying the assumptions of the classic benchmark datasets. In this work, we propose a general-purpose framework to robustly learn and classify from datasets exhibiting these issues. While our experience is with fowl, the lessons we have learned may be more generally applicable to real-world datasets in other domains including manufacturing and human health.

【Keywords】: Information systems; Information systems applications; Data mining

334. CompactETA: A Fast Inference System for Travel Time Prediction.

【Paper Link】【Pages】:3337-3345

【Authors】: Kun Fu ; Fanlin Meng ; Jieping Ye ; Zheng Wang

【Abstract】: Computing estimated time of arrival (ETA) is one of the most important services for online ride-hailing platforms like DiDi and Uber. With billions of service queries per day on such platforms, a fast inference ETA module ensures the efficiency of the overall decision system to guarantee satisfied user experience, as well as saving significant operating cost. In this paper, we develop a novel ETA learning system named as CompactETA, which provides an accurate online travel time inference within 100 microseconds. In the proposed method, we encode high order spatial and temporal dependency into sophisticated representations by applying graph attention network on a spatiotemporal weighted road network graph. We further encode the sequential information of the travel route by positional encoding to avoid the recurrent network structure. The properly learnt representations enable us to apply a very simple multi-layer perceptron model for online real-time inference. Evaluation of both offline experiments and online A/B testing verifies that CompactETA reduces the inference latency by more than 100 times compared to a state-of-the-art system, while maintains competing prediction accuracy.

335. Intelligent Exploration for User Interface Modules of Mobile App with Collective Learning.

【Paper Link】【Pages】:3346-3355

【Authors】: Jingbo Zhou ; Zhenwei Tang ; Min Zhao ; Xiang Ge ; Fuzhen Zhuang ; Meng Zhou ; Liming Zou ; Chenglei Yang ; Hui Xiong

【Abstract】: A mobile app interface usually consists of a set of user interface modules. How to properly design these user interface modules is vital to achieving user satisfaction for a mobile app. However, there are few methods to determine design variables for user interface modules except for relying on the judgment of designers. Usually, a laborious post-processing step is necessary to verify the key change of each design variable. Therefore, there is only a very limited amount of design solutions that can be tested. It is time-consuming and almost impossible to figure out the best design solutions as there are many modules. To this end, we introduce FEELER, a framework to fast and intelligently explore design solutions of user interface modules with a collective machine learning approach. FEELER can help designers quantitatively measure the preference score of different design solutions, aiming to facilitate the designers to conveniently and quickly adjust user interface module. We conducted extensive experimental evaluations on two real-life datasets to demonstrate its applicability in real-life cases of user interface module design in the Baidu App, which is one of the most popular mobile apps in China.

【Keywords】: Computing methodologies; Machine learning; Learning settings; Active learning settings; Human-centered computing; Interaction design; Interaction design process and methods; User interface design

336. Gemini: A Novel and Universal Heterogeneous Graph Information Fusing Framework for Online Recommendations.

【Paper Link】【Pages】:3356-3365

【Authors】: Jixing Xu ; Zhenlong Zhu ; Jianxin Zhao ; Xuanye Liu ; Minghui Shan ; Jiecheng Guo

【Abstract】: Recently, network embedding has been successfully used in recommendation systems. Researchers have made efforts to utilize additional auxiliary information (e.g., social relations of users) to improve performance. However, such auxiliary information lacks compatibility for all recommendation scenarios, thus it is difficult to apply in some industrial scenarios where generality is required. Moreover, the heterogeneous nature between users and items aggravates the difficulty in network information fusion. Many works tried to transform user-item heterogeneous network to two homogeneous graphs (i.e., user-user and item-item), and then fuse information separately. This may limit the representation power of learned embedding due to ignoring the adjacent relationship in the original graph. In addition, the sparsity of user-item interactions is an urgent problem need to be solved. To solve the above problems, we propose a universal and effective framework named Gemini, which only relies on the common interaction logs, avoiding the dependence on auxiliary information and ensuring a better generality. For the purpose of keeping original adjacent relationship, Gemini transforms the original user-item heterogeneous graph into two semi homogeneous graphs from the perspective of users and items respectively. The transformed graphs consist of two types of nodes: network nodes coming from homogeneous nodes and attribute nodes coming from heterogeneous node. Then, the node representation is learned in a homogeneous way, with considering edge embedding at the same time. Simultaneously, the interaction sparsity problem is solved to some extent as the transformed graphs contain the original second-order neighbors. For training efficiently, we also propose an iterative training algorithm to reduce computational complexity. Experimental results on the five datasets and online A/B tests in recommendations of DiDiChuXing show that Gemini outperforms state-of-the-art algorithms.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Learning latent representations; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms

337. Hypergraph Convolutional Recurrent Neural Network.

【Paper Link】【Pages】:3366-3376

【Authors】: Jaehyuk Yi ; Jinkyoo Park

【Abstract】: In this study, we present a hypergraph convolutional recurrent neural network (HGC-RNN), which is a prediction model for structured time-series sensor network data. Representing sensor networks in a graph structure is useful for expressing structural relationships among sensors. Conventional graph structure, however, has a limitation on representing complex structure in real world application, such as shared connections among multiple nodes. We use a hypergraph, which is capable of modeling complicated structures, for structural representation. HGC-RNN performs a hypergraph convolution operation on the input data represented in the hypergraph to extract hidden representations of the input, while considering the structural dependency of the data. HGC-RNN employs a recurrent neural network structure to learn temporal dependency from the data sequence. We conduct experiments to forecast taxi demand in NYC, traffic flow in the overhead hoist transfer system, and gas pressure in a gas regulator. We compare the performance of our method with those of other existing methods, and the result shows that HGC-RNN has strengths over baseline models.

【Keywords】: Computer systems organization; Embedded and cyber-physical systems; Sensor networks; Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Spatial and physical reasoning; Temporal reasoning

338. Towards Building an Intelligent Chatbot for Customer Service: Learning to Respond at the Appropriate Time.

【Paper Link】【Pages】:3377-3385

【Authors】: Che Liu ; Junfeng Jiang ; Chao Xiong ; Yi Yang ; Jieping Ye

【Abstract】: In recent years, intelligent chatbots have been widely used in the field of customer service. One of the key challenges for chatbots to maintain fluent dialogues with customers is how to respond at the appropriate time. However, most of the state-of-the-art chatbots follow the turn-by-turn interaction scheme. Such chatbots respond after each time when a customer sends an utterance, which in some cases leads to inappropriate responses and misleads the process of the dialogues. In this paper, we propose a multi-turn response triggering model (MRTM) to address this problem. MRTM is learned from large-scale human-human dialogues between the customers and the agents with a self-supervised learning scheme. It leverages the semantic matching relationships between the context and the response to train a semantic matching model and obtains the weights of the co-occurring utterances in the context through an asymmetrical self-attention mechanism. The weights are then used to determine whether the given context should be responded to. We conduct extensive experiments on two dialogue datasets collected from the real-world online customer service systems. Results show that MRTM outperforms the baselines by a large margin. Furthermore, we incorporate MRTM into DiDi's customer service chatbot. Based on the ability to identify the appropriate time to respond, the chatbot can incrementally aggregate the information across multiple utterances and make more intelligent responses at the appropriate time.

【Keywords】: Applied computing; Enterprise computing; Enterprise information systems; Enterprise applications; Computing methodologies; Artificial intelligence; Natural language processing; Discourse, dialogue and pragmatics; Machine learning; Machine learning approaches

339. Ads Allocation in Feed via Constrained Optimization.

【Paper Link】【Pages】:3386-3394

【Authors】: Jinyun Yan ; Zhiyuan Xu ; Birjodh Tiwana ; Shaunak Chatterjee

【Abstract】: Social networks and content publishing platforms have newsfeed applications, which show both organic content to drive engagement, and ads to drive revenue. This paper focuses on the problem of ads allocation in a newsfeed to achieve an optimal balance of revenue and engagement. To the best of our knowledge, we are the first to report practical solutions to this business-critical and popular problem in industry.

【Keywords】: Information systems; Information retrieval; Retrieval models and ranking; Rank aggregation; Information systems applications; Computational advertising; World Wide Web; Online advertising; Social advertising

340. USAD: UnSupervised Anomaly Detection on Multivariate Time Series.

【Paper Link】【Pages】:3395-3404

【Authors】: Julien Audibert ; Pietro Michiardi ; Frédéric Guyard ; Sébastien Marti ; Maria A. Zuluaga

【Abstract】: The automatic supervision of IT systems is a current challenge at Orange. Given the size and complexity reached by its IT operations, the number of sensors needed to obtain measurements over time, used to infer normal and abnormal behaviors, has increased dramatically making traditional expert-based supervision methods slow or prone to errors. In this paper, we propose a fast and stable method called UnSupervised Anomaly Detection for multivariate time series (USAD) based on adversely trained autoencoders. Its autoencoder architecture makes it capable of learning in an unsupervised way. The use of adversarial training and its architecture allows it to isolate anomalies while providing fast training. We study the properties of our methods through experiments on five public datasets, thus demonstrating its robustness, training speed and high anomaly detection performance. Through a feasibility study using Orange's proprietary data we have been able to validate Orange's requirements on scalability, stability, robustness, training speed and high performance.

【Keywords】: Applied computing; Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Anomaly detection; Machine learning approaches; Neural networks

341. A Dual Heterogeneous Graph Attention Network to Improve Long-Tail Performance for Shop Search in E-Commerce.

【Paper Link】【Pages】:3405-3415

【Authors】: Xichuan Niu ; Bofang Li ; Chenliang Li ; Rong Xiao ; Haochuan Sun ; Hongbo Deng ; Zhenzhong Chen

【Abstract】: Shop search has become an increasingly important service provided by Taobao, the China's largest e-commerce platform. By using shop search, a user can easily identify the desired shop that provides a full-scale of relevant items matching his information need. With the tremendous growth of users and shops, shop search faces several unique challenging problems: 1) many shop names do not fully express what they sell, i.e., the semantic gap between user query and shop name; 2) due to the lack of user interactions, it is difficult to deliver a good search result for the long-tail queries and retrieve long-tail shops that are highly relevant to a query.

【Keywords】: Information systems; Information retrieval; Retrieval models and ranking; Retrieval tasks and goals; Document filtering

342. Learning with Limited Labels via Momentum Damped & Differentially Weighted Optimization.

【Paper Link】【Pages】:3416-3425

【Authors】: Rishabh Mehrotra ; Ashish Gupta

【Abstract】: As deep learning-based models are deployed more widely in search & recommender systems, system designers often face the issue of gathering large amounts of well-annotated data to train such neural models. While most user-centric systems rely on interaction signals as implicit feedback to train models, such signals are often weak proxies of user satisfaction, as compared to (say) explicit judgments from users, which are prohibitively expensive to collect. In this paper, we consider the task of learning from limited labeled data, wherein we aim at jointly leveraging strong supervision data (e.g. explicit judgments) along with weak supervision data (e.g. implicit feedback or labels from the related task) to train neural models.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Mathematics of computing; Mathematical analysis; Mathematical optimization; Continuous optimization; Stochastic control and optimization

Health Day Papers 8

343. Learning to Simulate Human Mobility.

【Paper Link】【Pages】:3426-3433

【Authors】: Jie Feng ; Zeyu Yang ; Fengli Xu ; Haisu Yu ; Mudan Wang ; Yong Li

【Abstract】: Realistic simulation of a massive amount of human mobility data is of great use in epidemic spreading modeling and related health policy-making. Existing solutions for mobility simulation can be classified into two categories: model-based methods and model-free methods, which are both limited in generating high-quality mobility data due to the complicated transitions and complex regularities in human mobility. To solve this problem, we propose a model-free generative adversarial framework, which effectively integrates the domain knowledge of human mobility regularity utilized in the model-based methods. In the proposed framework, we design a novel self-attention based sequential modeling network as the generator to capture the complicated temporal transitions in human mobility. To augment the learning power of the generator with the advantages of model-based methods, we design an attention-based region network to introduce the prior knowledge of urban structure to generate a meaningful trajectory. As for the discriminator, we design a mobility regularity-aware loss to distinguish the generated trajectory. Finally, we utilize the mobility regularities of spatial continuity and temporal periodicity to pre-train the generator and discriminator to further accelerate the learning procedure. Extensive experiments on two real-life mobility datasets demonstrate that our framework outperforms seven state-of-the-art baselines significantly in terms of improving the quality of simulated mobility data by 35%. Furthermore, in the simulated spreading of COVID-19, synthetic data from our framework reduces MAPE from 5% ~ 10% (baseline performance) to 2%.

【Keywords】: Computing methodologies; Machine learning; Learning settings; Learning from demonstrations; Machine learning approaches; Neural networks; Modeling and simulation; Simulation types and techniques; Agent / discrete models; Information systems; Information systems applications; Data mining; Spatial-temporal systems

344. Data-driven Simulation and Optimization for Covid-19 Exit Strategies.

【Paper Link】【Pages】:3434-3442

【Authors】: Salah Ghamizi ; Renaud Rwemalika ; Maxime Cordy ; Lisa Veiber ; Tegawendé F. Bissyandé ; Mike Papadakis ; Jacques Klein ; Yves Le Traon

【Abstract】: The rapid spread of the Coronavirus SARS-2 is a major challenge that led almost all governments worldwide to take drastic measures to respond to the tragedy. Chief among those measures is the massive lockdown of entire countries and cities, which beyond its global economic impact has created some deep social and psychological tensions within populations. While the adopted mitigation measures (including the lockdown) have generally proven useful, policymakers are now facing a critical question: how and when to lift the mitigation measures? A carefully-planned exit strategy is indeed necessary to recover from the pandemic without risking a new outbreak. Classically, exit strategies rely on mathematical modeling to predict the effect of public health interventions. Such models are unfortunately known to be sensitive to some key parameters, which are usually set based on rules-of-thumb.

【Keywords】: Computing methodologies; Machine learning; Modeling and simulation; Simulation types and techniques

【Paper Link】【Pages】:3443-3450

【Authors】: Jizhou Huang ; Haifeng Wang ; Miao Fan ; An Zhuo ; Yibo Sun ; Ying Li

【Abstract】: The constrained outbreak of COVID-19 in Mainland China has recently been regarded as a successful example of fighting this highly contagious virus. Both the short period (in about three months) of transmission and the sub-exponential increase of confirmed cases in Mainland China have proved that the Chinese authorities took effective epidemic prevention measures, such as case isolation, travel restrictions, closing recreational venues, and banning public gatherings. These measures can, of course, effectively control the spread of the COVID-19 pandemic. Meanwhile, they may dramatically change the human mobility patterns, such as the daily transportation-related behaviors of the public. To better understand the impact of COVID-19 on transportation-related behaviors and to provide more targeted anti-epidemic measures, we use the huge amount of human mobility data collected from Baidu Maps, a widely-used Web mapping service in China, to look into the detail reaction of the people there during the pandemic. To be specific, we conduct data-driven analysis on transportation-related behaviors during the pandemic from the perspectives of 1) means of transportation, 2) type of visited venues, 3) check-in time of venues, 4) preference on "origin-destination'' distance, and 5) "origin-transportation-destination'' patterns. For each topic, we also give our specific insights and policy-making suggestions. Given that the COVID-19 pandemic is still spreading in more than 200 overseas countries, infecting millions of people worldwide, the insights and suggestions provided here may help fight COVID-19.

【Keywords】: Applied computing; Law, social and behavioral sciences; Sociology; Operations research; Transportation; Human-centered computing; Ubiquitous and mobile computing; Empirical studies in ubiquitous and mobile computing

【Paper Link】【Pages】:3451-3457

【Authors】: Shaon Bhatta Shuvo ; Bonaventure C. Molokwu ; Ziad Kobti

【Abstract】: Infectious diseases can spread from an infected person to a susceptible person through direct or indirect physical contact, consequently controlling such types of spread is difficult. However, a proper decision at the initial stage can help control the disease's propagation before it turns into a pandemic. Social distancing and hospital capacity are considered among the most critical parameters to manage these types of conditions. In this paper, we used artificial agent-based simulation modeling to identify the importance of social distancing and hospitals' capacity in terms of the number of beds to shorten the length of an outbreak and reduce the total number of infections and deaths during an epidemic. After simulating the model based on different scenarios in a small artificial society, we learned that shorter social isolation activation delay has a higher impact on reducing the catastrophe. Increasing the hospital's treatment capacity, i.e., the number of isolation beds in the hospitals can become handy when social isolation cannot be activated shortly. The model can be considered a prototype to take proper steps based on the simulations on different parameter settings towards the control of an epidemic.

【Keywords】: Computing methodologies; Artificial intelligence; Distributed artificial intelligence; Multi-agent systems; Planning and scheduling; Multi-agent planning; Modeling and simulation; Simulation types and techniques; Artificial life; Uncertainty quantification

347. Effective Transfer Learning for Identifying Similar Questions: Matching User Questions to COVID-19 FAQs.

【Paper Link】【Pages】:3458-3465

【Authors】: Clara H. McCreery ; Namit Katariya ; Anitha Kannan ; Manish Chablani ; Xavier Amatriain

【Abstract】: People increasingly search online for answers to their medical questions but the rate at which medical questions are asked online significantly exceeds the capacity of qualified people to answer them. This leaves many questions unanswered or inadequately answered. Many of these questions are not unique, and reliable identification of similar questions would enable more efficient and effective question answering schema. COVID-19 has only exacerbated this problem. Almost every government agency and healthcare organization has tried to meet the informational need of users by building online FAQs, but there is no way for people to ask their question and know if it is answered on one of these pages. While many research efforts have focused on the problem of general question similarity, these approaches do not generalize well to domains that require expert knowledge to determine semantic similarity, such as the medical domain. In this paper, we show how a double fine-tuning approach of pretraining a neural network on medical question-answer pairs followed by fine-tuning on medical question-question pairs is a particularly useful intermediate task for the ultimate goal of determining medical question similarity. While other pretraining tasks yield an accuracy below 78.7% on this task, our model achieves an accuracy of 82.6% with the same number of training examples, an accuracy of 80.0% with a much smaller training set, and an accuracy of 84.5% when the full corpus of medical question-answer data is used. We also describe a currently live system that uses the trained model to match user questions to COVID-related FAQs.

【Keywords】: Applied computing; Life and medical sciences; Consumer health; Computing methodologies; Machine learning; Learning settings; Semi-supervised learning settings; Machine learning approaches; Neural networks

348. Hi-COVIDNet: Deep Learning Approach to Predict Inbound COVID-19 Patients and Case Study in South Korea.

【Paper Link】【Pages】:3466-3473

【Authors】: Minseok Kim ; Junhyeok Kang ; Doyoung Kim ; Hwanjun Song ; Hyangsuk Min ; Youngeun Nam ; Dongmin Park ; Jae-Gil Lee

【Abstract】: The escalating crisis of COVID-19 has put people all over the world in danger. Owing to the high contagion rate of the virus, COVID-19 cases continue to increase globally. To further suppress the threat of the COVID-19 pandemic and minimize its damage, it is imperative that each country monitors inbound travelers. Moreover, given that resources for quarantine are often limited, they must be carefully allocated. In this paper, to aid in such allocation by predicting the number of inbound COVID-19 cases, we propose Hi-COVIDNet, which takes advantage of the geographic hierarchy. Hi-COVIDNet is based on a neural network with two-level components, namely, country-level and continent-level encoders, which understand the complex relationships among foreign countries and derive their respective contagion risk to the destination country. An in-depth case study in South Korea with real-world COVID-19 datasets confirmed the effectiveness and practicality of Hi-COVIDNet.

【Keywords】: Applied computing; Life and medical sciences; Computing methodologies; Machine learning; Machine learning approaches; Neural networks

349. Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound Data.

【Paper Link】【Pages】:3474-3484

【Authors】: Chloë Brown ; Jagmohan Chauhan ; Andreas Grammenos ; Jing Han ; Apinan Hasthanasombat ; Dimitris Spathis ; Tong Xia ; Pietro Cicuta ; Cecilia Mascolo

【Abstract】: Audio signals generated by the human body (e.g., sighs, breathing, heart, digestion, vibration sounds) have routinely been used by clinicians as indicators to diagnose disease or assess disease progression. Until recently, such signals were usually collected through manual auscultation at scheduled visits. Research has now started to use digital technology to gather bodily sounds (e.g., from digital stethoscopes) for cardiovascular or respiratory examination, which could then be used for automatic analysis. Some initial work shows promise in detecting diagnostic signals of COVID-19 from voice and coughs. In this paper we describe our data analysis over a large-scale crowdsourced dataset of respiratory sounds collected to aid diagnosis of COVID-19. We use coughs and breathing to understand how discernible COVID-19 sounds are from those in asthma or healthy controls. Our results show that even a simple binary machine learning classifier is able to classify correctly healthy and COVID-19 sounds. We also show how we distinguish a user who tested positive for COVID-19 and has a cough from a healthy user with a cough, and users who tested positive for COVID-19 and have a cough from users with asthma and a cough. Our models achieve an AUC of above 80% across all tasks. These results are preliminary and only scratch the surface of the potential of this type of data and audio-based machine learning. This work opens the door to further investigation of how automatically analysed respiratory patterns could be used as pre-screening signals to aid COVID-19 diagnosis.

【Keywords】: Computing methodologies; Machine learning; Human-centered computing; Human computer interaction (HCI); HCI design and evaluation methods; User studies; Ubiquitous and mobile computing; Information systems; Information systems applications; Data mining

350. Understanding the Urban Pandemic Spreading of COVID-19 with Real World Mobility Data.

【Paper Link】【Pages】:3485-3492

【Authors】: Qianyue Hao ; Lin Chen ; Fengli Xu ; Yong Li

【Abstract】: Facing the worldwide rapid spreading of COVID-19 pandemic, we need to understand its diffusion in the urban environments with heterogeneous population distribution and mobility. However, challenges exist in the choice of proper spatial resolution, integration of mobility data into epidemic modelling, as well as incorporation of unique characteristics of COVID-19.

【Keywords】: Applied computing; Life and medical sciences; Information systems; Data management systems; Information integration; Wrappers (data mining)

Panel 1

351. Fighting a Pandemic: Convergence of Expertise, Data Science and Policy.

【Paper Link】【Pages】:3493-3494

【Authors】: Tina Eliassi-Rad ; Nitesh V. Chawla ; Vittoria Colizza ; Lauren Gardner ; Marcel Salathé ; Samuel V. Scarpino ; Joseph T. Wu

【Abstract】: This panel will address the challenges and opportunities of using data science to fight a pandemic. Of particular interest are real-world cases where using data science helped the fight against the pandemic and cautionary tales of when it hindered that fight.

【Keywords】: Information systems; Information systems applications; Data mining; Security and privacy; Human and societal aspects of security and privacy

Tutorial Abstracts 44

352. From Zero to AI Hero with Automated Machine Learning.

【Paper Link】【Pages】:3495

【Authors】: Aniththa Umamahesan ; Deepak Mukunthu Iyappan Babu

【Abstract】: Automated ML is an emerging field in Machine Learning that helps developers and new data scientists with little data science knowledge build Machine Learning models and solutions without understanding the complexity of Learning Algorithm selection, and Hyper parameter tuning. With Azure Machine Learning's automated machine learning capability, given a dataset and a few configuration parameters, you will get a trained high quality machine learning model for the dataset that you can use for predictions. In this session, you will learn how to use Automated ML for productivity gains, empowering domain experts to build ML based solutions and scale to build several models with Azure Machine Learning's Automated ML.

【Keywords】: Computing methodologies; Machine learning

353. Put Deep Learning to Work: Accelerate Deep Learning through Amazon SageMaker and ML Services.

【Paper Link】【Pages】:3496

【Authors】: Wenming Ye ; Rachel Hu ; Miro Enev

【Abstract】: Deploying deep learning (DL) projects are becoming increasingly more pervasive at enterprises and startups alike. At Amazon, Machine Learning University (MLU)-trained engineers are taking DL to every aspect of Amazon's businesses, beyond just Amazon Go, Alexa, and Robotics.

【Keywords】:

354. Building Forecasting Solutions Using Open-Source and Azure Machine Learning.

【Paper Link】【Pages】:3497-3498

【Authors】: Chenhui Hu ; Vanja Paunic

【Abstract】: Time series forecasting is one of the most important topics in data science. Almost every business needs to predict the future in order to make better decisions and allocate resources more effectively. Examples of time series forecasting use cases are financial forecasting, demand forecasting in logistics for operational planning of assets, demand forecasting for Azure resources, and energy demand forecasting for campus buildings and data centers. The goal of this tutorial is to demonstrate state-of-the-art forecasting approaches to problems in retail and introduce a new repository focusing on best-practices in forecasting domain, along with a library of forecasting utilities [1].

【Keywords】: Applied computing; Operations research; Forecasting

355. How to Calibrate your Neural Network Classifier: Getting True Probabilities from a Classification Model.

【Paper Link】【Pages】:3499-3500

【Authors】: Natalia Culakova ; Dan Murphy ; Joao Gante ; Carlos Ledezma ; Vahan Hovhannisyan ; Alan Mosca

【Abstract】: Research in Machine Learning (ML) for classification tasks has been primarily guided by metrics that derive from a confusion matrix (e.g. accuracy, precision and recall). Several works have highlighted that this has lead to training practices that produce over-confident models and void the assumption that the model learns a probability distribution over the classification targets; this is referred to as miscalibration. Consequently, modern ML architectures struggle to perform in applications where a probabilistic forecaster is needed. Research efforts on calibration techniques have explored the possibility of recovering probability distributions from traditional architectures. This tutorial covers the key concepts required to understand the motivations behind calibration and aims at providing participants with the tools that they require assess the calibration of ML models and calibrate them when required.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by classification; Machine learning approaches; Neural networks

356. Neural Structured Learning: Training Neural Networks with Structured Signals.

【Paper Link】【Pages】:3501-3502

【Authors】: Arjun Gopalan ; Da-Cheng Juan ; Cesar Ilharco Magalhaes ; Chun-Sung Ferng ; Allan Heydon ; Chun-Ta Lu ; Philip Pham ; George Yu

【Abstract】: We present Neural Structured Learning (NSL) in TensorFlow [2], a new learning paradigm to train neural networks by leveraging structured signals in addition to feature inputs. Structure can be explicit as represented by a graph, or implicit, either induced by adversarial perturbation or inferred using techniques like embedding learning. NSL is open-sourced as part of the TensorFlow [3] ecosystem and is widely used in Google across many products and services. In this tutorial, we provide an overview of the NSL framework including various libraries, tools, and APIs as well as demonstrate the practical use of NSL in different applications. The NSL website is hosted at www.tensorflow.org/neural_structured_learning, which includes details about the theoretical foundations of the technology, extensive API documentation, and hands-on tutorials.

357. Accelerating and Expanding End-to-End Data Science Workflows with DL/ML Interoperability Using RAPIDS.

【Paper Link】【Pages】:3503-3504

【Authors】: Bartley Richardson ; Bradley Rees ; Tom Drabas ; Even Oldridge ; David A. Bader ; Rachel Allen

【Abstract】: The lines between data science (DS), machine learning (ML), deep learning (DL), and data mining continue to be blurred and removed. This is great as it ushers in vast amounts of capabilities, but it brings increased complexity and a vast number of tools/techniques. It's not uncommon for DL engineers to use one set of tools for data extraction/cleaning and then pivot to another library for training their models. After training and inference, it's common to then move data yet again by another set of tools for post-processing. The RAPIDS suite of open source libraries not only provides a method to execute and accelerate these tasks using GPUs with familiar APIs, but it also provides interoperability with the broader open source community and DL tools while removing unnecessary serializations that slow down workflows. GPUs provide massive parallelization that DL has leveraged for some time, and RAPIDS provides the missing pieces that extend this computing power to more traditional yet important DS and ML tasks (e.g., ETL, modeling). Complete pipelines can be built that encompass everything, including ETL, feature engineering, ML/DL modeling, inference, and visualization, all while removing typical serialization costs and affording seamless interoperability between libraries. All experiments using RAPIDS can effortlessly be scheduled, logged and reviewed using existing public cloud options. Join our engineers and data scientists as they walk through a collection of DS and ML/DL engineering problems that show how RAPIDS running on Azure ML can be used for end-to-end, entirely GPU pipelines. This tutorial includes specifics on how to use RAPIDS for feature engineering, interoperability with common ML/DL packages, and creating GPU native visualizations using cuxfilter. The use cases presented here give attendees a hands-on approach to using RAPIDS components as part of a larger workflow, seamlessly integrating with other libraries (e.g., TensorFlow) and visualization packages.

【Keywords】: Computer systems organization; Architectures; Other architectures; Neural networks; Computing methodologies; Machine learning; Machine learning algorithms; General and reference; Cross-computing tools and techniques; Performance; Document types; Surveys and overviews

358. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters.

【Paper Link】【Pages】:3505-3506

【Authors】: Jeff Rasley ; Samyam Rajbhandari ; Olatunji Ruwase ; Yuxiong He

【Abstract】: Explore new techniques in Microsoft's open source library called DeepSpeed, which advances large model training by improving scale, speed, cost, and usability, unlocking the ability to train 100-billion-parameter models. DeepSpeed is compatible with PyTorch. One piece of our library, called ZeRO, is a new parallelized optimizer that greatly reduces the resources needed for model and data parallelism while massively increasing the number of parameters that can be trained. Researchers have used these breakthroughs to create Turing Natural Language Generation (Turing-NLG), which at the time of its release was the largest publicly known language model at 17 billion parameters. In addition we will also go over our latest transformer kernel advancements that led the DeepSpeed team to achieve the world fastest BERT pretraining record.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches

359. Robust Deep Learning Methods for Anomaly Detection.

【Paper Link】【Pages】:3507-3508

【Authors】: Raghavendra Chalapathy ; Nguyen Lu Dang Khoa ; Sanjay Chawla

【Abstract】: Anomaly detection is an important problem that has been well-studied within diverse research areas and application domains. A robust anomaly detection system identifies rare events and patterns in the absence of labelled data. The identified patterns provide crucial insights about both the fidelity of the data and deviations in the underlying data-generating process. For example a surveillance system designed to monitor the emergence of new epidemics will use a robust anomaly detection methods to separate spurious associations from genuine indicators of an epidemic with minimal lag time.

【Keywords】: Computing methodologies; Machine learning; Machine learning algorithms

360. Faster, Simpler, More Accurate: Practical Automated Machine Learning with Tabular, Text, and Image Data.

【Paper Link】【Pages】:3509-3510

【Authors】: Jonas Mueller ; Xingjian Shi ; Alexander J. Smola

【Abstract】: Automated machine learning (AutoML) offers the promise of translating raw data into accurate predictions with just a few lines of code. Rather than relying on human time/effort and manual experimentation, models can be improved by simply letting the AutoML system run for more time. In this hands-on tutorial, we demonstrate fundamental techniques that enable powerful AutoML. We consider standard supervised learning tasks on various types of data including tables, text, images, as well as multi-modal data comprised of multiple types. Rather than technical descriptions of how individual ML models work, we emphasize how to best use models within an overall ML pipeline that takes in raw training data and outputs pre-dictions for test data. A major focus of our tutorial is on automating deep learning, a class of powerful techniques that are cumbersome to manage manually. Despite this, hardly any educational material describes their successful automation. Each topic covered in the tutorial is accompanied by a hands-on Jupyter notebook that implements best practices (which will be available on Github before and after the tutorial). Most of this code is adopted from AutoGluon (autogluon.mxnet.io), a recent AutoML toolkit for automated deep learning that is both state-of-the-art and easy-to-use.

【Keywords】: Computing methodologies; Machine learning; Cross-validation; Learning paradigms; Supervised learning; Supervised learning by classification; Supervised learning by regression; Machine learning algorithms; Ensemble methods; Machine learning approaches; Classification and regression trees; Neural networks; Modeling and simulation; Model development and analysis; Social and professional topics; Professional topics; Computing and business; Automation

361. Intelligible and Explainable Machine Learning: Best Practices and Practical Challenges.

【Paper Link】【Pages】:3511-3512

【Authors】: Rich Caruana ; Scott Lundberg ; Marco Túlio Ribeiro ; Harsha Nori ; Samuel Jenkins

【Abstract】: Learning methods such as boosting and deep learning have made ML models harder to understand and interpret. This puts data scientists and ML developers in the position of often having to make a tradeoff between accuracy and intelligibility. Research in IML (Interpretable Machine Learning) and XAI (Explainable AI) focus on minimizing this trade-off by developing more accurate interpretable models and by developing new techniques to explain black-box models. Such models and techniques make it easier for data scientists, engineers and model users to debug models and achieve important objectives such as ensuring the fairness of ML decisions and the reliability and safety of AI systems. In this tutorial, we present an overview of various interpretability methods and provide a framework for thinking about how to choose the right explanation method for different real-world scenarios. We will focus on the application of XAI in practice through a variety of case studies from domains such as healthcare, finance, and bias and fairness. Finally, we will present open problems and research directions for the data mining and machine learning community. What audience will learn: When and how to use a variety of machine learning interpretability methods through case studies of real-world situations. The difference between glass-box and black-box explanation methods and when to use them. How to use open source interpretability toolkits that are now available

【Keywords】: Hardware; Electronic design automation; Methodologies for EDA; Best practices for EDA

362. Dealing with Bias and Fairness in Data Science Systems: A Practical Hands-on Tutorial.

【Paper Link】【Pages】:3513-3514

【Authors】: Pedro Saleiro ; Kit T. Rodolfa ; Rayid Ghani

【Abstract】: Tackling issues of bias and fairness when building and deploying data science systems has received increased attention from the research community in recent years, yet a lot of the research has focused on theoretical aspects and very limited set of application areas and data sets. There is a lack of 1) practical training materials, 2) methodologies, and 3) tools for researchers and developers working on real-world algorithmic decision making system to deal with issues of bias and fairness. Today, treating bias and fairness as primary metrics of interest, and building, selecting, and validating models using those metrics is not standard practice for data scientists. In this hands-on tutorial we will try to bridge the gap between research and practice, by deep diving into algorithmic fairness, from metrics and definitions to practical case studies, including bias audits using the Aequitas toolkit (http://github.com/dssg/aequitas). By the end of this hands-on tutorial, the audience will be familiar with bias mitigation frameworks and tools to help them making decisions during a project based on intervention and deployment contexts in which their system will be used.

【Keywords】: Computing methodologies; Machine learning; General and reference; Cross-computing tools and techniques; Evaluation

363. Deep Learning for Search and Recommender Systems in Practice.

【Paper Link】【Pages】:3515-3516

【Authors】: Zhoutong Fu ; Huiji Gao ; Weiwei Guo ; Sandeep Kumar Jha ; Jun Jia ; Xiaowei Liu ; Bo Long ; Jun Shi ; Sida Wang ; Mingzhou Zhou

【Abstract】: In this talk, we will go over the components of personalized search and recommender systems and demonstrate the applications of various deep learning techniques along the way.

【Keywords】: Computing methodologies; Artificial intelligence

364. Computer Vision: Deep Dive into Object Segmentation Approaches.

【Paper Link】【Pages】:3517-3518

【Authors】: Yuanbo Wang ; Osama Sakhi ; Ala Eddine Ayadi ; Matthew S. Hagen ; Estelle Afshar

【Abstract】: Image segmentation is the task of associating pixels in an image with their respective object class labels. It has a wide range of applications in many industries including healthcare, transportation, robotics, fashion, home improvement, and tourism. Many deep learning-based approaches have been developed for image-level object recognition and pixel-level scene understanding - with the latter requiring a much denser annotation of scenes with a large set of objects. This tutorial provides an end-to-end pipeline for performing image segmentation using the state-of-art deep learning approaches and public datasets. The hands-on session will provide instructions for dataset customization, transformation, and training, validating, and testing segmentation models. The goal of this tutorial is to provide participants with a strong understanding of building image segmentation models for downstream applications.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision problems; Image segmentation

365. In Search for a Cure: Recommendation With Knowledge Graph on CORD-19.

【Paper Link】【Pages】:3519-3520

【Authors】: Iris Shen ; Le Zhang ; Jianxun Lian ; Chieh-Han Wu ; Miguel González-Fierro ; Andreas Argyriou ; Tao Wu

【Abstract】: The whole globe has cranked up for coping with the COVID-19 situation. The hands-on tutorial targets at providing a comprehensive and pragmatic end-to-end walk-through for building an academic research paper recommender for the use case of COVID-19 related study, with the help of knowledge graph technology. The code examples that demonstrate the theories are reproducible and can hopefully provide value for researchers to build tools that support conducting research to find a cure to COVID-19.

【Keywords】: Information systems; Data management systems; Database design and models; Graph-based database models; Information retrieval; Retrieval tasks and goals; Recommender systems; World Wide Web; Web searching and information discovery; Personalization

366. Scalable Graph Neural Networks with Deep Graph Library.

【Paper Link】【Pages】:3521-3522

【Authors】: Da Zheng ; Minjie Wang ; Quan Gan ; Zheng Zhang ; George Karypis

【Abstract】: Learning from graph and relational data plays a major role in many applications including social network analysis, marketing, e-commerce, information retrieval, knowledge modeling, medical and biological sciences, engineering, and others. In the last few years, Graph Neural Networks (GNNs) have emerged as a promising new supervised learning framework capable of bringing the power of deep representation learning to graph and relational data. This ever-growing body of research has shown that GNNs achieve state-of-the-art performance for problems such as link prediction, fraud detection, target-ligand binding activity prediction, knowledge-graph completion, and product recommendations. In practice, many of the real-world graphs are very large. It is urgent to have scalable solutions to train GNN on large graphs efficiently.

【Keywords】: Computer systems organization; Architectures; Other architectures; Neural networks; Information systems; Information systems applications; Computing platforms

367. Introduction to Computer Vision and Real Time Deep Learning-based Object Detection.

【Paper Link】【Pages】:3523-3524

【Authors】: James G. Shanahan ; Liang Dai

【Abstract】: Computer vision (CV) is a field of artificial intelligence that trains computers to interpret and understand the visual world for a variety of exciting downstream tasks such as self-driving cars, checkout-less shopping, smart cities, cancer detection, and more. The field of CV has been revolutionized by deep learning over the last decade. This tutorial looks under the hood of modern day CV systems, and builds out some of these tech pipelines in a Jupyter Notebook using Python, OpenCV, Keras and Tensorflow. While the primary focus is on digital images from cameras and videos, this tutorial will also introduce 3D point clouds, and classification and segmentation algorithms for processing them.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision problems; Object detection; Computer vision tasks; Scene understanding

368. Building Recommender Systems with PyTorch.

【Paper Link】【Pages】:3525-3526

【Authors】: Dheevatsa Mudigere ; Maxim Naumov ; Joe Spisak ; Geeta Chauhan ; Narine Kokhlikyan ; Amanpreet Singh ; Vedanuj Goswami

【Abstract】: In this tutorial we show how to build deep learning recommendation systems and resolve the associated interpretability, integrity and privacy challenges. We start with an overview of the PyTorch framework, features that it offers and a brief review of the evolution of recommendation models. We delineate their typical components and build a proxy deep learning recommendation model (DLRM) in PyTorch. Then, we discuss how to interpret recommendation system results as well as how to address the corresponding integrity and quality challenges.

【Keywords】: Computing methodologies; Artificial intelligence; Machine learning; Parallel computing methodologies

369. Causal Inference Meets Machine Learning.

【Paper Link】【Pages】:3527-3528

【Authors】: Peng Cui ; Zheyan Shen ; Sheng Li ; Liuyi Yao ; Yaliang Li ; Zhixuan Chu ; Jing Gao

【Abstract】: Causal inference has numerous real-world applications in many domains such as health care, marketing, political science and online advertising. Treatment effect estimation, a fundamental problem in causal inference, has been extensively studied in statistics for decades. However, traditional treatment effect estimation methods may not well handle large-scale and high-dimensional heterogeneous data. In recent years, an emerging research direction has attracted increasing attention in the broad artificial intelligence field, which combines the advantages of traditional treatment effect estimation approaches (e.g., matching estimators) and advanced representation learning approaches (e.g., deep neural networks). In this tutorial, we will introduce both traditional and state-of-the-art representation learning algorithms for treatment effect estimation. Background about causal inference, counterfactuals and matching estimators will be covered as well. We will also showcase promising applications of these methods in different application domains.

【Keywords】: Information systems; Information systems applications; Data mining

370. Fairness in Machine Learning for Healthcare.

【Paper Link】【Pages】:3529-3530

【Authors】: Muhammad Aurangzeb Ahmad ; Arpit Patel ; Carly Eckert ; Vikas Kumar ; Ankur Teredesai

【Abstract】: The issue of bias and fairness in healthcare has been around for centuries. With the integration of AI in healthcare the potential to discriminate and perpetuate unfair and biased practices in healthcare increases many folds The tutorial focuses on the challenges, requirements and opportunities in the area of fairness in healthcare AI and the various nuances associated with it. The problem healthcare as a multi-faceted systems level problem that necessitates careful of different notions of fairness in healthcare to corresponding concepts in machine learning is elucidated via different real world examples.

【Keywords】: Applied computing; Life and medical sciences; Health care information systems; Health informatics; Computing methodologies; Artificial intelligence; Machine learning; Machine learning algorithms

371. Learning from All Types of Experiences: A Unifying Machine Learning Perspective.

【Paper Link】【Pages】:3531-3532

【Authors】: Zhiting Hu ; Eric P. Xing

【Abstract】: Contemporary Machine Learning and AI research has resulted in thousands of models (e.g., numerous deep networks, graphical models), learning paradigms (e.g., supervised, unsupervised, active, reinforcement, adversarial learning), optimization techniques (e.g., all kinds of optimization or stochastic sampling algorithms), not mentioning countless approximation heuristics, tuning tricks, and black-box oracles, plus combinations of all above. While pushing the field forward rapidly, these results also contributed to making ML/AI more like an alchemist's crafting workshop rather than a modern chemist's periodic table. It not only makes mastering existing ML techniques extremely difficult, but also makes standardized, reusable, repeatable, reliable, and explainable practice and further development of ML/AI products extremely costly, if possible at all.

【Keywords】: Computing methodologies; Machine learning

372. Advances in Recommender Systems: From Multi-stakeholder Marketplaces to Automated RecSys.

【Paper Link】【Pages】:3533-3534

【Authors】: Rishabh Mehrotra ; Ben Carterette ; Yong Li ; Quanming Yao ; Chen Gao ; James T. Kwok ; Qiang Yang ; Isabelle Guyon

【Abstract】: The tutorial focuses on two major themes of recent advances in recommender systems: Part A: Recommendations in a Marketplace: Multi-sided marketplaces are steadily emerging as valuable ecosystems in many applications (e.g. Amazon, AirBnb, Uber), wherein the platforms have customers not only on the demand side (e.g. users), but also on the supply side (e.g. retailer). This tutorial focuses on designing search & recommendation frameworks that power such multi-stakeholder platforms. We discuss multi-objective ranking/recommendation techniques, discuss different ways in which stakeholders specify their objectives, highlight user specific characteristics (e.g. user receptivity) which could be leveraged when developing joint optimization modules and finally present a number of real world case-studies of such multi-stakeholder platforms.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Reinforcement learning; Sequential decision making; Information systems; World Wide Web; Web searching and information discovery

373. Physics Inspired Models in Artificial Intelligence.

【Paper Link】【Pages】:3535-3536

【Authors】: Muhammad Aurangzeb Ahmad ; Sener Özönder

【Abstract】: Ideas originating in physics have informed progress in artificial intelligence and machine learning for many decades. However the pedigree of many such ideas is oft neglected in the Computer Science community. The tutorial focuses on current and past ideas from physics that have helped in furthering AI and machine learning. Recent advances in physics inspired ideas in AI are also explored especially how insights from physics may hold the promise of opening the black box of deep learning. Lastly, current and future trends in this area and outlines of a research agenda on how physics-inspired models can benefit AI machine learning is given.

【Keywords】: Applied computing; Physical sciences and engineering; Physics; Computing methodologies; Artificial intelligence; Philosophical/theoretical foundations of artificial intelligence; Machine learning; Machine learning approaches

374. Scientific Text Mining and Knowledge Graphs.

【Paper Link】【Pages】:3537-3538

【Authors】: Meng Jiang ; Jingbo Shang

【Abstract】: Unstructured scientific text, in various forms of textual artifacts, including manuscripts, publications, patents, and proposals, is used to store the tremendous wealth of knowledge discovered after weeks, months, and years, developing hypotheses, working in the lab or clinic, and analyzing results. A grand challenge on data mining research is to develop effective methods for transforming the scientific text into well-structured forms (e.g., ontology, taxonomy, knowledge graphs), so that machine intelligent systems can build on them for hypothesis generation and validation. In this tutorial, we provide a comprehensive overview on recent research and development in this direction. First, we introduce a series of text mining methods that extract phrases, entities, scientific concepts, relations, claims, and experimental evidence. Then we discuss methods that construct and learn from scientific knowledge graphs for accurate search, document classification, and exploratory analysis. Specifically, we focus on scalable, effective, weakly supervised methods that work on text in sciences (e.g., chemistry, biology).

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Information extraction; Information systems applications; Data mining; World Wide Web

375. Learning with Small Data.

【Paper Link】【Pages】:3539-3540

【Authors】: Huaxiu Yao ; Xiaowei Jia ; Vipin Kumar ; Zhenhui Li

【Abstract】: In the era of big data, data-driven methods have become increasingly popular in various applications, such as image recognition, traffic signal control, fake news detection. The superior performance of these data-driven approaches relies on large-scale labeled training data, which are probably inaccessible in real-world applications, i.e., "small (labeled) data" challenge. Examples include predicting emergent events in a city, detecting emerging fake news, and forecasting the progression of conditions for rare diseases. In most scenarios, people care about these small data cases most and thus improving the learning effectiveness of machine learning algorithms with small labeled data has been a popular research topic.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Multi-task learning; Transfer learning; Information systems; Information systems applications; Data mining

376. Adversarial Attacks and Defenses: Frontiers, Advances and Practice.

【Paper Link】【Pages】:3541-3542

【Authors】: Han Xu ; Yaxin Li ; Wei Jin ; Jiliang Tang

【Abstract】: Deep neural networks (DNN) have achieved unprecedented success in numerous machine learning tasks in various domains. However, the existence of adversarial examples leaves us a big hesitation when applying DNN models on safety-critical tasks such as autonomous vehicles and malware detection. These adversarial examples are intentionally crafted instances, either appearing in the train or test phase, which can fool the DNN models to make severe mistakes. Therefore, people are dedicated to devising more robust models to resist adversarial examples, but usually they are broken by new stronger attacks. This arms-race between adversarial attacks and defenses has been drawn increasing attention in recent years. In this tutorial, we provide a comprehensive overview on the frontiers and advances of adversarial attacks and their countermeasures. In particular, we give a detailed introduction of different types of attacks under different scenarios, including evasion and poisoning attacks, white-box and black box attacks. We will also discuss how the defending strategies develop to compete against these attacks, and how new attacks come out to break these defenses. Moreover, we will discuss the story of adversarial attacks and defenses in other data domains, especially in graph structured data. Then, we introduce DeepRobust, a Pytorch adversarial learning library which aims to build a comprehensive and easy-to-use platform to foster this research field. Finally, we summarize the tutorial with discussions on open issues and challenges about adversarial attacks and defenses. Via our tutorial, our audience can grip the main idea and key approaches of the game between adversarial attacks and defenses.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks; Networks; Network properties; Network reliability; Software and its engineering; Software organization and properties; Extra-functional properties; Software safety

【Paper Link】【Pages】:3543-3544

【Authors】: Xin Luna Dong ; Hannaneh Hajishirzi ; Colin Lockard ; Prashant Shiralkar

【Abstract】: How do we surface the large amount of information present in HTML documents on the Web, from news articles to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for information extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision.

【Keywords】: Computing methodologies; Artificial intelligence; Natural language processing; Information extraction; Information systems; World Wide Web; Web mining; Data extraction and integration

378. Recent Advances on Graph Analytics and Its Applications in Healthcare.

【Paper Link】【Pages】:3545-3546

【Authors】: Fei Wang ; Peng Cui ; Jian Pei ; Yangqiu Song ; Chengxi Zang

【Abstract】: Graph is a natural representation encoding both the features of the data samples and relationships among them. Analysis with graphs is a classic topic in data mining and many techniques have been proposed in the past. In recent years, because of the rapid development of data mining and knowledge discovery, many novel graph analytics algorithms have been proposed and successfully applied in a variety of areas. The goal of this tutorial is to summarize the graph analytics algorithms developed recently and how they have been applied in healthcare. In particular, our tutorial will cover both the technical advances and the application in healthcare. On the technical aspect, we will introduce deep network embedding techniques, graph neural networks, knowledge graph construction and inference, graph generative models and graph neural ordinary differential equation models. On the healthcare side, we will introduce how these methods can be applied in predictive modeling of clinical risks (e.g., chronic disease onset, in-hospital mortality, condition exacerbation, etc.) and disease subtyping with multi-modal patient data (e.g., electronic health records, medical image and multi-omics), knowledge discovery from biomedical literature and integration with data-driven models, as well as pharmaceutical research and development (e.g., de-novo chemical compound design and optimization, patient similarity for clinical trial recruitment and pharmacovigilance). We will conclude the whole tutorial with a set of potential issues and challenges such as interpretability, fairness and security. In particular, considering the global pandemic of COVID-19, we will also summarize the existing research that have already leveraged graph analytics to help with the understanding the mechanism, transmission, treatment and prevention of COVID-19, as well as point out the available resources and potential opportunities for future research.

【Keywords】: Applied computing; Life and medical sciences; Health informatics; Mathematics of computing; Discrete mathematics; Graph theory; Graph algorithms; Theory of computation; Design and analysis of algorithms; Graph algorithms analysis

379. Tutorial on Human-Centered Explainability for Healthcare.

【Paper Link】【Pages】:3547-3548

【Authors】: Prithwish Chakraborty ; Bum Chul Kwon ; Sanjoy Dey ; Amit Dhurandhar ; Daniel Gruen ; Kenney Ng ; Daby Sow ; Kush R. Varshney

【Abstract】: In recent years, the rapid advances in Artificial Intelligence (AI) techniques along with an ever-increasing availability of healthcare data have made many novel analyses possible. Significant successes have been observed in a wide range of tasks such as next diagnosis prediction, AKI prediction, adverse event predictions including mortality and unexpected hospital re-admissions. However, there has been limited adoption and use in the clinical practice of these methods due to their black-box nature. A significant amount of research is currently focused on making such methods more interpretable or to make post-hoc explanations more accessible. However, most of this work is done at a very low level and as a result, may not have a direct impact at the point-of-care. This tutorial will provide an overview of the landscape of different approaches that have been developed for explainability in healthcare. Specifically, we will present the problem of explainability as it pertains to various personas involved in healthcare viz. data scientists, clinical researchers, and clinicians. We will chart out the requirements for such personas and present an overview of the different approaches that can address such needs. We will also walk-through several use-cases for such approaches. In this process, we will provide a brief introduction to explainability, charting its different dimensions as well as covering some relevant interpretability methods spanning such dimensions. We will touch upon some practical guides for explainability and provide a brief survey of open source tools such as the IBM AI Explainability 360 Open Source Toolkit.

【Keywords】: Applied computing; Life and medical sciences; Health informatics; Computing methodologies; Artificial intelligence

380. Recent Advances in Multimodal Educational Data Mining in K-12 Education.

【Paper Link】【Pages】:3549-3550

【Authors】: Zitao Liu ; Songfan Yang ; Jiliang Tang ; Neil Heffernan ; Rose Luckin

【Abstract】: Recently we have seen a rapid rise in the amount of education data available through the digitization of education. This huge amount of education data usually exhibits in a mixture form of images, videos, speech, texts, etc. It is crucial to consider data from different modalities to build successful applications in AI in education (AIED). This tutorial targets AI researchers and practitioners who are interested in applying state-of-the-art multimodal machine learning techniques to tackle some of the hard-core AIED tasks. These include tasks such as automatic short answer grading, student assessment, class quality assurance, knowledge tracing, etc.

【Keywords】: Applied computing; Education; Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Natural language processing

381. Tutorial on Online User Engagement: Metrics and Optimization.

【Paper Link】【Pages】:3551-3552

【Authors】: Liangjie Hong ; Mounia Lalmas

【Abstract】: User engagement plays a central role in companies operating online services, such as search engines, news portals, e-commerce sites, entertainment services, and social networks. A main challenge is to leverage collected knowledge about the daily online behavior of millions of users to understand what engages them short-term and more importantly long-term. Two critical steps of improving user engagement are metrics and their optimization. The most common way that engagement is measured is through various online metrics, acting as proxy measures of user engagement. This tutorial will review these metrics, their advantages and drawbacks, and their appropriateness to various types of online services. Once metrics are defined, how to optimize them will become the key issue. We will survey methodologies including machine learning models and experimental designs that are utilized to optimize these metrics via direct or indirect ways. As case studies, we will focus on four types of services, news, search, entertainment, and e-commerce.

【Keywords】: Information systems; Information retrieval; Evaluation of retrieval results; Retrieval tasks and goals; Recommender systems; World Wide Web; Web applications; Electronic commerce

382. Data Pricing - From Economics to Data Science.

【Paper Link】【Pages】:3553-3554

【Authors】: Jian Pei

【Abstract】: Data are invaluable. How can we assess the value of data objectively and quantitatively? Pricing data, or information goods in general, has been studied and practiced in dispersed areas and principles, such as economics, data management, data mining, electronic commerce, and marketing. In this tutorial, we present a unified and comprehensive overview of this important direction. We examine various motivations behind data pricing, understand the economics of data pricing, review the development and evolution of pricing models, and compare the proposals of marketplaces of data. We cover both digital products, such as ebooks and MP3 music, and data products, such as data sets, data queries and machine learning models. We also connect data pricing with the highly related areas, such as cloud service pricing, privacy pricing, and decentralized privacy preserving infrastructure like blockchains.

【Keywords】: General and reference; Document types; Surveys and overviews; Information systems; Information systems applications; Data mining; World Wide Web; Online advertising; Web applications; Electronic commerce; Electronic data interchange; Online auctions; Online shopping; Social and professional topics; Computing / technology policy; Intellectual property; Database protection laws; Digital rights management; Soft intellectual property; Privacy policies

383. Deep Graph Learning: Foundations, Advances and Applications.

【Paper Link】【Pages】:3555-3556

【Authors】: Yu Rong ; Tingyang Xu ; Junzhou Huang ; Wenbing Huang ; Hong Cheng ; Yao Ma ; Yiqi Wang ; Tyler Derr ; Lingfei Wu ; Tengfei Ma

【Abstract】: Many real data come in the form of non-grid objects, i.e. graphs, from social networks to molecules. Adaptation of deep learning from grid-alike data (e.g. images) to graphs has recently received unprecedented attention from both machine learning and data mining communities, leading to a new cross-domain field---Deep Graph Learning (DGL). Instead of painstaking feature engineering, DGL aims to learn informative representations of graphs in an end-to-end manner. It has exhibited remarkable success in various tasks, such as node/graph classification, link prediction, etc.

【Keywords】: Computing methodologies; Machine learning; Machine learning approaches; Neural networks

【Paper Link】【Pages】:3557-3558

【Authors】: Chuxu Zhang ; Meng Jiang ; Xiangliang Zhang ; Yanfang Ye ; Nitesh V. Chawla

【Abstract】: In today's information and computational society, complex systems are often modeled as multi-modal networks associated with heterogeneous structural relation, unstructured attribute/content, temporal context, or their combinations. The abundant information in multi-modal network requires both a domain understanding and large exploratory search space when doing feature engineering for building customized intelligent solutions in response to different purposes. Therefore, automating the feature discovery through representation learning in multi-modal networks has become essential for many applications. In this tutorial, we systematically review the area of multi-modal network representation learning, including a series of recent methods and applications. These methods will be categorized and introduced in the perspectives of unsupervised, semi-supervised and supervised learning, with corresponding real applications respectively. In the end, we conclude the tutorial and raise open discussions. The authors of this tutorial are active and productive researchers in this area.

【Keywords】: Computing methodologies; Machine learning; Information systems; Data management systems; Database design and models; Graph-based database models; Network data models; Information systems applications; Data mining

385. Data Science for the Real Estate Industry.

【Paper Link】【Pages】:3559-3560

【Authors】: Ron Bekkerman ; Vanja Josifovski ; Foster J. Provost

【Abstract】: World's major industries, such as Financial Services, Telecom, Advertising, Healthcare, Education, etc, have attracted the attention of the KDD community for decades. Hundreds of KDD papers have been published on topics related to these industries and dozens of workshops organized---some of which have become an integral part of the conference agenda (e.g. the Health Day). Somewhat unexpectedly, the KDD conference has barely addressed the real estate industry, despite its enormous size and prominence. The reason for that apparent mismatch is two-fold: (a) until recently, the real estate industry did not appreciate the value data science methods could add (with some exceptions, such as econometrics methods for creating real-estate price indices); (b) the Data Science community has not been aware of challenging real estate problems that are perfectly suited to its methods. This tutorial provides a step towards resolving this issue. We provide an introduction to real estate for data scientists, and outline a spectrum of data science problems, many of which are being tackled by new "prop-tech" companies, while some are yet to be approached. We present concrete examples from three of these companies (where the authors work): Airbnb -- the most popular short-term rental marketplace, Cherre -- a real estate data integration platform, and Compass -- the largest independent real estate brokerage in the U.S.

【Keywords】: Computing methodologies; Machine learning; Machine learning algorithms; Information systems; Information systems applications; Data mining; Decision support systems; Social and professional topics; Professional topics; Computing and business; Software and its engineering; Software organization and properties; Software system structures; Ultra-large-scale systems

386. Overview and Importance of Data Quality for Machine Learning Tasks.

【Paper Link】【Pages】:3561-3562

【Authors】: Abhinav Jain ; Hima Patel ; Lokesh Nagalapatti ; Nitin Gupta ; Sameep Mehta ; Shanmukha C. Guttula ; Shashank Mujumdar ; Shazia Afzal ; Ruhi Sharma Mittal ; Vitobha Munigala

【Abstract】: It is well understood from literature that the performance of a machine learning (ML) model is upper bounded by the quality of the data. While researchers and practitioners have focused on improving the quality of models (such as neural architecture search and automated feature selection), there are limited efforts towards improving the data quality. One of the crucial requirements before consuming datasets for any application is to understand the dataset at hand and failure to do so can result in inaccurate analytics and unreliable decisions. Assessing the quality of the data across intelligently designed metrics and developing corresponding transformation operations to address the quality gaps helps to reduce the effort of a data scientist for iterative debugging of the ML pipeline to improve model performance. This tutorial highlights the importance of analysing data quality in terms of its value for machine learning applications. This tutorial surveys all the important data quality related approaches discussed in literature, focusing on the intuition behind them, highlighting their strengths and similarities, and illustrates their applicability to real-world problems. Finally we will discuss the interesting work IBM Research is doing in this space.

【Keywords】: Computing methodologies; Machine learning

387. Interpreting and Explaining Deep Neural Networks: A Perspective on Time Series Data.

【Paper Link】【Pages】:3563-3564

【Authors】: Jaesik Choi

【Abstract】: Explainable and interpretable machine learning models and algorithms are important topics which have received growing attention from research, application and administration. Many complex Deep Neural Networks (DNNs) are often perceived as black-boxes. Researchers would like to be able to interpret what the DNN has learned in order to identify biases and failure models and improve models. In this tutorial, we will provide a comprehensive overview on methods to analyze deep neural networks and an insight how those interpretable and explainable methods help us understand time series data.

【Keywords】: Computing methodologies; Artificial intelligence; Knowledge representation and reasoning; Temporal reasoning

388. Edge AI: Systems Design and ML for IoT Data Analytics.

【Paper Link】【Pages】:3565-3566

【Authors】: Radu Marculescu ; Diana Marculescu ; Ümit Y. Ogras

【Abstract】: With the explosion in Big Data, it is often forgotten that much of the data nowadays is generated at the edge. Specifically, a major source of data is users' endpoint devices like phones, smart watches, etc., that are connected to the internet, also known as the Internet-of-Things (IoT). This "edge of data" faces several new challenges related to hardware-constraints, privacy-aware learning, and distributed learning (both training as well as inference). So what systems and machine learning algorithms can we use to generate or exploit data at the edge? Can network science help us solve machine learning (ML) problems? Can IoT-devices help people who live with some form of disability and many others benefit from health monitoring?

【Keywords】:

389. Data Sketching for Real Time Analytics: Theory and Practice.

【Paper Link】【Pages】:3567-3568

【Authors】: Daniel Ting ; Jonathan Malkin ; Lee Rhodes

【Abstract】: Speed, cost, and scale. These are 3 of the biggest challenges in analyzing big data. While modern data systems continue to push the boundaries of scale, the problems of speed and cost are fundamentally tied to the size of data being scanned or processed. Processing thousands of queries that each access terabytes of data with sub-second latency remains infeasible. Data sketching techniques provide means to drastically reduce this size, allowing for real-time or interactive data analysis with reduced costs but with approximate answers.

【Keywords】:

390. Deep Learning for Anomaly Detection.

【Paper Link】【Pages】:3569-3570

【Authors】: Ruoying Wang ; Kexin Nie ; Yen-Jung Chang ; Xinwei Gong ; Tie Wang ; Yang Yang ; Bo Long

【Abstract】: Anomaly detection has been widely studied and used in diverse applications. Building an effective anomaly detection system requires researchers and developers to learn complex structure from noisy data, identify dynamic anomaly patterns, and detect anomalies with limited labels. Recent advancements in deep learning techniques have greatly improved anomaly detection performance, in comparison with classical approaches, and have extended anomaly detection to a wide variety of applications. This tutorial will help the audience gain a comprehensive understanding of deep learning based anomaly detection techniques in various application domains. First, we give an overview of the anomaly detection problem, introducing the approaches taken before the deep model era and listing out the challenges they faced. Then we survey the state-of-the-art deep learning models that range from building block neural network structures such as MLP, CNN, and LSTM, to more complex structures such as autoencoder, generative models (VAE, GAN, Flow-based models), to deep one-class detection models, etc. In addition, we illustrate how techniques such as transfer learning and reinforcement learning can help amend the label sparsity issue in anomaly detection problems and how to collect and make the best use of user labels in practice. Second to last, we discuss real world use cases coming from and outside LinkedIn. The tutorial concludes with a discussion of future trends.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Unsupervised learning; Anomaly detection; Machine learning approaches; Neural networks; Theory of computation; Theory and algorithms for application domains; Machine learning theory; Semi-supervised learning

391. Deep Learning for Industrial AI: Challenges, New Methods and Best Practices.

【Paper Link】【Pages】:3571-3572

【Authors】: Chetan Gupta ; Ahmed K. Farahat

【Abstract】: Industrial AI is concerned with the application of Artificial Intelligence (AI), Machine Learning (ML) and related technologies towards addressing real-world use cases in industrial and societal domains. These uses cases can be broadly categorized into the horizontal areas of maintenance and repair, operations and supply chain, quality, safety, design, and end-to-end optimization - with applications in a variety of verticals. In the last few years, we have witnessed a growing interest in applying Deep Learning (DL) techniques to Industrial AI problems, ranging from using sequence models such as Long Short-Term Memory (LSTM) for predicting failures in equipment, to using Deep Reinforcement Learning (Deep RL) for scheduling and dispatching. Applying deep learning techniques to industrial applications imposes a set of unique challenges, which include, but are not limited to, (1) limited data, highly skewed class distribution and occurrence of rare classes such as failures, (2) multi-modal data (sensors, events, images, text, etc.) indexed over space and time (3) the need for explainable decisions, (4) a need to attain consistency between different but "related" models and between multiple generations of the same model, and (5) decision making to optimize business outcomes where the cost of a mistake could be very high. This tutorial presents an overview of these challenges, along with new methods and best practices to address them. Examples of these methods include using sequence DL models and Functional Neural Networks (FNNs) for modeling sensor and spatiotemporal measurements; using multi-task learning, graph models and ensemble learning for improving consistency of DL models; using deep RL for health indicator learning and dynamic dispatching; cost-based decision making for prognostics; and using GANs for generating senor data for prognostics. Finally, we will present some open problems in Industrial AI and how the research community can shape the future of the next industrial and societal revolution.

【Keywords】: Applied computing; Computing methodologies; Artificial intelligence; Machine learning

392. Embedding-Driven Multi-Dimensional Topic Mining and Text Analysis.

【Paper Link】【Pages】:3573-3574

【Authors】: Yu Meng ; Jiaxin Huang ; Jiawei Han

【Abstract】: People nowadays are immersed in a wealth of text data, ranging from news articles, to social media, academic publications, advertisements, and economic reports. A grand challenge of data mining is to develop effective, scalable and weakly-supervised methods for extracting actionable structures and knowledge from massive text data. Without requiring extensive and corpus-specific human annotations, these methods will satisfy people's diverse applications and needs for comprehending and making good use of large-scale corpora.

【Keywords】: Computing methodologies; Artificial intelligence; Natural language processing; Machine learning; Information systems; Information retrieval; Information systems applications; Data mining

393. Learning by Exploration: New Challenges in Real-World Environments.

【Paper Link】【Pages】:3575-3576

【Authors】: Qingyun Wu ; Huazheng Wang ; Hongning Wang

【Abstract】: Learning is a predominant theme for any intelligent system, humans, or machines. Moving beyond the classical paradigm of learning from past experience, e.g., offline supervised learning from given labels, a learner needs to actively collect exploratory feedback to learn from the unknowns, i.e., learning through exploration. This tutorial will introduce the learning by exploration paradigm, which is the key ingredient in many interactive online learning problems, including the multi-armed bandit and, more generally, reinforcement learning problems.

【Keywords】: Computing methodologies; Machine learning; Learning paradigms; Reinforcement learning; Sequential decision making; Learning settings; Online learning settings

394. Image and Video Understanding for Recommendation and Spam Detection Systems.

【Paper Link】【Pages】:3577-3578

【Authors】: Aman Gupta ; Sirjan Kafle ; Di Wen ; Dylan Wang ; Sumit Srivastava ; Suhit Sinha ; Nikita Gupta ; Bharat Jain ; Ananth Sankar ; Liang Zhang

【Abstract】: Image and video-based content has become ever present in a variety of domains like news, entertainment and education. Users typically discover and engage with content via search and recommendation systems. It is also important to serve high quality data to users by filtering out irrelevant or harmful content. Thus, there is an increasing need to leverage the rich information in image and video content in order to power systems for search and recommendation. At the same time, the effectiveness and efficiency of these systems has been accelerated by the availability of large-scale labeled datasets and sophisticated deep learning-based models.

【Keywords】: Computing methodologies; Artificial intelligence; Computer vision; Computer vision problems; Computer vision representations; Image representations; Computer vision tasks; Visual content-based indexing and retrieval; Machine learning

395. Data-Driven Never-Ending Learning Question Answering Systems.

【Paper Link】【Pages】:3579-3580

【Authors】: Estevam R. Hruschka Jr.

【Abstract】: This tutorial focuses on how to build Question Answering (QA) syetems based on the Never-Ending Learning (NEL) approach. NEL systems can be roughly described as computer systems that learn over time to become better in solving a specific task. Different NEL approaches have been proposed and applied in different tasks and domains. Recent advances encourage us to keep addressing the problem of how to build computer systems that can take advantage of NEL principles. Considering that it is not always so straightforward to have NEL principles applied to ML models, this tutorial guides the audience (with hands-on examples and supporting theory, algorithms and models) on how to model a system in a NEL fashion and intends to help KDD community to become familiar with such approaches. Question Answering is chosen as application domain mainly because of the relevance of the topic (QA) for KDD and AI communities in general.

【Keywords】: Computing methodologies; Artificial intelligence; Natural language processing; Machine learning; Learning settings; Semi-supervised learning settings; Information systems; Information retrieval; Retrieval tasks and goals; Question answering

Diversity and Inclusion Abstracts 14

396. How Can Computer Science Education Address Inequities.

【Paper Link】【Pages】:3581

【Authors】: Manuel A. Pérez-Quiñones

【Abstract】: The 2019 global pandemic and the social protests in support of Black Lives Matter has made it clear that society has yet to eradicate systemic racism. Can computing education help? Academics, particularly in STEM fields, are often shielded from these conversations as we think they belong in a social science classroom. The effect of the pandemic and the protests has made it abundantly clear that we can no longer be apathetic about systemic issues that impact equity in society. Based on personal experience and on years of participation in Broadening Participation in Computing (BPC) efforts, the author suggests actions that CS departments can do to fight inequity. These can be classified into several broad categories: (a) provide more support and opportunities to students from underrepresented groups, (b) encourage faculty to become active participants in addressing inequity, (c) update the computing curriculum to be more inclusive and culturally responsive, and (d) evolve the departmental infrastructure to manage diversity, equity and inclusion.

【Keywords】: Computing methodologies; Artificial intelligence; Social and professional topics; Professional topics; Computing education; Computing education programs; Computing profession; User characteristics; Race and ethnicity; Theory of computation; Theory and algorithms for application domains; Database theory; Theory of database privacy and security

397. Diversity and Inclusion, a Perspective from a Four Years MSI Faculty Member.

【Paper Link】【Pages】:3582

【Authors】: Eliana Valenzuela Andrade

【Abstract】: This presentation provides information on the path taken by a faculty member at a Minority Serving Institution (MSI). She presents the difficulties and opportunities that have been presented to her and, as a role model she has managed to improve the quality of life of her students.

【Keywords】: General and reference

398. CoRE Lab - An Effort to Engage College Hispanic Students in STEM.

【Paper Link】【Pages】:3583

【Authors】: Wilson E. Lozano-Rolon

【Abstract】: According to PCAST 2012 report [1], from 2012 to 2021 the number in STEM graduates must be increased by 1 million in order to meet the nation's workforce needed. By 2018, the Hispanic community comprised only the six percent of the US workforce in STEM [2]. This scenario presents a continuous challenge for Hispanic Serving Institutions (HIS).

【Keywords】: Social and professional topics; Professional topics; Computing education; Computing education programs; Computer engineering education

399. Support for Diverse Students.

【Paper Link】【Pages】:3584

【Authors】: Daniel A. Jiménez

【Abstract】: This talk is intended to motivate diverse students in their study of computer science and engineering. Prof. Jiménez discusses the path he took to become a computer science professor. He describes his efforts promoting women and under-represented minorities in computer science and engineering. He ends with some advice to diverse students.

【Keywords】: General and reference; Document types; General literature

400. Broadening Participation in Technology Policy.

【Paper Link】【Pages】:3585

【Authors】: Brianna B. Posadas

【Abstract】: Those who work in politics and policy are often unprepared to address the issues that current technology developments have created. A glaring example is the congressional hearings with Facebook where Congress Members asked embarrassingly fundamental questions and were not able to get to the heart of the issue. Many congress people do not have technology advisor on staff to assist in addressing technical issues. Not only does Congress not have technology experts on staff, but the few who are involved are not from underrepresented minority communities. This makes it even more difficult to adequately address issues that more directly impact communities of color including predictive policing, voter suppression, and facial recognition.

【Keywords】: Social and professional topics; Computing / technology policy; User characteristics

401. The Dark Side of Machine Learning Algorithms: How and Why They Can Leverage Bias, and What Can Be Done to Pursue Algorithmic Fairness.

【Paper Link】【Pages】:3586-3587

【Authors】: Mariya I. Vasileva

【Abstract】: Machine learning and access to big data are revolutionizing the way many industries operate, providing analytics and automation to many aspects of real-world practical tasks that were previously thought to be necessarily manual. With the pervasiveness of artificial intelligence and machine learning over the past decade, and their epidemic spread in a variety of applications, algorithmic fairness has become a prominent open research problem. For instance, machine learning is used in courts to assess the probability that a defendant recommits a crime; in the medical domain to assist with diagnosis or predict predisposition to certain diseases; in social welfare systems; and autonomous vehicles. The decision making processes in these real-world applications have a direct effect on people's lives, and can cause harm to society if the machine learning algorithms deployed are not designed with considerations to fairness.

【Keywords】: General and reference; Document types; Surveys and overviews

402. Accessible Online Meetings and Presentations.

【Paper Link】【Pages】:3588

【Authors】: Brianna Blaser

【Abstract】: In our current situation, conferences, classes, and meetings are moving online. What steps can you take to ensure that your activities are welcoming to a diverse audience, including people with disabilities? This presentation will look at proactive strategies you can take to ensure that your meetings and presentations are accessible to a wide audience. We'll talk about communication with participants, preparation, presentation materials, technology, and accommodations.

【Keywords】: Social and professional topics; Professional topics; Computing education; User characteristics; People with disabilities

403. Perspectives on Broadening Participation in STEM Careers across Academia, Government, and Industry.

【Paper Link】【Pages】:3589-3590

【Authors】: Hasan Jackson

【Abstract】: Bias, defined as prejudice for or against something/someone, is a central component of our understanding of everyday life. In particular, the technology that you interact with daily and the teams that you work with help to inform your biases. For minority populations in technology, often being the lone representative for diverging perspectives, the inclusion of potentially harmful bias in technology projects is easily discernable, even if it has proven near impossible to resist.

【Keywords】: Computing methodologies; Artificial intelligence; Social and professional topics; Professional topics; Computing profession; Codes of ethics

404. The Illusion of Inclusion: Large Scale Genomic Data Sovereignty and Indigenous Populations.

【Paper Link】【Pages】:3591

【Authors】: Keolu Fox

【Abstract】: Raw genomic data has emerged as a top global commodity in the the past several years. This shift is so new that data science experts are still evaluating what such information is worth in a global market. In 2018, the direct-to-consumer genetic-testing company 23andMe sold access to its database containing digital sequence information from approximately 5 million people to GlaxoSmithKline for 300 million dollars. Clearly there is a growing market for these products and like a goldrush, organizations are seeking to find rare samples to package up for analysis and reward. Indigenous peoples are legitimately concerned about the potential for commodification of drugs derived from research on their genomes, and as a consequence, they are sometimes reluctant to participate in genomics research. All of Us, a program of federally funded investigators are interested in recruiting participants from native groups, but given the fraught history of genetic studies involving Indigenous peoples - including examples such as the Havasupai v. Arizona State University, in which the tribe successfully sued the university for improperly using its members' blood samples - tribal communities continue to be wary about participating in the federal government's newest endeavor. Here I contextualize data sovereignty issues and identify strategies to create equity in federal government genomic data collection efforts.

【Keywords】: Social and professional topics; Computing / technology policy; Intellectual property; Database protection laws; Medical information policy; Genetic information; User characteristics; Cultural characteristics; Geographic characteristics; Race and ethnicity

405. Models of Data Governance and Advancing Indigenous Genomic Data Sovereignty.

【Paper Link】【Pages】:3592

【Authors】: Krystal S. Tsosie

【Abstract】: While there has been considerable deliberation about ownership and stewardship of genomic data, as of yet, there does not exist a singular framework that encapsulates the current and future trajectory of how these data governance models can exist for Indigenous communities. We succinctly describe two case studies in the Akimel O'odham (Pima) communities that demonstrate the spectrum of data governance structures, in which tribal members have no input to complete control of data collection and usage. We describe (1) tribal-trust relationships, (2) non-tribal partnerships, and (3) tribally-driven models in context of an Indigenous people whose genomic and health data have been widely misused and exploited by outside researchers and the new narrative in which the O'odham have begun re-asserting their sovereignty in data domains.

【Keywords】: Security and privacy; Human and societal aspects of security and privacy; Social aspects of security and privacy; Social and professional topics; Computing / technology policy; Medical information policy; Genetic information; User characteristics; Cultural characteristics

406. No Computation without Representation: Avoiding Data and Algorithm Biases through Diversity.

【Paper Link】【Pages】:3593

【Authors】: Caitlin Kuhlman ; Latifa Jackson ; Rumi Chunara

【Abstract】: The emergence and growth of research on issues of ethics in Artificial Intelligence, and in particular algorithmic fairness, has roots in an essential observation that structural inequalities in our society are reflected in the data used to train predictive models and in the design of objective functions. While research aiming to mitigate these issues is inherently interdisciplinary, the design of unbiased algorithms and fair socio-technical systems are key desired outcomes which depend on practitioners from the fields of data science and computing. However, these computing fields broadly also suffer from the same under-representation issues that are found in the datasets we analyze. This disconnect affects the design of both the desired outcomes and metrics by which we measure success. If the ethical AI research community accepts this, we tacitly endorse the status quo and contradict the goals of non-discrimination and equity which work on algorithmic fairness, accountability, and transparency seeks to address.

【Keywords】: Social and professional topics; Professional topics; Computing education

407. Mutually Beneficial Collaborations to Broaden Participation of Hispanics in Data Science.

【Paper Link】【Pages】:3594-3595

【Authors】: Patricia Ordóñez Franco

【Abstract】: Representation of Hispanics, especially Hispanic women, is notoriously low in data science programs in higher education and in the tech industry. The engagement of undergraduate students in research, often and early in their path towards degree completion, has been championed as one of the principal reforms necessary to increase the number of capable professionals in STEM. The benefits attributed to undergraduate research experiences have been reported to disproportionately benefit individuals from groups that have been historically underrepresented in STEM. The IDI-BD2K (Increasing Diversity in Interdisciplinary Big Data to Knowledge) Program funded by the NIH at the University of Puerto Rico Río Piedras (UPRRP) was designed to bridge the increasing digital and data divide at the university. The college's population is 98 percent Hispanic and yet there is no formal data science program. There also exists a gender imbalance in computing at the College of Natural Sciences at the UPRRP. Over 60 percent of the undergraduate students in Biology are women. However, the percentage of women in Computer Science hovers around 15 percent. The IDI-BD2K was created to address both these concerns and increase the participation of Hispanics in interdisciplinary computational and quantitative research. In this talk, I will highlight the need for mutually beneficial university collaborations to reduce the digital and data divide, create greater awareness of the growing disparities and increase the number of future faculty with experience teaching diverse students.

【Keywords】: Social and professional topics; Professional topics; Computing education; Computing education programs; Computational science and engineering education; Informal education; User characteristics; Gender; Women

408. Bringing Inclusive Diversity to Data Science: Opportunities and Challenges.

【Paper Link】【Pages】:3596

【Authors】: Heriberto Acosta Maestre

【Abstract】: As data science research continues to expand into a variety of applied fields, the need for talented and diverse individuals has been widely acknowledged. Despite this acknowledgement, data science lags behind other STEM disciplines in achieving a diverse workforce. Through work we have undertaken in the past as part of the Broadening Participation in Data Mining workshop (BPDM) and our work with ACM SIGKDD, we seek to build a better workforce that is positioned to address the data science problems of the next hundred years. A significant barrier to trainee long-term career success is their limited ability of underrepresented trainees to demonstrate their analytical abilities and sophisticated inferential talents to address key data issues in our community. In this talk we will present an overview of the goals of the Diversity and Inclusion track and share our vision for how we bridge this diversity divide that our society and our data science workforce needs right now. We are interested in how diversity is encountered across ethnic, gender, and ability identities. To this end we have prepared an exciting new program activities to facilitate broader conversations in the data science field that cover not only technical ideas but innovative thinking in what the future of data science can look like if we diversify the group of contributors and enlarge those included.

【Keywords】: Social and professional topics; User characteristics; Cultural characteristics; People with disabilities; Race and ethnicity; Sexual orientation

409. The Data Science Mentoring Fire Next Time: Innovative Strategies for Mentoring in Data Science.

【Paper Link】【Pages】:3597-3600

【Authors】: Latifa Jackson ; Heriberto Acosta Maestre

【Abstract】: As data mining research and applications continue to expand in to a variety of fields such as medicine, finance, security, etc., the need for talented and diverse individuals is clearly felt. This is particularly the case as Big Data initiatives have taken off in the federal, private and academic sectors, providing a wealth of opportunities, nationally and internationally. The Broadening Participation in Data Mining (BPDM) workshop was created more than 7 years ago with the goal of fostering mentorship, guidance, and connections for minority and underrepresented groups in the data science and machine learning community, while also enriching technical aptitude and exposure for a group of talented students. To date it has impacted the lives of more than 330 underrepresented trainees in data science. We provide a venue to connect talented students with innovative researchers in industry, academia, professional societies, and government. Our mission is to facilitate meaningful, lasting relationships between BPDM participants to ultimately increase diversity in data mining. This most recent workshop took place at Howard University in Washington, DC in February 2019. Here we report on the mentoring strategies that we undertook at the 2019 BPDM and how those were received.

【Keywords】: Social and professional topics; User characteristics; Cultural characteristics; Gender; Women; People with disabilities; Race and ethnicity

Applied Data Science Invited Talks Abstracts 12

【Paper Link】【Pages】:3601

【Authors】: Alon Y. Halevy

【Abstract】: Online social networks provide a platform for sharing information and free expression. However, these networks are also used for malicious purposes, such as distributing misinformation and hate speech, selling illegal drugs, and coordinating sex trafficking or child exploitation. Keeping users on these platforms safe from such harm, known as the problem of Integrity, is a major focus for social media companies. This talk, coming from the perspective of addressing many of these challenges at Facebook, highlights some of the recent progress made in the area of integrity and some of the challenges that lie ahead.

【Keywords】: Social and professional topics; Computing / technology policy

411. Straddling the Boundary between Contribution and Solution Driven Science.

【Paper Link】【Pages】:3602

【Authors】: Daniel Marcu

【Abstract】: Advancing the state of the art in the context of products and services used by hundreds of millions of customers poses challenges that go beyond those associated with advancing the state of the art in customer-free settings. In this talk, I will highlight some of these challenges and discuss approaches to overcoming them in the context of two Amazon services: Amazon Translate and Alexa.

【Keywords】: Computing methodologies; Artificial intelligence

412. Artificial Intelligence for Healthcare.

【Paper Link】【Pages】:3603

【Authors】: Dorin Comaniciu

【Abstract】: We discuss the current and future impact of artificial intelligence (AI) technologies on healthcare. We consider four hierarchical levels of healthcare data generation and processing of increasing complexity and wider implications. At the imaging scanner and instrument level, AI aims at improving, simplifying, and standardizing data acquisition and preparation. We present examples of systems for AI-driven automatic patient iso-centering before a computed tomography scan, deep learning-based image reconstruction, and creation of optimized and standardized visualizations, for example, automatic rib-unfolding. At the reading and reporting levels, AI focuses on the detection and characterization of abnormalities and on automatic measurements in images. We introduce multiple AI systems for the brain, heart, lung, prostate, and musculoskeletal disease. The third level is exemplified by the integrated nature of the clinical data in a patient-specific manner. The AI algorithms at this level focus on risk prediction and stratification, as opposed to merely detecting, measuring, and quantifying images. An AI-based approach for individualizing radiation dose in lung stereotactic body radiotherapy is discussed. The digital twin is presented as a concept of individualized computational modeling of human physiology. Finally, at the cohort and population analysis levels, the focus of AI shifts from clinical decision-making to operational decisions and process optimization.

【Keywords】: Applied computing; Life and medical sciences; Health care information systems

413. Using Machine Learning to Detect Cancer Early.

【Paper Link】【Pages】:3604

【Authors】: Jan Schellenberger

【Abstract】: GRAIL's mission is to detect cancer early, when it can be cured. Building a classifier that can detect cancer early in a clinical setting is a complicated endeavor with unique challenges: data acquisition and stabilization can take years; cancer status (labels) can be ambiguous, noisy, and changing; sequencing data can be enormous and presents scaling issues. Because the cancer early detection machine learning classifier is being built in the context of a clinical trial environment, an extra level of rigor and planning is required.

【Keywords】: Applied computing; Life and medical sciences; Bioinformatics; Computational biology; Molecular sequence analysis; Sequencing and genotyping technologies

414. Build the State-of-the-Art Machine Learning Technology for the Crypto Economy.

【Paper Link】【Pages】:3605

【Authors】: Michael Li ; Catalin Tiseanu ; Burkay Gur

【Abstract】: Coinbase's mission is "to create an open financial system for the world". This presentation serves as an overview of our efforts in building the state-of-the-art machine learning technology for the fast-evolving crypto economy, which follows a prototype, productization, and experimentation development cycle. On the machine learning side, it covers topics around proper train/validation setup, maintaining a fast iteration cycle using a custom-built AutoML framework (called "EasyML"), a deep learning Transformers-based sequence based model and how to incorporate timing into it, how to combine gradient boosting trees with deep learning using linear blending, as well as model interpretability, evaluation, and experimentation. On the machine learning platform side, we will dive into the internals of Nostradamus, our in-house-built framework that manages model life-cycle, and Feature Store, our self-serve feature management, computation and serving framework.

【Keywords】: Computing methodologies; Machine learning

415. How AI Can Help Build Resiliency for Small Businesses in a Global Economic Crisis.

【Paper Link】【Pages】:3606

【Authors】: Nhung Ho

【Abstract】: In the midst of COVID-19, a global economic crisis is threatening the livelihoods of small business owners everywhere. In ordinary times, 50 percent of small businesses go out of business in the first 5 years. In today's extraordinary times, nearly 7.5 million (~25 percent) of small businesses in the U.S. alone have been at risk of closing permanently in a matter of months (Source: Main Street America's Small Business Survey 2020).

【Keywords】: Computer systems organization; Architectures; Distributed architectures; Cloud computing; Computing methodologies; Machine learning; Learning paradigms; Supervised learning; Supervised learning by classification; Supervised learning by regression; Machine learning approaches; Classification and regression trees; Mathematics of computing; Probability and statistics; Statistical paradigms; Time series analysis

416. Toward Responsible AI by Planning to Fail.

【Paper Link】【Pages】:3607

【Authors】: Saleema Amershi

【Abstract】: The potential for AI technologies to enhance human capabilities and improve our lives is of little debate; yet, neither is their potential to cause harm and social disruption. While preventing or minimizing AI biases and harms is justifiably the subject of intense study in academic, industrial and even legal communities, an approach centered on acknowledging and planning for AI-based failures has the potential to shed new light on how to develop and deploy responsible AI-based systems.

【Keywords】: Computing methodologies; Artificial intelligence; Human-centered computing; Human computer interaction (HCI)

417. Multimodal Machine Learning for Video and Image Analysis.

【Paper Link】【Pages】:3608

【Authors】: Shalini Ghosh

【Abstract】: In this talk, we will first discuss multimodal ML for video content analysis. Videos typically have data in multiple modalities like audio, video, and text (captions). Understanding and modeling the interaction between different modalities is key for video analysis tasks like categorization, object detection, activity recognition, etc. However, data modalities are not always correlated -- so, learning when modalities are correlated and using that to guide the influence of one modality on the other is crucial. Another salient feature of videos is the coherence between successive frames due to continuity of video and audio, a property that we refer to as temporal coherence. We show how using non-linear guided cross-modal signals and temporal coherence can improve the performance of multimodal ML models for video analysis tasks like categorization. We also created a hierarchical taxonomy of categories internally. Our experiments on the large-scale YouTube-8M dataset show how our approach significantly outperforms state-of-the-art multimodal ML model for video categorization using our taxonomy, as well as generalizes well to an internal dataset of video segments from actual TV programs. The next part of the talk will briefly discuss our work on explainability of multimodal ML models. We will conclude the talk by outlining other multimodal ML applications like incremental object detection and visual dialog, and discuss potential applications of multimodal ML to various domains.

【Keywords】: Computer systems organization; Architectures; Other architectures; Neural networks; Computing methodologies; Artificial intelligence; Computer vision; Machine learning; Machine learning approaches; Theory of computation; Theory and algorithms for application domains; Machine learning theory

418. Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning.

【Paper Link】【Pages】:3609

【Authors】: Timnit Gebru

【Abstract】: A growing body of work shows that many problems in fairness, accountability, transparency, and ethics in machine learning systems are rooted in decisions surrounding the data collection and annotation process. We argue that a new specialization should be formed within machine learning that is focused on methodologies for data collection and annotation: efforts that require institutional frameworks and procedures. Specifically for sociocultural data, parallels can be drawn from archives and libraries. Archives are the longest standing communal effort to gather human information and archive scholars have already developed the language and procedures to address and discuss many challenges pertaining to data collection such as consent, power, inclusivity, transparency, and ethics privacy. We discuss these five key approaches in document collection practices in archives that can inform data collection in sociocultural machine learning.

【Keywords】: Information systems; Information systems applications; Digital libraries and archives; Theory of computation; Theory and algorithms for application domains; Machine learning theory

419. Unleashing the Power of Subjective Data: Managing Experiences as First-Class Citizens.

【Paper Link】【Pages】:3610

【Authors】: Wang-Chiew Tan

【Abstract】: Subjective data refers to data that contains opinions and experiences. Such data is ubiquitous in product reviews, tweets, and discussion forums in social media. Consumers today spend considerable time sifting through subjective data to make informed decisions about purchases. At Megagon Labs, we are building technologies to synthesize knowledge from subjective data and to facilitate searching over them.

【Keywords】: Information systems; Information retrieval; Retrieval tasks and goals; Expert search; Information extraction; Sentiment analysis; Summarization

420. Innovating with Language AI.

【Paper Link】【Pages】:3611

【Authors】: Ashwin Ram

【Abstract】: Understanding human language in real world scenarios involves not just natural language processing but also speech, vision, knowledge graphs, user modeling, and other AI techniques. Doing this at Google scale involves planetary-scale cloud computing as well as tiny-scale edge computing. I'll share a behind-the-scenes look at how Google uses Language AI to power its billion-user products, and discuss some of the newer approaches we are developing to make sense of human language. I'll end with our vision to democratize AI and how you can use Google AI in your own work.

【Keywords】: Computer systems organization; Architectures; Distributed architectures; Cloud computing; Computing methodologies; Artificial intelligence; Natural language processing; Machine learning; Human-centered computing

421. Data Paucity and Low Resource Scenarios: Challenges and Opportunities.

【Paper Link】【Pages】:3612

【Authors】: Mona Diab

【Abstract】: In an era of unstructured data abundance, you would think that we have solved our data requirements for building robust systems for language processing. However, this is not the case if we think on a global scale with over 7000 languages where only a handful have digital resources. Systems at scale with good performance typically require annotated resources that cover the genres and domain divides. Moreover, the existence of a handful of resources in some languages is a reflection of the digital disparity in various societies leading to inadvertent biases in systems. In this talk I will show some solutions for low resource scenarios, both cross domain and genres as well as cross lingually.

【Keywords】: Information systems

25th KDD 2020:Virtual Conference, USA

Paper Num: 421 || Session Num: 8

Keynote & Invited Talks 4

1. AI for Intelligent Financial Services: Examples and Discussion.

2. Keynote Speaker: Emery N. Brown.

3. Keynote Speaker: Yolanda Gil.

4. Keynote Speaker: Alessandro Vespignani.

Research Track Papers 217

5. Learning Effective Road Network Representation with Hierarchical Graph Neural Networks.

6. Interpretability is a Kind of Safety: An Interpreter-based Ensemble for Adversary Defense.

7. Higher-order Clustering in Complex Heterogeneous Networks.

8. Preserving Dynamic Attention for Long-Term Spatial-Temporal Prediction.

9. Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach.

10. Kernel Assisted Learning for Personalized Dose Finding.

11. Graph Structure Learning for Robust Graph Neural Networks.

12. An Efficient Neighborhood-based Interaction Model for Recommendation on Heterogeneous Graph.

13. Directional Multivariate Ranking.

14. Truth Discovery against Strategic Sybil Attack in Crowdsourcing.

15. Partial Multi-Label Learning via Probabilistic Graph Matching Mechanism.

16. Spectrum-Guided Adversarial Disparity Learning.

17. Attention and Memory-Augmented Networks for Dual-View Sequential Learning.

18. Semantic Search in Millions of Equations.

19. SSumM: Sparse Summarization of Massive Graphs.

20. Rethinking Pruning for Accelerating Deep Inference At the Edge.

21. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems.

22. Structural Patterns and Generative Models of Real-world Hypergraphs.

23. Efficient Algorithm for the b-Matching Graph.

24. Isolation Distributional Kernel: A New Tool for Kernel based Anomaly Detection.

25. NodeAug: Semi-Supervised Node Classification with Data Augmentation.

26. An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks.

27. Kronecker Attention Networks.

28. GRACE: Generating Concise and Informative Contrastive Sample to Explain Neural Network Model's Prediction.

29. Hierarchical Attention Propagation for Healthcare Representation Learning.

30. SCE: Scalable Network Embedding from Sparsest Cut.

31. Local Community Detection in Multiple Networks.

32. A Block Decomposition Algorithm for Sparse Optimization.

33. Adversarial Infidelity Learning for Model Interpretation.

34. Grounding Visual Concepts for Zero-Shot Event Detection and Event Captioning.

35. How to Count Triangles, without Seeing the Whole Graph.

36. Incremental Lossless Graph Summarization.

37. From Online to Non-i.i.d. Batch Learning.

38. Towards Deeper Graph Neural Networks.

39. Laplacian Change Point Detection for Dynamic Graphs.

40. Learning Transferrable Parameters for Long-tailed Sequential User Behavior Modeling.

41. TranSlider: Transfer Ensemble Learning from Exploitation to Exploration.

42. InFoRM: Individual Fairness on Graph Mining.

43. Local Motif Clustering on Time-Evolving Graphs.

44. A Data-Driven Graph Generative Model for Temporal Interaction Networks.

45. Recurrent Networks for Guided Multi-Attention Classification.

46. Vulnerability vs. Reliability: Disentangled Adversarial Examples for Cross-Modal Learning.

47. XGNN: Towards Model-Level Explanations of Graph Neural Networks.

48. CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data.

49. INPREM: An Interpretable and Trustworthy Predictive Model for Healthcare.

50. Policy-GNN: Aggregation Optimization for Graph Neural Networks.

51. Malicious Attacks against Deep Reinforcement Learning Interpretations.

52. Disentangled Self-Supervision in Sequential Recommenders.

53. DETERRENT: Knowledge Guided Graph Attention Network for Detecting Healthcare Misinformation.

54. MultiImport: Inferring Node Importance in a Knowledge Graph from Multiple Input Signals.

55. Geodesic Forests.

56. Z-Miner: An Efficient Method for Mining Frequent Arrangements of Event Intervals.

57. Imputing Various Incomplete Attributes via Distance Likelihood Maximization.

58. WeightGrad: Geo-Distributed Data Analysis Using Quantization for Faster Convergence and Better Accuracy.

59. Feature-Induced Manifold Disambiguation for Multi-View Partial Multi-label Learning.

60. MinSearch: An Efficient Algorithm for Similarity Search under Edit Distance.

61. Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods.

62. Residual Correlation in Graph Neural Network Regression.

63. Towards Fair Truth Discovery from Biased Crowdsourced Answers.

64. AutoShuffleNet: Learning Permutation Matrices via an Exact Lipschitz Continuous Penalty in Deep Convolutional Neural Networks.

65. MoFlow: An Invertible Flow Model for Generating Molecular Graphs.

66. Parallel DNN Inference Framework Leveraging a Compact RISC-V ISA-based Multi-core System.

67. Missing Value Imputation for Mixed Data via Gaussian Copula.

68. HiTANet: Hierarchical Time-Aware Attention Networks for Risk Prediction on Electronic Health Records.

69. Personalized PageRank to a Target Node, Revisited.

70. Edge-consensus Learning: Deep Learning on P2P Networks with Nonhomogeneous Data.

71. Deep Learning of High-Order Interactions for Protein Interface Prediction.

72. MAMO: Memory-Augmented Meta-Optimization for Cold-start Recommendation.

73. Finding Effective Geo-social Group for Impromptu Activities with Diverse Demands.

74. Representing Temporal Attributes for Schema Matching.

75. Estimating Properties of Social Networks via Random Walk considering Private Nodes.

76. ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction.