34th AAAI 2020:New York, NY, USA

The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press 【DBLP Link

Paper Num: 1865 || Session Num: 33

AAAI Technical Track: AI and the Web 41

1. Balancing Spreads of Influence in a Social Network.

Paper Link】 【Pages】:3-10

【Authors】: Ruben Becker ; Federico Corò ; Gianlorenzo D'Angelo ; Hugo Gilbert

【Abstract】: The personalization of our news consumption on social media has a tendency to reinforce our pre-existing beliefs instead of balancing our opinions. To tackle this issue, Garimella et al. (NIPS'17) modeled the spread of these viewpoints, also called campaigns, using the independent cascade model introduced by Kempe, Kleinberg and Tardos (KDD'03) and studied an optimization problem that aims to balance information exposure when two opposing campaigns propagate in a network. This paper investigates a natural generalization of this optimization problem in which μ different campaigns propagate in the network and we aim to maximize the expected number of nodes that are reached by at least ν or none of the campaigns, where μ ≥ ν ≥ 2. Following Garimella et al., despite this general setting, we also investigate a simplified one, in which campaigns propagate in a correlated manner. While for the simplified setting, we show that the problem can be approximated within a constant factor for any constant μ and ν, for the general setting, we give reductions leading to several approximation hardness results when ν ≥ 3. For instance, assuming the gap exponential time hypothesis to hold, we obtain that the problem cannot be approximated within a factor of n−g(n) for any g(n) = o(1) where n is the number of nodes in the network. We complement our hardness results with an Ω(n−1/2)-approximation algorithm for the general setting when ν = 3 and μ is arbitrary.

【Keywords】:

2. MultiSumm: Towards a Unified Model for Multi-Lingual Abstractive Summarization.

Paper Link】 【Pages】:11-18

【Authors】: Yue Cao ; Xiaojun Wan ; Jin-ge Yao ; Dian Yu

【Abstract】: Automatic text summarization aims at producing a shorter version of the input text that conveys the most important information. However, multi-lingual text summarization, where the goal is to process texts in multiple languages and output summaries in the corresponding languages with a single model, has been rarely studied. In this paper, we present MultiSumm, a novel multi-lingual model for abstractive summarization. The MultiSumm model uses the following training regime: (I) multi-lingual learning that contains language model training, auto-encoder training, translation and back-translation training, and (II) joint summary generation training. We conduct experiments on summarization datasets for five rich-resource languages: English, Chinese, French, Spanish, and German, as well as two low-resource languages: Bosnian and Croatian. Experimental results show that our proposed model significantly outperforms a multi-lingual baseline model. Specifically, our model achieves comparable or even better performance than models trained separately on each language. As an additional contribution, we construct the first summarization dataset for Bosnian and Croatian, containing 177,406 and 204,748 samples, respectively.

【Keywords】:

3. Efficient Heterogeneous Collaborative Filtering without Negative Sampling for Recommendation.

Paper Link】 【Pages】:19-26

【Authors】: Chong Chen ; Min Zhang ; Yongfeng Zhang ; Weizhi Ma ; Yiqun Liu ; Shaoping Ma

【Abstract】: Recent studies on recommendation have largely focused on exploring state-of-the-art neural networks to improve the expressiveness of models, while typically apply the Negative Sampling (NS) strategy for efficient learning. Despite effectiveness, two important issues have not been well-considered in existing methods: 1) NS suffers from dramatic fluctuation, making sampling-based methods difficult to achieve the optimal ranking performance in practical applications; 2) although heterogeneous feedback (e.g., view, click, and purchase) is widespread in many online systems, most existing methods leverage only one primary type of user feedback such as purchase. In this work, we propose a novel non-sampling transfer learning solution, named Efficient Heterogeneous Collaborative Filtering (EHCF) for Top-N recommendation. It can not only model fine-grained user-item relations, but also efficiently learn model parameters from the whole heterogeneous data (including all unlabeled data) with a rather low time complexity. Extensive experiments on three real-world datasets show that EHCF significantly outperforms state-of-the-art recommendation methods in both traditional (single-behavior) and heterogeneous scenarios. Moreover, EHCF shows significant improvements in training efficiency, making it more applicable to real-world large-scale systems. Our implementation has been released 1 to facilitate further developments on efficient whole-data based neural methods.

【Keywords】:

4. Revisiting Graph Based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach.

Paper Link】 【Pages】:27-34

【Authors】: Lei Chen ; Le Wu ; Richang Hong ; Kun Zhang ; Meng Wang

【Abstract】: Graph Convolutional Networks~(GCNs) are state-of-the-art graph based representation learning models by iteratively stacking multiple layers of convolution aggregation operations and non-linear activation operations. Recently, in Collaborative Filtering~(CF) based Recommender Systems~(RS), by treating the user-item interaction behavior as a bipartite graph, some researchers model higher-layer collaborative signals with GCNs. These GCN based recommender models show superior performance compared to traditional works. However, these models suffer from training difficulty with non-linear activations for large user-item graphs. Besides, most GCN based models could not model deeper layers due to the over smoothing effect with the graph convolution operation. In this paper, we revisit GCN based CF models from two aspects. First, we empirically show that removing non-linearities would enhance recommendation performance, which is consistent with the theories in simple graph convolutional networks. Second, we propose a residual network structure that is specifically designed for CF with user-item interaction modeling, which alleviates the over smoothing problem in graph convolution aggregation operation with sparse user-item interaction data. The proposed model is a linear model and it is easy to train, scale to large datasets, and yield better efficiency and effectiveness on two real datasets. We publish the source code at https://github.com/newlei/LR-GCCF.

【Keywords】:

5. Question-Driven Purchasing Propensity Analysis for Recommendation.

Paper Link】 【Pages】:35-42

【Authors】: Long Chen ; Ziyu Guan ; Qibin Xu ; Qiong Zhang ; Huan Sun ; Guangyue Lu ; Deng Cai

【Abstract】: Merchants of e-commerce Websites expect recommender systems to entice more consumption which is highly correlated with the customers' purchasing propensity. However, most existing recommender systems focus on customers' general preference rather than purchasing propensity often governed by instant demands which we deem to be well conveyed by the questions asked by customers. A typical recommendation scenario is: Bob wants to buy a cell phone which can play the game PUBG. He is interested in HUAWEI P20 and asks “can PUBG run smoothly on this phone?” under it. Then our system will be triggered to recommend the most eligible cell phones to him. Intuitively, diverse user questions could probably be addressed in reviews written by other users who have similar concerns. To address this recommendation problem, we propose a novel Question-Driven Attentive Neural Network (QDANN) to assess the instant demands of questioners and the eligibility of products based on user generated reviews, and do recommendation accordingly. Without supervision, QDANN can well exploit reviews to achieve this goal. The attention mechanisms can be used to provide explanations for recommendations. We evaluate QDANN in three domains of Taobao. The results show the efficacy of our method and its superiority over baseline methods.

【Keywords】:

6. Gradient Method for Continuous Influence Maximization with Budget-Saving Considerations.

Paper Link】 【Pages】:43-50

【Authors】: Wei Chen ; Weizhong Zhang ; Haoyu Zhao

【Abstract】: Continuous influence maximization (CIM) generalizes the original influence maximization by incorporating general marketing strategies: a marketing strategy mix is a vector x = (x1, …, xd) such that for each node v in a social network, v could be activated as a seed of diffusion with probability hv(x), where hv is a strategy activation function satisfying DR-submodularity. CIM is the task of selecting a strategy mix x with constraint ∑ixi ≤ k where k is a budget constraint, such that the total number of activated nodes after the diffusion process, called influence spread and denoted as g(x), is maximized. In this paper, we extend CIM to consider budget saving, that is, each strategy mix x has a cost c(x) where c is a convex cost function, and we want to maximize the balanced sum g(x) + λ(k − c(x)) where λ is a balance parameter, subject to the constraint of c(x) ≤ k. We denote this problem as CIM-BS. The objective function of CIM-BS is neither monotone, nor DR-submodular or concave, and thus neither the greedy algorithm nor the standard result on gradient method could be directly applied. Our key innovation is the combination of the gradient method with reverse influence sampling to design algorithms that solve CIM-BS: For the general case, we give an algorithm that achieves (½ − ε)-approximation, and for the case of independent strategy activations, we present an algorithm that achieves (1 − 1/e − ε) approximation.

【Keywords】:

7. Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search.

Paper Link】 【Pages】:51-58

【Authors】: Xinyan Dai ; Xiao Yan ; Kelvin Kai Wing Ng ; Jiu Liu ; James Cheng

【Abstract】: Vector quantization (VQ) techniques are widely used in similarity search for data compression, computation acceleration and etc. Originally designed for Euclidean distance, existing VQ techniques (e.g., PQ, AQ) explicitly or implicitly minimize the quantization error. In this paper, we present a new angle to analyze the quantization error, which decomposes the quantization error into norm error and direction error. We show that quantization errors in norm have much higher influence on inner products than quantization errors in direction, and small quantization error does not necessarily lead to good performance in maximum inner product search (MIPS). Based on this observation, we propose norm-explicit quantization (NEQ) — a general paradigm that improves existing VQ techniques for MIPS. NEQ quantizes the norms of items in a dataset explicitly to reduce errors in norm, which is crucial for MIPS. For the direction vectors, NEQ can simply reuse an existing VQ technique to quantize them without modification. We conducted extensive experiments on a variety of datasets and parameter configurations. The experimental results show that NEQ improves the performance of various VQ techniques for MIPS, including PQ, OPQ, RQ and AQ.

【Keywords】:

8. Modeling Fluency and Faithfulness for Diverse Neural Machine Translation.

Paper Link】 【Pages】:59-66

【Authors】: Yang Feng ; Wanying Xie ; Shuhao Gu ; Chenze Shao ; Wen Zhang ; Zhengxin Yang ; Dong Yu

【Abstract】: Neural machine translation models usually adopt the teacher forcing strategy for training which requires the predicted sequence matches ground truth word by word and forces the probability of each prediction to approach a 0-1 distribution. However, the strategy casts all the portion of the distribution to the ground truth word and ignores other words in the target vocabulary even when the ground truth word cannot dominate the distribution. To address the problem of teacher forcing, we propose a method to introduce an evaluation module to guide the distribution of the prediction. The evaluation module accesses each prediction from the perspectives of fluency and faithfulness to encourage the model to generate the word which has a fluent connection with its past and future translation and meanwhile tends to form a translation equivalent in meaning to the source. The experiments on multiple translation tasks show that our method can achieve significant improvements over strong baselines.

【Keywords】:

9. Leveraging Title-Abstract Attentive Semantics for Paper Recommendation.

Paper Link】 【Pages】:67-74

【Authors】: Guibing Guo ; Bowei Chen ; Xiaoyan Zhang ; Zhirong Liu ; Zhenhua Dong ; Xiuqiang He

【Abstract】: Paper recommendation is a research topic to provide users with personalized papers of interest. However, most existing approaches equally treat title and abstract as the input to learn the representation of a paper, ignoring their semantic relationship. In this paper, we regard the abstract as a sequence of sentences, and propose a two-level attentive neural network to capture: (1) the ability of each word within a sentence to reflect if it is semantically close to the words within the title. (2) the extent of each sentence in the abstract relative to the title, which is often a good summarization of the abstract document. Specifically, we propose a Long-Short Term Memory (LSTM) network with attention to learn the representation of sentences, and integrate a Gated Recurrent Unit (GRU) network with a memory network to learn the long-term sequential sentence patterns of interacted papers for both user and item (paper) modeling. We conduct extensive experiments on two real datasets, and show that our approach outperforms other state-of-the-art approaches in terms of accuracy.

【Keywords】:

10. Preserving Ordinal Consensus: Towards Feature Selection for Unlabeled Data.

Paper Link】 【Pages】:75-82

【Authors】: Jun Guo ; Heng Chang ; Wenwu Zhu

【Abstract】: To better pre-process unlabeled data, most existing feature selection methods remove redundant and noisy information by exploring some intrinsic structures embedded in samples. However, these unsupervised studies focus too much on the relations among samples, totally neglecting the feature-level geometric information. This paper proposes an unsupervised triplet-induced graph to explore a new type of potential structure at feature level, and incorporates it into simultaneous feature selection and clustering. In the feature selection part, we design an ordinal consensus preserving term based on a triplet-induced graph. This term enforces the projection vectors to preserve the relative proximity of original features, which contributes to selecting more relevant features. In the clustering part, Self-Paced Learning (SPL) is introduced to gradually learn from ‘easy’ to ‘complex’ samples. SPL alleviates the dilemma of falling into the bad local minima incurred by noise and outliers. Specifically, we propose a compelling regularizer for SPL to obtain a robust loss. Finally, an alternating minimization algorithm is developed to efficiently optimize the proposed model. Extensive experiments on different benchmark datasets consistently demonstrate the superiority of our proposed method.

【Keywords】:

11. An Attentional Recurrent Neural Network for Personalized Next Location Recommendation.

Paper Link】 【Pages】:83-90

【Authors】: Qing Guo ; Zhu Sun ; Jie Zhang ; Yin-Leng Theng

【Abstract】: Most existing studies on next location recommendation propose to model the sequential regularity of check-in sequences, but suffer from the severe data sparsity issue where most locations have fewer than five following locations. To this end, we propose an Attentional Recurrent Neural Network (ARNN) to jointly model both the sequential regularity and transition regularities of similar locations (neighbors). In particular, we first design a meta-path based random walk over a novel knowledge graph to discover location neighbors based on heterogeneous factors. A recurrent neural network is then adopted to model the sequential regularity by capturing various contexts that govern user mobility. Meanwhile, the transition regularities of the discovered neighbors are integrated via the attention mechanism, which seamlessly cooperates with the sequential regularity as a unified recurrent framework. Experimental results on multiple real-world datasets demonstrate that ARNN outperforms state-of-the-art methods.

【Keywords】:

12. Re-Attention for Visual Question Answering.

Paper Link】 【Pages】:91-98

【Authors】: Wenya Guo ; Ying Zhang ; Xiaoping Wu ; Jufeng Yang ; Xiangrui Cai ; Xiaojie Yuan

【Abstract】: Visual Question Answering~(VQA) requires a simultaneous understanding of images and questions. Existing methods achieve well performance by focusing on both key objects in images and key words in questions. However, the answer also contains rich information which can help to better describe the image and generate more accurate attention maps. In this paper, to utilize the information in answer, we propose a re-attention framework for the VQA task. We first associate image and question by calculating the similarity of each object-word pairs in the feature space. Then, based on the answer, the learned model re-attends the corresponding visual objects in images and reconstructs the initial attention map to produce consistent results. Benefiting from the re-attention procedure, the question can be better understood, and the satisfactory answer is generated. Extensive experiments on the benchmark dataset demonstrate the proposed method performs favorably against the state-of-the-art approaches.

【Keywords】:

13. Semi-Supervised Multi-Modal Learning with Balanced Spectral Decomposition.

Paper Link】 【Pages】:99-106

【Authors】: Peng Hu ; Hongyuan Zhu ; Xi Peng ; Jie Lin

【Abstract】: Cross-modal retrieval aims to retrieve the relevant samples across different modalities, of which the key problem is how to model the correlations among different modalities while narrowing the large heterogeneous gap. In this paper, we propose a Semi-supervised Multimodal Learning Network method (SMLN) which correlates different modalities by capturing the intrinsic structure and discriminative correlation of the multimedia data. To be specific, the labeled and unlabeled data are used to construct a similarity matrix which integrates the cross-modal correlation, discrimination, and intra-modal graph information existing in the multimedia data. What is more important is that we propose a novel optimization approach to optimize our loss within a neural network which involves a spectral decomposition problem derived from a ratio trace criterion. Our optimization enjoys two advantages given below. On the one hand, the proposed approach is not limited to our loss, which could be applied to any case that is a neural network with the ratio trace criterion. On the other hand, the proposed optimization is different from existing ones which alternatively maximize the minor eigenvalues, thus overemphasizing the minor eigenvalues and ignore the dominant ones. In contrast, our method will exactly balance all eigenvalues, thus being more competitive to existing methods. Thanks to our loss and optimization strategy, our method could well preserve the discriminative and instinct information into the common space and embrace the scalability in handling large-scale multimedia data. To verify the effectiveness of the proposed method, extensive experiments are carried out on three widely-used multimodal datasets comparing with 13 state-of-the-art approaches.

【Keywords】:

14. MuMod: A Micro-Unit Connection Approach for Hybrid-Order Community Detection.

Paper Link】 【Pages】:107-114

【Authors】: Ling Huang ; Hong-Yang Chao ; Guangqiang Xie

【Abstract】: In the past few years, higher-order community detection has drawn an increasing amount of attention. Compared with the lower-order approaches that rely on the connectivity pattern of individual nodes and edges, the higher-order approaches discover communities by leveraging the higher-order connectivity pattern via constructing a motif-based hypergraph. Despite success in capturing the building blocks of complex networks, recent study has shown that the higher-order approaches unavoidably suffer from the hypergraph fragmentation issue. Although an edge enhancement strategy has been designed previously to address this issue, adding additional edges may corrupt the original lower-order connectivity pattern. To this end, this paper defines a new problem of community detection, namely hybrid-order community detection, which aims to discover communities by simultaneously leveraging the lower-order connectivity pattern and the higherorder connectivity pattern. For addressing this new problem, a new Micro-unit Modularity (MuMod) approach is designed. The basic idea lies in constructing a micro-unit connection network, where both of the lower-order connectivity pattern and the higher-order connectivity pattern are utilized. And then a new micro-unit modularity model is proposed for generating the micro-unit groups, from which the overlapping community structure of the original network can be derived. Extensive experiments are conducted on five real-world networks. Comparison results with twelve existing approaches confirm the effectiveness of the proposed method.

【Keywords】:

15. Cross-Lingual Pre-Training Based Transfer for Zero-Shot Neural Machine Translation.

Paper Link】 【Pages】:115-122

【Authors】: Baijun Ji ; Zhirui Zhang ; Xiangyu Duan ; Min Zhang ; Boxing Chen ; Weihua Luo

【Abstract】: Transfer learning between different language pairs has shown its effectiveness for Neural Machine Translation (NMT) in low-resource scenario. However, existing transfer methods involving a common target language are far from success in the extreme scenario of zero-shot translation, due to the language space mismatch problem between transferor (the parent model) and transferee (the child model) on the source side. To address this challenge, we propose an effective transfer learning approach based on cross-lingual pre-training. Our key idea is to make all source languages share the same feature space and thus enable a smooth transition for zero-shot translation. To this end, we introduce one monolingual pre-training method and two bilingual pre-training methods to obtain a universal encoder for different languages. Once the universal encoder is constructed, the parent model built on such encoder is trained with large-scale annotated data and then directly applied in zero-shot translation scenario. Experiments on two public datasets show that our approach significantly outperforms strong pivot-based baseline and various multilingual NMT approaches.

【Keywords】:

16. Functionality Discovery and Prediction of Physical Objects.

Paper Link】 【Pages】:123-130

【Authors】: Lei Ji ; Botian Shi ; Xianglin Guo ; Xilin Chen

【Abstract】: Functionality is a fundamental attribute of an object which indicates the capability to be used to perform specific actions. It is critical to empower robots the functionality knowledge in discovering appropriate objects for a task e.g. cut cake using knife. Existing research works have focused on understanding object functionality through human-object-interaction from extensively annotated image or video data and are hard to scale up. In this paper, we (1) mine object-functionality knowledge through pattern-based and model-based methods from text, (2) introduce a novel task on physical object-functionality prediction, which consumes an image and an action query to predict whether the object in the image can perform the action, and (3) propose a method to leverage the mined functionality knowledge for the new task. Our experimental results show the effectiveness of our methods.

【Keywords】:

17. True Nonlinear Dynamics from Incomplete Networks.

Paper Link】 【Pages】:131-138

【Authors】: Chunheng Jiang ; Jianxi Gao ; Malik Magdon-Ismail

【Abstract】: We study nonlinear dynamics on complex networks. Each vertex i has a state xi which evolves according to a networked dynamics to a steady-state xi*. We develop fundamental tools to learn the true steady-state of a small part of the network, without knowing the full network. A naive approach and the current state-of-the-art is to follow the dynamics of the observed partial network to local equilibrium. This dramatically fails to extract the true steady state. We use a mean-field approach to map the dynamics of the unseen part of the network to a single node, which allows us to recover accurate estimates of steady-state on as few as 5 observed vertices in domains ranging from ecology to social networks to gene regulation. Incomplete networks are the norm in practice, and we offer new ways to think about nonlinear dynamics when only sparse information is available.

【Keywords】:

18. Understanding and Improving Proximity Graph Based Maximum Inner Product Search.

Paper Link】 【Pages】:139-146

【Authors】: Jie Liu ; Xiao Yan ; Xinyan Dai ; Zhirong Li ; James Cheng ; Ming-Chang Yang

【Abstract】: The inner-product navigable small world graph (ip-NSW) represents the state-of-the-art method for approximate maximum inner product search (MIPS) and it can achieve an order of magnitude speedup over the fastest baseline. However, to date it is still unclear where its exceptional performance comes from. In this paper, we show that there is a strong norm bias in the MIPS problem, which means that the large norm items are very likely to become the result of MIPS. Then we explain the good performance of ip-NSW as matching the norm bias of the MIPS problem — large norm items have big in-degrees in the ip-NSW proximity graph and a walk on the graph spends the majority of computation on these items, thus effectively avoids unnecessary computation on small norm items. Furthermore, we propose the ip-NSW+ algorithm, which improves ip-NSW by introducing an additional angular proximity graph. Search is first conducted on the angular graph to find the angular neighbors of a query and then the MIPS neighbors of these angular neighbors are used to initialize the candidate pool for search on the inner-product proximity graph. Experiment results show that ip-NSW+ consistently and significantly outperforms ip-NSW and provides more robust performance under different data distributions.

【Keywords】:

Paper Link】 【Pages】:147-155

【Authors】: Xiaoxue Li ; Yanmin Shang ; Yanan Cao ; Yangxi Li ; Jianlong Tan ; Yanbing Liu

【Abstract】: Anchor Link Prediction (ALP) across heterogeneous networks plays a pivotal role in inter-network applications. The difficulty of anchor link prediction in heterogeneous networks lies in how to consider the factors affecting nodes alignment comprehensively. In recent years, predicting anchor links based on network embedding has become the main trend. For heterogeneous networks, previous anchor link prediction methods first integrate various types of nodes associated with a user node to obtain a fusion embedding vector from global perspective, and then predict anchor links based on the similarity between fusion vectors corresponding with different user nodes. However, the fusion vector ignores effects of the local type information on user nodes alignment. To address the challenge, we propose a novel type-aware anchor link prediction across heterogeneous networks (TALP), which models the effect of type information and fusion information on user nodes alignment from local and global perspective simultaneously. TALP can solve the network embedding and type-aware alignment under a unified optimization framework based on a two-layer graph attention architecture. Through extensive experiments on real heterogeneous network datasets, we demonstrate that TALP significantly outperforms the state-of-the-art methods.

【Keywords】:

20. Deep Match to Rank Model for Personalized Click-Through Rate Prediction.

Paper Link】 【Pages】:156-163

【Authors】: Zequn Lyu ; Yu Dong ; Chengfu Huo ; Weijun Ren

【Abstract】: Click-through rate (CTR) prediction is a core task in the field of recommender system and many other applications. For CTR prediction model, personalization is the key to improve the performance and enhance the user experience. Recently, several models are proposed to extract user interest from user behavior data which reflects user's personalized preference implicitly. However, existing works in the field of CTR prediction mainly focus on user representation and pay less attention on representing the relevance between user and item, which directly measures the intensity of user's preference on target item. Motivated by this, we propose a novel model named Deep Match to Rank (DMR) which combines the thought of collaborative filtering in matching methods for the ranking task in CTR prediction. In DMR, we design User-to-Item Network and Item-to-Item Network to represent the relevance in two forms. In User-to-Item Network, we represent the relevance between user and item by inner product of the corresponding representation in the embedding space. Meanwhile, an auxiliary match network is presented to supervise the training and push larger inner product to represent higher relevance. In Item-to-Item Network, we first calculate the item-to-item similarities between user interacted items and target item by attention mechanism, and then sum up the similarities to obtain another form of user-to-item relevance. We conduct extensive experiments on both public and industrial datasets to validate the effectiveness of our model, which outperforms the state-of-art models significantly.

【Keywords】:

21. Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion.

Paper Link】 【Pages】:164-172

【Authors】: Sijie Mai ; Haifeng Hu ; Songlong Xing

【Abstract】: Learning joint embedding space for various modalities is of vital importance for multimodal fusion. Mainstream modality fusion approaches fail to achieve this goal, leaving a modality gap which heavily affects cross-modal fusion. In this paper, we propose a novel adversarial encoder-decoder-classifier framework to learn a modality-invariant embedding space. Since the distributions of various modalities vary in nature, to reduce the modality gap, we translate the distributions of source modalities into that of target modality via their respective encoders using adversarial training. Furthermore, we exert additional constraints on embedding space by introducing reconstruction loss and classification loss. Then we fuse the encoded representations using hierarchical graph neural network which explicitly explores unimodal, bimodal and trimodal interactions in multi-stage. Our method achieves state-of-the-art performance on multiple datasets. Visualization of the learned embeddings suggests that the joint embedding space learned by our method is discriminative.

【Keywords】:

22. A Variational Point Process Model for Social Event Sequences.

Paper Link】 【Pages】:173-180

【Authors】: Zhen Pan ; Zhenya Huang ; Defu Lian ; Enhong Chen

【Abstract】: Many events occur in real-world and social networks. Events are related to the past and there are patterns in the evolution of event sequences. Understanding the patterns can help us better predict the type and arriving time of the next event. In the literature, both feature-based approaches and generative approaches are utilized to model the event sequence. Feature-based approaches extract a variety of features, and train a regression or classification model to make a prediction. Yet, their performance is dependent on the experience-based feature exaction. Generative approaches usually assume the evolution of events follow a stochastic point process (e.g., Poisson process or its complexer variants). However, the true distribution of events is never known and the performance depends on the design of stochastic process in practice. To solve the above challenges, in this paper, we present a novel probabilistic generative model for event sequences. The model is termed Variational Event Point Process (VEPP). Our model introduces variational auto-encoder to event sequence modeling that can better use the latent information and capture the distribution over inter-arrival time and types of event sequences. Experiments on real-world datasets prove effectiveness of our proposed model.

【Keywords】:

23. Fair Updates in Two-Sided Market Platforms: On Incrementally Updating Recommendations.

Paper Link】 【Pages】:181-188

【Authors】: Gourab K. Patro ; Abhijnan Chakraborty ; Niloy Ganguly ; Krishna P. Gummadi

【Abstract】: Major online platforms today can be thought of as two-sided markets with producers and customers of goods and services. There have been concerns that over-emphasis on customer satisfaction by the platforms may affect the well-being of the producers. To counter such issues, few recent works have attempted to incorporate fairness for the producers. However, these studies have overlooked an important issue in such platforms -- to supposedly improve customer utility, the underlying algorithms are frequently updated, causing abrupt changes in the exposure of producers. In this work, we focus on the fairness issues arising out of such frequent updates, and argue for incremental updates of the platform algorithms so that the producers have enough time to adjust (both logistically and mentally) to the change. However, naive incremental updates may become unfair to the customers. Thus focusing on recommendations deployed on two-sided platforms, we formulate an ILP based online optimization to deploy changes incrementally in η steps, where we can ensure smooth transition of the exposure of items while guaranteeing a minimum utility for every customer. Evaluations over multiple real world datasets show that our proposed mechanism for platform updates can be efficient and fair to both the producers and the customers in two-sided platforms.

【Keywords】:

24. Towards Comprehensive Recommender Systems: Time-Aware Unified Recommendations Based on Listwise Ranking of Implicit Cross-Network Data.

Paper Link】 【Pages】:189-197

【Authors】: Dilruk Perera ; Roger Zimmermann

【Abstract】: The abundance of information in web applications make recommendation essential for users as well as applications. Despite the effectiveness of existing recommender systems, we find two major limitations that reduce their overall performance: (1) inability to provide timely recommendations for both new and existing users by considering the dynamic nature of user preferences, and (2) not fully optimized for the ranking task when using implicit feedback. Therefore, we propose a novel deep learning based unified cross-network solution to mitigate cold-start and data sparsity issues and provide timely recommendations for new and existing users. Furthermore, we consider the ranking problem under implicit feedback as a classification task, and propose a generic personalized listwise optimization criterion for implicit data to effectively rank a list of items. We illustrate our cross-network model using Twitter auxiliary information for recommendations on YouTube target network. Extensive comparisons against multiple time aware and cross-network baselines show that the proposed solution is superior in terms of accuracy, novelty and diversity. Furthermore, experiments conducted on the popular MovieLens dataset suggest that the proposed listwise ranking method outperforms existing state-of-the-art ranking techniques.

【Keywords】:

25. Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation.

Paper Link】 【Pages】:198-205

【Authors】: Chenze Shao ; Jinchao Zhang ; Yang Feng ; Fandong Meng ; Jie Zhou

【Abstract】: Non-Autoregressive Neural Machine Translation (NAT) achieves significant decoding speedup through generating target words independently and simultaneously. However, in the context of non-autoregressive translation, the word-level cross-entropy loss cannot model the target-side sequential dependency properly, leading to its weak correlation with the translation quality. As a result, NAT tends to generate influent translations with over-translation and under-translation errors. In this paper, we propose to train NAT to minimize the Bag-of-Ngrams (BoN) difference between the model output and the reference sentence. The bag-of-ngrams training objective is differentiable and can be efficiently calculated, which encourages NAT to capture the target-side sequential dependency and correlates well with the translation quality. We validate our approach on three translation tasks and show that our approach largely outperforms the NAT baseline by about 5.0 BLEU scores on WMT14 En↔De and about 2.5 BLEU scores on WMT16 En↔Ro.

【Keywords】:

26. PEIA: Personality and Emotion Integrated Attentive Model for Music Recommendation on Social Media Platforms.

Paper Link】 【Pages】:206-213

【Authors】: Tiancheng Shen ; Jia Jia ; Yan Li ; Yihui Ma ; Yaohua Bu ; Hanjie Wang ; Bo Chen ; Tat-Seng Chua ; Wendy Hall

【Abstract】: With the rapid expansion of digital music formats, it's indispensable to recommend users with their favorite music. For music recommendation, users' personality and emotion greatly affect their music preference, respectively in a long-term and short-term manner, while rich social media data provides effective feedback on these information. In this paper, aiming at music recommendation on social media platforms, we propose a Personality and Emotion Integrated Attentive model (PEIA), which fully utilizes social media data to comprehensively model users' long-term taste (personality) and short-term preference (emotion). Specifically, it takes full advantage of personality-oriented user features, emotion-oriented user features and music features of multi-faceted attributes. Hierarchical attention is employed to distinguish the important factors when incorporating the latent representations of users' personality and emotion. Extensive experiments on a large real-world dataset of 171,254 users demonstrate the effectiveness of our PEIA model which achieves an NDCG of 0.5369, outperforming the state-of-the-art methods. We also perform detailed parameter analysis and feature contribution analysis, which further verify our scheme and demonstrate the significance of co-modeling of user personality and emotion in music recommendation.

【Keywords】:

27. Where to Go Next: Modeling Long- and Short-Term User Preferences for Point-of-Interest Recommendation.

Paper Link】 【Pages】:214-221

【Authors】: Ke Sun ; Tieyun Qian ; Tong Chen ; Yile Liang ; Quoc Viet Hung Nguyen ; Hongzhi Yin

【Abstract】: Point-of-Interest (POI) recommendation has been a trending research topic as it generates personalized suggestions on facilities for users from a large number of candidate venues. Since users' check-in records can be viewed as a long sequence, methods based on recurrent neural networks (RNNs) have recently shown promising applicability for this task. However, existing RNN-based methods either neglect users' long-term preferences or overlook the geographical relations among recently visited POIs when modeling users' short-term preferences, thus making the recommendation results unreliable. To address the above limitations, we propose a novel method named Long- and Short-Term Preference Modeling (LSTPM) for next-POI recommendation. In particular, the proposed model consists of a nonlocal network for long-term preference modeling and a geo-dilated RNN for short-term preference learning. Extensive experiments on two real-world datasets demonstrate that our model yields significant improvements over the state-of-the-art methods.

【Keywords】:

28. Knowledge Graph Alignment Network with Gated Multi-Hop Neighborhood Aggregation.

Paper Link】 【Pages】:222-229

【Authors】: Zequn Sun ; Chengming Wang ; Wei Hu ; Muhao Chen ; Jian Dai ; Wei Zhang ; Yuzhong Qu

【Abstract】: Graph neural networks (GNNs) have emerged as a powerful paradigm for embedding-based entity alignment due to their capability of identifying isomorphic subgraphs. However, in real knowledge graphs (KGs), the counterpart entities usually have non-isomorphic neighborhood structures, which easily causes GNNs to yield different representations for them. To tackle this problem, we propose a new KG alignment network, namely AliNet, aiming at mitigating the non-isomorphism of neighborhood structures in an end-to-end manner. As the direct neighbors of counterpart entities are usually dissimilar due to the schema heterogeneity, AliNet introduces distant neighbors to expand the overlap between their neighborhood structures. It employs an attention mechanism to highlight helpful distant neighbors and reduce noises. Then, it controls the aggregation of both direct and distant neighborhood information using a gating mechanism. We further propose a relation loss to refine entity representations. We perform thorough experiments with detailed ablation studies and analyses on five entity alignment datasets, demonstrating the effectiveness of AliNet.

【Keywords】:

29. Learning with Unsure Responses.

Paper Link】 【Pages】:230-237

【Authors】: Kunihiro Takeoka ; Yuyang Dong ; Masafumi Oyamada

【Abstract】: Many annotation systems provide to add an unsure option in the labels, because the annotators have different expertise, and they may not have enough confidence to choose a label for some assigned instances. However, all the existing approaches only learn the labels with a clear class name and ignore the unsure responses. Due to the unsure response also account for a proportion of the dataset (e.g., about 10-30% in real datasets), existing approaches lead to high costs such as paying more money or taking more time to collect enough size of labeled data. Therefore, it is a significant issue to make use of these unsure.In this paper, we make the unsure responses contribute to training classifiers. We found a property that the instances corresponding to the unsure responses always appear close to the decision boundary of classification. We design a loss function called unsure loss based on this property. We extend the conventional methods for classification and learning from crowds with this unsure loss. Experimental results on realworld and synthetic data demonstrate the performance of our method and its superiority over baseline methods.

【Keywords】:

30. Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation Learning.

Paper Link】 【Pages】:238-245

【Authors】: Haiwen Wang ; Ruijie Wang ; Chuan Wen ; Shuhao Li ; Yuting Jia ; Weinan Zhang ; Xinbing Wang

【Abstract】: Author name ambiguity causes inadequacy and inconvenience in academic information retrieval, which raises the necessity of author name disambiguation (AND). Existing AND methods can be divided into two categories: the models focusing on content information to distinguish whether two papers are written by the same author, the models focusing on relation information to represent information as edges on the network and to quantify the similarity among papers. However, the former requires adequate labeled samples and informative negative samples, and are also ineffective in measuring the high-order connections among papers, while the latter needs complicated feature engineering or supervision to construct the network. We propose a novel generative adversarial framework to grow the two categories of models together: (i) the discriminative module distinguishes whether two papers are from the same author, and (ii) the generative module selects possibly homogeneous papers directly from the heterogeneous information network, which eliminates the complicated feature engineering. In such a way, the discriminative module guides the generative module to select homogeneous papers, and the generative module generates high-quality negative samples to train the discriminative module to make it aware of high-order connections among papers. Furthermore, a self-training strategy for the discriminative module and a random walk based generating algorithm are designed to make the training stable and efficient. Extensive experiments on two real-world AND benchmarks demonstrate that our model provides significant performance improvement over the state-of-the-art methods.

【Keywords】:

31. Social Influence Does Matter: User Action Prediction for In-Feed Advertising.

Paper Link】 【Pages】:246-253

【Authors】: Hongyang Wang ; Qingfei Meng ; Ju Fan ; Yuchen Li ; Laizhong Cui ; Xiaoman Zhao ; Chong Peng ; Gong Chen ; Xiaoyong Du

【Abstract】: Social in-feed advertising delivers ads that seamlessly fit inside a user’s feed, and allows users to engage in social actions (likes or comments) with the ads. Many businesses pay higher attention to “engagement marketing” that maximizes social actions, as social actions can effectively promote brand awareness. This paper studies social action prediction for in-feed advertising. Most existing works overlook the social influence as a user’s action may be affected by her friends’ actions. This paper introduces an end-to-end approach that leverages social influence for action prediction, and focuses on addressing the high sparsity challenge for in-feed ads. We propose to learn influence structure that models who tends to be influenced. We extract a subgraph with the near neighbors a user interacts with, and learn topological features of the subgraph by developing structure-aware graph encoding methods. We also introduce graph attention networks to learn influence dynamics that models how a user is influenced by neighbors’ actions. We conduct extensive experiments on real datasets from the commercial advertising platform of WeChat and a public dataset. The experimental results demonstrate that social influence learned by our approach can significantly boost performance of social action prediction.

【Keywords】:

32. Mining Unfollow Behavior in Large-Scale Online Social Networks via Spatial-Temporal Interaction.

Paper Link】 【Pages】:254-261

【Authors】: Haozhe Wu ; Zhiyuan Hu ; Jia Jia ; Yaohua Bu ; Xiangnan He ; Tat-Seng Chua

【Abstract】: Online Social Networks (OSNs) evolve through two pervasive behaviors: follow and unfollow, which respectively signify relationship creation and relationship dissolution. Researches on social network evolution mainly focus on the follow behavior, while the unfollow behavior has largely been ignored. Mining unfollow behavior is challenging because user's decision on unfollow is not only affected by the simple combination of user's attributes like informativeness and reciprocity, but also affected by the complex interaction among them. Meanwhile, prior datasets seldom contain sufficient records for inferring such complex interaction. To address these issues, we first construct a large-scale real-world Weibo1 dataset, which records detailed post content and relationship dynamics of 1.8 million Chinese users. Next, we define user's attributes as two categories: spatial attributes (e.g., social role of user) and temporal attributes (e.g., post content of user). Leveraging the constructed dataset, we systematically study how the interaction effects between user's spatial and temporal attributes contribute to the unfollow behavior. Afterwards, we propose a novel unified model with heterogeneous information (UMHI) for unfollow prediction. Specifically, our UMHI model: 1) captures user's spatial attributes through social network structure; 2) infers user's temporal attributes through user-posted content and unfollow history; and 3) models the interaction between spatial and temporal attributes by the nonlinear MLP layers. Comprehensive evaluations on the constructed dataset demonstrate that the proposed UMHI model outperforms baseline methods by 16.44 on average in terms of precision. In addition, factor analyses verify that both spatial attributes and temporal attributes are essential for mining unfollow behavior.

【Keywords】:

33. Who Likes What? - SplitLBI in Exploring Preferential Diversity of Ratings.

Paper Link】 【Pages】:262-269

【Authors】: Qianqian Xu ; Jiechao Xiong ; Zhiyong Yang ; Xiaochun Cao ; Qingming Huang ; Yuan Yao

【Abstract】: In recent years, learning user preferences has received significant attention. A shortcoming of existing learning to rank work lies in that they do not take into account the multi-level hierarchies from social choice to individuals. In this paper, we propose a multi-level model which learns both the common preference or utility function over the population based on features of alternatives to-be-compared, and preferential diversity functions conditioning on user categories. Such a multi-level model, enables us to simultaneously learn a coarse-grained social preference function together with a fine-grained personalized diversity. It provides us prediction power for the choices of new users on new alternatives. The key algorithm in this paper is based on Split Linearized Bregman Iteration (SplitLBI) algorithm which generates a dynamic path from the common utility to personalized preferential diversity, at different levels of sparsity on personalization. A synchronized parallel version of SplitLBI is proposed to meet the needs of fast analysis of large-scale data. The validity of the methodology are supported by experiments with both simulated and real-world datasets such as movie and dining restaurant ratings which provides us a coarse-to-fine grained preference learning.

【Keywords】:

34. Multi-Feature Discrete Collaborative Filtering for Fast Cold-Start Recommendation.

Paper Link】 【Pages】:270-278

【Authors】: Yang Xu ; Lei Zhu ; Zhiyong Cheng ; Jingjing Li ; Jiande Sun

【Abstract】: Hashing is an effective technique to address the large-scale recommendation problem, due to its high computation and storage efficiency on calculating the user preferences on items. However, existing hashing-based recommendation methods still suffer from two important problems: 1) Their recommendation process mainly relies on the user-item interactions and single specific content feature. When the interaction history or the content feature is unavailable (the cold-start problem), their performance will be seriously deteriorated. 2) Existing methods learn the hash codes with relaxed optimization or adopt discrete coordinate descent to directly solve binary hash codes, which results in significant quantization loss or consumes considerable computation time. In this paper, we propose a fast cold-start recommendation method, called Multi-Feature Discrete Collaborative Filtering (MFDCF), to solve these problems. Specifically, a low-rank self-weighted multi-feature fusion module is designed to adaptively project the multiple content features into binary yet informative hash codes by fully exploiting their complementarity. Additionally, we develop a fast discrete optimization algorithm to directly compute the binary hash codes with simple operations. Experiments on two public recommendation datasets demonstrate that MFDCF outperforms the state-of-the-arts on various aspects.

【Keywords】:

35. Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization.

Paper Link】 【Pages】:279-286

【Authors】: Hanyu Xuan ; Zhenyu Zhang ; Shuo Chen ; Jian Yang ; Yan Yan

【Abstract】: In human multi-modality perception systems, the benefits of integrating auditory and visual information are extensive as they provide plenty supplementary cues for understanding the events. Despite some recent methods proposed for such application, they cannot deal with practical conditions with temporal inconsistency. Inspired by human system which puts different focuses at specific locations, time segments and media while performing multi-modality perception, we provide an attention-based method to simulate such process. Similar to human mechanism, our network can adaptively select “where” to attend, “when” to attend and “which” to attend for audio-visual event localization. In this way, even with large temporal inconsistent between vision and audio, our network is able to adaptively trade information between different modalities and successfully achieve event localization. Our method achieves state-of-the-art performance on AVE (Audio-Visual Event) dataset collected in the real life. In addition, we also systemically investigate audio-visual event localization tasks. The visualization results also help us better understand how our model works.

【Keywords】:

36. Learning to Match on Graph for Fashion Compatibility Modeling.

Paper Link】 【Pages】:287-294

【Authors】: Xun Yang ; Xiaoyu Du ; Meng Wang

【Abstract】: Understanding the mix-and-match relationships between items receives increasing attention in the fashion industry. Existing methods have primarily learned visual compatibility from dyadic co-occurrence or co-purchase information of items to model the item-item matching interaction. Despite effectiveness, rich extra-connectivities between compatible items, e.g., user-item interactions and item-item substitutable relationships, which characterize the structural properties of items, have been largely ignored. This paper presents a graph-based fashion matching framework named Deep Relational Embedding Propagation (DREP), aiming to inject the extra-connectivities between items into the pairwise compatibility modeling. Specifically, we first build a multi-relational item-item-user graph which encodes diverse item-item and user-item relationships. Then we compute structured representations of items by an attentive relational embedding propagation rule that performs messages propagation along edges of the relational graph. This leads to expressive modeling of higher-order connectivity between items and also better representation of fashion items. Finally, we predict pairwise compatibility based on a compatibility metric learning module. Extensive experiments show that DREP can significantly improve the performance of state-of-the-art methods.

【Keywords】:

37. D2D-LSTM: LSTM-Based Path Prediction of Content Diffusion Tree in Device-to-Device Social Networks.

Paper Link】 【Pages】:295-302

【Authors】: Heng Zhang ; Xiaofei Wang ; Jiawen Chen ; Chenyang Wang ; Jianxin Li

【Abstract】: With the proliferation of mobile device users, the Device-to-Device (D2D) communication has ascended to the spotlight in social network for users to share and exchange enormous data. Different from classic online social network (OSN) like Twitter and Facebook, each single data file to be shared in the D2D social network is often very large in data size, e.g., video, image or document. Sometimes, a small number of interesting data files may dominate the network traffic, and lead to heavy network congestion. To reduce the traffic congestion and design effective caching strategy, it is highly desirable to investigate how the data files are propagated in offline D2D social network and derive the diffusion model that fits to the new form of social network. However, existing works mainly concern about link prediction, which cannot predict the overall diffusion path when network topology is unknown. In this article, we propose D2D-LSTM based on Long Short-Term Memory (LSTM), which aims to predict complete content propagation paths in D2D social network. Taking the current user's time, geography and category preference into account, historical features of the previous path can be captured as well. It utilizes prototype users for prediction so as to achieve a better generalization ability. To the best of our knowledge, it is the first attempt to use real world large-scale dataset of mobile social network (MSN) to predict propagation path trees in a top-down order. Experimental results corroborate that the proposed algorithm can achieve superior prediction performance than state-of-the-art approaches. Furthermore, D2D-LSTM can achieve 95% average precision for terminal class and 17% accuracy for tree path hit.

【Keywords】:

38. An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos.

Paper Link】 【Pages】:303-311

【Authors】: Sicheng Zhao ; Yunsheng Ma ; Yang Gu ; Jufeng Yang ; Tengfei Xing ; Pengfei Xu ; Runbo Hu ; Hua Chai ; Kurt Keutzer

【Abstract】: Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN. Further, we design a special classification loss, i.e. polarity-consistent cross-entropy loss, based on the polarity-emotion hierarchy constraint to guide the attention generation. Extensive experiments conducted on the challenging VideoEmotion-8 and Ekman-6 datasets demonstrate that the proposed VAANet outperforms the state-of-the-art approaches for video emotion recognition. Our source code is released at: https://github.com/maysonma/VAANet.

【Keywords】:

39. Multi-Channel Reverse Dictionary Model.

Paper Link】 【Pages】:312-319

【Authors】: Lei Zheng ; Fanchao Qi ; Zhiyuan Liu ; Yasheng Wang ; Qun Liu ; Maosong Sun

【Abstract】: A reverse dictionary takes the description of a target word as input and outputs the target word together with other words that match the description. Existing reverse dictionary methods cannot deal with highly variable input queries and low-frequency target words successfully. Inspired by the description-to-word inference process of humans, we propose the multi-channel reverse dictionary model, which can mitigate the two problems simultaneously. Our model comprises a sentence encoder and multiple predictors. The predictors are expected to identify different characteristics of the target word from the input query. We evaluate our model on English and Chinese datasets including both dictionary definitions and human-written descriptions. Experimental results show that our model achieves the state-of-the-art performance, and even outperforms the most popular commercial reverse dictionary system on the human-written description dataset. We also conduct quantitative analyses and a case study to demonstrate the effectiveness and robustness of our model. All the code and data of this work can be obtained on https://github.com/thunlp/MultiRD.

【Keywords】:

40. Table2Analysis: Modeling and Recommendation of Common Analysis Patterns for Multi-Dimensional Data.

Paper Link】 【Pages】:320-328

【Authors】: Mengyu Zhou ; Wang Tao ; Pengxin Ji ; Han Shi ; Dongmei Zhang

【Abstract】: Given a table of multi-dimensional data, what analyses would human create to extract information from it? From scientific exploration to business intelligence (BI), this is a key problem to solve towards automation of knowledge discovery and decision making. In this paper, we propose Table2Analysis to learn commonly conducted analysis patterns from large amount of (table, analysis) pairs, and recommend analyses for any given table even not seen before. Multi-dimensional data as input challenges existing model architectures and training techniques to fulfill the task. Based on deep Q-learning with heuristic search, Table2Analysis does table to sequence generation, with each sequence encoding an analysis. Table2Analysis has 0.78 recall at top-5 and 0.65 recall at top-1 in our evaluation against a large scale spreadsheet corpus on the PivotTable recommendation task.

【Keywords】:

41. A Recurrent Model for Collective Entity Linking with Adaptive Features.

Paper Link】 【Pages】:329-336

【Authors】: Xiaoling Zhou ; Yukai Miao ; Wei Wang ; Jianbin Qin

【Abstract】: The vast amount of web data enables us to build knowledge bases with unprecedented quality and coverage. Named Entity Disambiguation (NED) is an important task that automatically resolves ambiguous mentions in free text to correct target entries in the knowledge base. Traditional machine learning based methods for NED were outperformed and made obsolete by the state-of-the-art deep learning based models. However, deep learning models are more complex, requiring large amount of training data and lengthy training and parameter tuning time. In this paper, we revisit traditional machine learning techniques and propose a light-weight, tuneable and time-efficient method without using deep learning or deep learning generated features. We propose novel adaptive features that focus on extracting discriminative features to better model similarities between candidate entities and the mention's context. We learn a local ranking model based on traditional and the new adaptive features based on the learning-to-rank framework. While arriving at linking decisions individually via the local model, our method also takes into consideration the correlation between decisions by running multiple recurrent global models, which can be deemed as a learned local search method. Our method attains performances comparable to the state-of-the-art deep learning-based methods on NED benchmark datasets while being significantly faster to train.

【Keywords】:

AAAI Special Technical Track: AI for Social Impact 25

42. FairyTED: A Fair Rating Predictor for TED Talk Data.

Paper Link】 【Pages】:338-345

【Authors】: Rupam Acharyya ; Shouman Das ; Ankani Chattoraj ; Md. Iftekhar Tanveer

【Abstract】: With the recent trend of applying machine learning in every aspect of human life, it is important to incorporate fairness into the core of the predictive algorithms. We address the problem of predicting the quality of public speeches while being fair with respect to sensitive attributes of the speakers, e.g. gender and race. We use the TED talks as an input repository of public speeches because it consists of speakers from a diverse community and has a wide outreach. Utilizing the theories of Causal Models, Counterfactual Fairness and state-of-the-art neural language models, we propose a mathematical framework for fair prediction of the public speaking quality. We employ grounded assumptions to construct a causal model capturing how different attributes affect public speaking quality. This causal model contributes in generating counterfactual data to train a fair predictive model. Our framework is general enough to utilize any assumption within the causal model. Experimental results show that while prediction accuracy is comparable to recent work on this dataset, our predictions are counterfactually fair with respect to a novel metric when compared to true data labels. The FairyTED setup not only allows organizers to make informed and diverse selection of speakers from the unobserved counterfactual possibilities but it also ensures that viewers and new users are not influenced by unfair and unbalanced ratings from arbitrary visitors to the ted.com website when deciding to view a talk.

【Keywords】:

43. Crisis-DIAS: Towards Multimodal Damage Analysis - Deployment, Challenges and Assessment.

Paper Link】 【Pages】:346-353

【Authors】: Mansi Agarwal ; Maitree Leekha ; Ramit Sawhney ; Rajiv Ratn Shah

【Abstract】: In times of a disaster, the information available on social media can be useful for several humanitarian tasks as disseminating messages on social media is quick and easily accessible. Disaster damage assessment is inherently multi-modal, yet most existing work on damage identification has focused solely on building generic classification models that rely exclusively on text or image analysis of online social media sessions (e.g., posts). Despite their empirical success, these efforts ignore the multi-modal information manifested in social media data. Conventionally, when information from various modalities is presented together, it often exhibits complementary insights about the application domain and facilitates better learning performance. In this work, we present Crisis-DIAS, a multi-modal sequential damage identification, and severity detection system. We aim to support disaster management and aid in planning by analyzing and exploiting the impact of linguistic cues on a unimodal visual system. Through extensive qualitative, quantitative and theoretical analysis on a real-world multi-modal social media dataset, we show that the Crisis-DIAS framework is superior to the state-of-the-art damage assessment models in terms of bias, responsiveness, computational efficiency, and assessment performance.

【Keywords】:

44. Unsupervised Detection of Sub-Events in Large Scale Disasters.

Paper Link】 【Pages】:354-361

【Authors】: Chidubem Arachie ; Manas Gaur ; Sam Anzaroot ; William Groves ; Ke Zhang ; Alejandro Jaimes

【Abstract】: Social media plays a major role during and after major natural disasters (e.g., hurricanes, large-scale fires, etc.), as people “on the ground” post useful information on what is actually happening. Given the large amounts of posts, a major challenge is identifying the information that is useful and actionable. Emergency responders are largely interested in finding out what events are taking place so they can properly plan and deploy resources. In this paper we address the problem of automatically identifying important sub-events (within a large-scale emergency “event”, such as a hurricane). In particular, we present a novel, unsupervised learning framework to detect sub-events in Tweets for retrospective crisis analysis. We first extract noun-verb pairs and phrases from raw tweets as sub-event candidates. Then, we learn a semantic embedding of extracted noun-verb pairs and phrases, and rank them against a crisis-specific ontology. We filter out noisy and irrelevant information then cluster the noun-verb pairs and phrases so that the top-ranked ones describe the most important sub-events. Through quantitative experiments on two large crisis data sets (Hurricane Harvey and the 2015 Nepal Earthquake), we demonstrate the effectiveness of our approach over the state-of-the-art. Our qualitative evaluation shows better performance compared to our baseline.

【Keywords】:

45. Spatio-Temporal Attention-Based Neural Network for Credit Card Fraud Detection.

Paper Link】 【Pages】:362-369

【Authors】: Dawei Cheng ; Sheng Xiang ; Chencheng Shang ; Yiyi Zhang ; Fangzhou Yang ; Liqing Zhang

【Abstract】: Credit card fraud is an important issue and incurs a considerable cost for both cardholders and issuing institutions. Contemporary methods apply machine learning-based approaches to detect fraudulent behavior from transaction records. But manually generating features needs domain knowledge and may lay behind the modus operandi of fraud, which means we need to automatically focus on the most relevant patterns in fraudulent behavior. Therefore, in this work, we propose a spatial-temporal attention-based neural network (STAN) for fraud detection. In particular, transaction records are modeled by attention and 3D convolution mechanisms by integrating the corresponding information, including spatial and temporal behaviors. Attentional weights are jointly learned in an end-to-end manner with 3D convolution and detection networks. Afterward, we conduct extensive experiments on real-word fraud transaction dataset, the result shows that STAN performs better than other state-of-the-art baselines in both AUC and precision-recall curves. Moreover, we conduct empirical studies with domain experts on the proposed method for fraud post-analysis; the result demonstrates the effectiveness of our proposed method in both detecting suspicious transactions and mining fraud patterns.

【Keywords】:

46. Tracking Disaster Footprints with Social Streaming Data.

Paper Link】 【Pages】:370-377

【Authors】: Lu Cheng ; Jundong Li ; K. Selçuk Candan ; Huan Liu

【Abstract】: Social media has become an indispensable tool in the face of natural disasters due to its broad appeal and ability to quickly disseminate information. For instance, Twitter is an important source for disaster responders to search for (1) topics that have been identified as being of particular interest over time, i.e., common topics such as “disaster rescue”; (2) new emerging themes of disaster-related discussions that are fast gathering in social media streams (Saha and Sindhwani 2012), i.e., distinct topics such as “the latest tsunami destruction”. To understand the status quo and allocate limited resources to most urgent areas, emergency managers need to quickly sift through relevant topics generated over time and investigate their commonness and distinctiveness. A major obstacle to the effective usage of social media, however, is its massive amount of noisy and undesired data. Hence, a naive method, such as set intersection/difference to find common/distinct topics, is often not practical. To address this challenge, this paper studies a new topic tracking problem that seeks to effectively identify the common and distinct topics with social streaming data. The problem is important as it presents a promising new way to efficiently search for accurate information during emergency response. This is achieved by an online Nonnegative Matrix Factorization (NMF) scheme that conducts a faster update of latent factors, and a joint NMF technique that seeks the balance between the reconstruction error of topic identification and the losses induced by discovering common and distinct topics. Extensive experimental results on real-world datasets collected during Hurricane Harvey and Florence reveal the effectiveness of our framework.

【Keywords】:

47. Detecting and Tracking Communal Bird Roosts in Weather Radar Data.

Paper Link】 【Pages】:378-385

【Authors】: Zezhou Cheng ; Saadia Gabriel ; Pankaj Bhambhani ; Daniel Sheldon ; Subhransu Maji ; Andrew Laughlin ; David Winkler

【Abstract】: The US weather radar archive holds detailed information about biological phenomena in the atmosphere over the last 20 years. Communally roosting birds congregate in large numbers at nighttime roosting locations, and their morning exodus from the roost is often visible as a distinctive pattern in radar images. This paper describes a machine learning system to detect and track roost signatures in weather radar data. A significant challenge is that labels were collected opportunistically from previous research studies and there are systematic differences in labeling style. We contribute a latent-variable model and EM algorithm to learn a detection model together with models of labeling styles for individual annotators. By properly accounting for these variations we learn a significantly more accurate detector. The resulting system detects previously unknown roosting locations and provides comprehensive spatio-temporal data about roosts across the US. This data will provide biologists important information about the poorly understood phenomena of broad-scale habitat use and movements of communally roosting birds during the non-breeding season.

【Keywords】:

48. Hindi-English Hate Speech Detection: Author Profiling, Debiasing, and Practical Perspectives.

Paper Link】 【Pages】:386-393

【Authors】: Shivang Chopra ; Ramit Sawhney ; Puneet Mathur ; Rajiv Ratn Shah

【Abstract】: Code-switching in linguistically diverse, low resource languages is often semantically complex and lacks sophisticated methodologies that can be applied to real-world data for precisely detecting hate speech. In an attempt to bridge this gap, we introduce a three-tier pipeline that employs profanity modeling, deep graph embeddings, and author profiling to retrieve instances of hate speech in Hindi-English code-switched language (Hinglish) on social media platforms like Twitter. Through extensive comparison against several baselines on two real-world datasets, we demonstrate how targeted hate embeddings combined with social network-based features outperform state of the art, both quantitatively and qualitatively. Additionally, we present an expert-in-the-loop algorithm for bias elimination in the proposed model pipeline and study the prevalence and performance impact of the debiasing. Finally, we discuss the computational, practical, ethical, and reproducibility aspects of the deployment of our pipeline across the Web.

【Keywords】:

49. Inferring Nighttime Satellite Imagery from Human Mobility.

Paper Link】 【Pages】:394-402

【Authors】: Brian Dickinson ; Gourab Ghoshal ; Xerxes Dotiwalla ; Adam Sadilek ; Henry A. Kautz

【Abstract】: Nighttime lights satellite imagery has been used for decades as a uniform, global source of data for studying a wide range of socioeconomic factors. Recently, another more terrestrial source is producing data with similarly uniform global coverage: anonymous and aggregated smart phone location. This data, which measures the movement patterns of people and populations rather than the light they produce, could prove just as valuable in decades to come. In fact, since human mobility is far more directly related to the socioeconomic variables being predicted, it has an even greater potential. Additionally, since cell phone locations can be aggregated in real time while preserving individual user privacy, it will be possible to conduct studies that would previously have been impossible because they require data from the present. Of course, it will take quite some time to establish the new techniques necessary to apply human mobility data to problems traditionally studied with satellite imagery and to conceptualize and develop new real time applications. In this study we demonstrate that it is possible to accelerate this process by inferring artificial nighttime satellite imagery from human mobility data, while maintaining a strong differential privacy guarantee. We also show that these artificial maps can be used to infer socioeconomic variables, often with greater accuracy than using actual satellite imagery. Along the way, we find that the relationship between mobility and light emissions is both nonlinear and varies considerably around the globe. Finally, we show that models based on human mobility can significantly improve our understanding of society at a global scale.

【Keywords】:

50. A Distributed Multi-Sensor Machine Learning Approach to Earthquake Early Warning.

Paper Link】 【Pages】:403-411

【Authors】: Kevin Fauvel ; Daniel Balouek-Thomert ; Diego Melgar ; Pedro Silva ; Anthony Simonet ; Gabriel Antoniu ; Alexandru Costan ; Véronique Masson ; Manish Parashar ; Ivan Rodero ; Alexandre Termier

【Abstract】: Our research aims to improve the accuracy of Earthquake Early Warning (EEW) systems by means of machine learning. EEW systems are designed to detect and characterize medium and large earthquakes before their damaging effects reach a certain location. Traditional EEW methods based on seismometers fail to accurately identify large earthquakes due to their sensitivity to the ground motion velocity. The recently introduced high-precision GPS stations, on the other hand, are ineffective to identify medium earthquakes due to its propensity to produce noisy data. In addition, GPS stations and seismometers may be deployed in large numbers across different locations and may produce a significant volume of data consequently, affecting the response time and the robustness of EEW systems.In practice, EEW can be seen as a typical classification problem in the machine learning field: multi-sensor data are given in input, and earthquake severity is the classification result. In this paper, we introduce the Distributed Multi-Sensor Earthquake Early Warning (DMSEEW) system, a novel machine learning-based approach that combines data from both types of sensors (GPS stations and seismometers) to detect medium and large earthquakes. DMSEEW is based on a new stacking ensemble method which has been evaluated on a real-world dataset validated with geoscientists. The system builds on a geographically distributed infrastructure, ensuring an efficient computation in terms of response time and robustness to partial infrastructure failures. Our experiments show that DMSEEW is more accurate than the traditional seismometer-only approach and the combined-sensors (GPS and seismometers) approach that adopts the rule of relative strength.

【Keywords】:

51. Faking Fairness via Stealthily Biased Sampling.

Paper Link】 【Pages】:412-419

【Authors】: Kazuto Fukuchi ; Satoshi Hara ; Takanori Maehara

【Abstract】: Auditing fairness of decision-makers is now in high demand. To respond to this social demand, several fairness auditing tools have been developed. The focus of this study is to raise an awareness of the risk of malicious decision-makers who fake fairness by abusing the auditing tools and thereby deceiving the social communities. The question is whether such a fraud of the decision-maker is detectable so that the society can avoid the risk of fake fairness. In this study, we answer this question negatively. We specifically put our focus on a situation where the decision-maker publishes a benchmark dataset as the evidence of his/her fairness and attempts to deceive a person who uses an auditing tool that computes a fairness metric. To assess the (un)detectability of the fraud, we explicitly construct an algorithm, the stealthily biased sampling, that can deliberately construct an evil benchmark dataset via subsampling. We show that the fraud made by the stealthily based sampling is indeed difficult to detect both theoretically and empirically.

【Keywords】:

52. Discriminating Cognitive Disequilibrium and Flow in Problem Solving: A Semi-Supervised Approach Using Involuntary Dynamic Behavioral Signals.

Paper Link】 【Pages】:420-427

【Authors】: Mononito Goswami ; Lujie Chen ; Artur Dubrawski

【Abstract】: Problem solving is one of the most important 21st century skills. However, effectively coaching young students in problem solving is challenging because teachers must continuously monitor their cognitive and affective states, and make real-time pedagogical interventions to maximize their learning outcomes. It is an even more challenging task in social environments with limited human coaching resources. To lessen the cognitive load on a teacher and enable affect-sensitive intelligent tutoring, many researchers have investigated automated cognitive and affective detection methods. However, most of the studies use culturally-sensitive indices of affect that are prone to social editing such as facial expressions, and only few studies have explored involuntary dynamic behavioral signals such as gross body movements. In addition, most current methods rely on expensive labelled data from trained annotators for supervised learning. In this paper, we explore a semi-supervised learning framework that can learn low-dimensional representations of involuntary dynamic behavioral signals (mainly gross-body movements) from a modest number of short time series segments. Experiments on a real-world dataset reveal a significant advantage of these representations in discriminating cognitive disequilibrium and flow, as compared to traditional complexity measures from dynamical systems literature, and demonstrate their potential in transferring learned models to previously unseen subjects.

【Keywords】:

53. Lightweight and Robust Representation of Economic Scales from Satellite Imagery.

Paper Link】 【Pages】:428-436

【Authors】: Sungwon Han ; Donghyun Ahn ; Hyunji Cha ; Jeasurk Yang ; Sungwon Park ; Meeyoung Cha

【Abstract】: Satellite imagery has long been an attractive data source providing a wealth of information regarding human-inhabited areas. While high-resolution satellite images are rapidly becoming available, limited studies have focused on how to extract meaningful information regarding human habitation patterns and economic scales from such data. We present READ, a new approach for obtaining essential spatial representation for any given district from high-resolution satellite imagery based on deep neural networks. Our method combines transfer learning and embedded statistics to efficiently learn the critical spatial characteristics of arbitrary size areas and represent such characteristics in a fixed-length vector with minimal information loss. Even with a small set of labels, READ can distinguish subtle differences between rural and urban areas and infer the degree of urbanization. An extensive evaluation demonstrates that the model outperforms state-of-the-art models in predicting economic scales, such as the population density in South Korea (R2=0.9617), and shows a high use potential in developing countries where district-level economic scales are unknown.

【Keywords】:

54. The Unreasonable Effectiveness of Inverse Reinforcement Learning in Advancing Cancer Research.

Paper Link】 【Pages】:437-445

【Authors】: John Kalantari ; Heidi Nelson ; Nicholas Chia

【Abstract】: The “No Free Lunch” theorem states that for any algorithm, elevated performance over one class of problems is offset by its performance over another. Stated differently, no algorithm works for everything. Instead, designing effective algorithms often means exploiting prior knowledge of data relationships specific to a given problem. This “unreasonable efficacy” is especially desirable for complex and seemingly intractable problems in the natural sciences. One such area that is rife with the need for better algorithms is cancer biology—a field where relatively few insights are being generated from relatively large amounts of data. In part, this is due to the inability of mere statistics to reflect cancer as a genetic evolutionary process—one that involves cells actively mutating in order to navigate host barriers, outcompete neighboring cells, and expand spatially.Our work is built upon the central proposition that the Markov Decision Process (MDP) can better represent the process by which cancer arises and progresses. More specifically, by encoding a cancer cell's complex behavior as a MDP, we seek to model the series of genetic changes, or evolutionary trajectory, that leads to cancer as an optimal decision process. We posit that using an Inverse Reinforcement Learning (IRL) approach will enable us to reverse engineer an optimal policy and reward function based on a set of “expert demonstrations” extracted from the DNA of patient tumors. The inferred reward function and optimal policy can subsequently be used to extrapolate the evolutionary trajectory of any tumor. Here, we introduce a Bayesian nonparametric IRL model (PUR-IRL) where the number of reward functions is a priori unbounded in order to account for uncertainty in cancer data, i.e., the existence of latent trajectories and non-uniform sampling. We show that PUR-IRL is “unreasonably effective” in gaining interpretable and intuitive insights about cancer progression from high-dimensional genome data.

【Keywords】:

55. Linguistic Fingerprints of Internet Censorship: The Case of Sina Weibo.

Paper Link】 【Pages】:446-453

【Authors】: Kei Yin Ng ; Anna Feldman ; Jing Peng

【Abstract】: This paper studies how the linguistic components of blogposts collected from Sina Weibo, a Chinese microblogging platform, might affect the blogposts' likelihood of being censored. Our results go along with King et al. (2013)'s Collective Action Potential (CAP) theory, which states that a blogpost's potential of causing riot or assembly in real life is the key determinant of it getting censored. Although there is not a definitive measure of this construct, the linguistic features that we identify as discriminatory go along with the CAP theory. We build a classifier that significantly outperforms non-expert humans in predicting whether a blogpost will be censored. The crowdsourcing results suggest that while humans tend to see censored blogposts as more controversial and more likely to trigger action in real life than the uncensored counterparts, they in general cannot make a better guess than our model when it comes to ‘reading the mind’ of the censors in deciding whether a blogpost should be censored. We do not claim that censorship is only determined by the linguistic features. There are many other factors contributing to censorship decisions. The focus of the present paper is on the linguistic form of blogposts. Our work suggests that it is possible to use linguistic properties of social media posts to automatically predict if they are going to be censored.

【Keywords】:

56. Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas.

Paper Link】 【Pages】:454-462

【Authors】: Shriphani Palakodety ; Ashiqur R. KhudaBukhsh ; Jaime G. Carbonell

【Abstract】: The Rohingya refugee crisis is one of the biggest humanitarian crises of modern times with more than 700,000 Rohingyas rendered homeless according to the United Nations High Commissioner for Refugees. While it has received sustained press attention globally, no comprehensive research has been performed on social media pertaining to this large evolving crisis. In this work, we construct a substantial corpus of YouTube video comments (263,482 comments from 113,250 users in 5,153 relevant videos) with an aim to analyze the possible role of AI in helping a marginalized community. Using a novel combination of multiple Active Learning strategies and a novel active sampling strategy based on nearest-neighbors in the comment-embedding space, we construct a classifier that can detect comments defending the Rohingyas among larger numbers of disparaging and neutral ones. We advocate that beyond the burgeoning field of hate speech detection, automatic detection of help speech can lend voice to the voiceless people and make the internet safer for marginalized communities.

【Keywords】:

57. Guided Weak Supervision for Action Recognition with Scarce Data to Assess Skills of Children with Autism.

Paper Link】 【Pages】:463-470

【Authors】: Prashant Pandey ; Prathosh AP ; Manu Kohli ; Josh Pritchard

【Abstract】: Diagnostic and intervention methodologies for skill assessment of autism typically requires a clinician repetitively initiating several stimuli and recording the child's response. In this paper, we propose to automate the response measurement through video recording of the scene following the use of Deep Neural models for human action recognition from videos. However, supervised learning of neural networks demand large amounts of annotated data that is hard to come by. This issue is addressed by leveraging the ‘similarities’ between the action categories in publicly available large-scale video action (source) datasets and the dataset of interest. A technique called Guided Weak Supervision is proposed, where every class in the target data is matched to a class in the source data using the principle of posterior likelihood maximization. Subsequently, classifier on the target data is re-trained by augmenting samples from the matched source classes, along with a new loss encouraging inter-class separability. The proposed method is evaluated on two skill assessment autism datasets, SSBD (Sundar Rajagopalan, Dhall, and Goecke 2013) and a real world Autism dataset comprising 37 children of different ages and ethnicity who are diagnosed with autism. Our proposed method is found to improve the performance of the state-of-the-art multi-class human action recognition models in-spite of supervision with scarce data.

【Keywords】:

58. The Stanford Acuity Test: A Precise Vision Test Using Bayesian Techniques and a Discovery in Human Visual Response.

Paper Link】 【Pages】:471-479

【Authors】: Chris Piech ; Ali Malik ; Laura M. Scott ; Robert T. Chang ; Charles Lin

【Abstract】: Chart-based visual acuity measurements are used by billions of people to diagnose and guide treatment of vision impairment. However, the ubiquitous eye exam has no mechanism for reasoning about uncertainty and as such, suffers from a well-documented reproducibility problem. In this paper we make two core contributions. First, we uncover a new parametric probabilistic model of visual acuity response based on detailed measurements of patients with eye disease. Then, we present an adaptive, digital eye exam using modern artificial intelligence techniques which substantially reduces acuity exam error over existing approaches, while also introducing the novel ability to model its own uncertainty and incorporate prior beliefs. Using standard evaluation metrics, we estimate a 74% reduction in prediction error compared to the ubiquitous chart-based eye exam and up to 67% reduction compared to the previous best digital exam. For patients with eye disease, the novel ability to finely measure acuity from home could be a crucial part in early diagnosis. We provide a web implementation of our algorithm for anyone in the world to use. The insights in this paper also provide interesting implications for the field of psychometric Item Response Theory.

【Keywords】:

59. Automatically Neutralizing Subjective Bias in Text.

Paper Link】 【Pages】:480-489

【Authors】: Reid Pryzant ; Richard Diehl Martinez ; Nathan Dass ; Sadao Kurohashi ; Dan Jurafsky ; Diyi Yang

【Abstract】: Texts like news, encyclopedias, and some social media strive for objectivity. Yet bias in the form of inappropriate subjectivity — introducing attitudes via framing, presupposing truth, and casting doubt — remains ubiquitous. This kind of bias erodes our collective trust and fuels social conflict. To address this issue, we introduce a novel testbed for natural language generation: automatically bringing inappropriately subjective text into a neutral point of view (“neutralizing” biased text). We also offer the first parallel corpus of biased language. The corpus contains 180,000 sentence pairs and originates from Wikipedia edits that removed various framings, presuppositions, and attitudes from biased sentences. Last, we propose two strong encoder-decoder baselines for the task. A straightforward yet opaque concurrent system uses a BERT encoder to identify subjective words as part of the generation process. An interpretable and controllable modular algorithm separates these steps, using (1) a BERT-based classifier to identify problematic words and (2) a novel join embedding through which the classifier can edit the hidden states of the encoder. Large-scale human evaluation across four domains (encyclopedias, news headlines, books, and political speeches) suggests that these algorithms are a first step towards the automatic identification and reduction of bias.

【Keywords】:

60. Capturing the Style of Fake News.

Paper Link】 【Pages】:490-497

【Authors】: Piotr Przybyla

【Abstract】: In this study we aim to explore automatic methods that can detect online documents of low credibility, especially fake news, based on the style they are written in. We show that general-purpose text classifiers, despite seemingly good performance when evaluated simplistically, in fact overfit to sources of documents in training data. In order to achieve a truly style-based prediction, we gather a corpus of 103,219 documents from 223 online sources labelled by media experts, devise realistic evaluation scenarios and design two new classifiers: a neural network and a model based on stylometric features. The evaluation shows that the proposed classifiers maintain high accuracy in case of documents on previously unseen topics (e.g. new events) and from previously unseen sources (e.g. emerging news websites). An analysis of the stylometric model indicates it indeed focuses on sensational and affective vocabulary, known to be typical for fake news.

【Keywords】:

61. On Identifying Hashtags in Disaster Twitter Data.

Paper Link】 【Pages】:498-506

【Authors】: Jishnu Ray Chowdhury ; Cornelia Caragea ; Doina Caragea

【Abstract】: Tweet hashtags have the potential to improve the search for information during disaster events. However, there is a large number of disaster-related tweets that do not have any user-provided hashtags. Moreover, only a small number of tweets that contain actionable hashtags are useful for disaster response. To facilitate progress on automatic identification (or extraction) of disaster hashtags for Twitter data, we construct a unique dataset of disaster-related tweets annotated with hashtags useful for filtering actionable information. Using this dataset, we further investigate Long Short-Term Memory-based models within a Multi-Task Learning framework. The best performing model achieves an F1-score as high as $92.22%$. The dataset, code, and other resources are available on Github.1

【Keywords】:

62. Neural Approximate Dynamic Programming for On-Demand Ride-Pooling.

Paper Link】 【Pages】:507-515

【Authors】: Sanket Shah ; Meghna Lowalekar ; Pradeep Varakantham

【Abstract】: On-demand ride-pooling (e.g., UberPool, LyftLine, GrabShare) has recently become popular because of its ability to lower costs for passengers while simultaneously increasing revenue for drivers and aggregation companies (e.g., Uber). Unlike in Taxi on Demand (ToD) services – where a vehicle is assigned one passenger at a time – in on-demand ride-pooling, each vehicle must simultaneously serve multiple passengers with heterogeneous origin and destination pairs without violating any quality constraints. To ensure near real-time response, existing solutions to the real-time ride-pooling problem are myopic in that they optimise the objective (e.g., maximise the number of passengers served) for the current time step without considering the effect such an assignment could have on assignments in future time steps. However, considering the future effects of an assignment that also has to consider what combinations of passenger requests can be assigned to vehicles adds a layer of combinatorial complexity to the already challenging problem of considering future effects in the ToD case.A popular approach that addresses the limitations of myopic assignments in ToD problems is Approximate Dynamic Programming (ADP). Existing ADP methods for ToD can only handle Linear Program (LP) based assignments, however, as the value update relies on dual values from the LP. The assignment problem in ride pooling requires an Integer Linear Program (ILP) that has bad LP relaxations. Therefore, our key technical contribution is in providing a general ADP method that can learn from the ILP based assignment found in ride-pooling. Additionally, we handle the extra combinatorial complexity from combinations of passenger requests by using a Neural Network based approximate value function and show a connection to Deep Reinforcement Learning that allows us to learn this value-function with increased stability and sample-efficiency. We show that our approach easily outperforms leading approaches for on-demand ride-pooling on a real-world dataset by up to 16%, a significant improvement in city-scale transportation problems.

【Keywords】:

63. Weak Supervision for Fake News Detection via Reinforcement Learning.

Paper Link】 【Pages】:516-523

【Authors】: Yaqing Wang ; Weifeng Yang ; Fenglong Ma ; Jin Xu ; Bin Zhong ; Qiang Deng ; Jing Gao

【Abstract】: Today social media has become the primary source for news. Via social media platforms, fake news travel at unprecedented speeds, reach global audiences and put users and communities at great risk. Therefore, it is extremely important to detect fake news as early as possible. Recently, deep learning based approaches have shown improved performance in fake news detection. However, the training of such models requires a large amount of labeled data, but manual annotation is time-consuming and expensive. Moreover, due to the dynamic nature of news, annotated samples may become outdated quickly and cannot represent the news articles on newly emerged events. Therefore, how to obtain fresh and high-quality labeled samples is the major challenge in employing deep learning models for fake news detection. In order to tackle this challenge, we propose a reinforced weakly-supervised fake news detection framework, i.e., WeFEND, which can leverage users' reports as weak supervision to enlarge the amount of training data for fake news detection. The proposed framework consists of three main components: the annotator, the reinforced selector and the fake news detector. The annotator can automatically assign weak labels for unlabeled news based on users' reports. The reinforced selector using reinforcement learning techniques chooses high-quality samples from the weakly labeled data and filters out those low-quality ones that may degrade the detector's prediction performance. The fake news detector aims to identify fake news based on the news content. We tested the proposed framework on a large collection of news articles published via WeChat official accounts and associated user reports. Extensive experiments on this dataset show that the proposed WeFEND model achieves the best performance compared with the state-of-the-art methods.

【Keywords】:

64. Protecting Geolocation Privacy of Photo Collections.

Paper Link】 【Pages】:524-531

【Authors】: Jinghan Yang ; Ayan Chakrabarti ; Yevgeniy Vorobeychik

【Abstract】: People increasingly share personal information, including their photos and photo collections, on social media. This information, however, can compromise individual privacy, particularly as social media platforms use it to infer detailed models of user behavior, including tracking their location. We consider the specific issue of location privacy as potentially revealed by posting photo collections, which facilitate accurate geolocation with the help of deep learning methods even in the absence of geotags. One means to limit associated inadvertent geolocation privacy disclosure is by carefully pruning select photos from photo collections before these are posted publicly. We study this problem formally as a combinatorial optimization problem in the context of geolocation prediction facilitated by deep learning. We first demonstrate the complexity both by showing that a natural greedy algorithm can be arbitrarily bad and by proving that the problem is NP-Hard. We then exhibit an important tractable special case, as well as a more general approach based on mixed-integer linear programming. Through extensive experiments on real photo collections, we demonstrate that our approaches are indeed highly effective at preserving geolocation privacy.

【Keywords】:

65. Weakly-Supervised Fine-Grained Event Recognition on Social Media Texts for Disaster Management.

Paper Link】 【Pages】:532-539

【Authors】: Wenlin Yao ; Cheng Zhang ; Shiva Saravanan ; Ruihong Huang ; Ali Mostafavi

【Abstract】: People increasingly use social media to report emergencies, seek help or share information during disasters, which makes social networks an important tool for disaster management. To meet these time-critical needs, we present a weakly supervised approach for rapidly building high-quality classifiers that label each individual Twitter message with fine-grained event categories. Most importantly, we propose a novel method to create high-quality labeled data in a timely manner that automatically clusters tweets containing an event keyword and asks a domain expert to disambiguate event word senses and label clusters quickly. In addition, to process extremely noisy and often rather short user-generated messages, we enrich tweet representations using preceding context tweets and reply tweets in building event recognition classifiers. The evaluation on two hurricanes, Harvey and Florence, shows that using only 1-2 person-hours of human supervision, the rapidly trained weakly supervised classifiers outperform supervised classifiers trained using more than ten thousand annotated tweets created in over 50 person-hours.

【Keywords】:

66. Interactive Learning with Proactive Cognition Enhancement for Crowd Workers.

Paper Link】 【Pages】:540-547

【Authors】: Jing Zhang ; Huihui Wang ; Shunmei Meng ; Victor S. Sheng

【Abstract】: Learning from crowds often performs in an active learning paradigm, aiming to improve learning performance quickly as well as to reduce labeling cost by selecting proper workers to (re)label critical instances. Previous active learning methods for learning from crowds do not have any proactive mechanism to effectively improve the reliability of workers, which prevents to obtain steadily rising learning curves. To help workers improve their reliability while performing tasks, this paper proposes a novel Interactive Learning framework with Proactive Cognitive Enhancement (ILPCE) for crowd workers. The ILPCE framework includes an interactive learning mechanism: When crowd workers perform labeling tasks in active learning, their cognitive ability to the specific domain can be enhanced through learning the exemplars selected by a psychological model-based machine teaching method. A novel probabilistic truth inference model and an interactive labeling scheme are proposed to ensure the effectiveness of the interactive learning mechanism and the performance of learning models can be simultaneously improved through a fast and low-cost way. Experimental results on three real-world learning tasks demonstrate that our ILPCE significantly outperforms five representative state-of-the-art methods.

【Keywords】:

AAAI Technical Track: Applications 90

67. Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks.

Paper Link】 【Pages】:549-556

【Authors】: Tian Bian ; Xi Xiao ; Tingyang Xu ; Peilin Zhao ; Wenbing Huang ; Yu Rong ; Junzhou Huang

【Abstract】: Social media has been developing rapidly in public due to its nature of spreading new information, which leads to rumors being circulated. Meanwhile, detecting rumors from such massive information in social media is becoming an arduous challenge. Therefore, some deep learning methods are applied to discover rumors through the way they spread, such as Recursive Neural Network (RvNN) and so on. However, these deep learning methods only take into account the patterns of deep propagation but ignore the structures of wide dispersion in rumor detection. Actually, propagation and dispersion are two crucial characteristics of rumors. In this paper, we propose a novel bi-directional graph model, named Bi-Directional Graph Convolutional Networks (Bi-GCN), to explore both characteristics by operating on both top-down and bottom-up propagation of rumors. It leverages a GCN with a top-down directed graph of rumor spreading to learn the patterns of rumor propagation; and a GCN with an opposite directed graph of rumor diffusion to capture the structures of rumor dispersion. Moreover, the information from source post is involved in each layer of GCN to enhance the influences from the roots of rumors. Encouraging empirical results on several benchmarks confirm the superiority of the proposed method over the state-of-the-art approaches.

【Keywords】:

68. Doctor2Vec: Dynamic Doctor Representation Learning for Clinical Trial Recruitment.

Paper Link】 【Pages】:557-564

【Authors】: Siddharth Biswal ; Cao Xiao ; Lucas M. Glass ; Elizabeth Milkovits ; Jimeng Sun

【Abstract】: Massive electronic health records (EHRs) enable the success of learning accurate patient representations to support various predictive health applications. In contrast, doctor representation was not well studied despite that doctors play pivotal roles in healthcare. How to construct the right doctor representations? How to use doctor representation to solve important health analytic problems? In this work, we study the problem on clinical trial recruitment, which is about identifying the right doctors to help conduct the trials based on the trial description and patient EHR data of those doctors. We propose Doctor2Vec which simultaneously learns 1) doctor representations from EHR data and 2) trial representations from the description and categorical information about the trials. In particular, Doctor2Vec utilizes a dynamic memory network where the doctor's experience with patients are stored in the memory bank and the network will dynamically assign weights based on the trial representation via an attention mechanism. Validated on large real-world trials and EHR data including 2,609 trials, 25K doctors and 430K patients, Doctor2Vec demonstrated improved performance over the best baseline by up to 8.7% in PR-AUC. We also demonstrated that the Doctor2Vec embedding can be transferred to benefit data insufficiency settings including trial recruitment in less populated/newly explored country with 13.7% improvement or for rare diseases with 8.1% improvement in PR-AUC.

【Keywords】:

69. TrueLearn: A Family of Bayesian Algorithms to Match Lifelong Learners to Open Educational Resources.

Paper Link】 【Pages】:565-573

【Authors】: Sahan Bulathwela ; María Pérez-Ortiz ; Emine Yilmaz ; John Shawe-Taylor

【Abstract】: The recent advances in computer-assisted learning systems and the availability of open educational resources today promise a pathway to providing cost-efficient high-quality education to large masses of learners. One of the most ambitious use cases of computer-assisted learning is to build a lifelong learning recommendation system. Unlike short-term courses, lifelong learning presents unique challenges, requiring sophisticated recommendation models that account for a wide range of factors such as background knowledge of learners or novelty of the material while effectively maintaining knowledge states of masses of learners for significantly longer periods of time (ideally, a lifetime). This work presents the foundations towards building a dynamic, scalable and transparent recommendation system for education, modelling learner's knowledge from implicit data in the form of engagement with open educational resources. We i) use a text ontology based on Wikipedia to automatically extract knowledge components of educational resources and, ii) propose a set of online Bayesian strategies inspired by the well-known areas of item response theory and knowledge tracing. Our proposal, TrueLearn, focuses on recommendations for which the learner has enough background knowledge (so they are able to understand and learn from the material), and the material has enough novelty that would help the learner improve their knowledge about the subject and keep them engaged. We further construct a large open educational video lectures dataset and test the performance of the proposed algorithms, which show clear promise towards building an effective educational recommendation system.

【Keywords】:

Paper Link】 【Pages】:574-581

【Authors】: Lisi Chen ; Shuo Shang ; Tao Guo

【Abstract】: With the proliferation of GPS-based data (e.g., routes and trajectories), it is of great importance to enable the functionality of real-time route search and recommendations. We define and study a novel Continuous Route-Search-by-Location (C-RSL) problem to enable real-time route search by locations for a large number of users over route data streams. Given a set of C-RSL queries where each query q contains a set of places q.O to visit and a threshold q.θ, we continuously feed each query q with routes that has similarity to q.O no less than q.θ. We also extend our proposal to support top-k C-RSL problem where each query continuously maintains k most similar routes. The C-RSL problem targets a variety of applications, including real-time route planning, ridesharing, and other location-based services that have real-time demand. To enable efficient route matching on a large number of C-RSL queries, we develop novel parallel route matching algorithms with good time complexity. Extensive experiments with real data offer insight into the performance of our algorithms, indicating that our proposal is capable of achieving high efficiency and scalability.

【Keywords】:

71. Pay Your Trip for Traffic Congestion: Dynamic Pricing in Traffic-Aware Road Networks.

Paper Link】 【Pages】:582-589

【Authors】: Lisi Chen ; Shuo Shang ; Bin Yao ; Jing Li

【Abstract】: Pricing is essential in optimizing transportation resource allocation. Congestion pricing is widely used to reduce urban traffic congestion. We propose and investigate a novel Dynamic Pricing Strategy (DPS) to price travelers' trips in intelligent transportation platforms (e.g., DiDi, Lyft, Uber). The trips are charged according to their “congestion contributions” to global urban traffic systems. The dynamic pricing strategy retrieves a matching between n travelers' trips and the potential travel routes (each trip has k potential routes) to minimize the global traffic congestion. We believe that DPS holds the potential to benefit society and the environment, such as reducing traffic congestion and enabling smarter and greener transportation. The DPS problem is challenging due to its high computation complexity (there exist kn matching possibilities). We develop an efficient and effective approximate matching algorithm based on local search, as well as pruning techniques to further enhance the matching efficiency. The accuracy and efficiency of the dynamic pricing strategy are verified by extensive experiments on real datasets.

【Keywords】:

72. Adaptive Greedy versus Non-Adaptive Greedy for Influence Maximization.

Paper Link】 【Pages】:590-597

【Authors】: Wei Chen ; Binghui Peng ; Grant Schoenebeck ; Biaoshuai Tao

【Abstract】: We consider the adaptive influence maximization problem: given a network and a budget k, iteratively select k seeds in the network to maximize the expected number of adopters. In the full-adoption feedback model, after selecting each seed, the seed-picker observes all the resulting adoptions. In the myopic feedback model, the seed-picker only observes whether each neighbor of the chosen seed adopts. Motivated by the extreme success of greedy-based algorithms/heuristics for influence maximization, we propose the concept of greedy adaptivity gap, which compares the performance of the adaptive greedy algorithm to its non-adaptive counterpart. Our first result shows that, for submodular influence maximization, the adaptive greedy algorithm can perform up to a (1-1/e)-fraction worse than the non-adaptive greedy algorithm, and that this ratio is tight. More specifically, on one side we provide examples where the performance of the adaptive greedy algorithm is only a (1-1/e) fraction of the performance of the non-adaptive greedy algorithm in four settings: for both feedback models and both the independent cascade model and the linear threshold model. On the other side, we prove that in any submodular cascade, the adaptive greedy algorithm always outputs a (1-1/e)-approximation to the expected number of adoptions in the optimal non-adaptive seed choice. Our second result shows that, for the general submodular cascade model with full-adoption feedback, the adaptive greedy algorithm can outperform the non-adaptive greedy algorithm by an unbounded factor. Finally, we propose a risk-free variant of the adaptive greedy algorithm that always performs no worse than the non-adaptive greedy algorithm.

【Keywords】:

73. DeepVar: An End-to-End Deep Learning Approach for Genomic Variant Recognition in Biomedical Literature.

Paper Link】 【Pages】:598-605

【Authors】: Chaoran Cheng ; Fei Tan ; Zhi Wei

【Abstract】: We consider the problem of Named Entity Recognition (NER) on biomedical scientific literature, and more specifically the genomic variants recognition in this work. Significant success has been achieved for NER on canonical tasks in recent years where large data sets are generally available. However, it remains a challenging problem on many domain-specific areas, especially the domains where only small gold annotations can be obtained. In addition, genomic variant entities exhibit diverse linguistic heterogeneity, differing much from those that have been characterized in existing canonical NER tasks. The state-of-the-art machine learning approaches heavily rely on arduous feature engineering to characterize those unique patterns. In this work, we present the first successful end-to-end deep learning approach to bridge the gap between generic NER algorithms and low-resource applications through genomic variants recognition. Our proposed model can result in promising performance without any hand-crafted features or post-processing rules. Our extensive experiments and results may shed light on other similar low-resource NER applications.

【Keywords】:

74. Learning the Graphical Structure of Electronic Health Records with Graph Convolutional Transformer.

Paper Link】 【Pages】:606-613

【Authors】: Edward Choi ; Zhen Xu ; Yujia Li ; Michael Dusenberry ; Gerardo Flores ; Emily Xue ; Andrew M. Dai

【Abstract】: Effective modeling of electronic health records (EHR) is rapidly becoming an important topic in both academia and industry. A recent study showed that using the graphical structure underlying EHR data (e.g. relationship between diagnoses and treatments) improves the performance of prediction tasks such as heart failure prediction. However, EHR data do not always contain complete structure information. Moreover, when it comes to claims data, structure information is completely unavailable to begin with. Under such circumstances, can we still do better than just treating EHR data as a flat-structured bag-of-features? In this paper, we study the possibility of jointly learning the hidden structure of EHR while performing supervised prediction tasks on EHR data. Specifically, we discuss that Transformer is a suitable basis model to learn the hidden EHR structure, and propose Graph Convolutional Transformer, which uses data statistics to guide the structure learning process. The proposed model consistently outperformed previous approaches empirically, on both synthetic data and publicly available EHR data, for various prediction tasks such as graph reconstruction and readmission prediction, indicating that it can serve as an effective general-purpose representation learning algorithm for EHR data.

【Keywords】:

75. CONAN: Complementary Pattern Augmentation for Rare Disease Detection.

Paper Link】 【Pages】:614-621

【Authors】: Limeng Cui ; Siddharth Biswal ; Lucas M. Glass ; Greg Lever ; Jimeng Sun ; Cao Xiao

【Abstract】: Rare diseases affect hundreds of millions of people worldwide but are hard to detect since they have extremely low prevalence rates (varying from 1/1,000 to 1/200,000 patients) and are massively underdiagnosed. How do we reliably detect rare diseases with such low prevalence rates? How to further leverage patients with possibly uncertain diagnosis to improve detection? In this paper, we propose a Complementary pattern Augmentation (CONAN) framework for rare disease detection. CONAN combines ideas from both adversarial training and max-margin classification. It first learns self-attentive and hierarchical embedding for patient pattern characterization. Then, we develop a complementary generative adversarial networks (GAN) model to generate candidate positive and negative samples from the uncertain patients by encouraging a max-margin between classes. In addition, CONAN has a disease detector that serves as the discriminator during the adversarial training for identifying rare diseases. We evaluated CONAN on two disease detection tasks. For low prevalence inflammatory bowel disease (IBD) detection, CONAN achieved .96 precision recall area under the curve (PR-AUC) and 50.1% relative improvement over the best baseline. For rare disease idiopathic pulmonary fibrosis (IPF) detection, CONAN achieves .22 PR-AUC with 41.3% relative improvement over the best baseline.

【Keywords】:

76. Differentially Private and Fair Classification via Calibrated Functional Mechanism.

Paper Link】 【Pages】:622-629

【Authors】: Jiahao Ding ; Xinyue Zhang ; Xiaohuan Li ; Junyi Wang ; Rong Yu ; Miao Pan

【Abstract】: Machine learning is increasingly becoming a powerful tool to make decisions in a wide variety of applications, such as medical diagnosis and autonomous driving. Privacy concerns related to the training data and unfair behaviors of some decisions with regard to certain attributes (e.g., sex, race) are becoming more critical. Thus, constructing a fair machine learning model while simultaneously providing privacy protection becomes a challenging problem. In this paper, we focus on the design of classification model with fairness and differential privacy guarantees by jointly combining functional mechanism and decision boundary fairness. In order to enforce ϵ-differential privacy and fairness, we leverage the functional mechanism to add different amounts of Laplace noise regarding different attributes to the polynomial coefficients of the objective function in consideration of fairness constraint. We further propose an utility-enhancement scheme, called relaxed functional mechanism by adding Gaussian noise instead of Laplace noise, hence achieving (ϵ, δ)-differential privacy. Based on the relaxed functional mechanism, we can design (ϵ, δ)-differentially private and fair classification model. Moreover, our theoretical analysis and empirical results demonstrate that our two approaches achieve both fairness and differential privacy while preserving good utility and outperform the state-of-the-art algorithms.

【Keywords】:

77. Predicting AC Optimal Power Flows: Combining Deep Learning and Lagrangian Dual Methods.

Paper Link】 【Pages】:630-637

【Authors】: Ferdinando Fioretto ; Terrence W. K. Mak ; Pascal Van Hentenryck

【Abstract】: The Optimal Power Flow (OPF) problem is a fundamental building block for the optimization of electrical power systems. It is nonlinear and nonconvex and computes the generator setpoints for power and voltage, given a set of load demands. It is often solved repeatedly under various conditions, either in real-time or in large-scale studies. This need is further exacerbated by the increasing stochasticity of power systems due to renewable energy sources in front and behind the meter. To address these challenges, this paper presents a deep learning approach to the OPF. The learning model exploits the information available in the similar states of the system (which is commonly available in practical applications), as well as a dual Lagrangian method to satisfy the physical and engineering constraints present in the OPF. The proposed model is evaluated on a large collection of realistic medium-sized power systems. The experimental results show that its predictions are highly accurate with average errors as low as 0.2%. Additionally, the proposed approach is shown to improve the accuracy of the widely adopted linear DC approximation by at least two orders of magnitude.

【Keywords】:

78. CORE: Automatic Molecule Optimization Using Copy & Refine Strategy.

Paper Link】 【Pages】:638-645

【Authors】: Tianfan Fu ; Cao Xiao ; Jimeng Sun

【Abstract】: Molecule optimization is about generating molecule Y with more desirable properties based on an input molecule X. The state-of-the-art approaches partition the molecules into a large set of substructures S and grow the new molecule structure by iteratively predicting which substructure from S to add. However, since the set of available substructures S is large, such an iterative prediction task is often inaccurate especially for substructures that are infrequent in the training data. To address this challenge, we propose a new generating strategy called “Copy&Refine” (CORE), where at each step the generator first decides whether to copy an existing substructure from input X or to generate a new substructure, then the most promising substructure will be added to the new molecule. Combining together with scaffolding tree generation and adversarial training, CORE can significantly improve several latest molecule optimization methods in various measures including drug likeness (QED), dopamine receptor (DRD2) and penalized LogP. We tested CORE and baselines using the ZINC database and CORE obtained up to 11% and 21% relatively improvement over the baselines on success rate on the complete test set and the subset with infrequent substructures, respectively.

【Keywords】:

79. GAN-Based Unpaired Chinese Character Image Translation via Skeleton Transformation and Stroke Rendering.

Paper Link】 【Pages】:646-653

【Authors】: Yiming Gao ; Jiangqin Wu

【Abstract】: The automatic style translation of Chinese characters (CH-Char) is a challenging problem. Different from English or general artistic style transfer, Chinese characters contain a large number of glyphs with the complicated content and characteristic style. Early methods on CH-Char synthesis are inefficient and require manual intervention. Recently some GAN-based methods are proposed for font generation. The supervised GAN-based methods require numerous image pairs, which is difficult for many chirography styles. In addition, unsupervised methods often cause the blurred and incorrect strokes. Therefore, in this work, we propose a three-stage Generative Adversarial Network (GAN) architecture for multi-chirography image translation, which is divided into skeleton extraction, skeleton transformation and stroke rendering with unpaired training data. Specifically, we first propose a fast skeleton extraction method (ENet). Secondly, we utilize the extracted skeleton and the original image to train a GAN model, RNet (a stroke rendering network), to learn how to render the skeleton with stroke details in target style. Finally, the pre-trained model RNet is employed to assist another GAN model, TNet (a skeleton transformation network), to learn to transform the skeleton structure on the unlabeled skeleton set. We demonstrate the validity of our method on two chirography datasets we established.

【Keywords】:

80. Predictive Student Modeling in Educational Games with Multi-Task Learning.

Paper Link】 【Pages】:654-661

【Authors】: Michael Geden ; Andrew Emerson ; Jonathan P. Rowe ; Roger Azevedo ; James C. Lester

【Abstract】: Modeling student knowledge is critical in adaptive learning environments. Predictive student modeling enables formative assessment of student knowledge and skills, and it drives personalized support to create learning experiences that are both effective and engaging. Traditional approaches to predictive student modeling utilize features extracted from students’ interaction trace data to predict student test performance, aggregating student test performance as a single output label. We reformulate predictive student modeling as a multi-task learning problem, modeling questions from student test data as distinct “tasks.” We demonstrate the effectiveness of this approach by utilizing student data from a series of laboratory-based and classroom-based studies conducted with a game-based learning environment for microbiology education, Crystal Island. Using sequential representations of student gameplay, results show that multi-task stacked LSTMs with residual connections significantly outperform baseline models that do not use the multi-task formulation. Additionally, the accuracy of predictive student models is improved as the number of tasks increases. These findings have significant implications for the design and development of predictive student models in adaptive learning environments.

【Keywords】:

81. Enhancing Personalized Trip Recommendation with Attractive Routes.

Paper Link】 【Pages】:662-669

【Authors】: Jiqing Gu ; Chao Song ; Wenjun Jiang ; Xiaomin Wang ; Ming Liu

【Abstract】: Personalized trip recommendation tries to recommend a sequence of point of interests (POIs) for a user. Most of existing studies search POIs only according to the popularity of POIs themselves. In fact, the routes among the POIs also have attractions to visitors, and some of these routes have high popularity. We term this kind of route as Attractive Route (AR), which brings extra user experience. In this paper, we study the attractive routes to improve personalized trip recommendation. To deal with the challenges of discovery and evaluation of ARs, we propose a personalized Trip Recommender with POIs and Attractive Route (TRAR). It discovers the attractive routes based on the popularity and the Gini coefficient of POIs, then it utilizes a gravity model in a category space to estimate the rating scores and preferences of the attractive routes. Based on that, TRAR recommends a trip with ARs to maximize user experience and leverage the tradeoff between the time cost and the user experience. The experimental results show the superiority of TRAR compared with other state-of-the-art methods.

【Keywords】:

82. Graduate Employment Prediction with Bias.

Paper Link】 【Pages】:670-677

【Authors】: Teng Guo ; Feng Xia ; Shihao Zhen ; Xiaomei Bai ; Dongyu Zhang ; Zitao Liu ; Jiliang Tang

【Abstract】: The failure of landing a job for college students could cause serious social consequences such as drunkenness and suicide. In addition to academic performance, unconscious biases can become one key obstacle for hunting jobs for graduating students. Thus, it is necessary to understand these unconscious biases so that we can help these students at an early stage with more personalized intervention. In this paper, we develop a framework, i.e., MAYA (Multi-mAjor emploYment stAtus) to predict students' employment status while considering biases. The framework consists of four major components. Firstly, we solve the heterogeneity of student courses by embedding academic performance into a unified space. Then, we apply a generative adversarial network (GAN) to overcome the class imbalance problem. Thirdly, we adopt Long Short-Term Memory (LSTM) with a novel dropout mechanism to comprehensively capture sequential information among semesters. Finally, we design a bias-based regularization to capture the job market biases. We conduct extensive experiments on a large-scale educational dataset and the results demonstrate the effectiveness of our prediction framework.

【Keywords】:

83. Multi-Scale Anomaly Detection on Attributed Networks.

Paper Link】 【Pages】:678-685

【Authors】: Leonardo Gutiérrez-Gómez ; Alexandre Bovet ; Jean-Charles Delvenne

【Abstract】: Many social and economic systems can be represented as attributed networks encoding the relations between entities who are themselves described by different node attributes. Finding anomalies in these systems is crucial for detecting abuses such as credit card frauds, web spams or network intrusions. Intuitively, anomalous nodes are defined as nodes whose attributes differ starkly from the attributes of a certain set of nodes of reference, called the context of the anomaly. While some methods have proposed to spot anomalies locally, globally or within a community context, the problem remain challenging due to the multi-scale composition of real networks and the heterogeneity of node metadata. Here, we propose a principled way to uncover outlier nodes simultaneously with the context with respect to which they are anomalous, at all relevant scales of the network. We characterize anomalous nodes in terms of the concentration retained for each node after smoothing specific signals localized on the vertices of the graph. Besides, we introduce a graph signal processing formulation of the Markov stability framework used in community detection, in order to find the context of anomalies. The performance of our method is assessed on synthetic and real-world attributed networks and shows superior results concerning state of the art algorithms. Finally, we show the scalability of our approach in large networks employing Chebychev polynomial approximations.

【Keywords】:

84. Accurate Structured-Text Spotting for Arithmetical Exercise Correction.

Paper Link】 【Pages】:686-693

【Authors】: Yiqing Hu ; Yan Zheng ; Hao Liu ; Deqiang Jiang ; Yinsong Liu ; Bo Ren

【Abstract】: Correcting arithmetical exercise is a labor intensive and time consuming task for primary school teachers all the time. To reduce their burdens, we propose Arithmetical Exercise Checker (AEC), which is the first system that automatically evaluates all arithmetical expressions (AEs) on exercise images. The major challenge is that AE is formed by printed and handwritten texts with particular arithmetical patterns (e.g., multi-line, fraction). Despite being part of AE, handwritten texts usually lead to zigzag boundaries and tangled rows. What's worse, AE may be arithmetical incorrect, which makes the contextual information less valuable for recognition. To tackle these problems, we introduce integrated detection, recognition and evaluation branches by leveraging AE's intrinsic features, namely 1) boundary indistinctive, 2) locally relevant patterns and 3) globally irrelevant symbols. Experimental results demonstrate that AEC yields a 93.72% correction accuracy on 40 kinds of mainstream primary arithmetical exercises. So far, the online service of AEC processes 75, 000 arbitrary exercises on average per day, and already reduced the burden of over 1, 000, 000 users. AEC shows the benefits for implementing an vision-based system as a way to aid teachers in reducing reduplicative tasks.

【Keywords】:

85. Pairwise Learning with Differential Privacy Guarantees.

Paper Link】 【Pages】:694-701

【Authors】: Mengdi Huai ; Di Wang ; Chenglin Miao ; Jinhui Xu ; Aidong Zhang

【Abstract】: Pairwise learning has received much attention recently as it is more capable of modeling the relative relationship between pairs of samples. Many machine learning tasks can be categorized as pairwise learning, such as AUC maximization and metric learning. Existing techniques for pairwise learning all fail to take into consideration a critical issue in their design, i.e., the protection of sensitive information in the training set. Models learned by such algorithms can implicitly memorize the details of sensitive information, which offers opportunity for malicious parties to infer it from the learned models. To address this challenging issue, in this paper, we propose several differentially private pairwise learning algorithms for both online and offline settings. Specifically, for the online setting, we first introduce a differentially private algorithm (called OnPairStrC) for strongly convex loss functions. Then, we extend this algorithm to general convex loss functions and give another differentially private algorithm (called OnPairC). For the offline setting, we also present two differentially private algorithms (called OffPairStrC and OffPairC) for strongly and general convex loss functions, respectively. These proposed algorithms can not only learn the model effectively from the data but also provide strong privacy protection guarantee for sensitive information in the training set. Extensive experiments on real-world datasets are conducted to evaluate the proposed algorithms and the experimental results support our theoretical analysis.

【Keywords】:

86. CASTER: Predicting Drug Interactions with Chemical Substructure Representation.

Paper Link】 【Pages】:702-709

【Authors】: Kexin Huang ; Cao Xiao ; Trong Nghia Hoang ; Lucas Glass ; Jimeng Sun

【Abstract】: Adverse drug-drug interactions (DDIs) remain a leading cause of morbidity and mortality. Identifying potential DDIs during the drug design process is critical for patients and society. Although several computational models have been proposed for DDI prediction, there are still limitations: (1) specialized design of drug representation for DDI predictions is lacking; (2) predictions are based on limited labelled data and do not generalize well to unseen drugs or DDIs; and (3) models are characterized by a large number of parameters, thus are hard to interpret. In this work, we develop a ChemicAl SubstrucTurE Representation (CASTER) framework that predicts DDIs given chemical structures of drugs. CASTER aims to mitigate these limitations via (1) a sequential pattern mining module rooted in the DDI mechanism to efficiently characterize functional sub-structures of drugs; (2) an auto-encoding module that leverages both labelled and unlabelled chemical structure data to improve predictive accuracy and generalizability; and (3) a dictionary learning module that explains the prediction via a small set of coefficients which measure the relevance of each input sub-structures to the DDI outcome. We evaluated CASTER on two real-world DDI datasets and showed that it performed better than state-of-the-art baselines and provided interpretable predictions.

【Keywords】:

87. RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning.

Paper Link】 【Pages】:710-718

【Authors】: Nan Jiang ; Sheng Jin ; Zhiyao Duan ; Changshui Zhang

【Abstract】: This paper presents a deep reinforcement learning algorithm for online accompaniment generation, with potential for real-time interactive human-machine duet improvisation. Different from offline music generation and harmonization, online music accompaniment requires the algorithm to respond to human input and generate the machine counterpart in a sequential order. We cast this as a reinforcement learning problem, where the generation agent learns a policy to generate a musical note (action) based on previously generated context (state). The key of this algorithm is the well-functioning reward model. Instead of defining it using music composition rules, we learn this model from monophonic and polyphonic training data. This model considers the compatibility of the machine-generated note with both the machine-generated context and the human-generated context. Experiments show that this algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part. Subjective evaluations on preferences show that the proposed algorithm generates music pieces of higher quality than the baseline method.

【Keywords】:

88. A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction.

Paper Link】 【Pages】:719-726

【Authors】: Ziqi Ke ; Haris Vikalo

【Abstract】: Reconstructing components of a genomic mixture from data obtained by means of DNA sequencing is a challenging problem encountered in a variety of applications including single individual haplotyping and studies of viral communities. High-throughput DNA sequencing platforms oversample mixture components to provide massive amounts of reads whose relative positions can be determined by mapping the reads to a known reference genome; assembly of the components, however, requires discovery of the reads' origin – an NP-hard problem that the existing methods struggle to solve with the required level of accuracy. In this paper, we present a learning framework based on a graph auto-encoder designed to exploit structural properties of sequencing data. The algorithm is a neural network which essentially trains to ignore sequencing errors and infers the posterior probabilities of the origin of sequencing reads. Mixture components are then reconstructed by finding consensus of the reads determined to originate from the same genomic component. Results on realistic synthetic as well as experimental data demonstrate that the proposed framework reliably assembles haplotypes and reconstructs viral communities, often significantly outperforming state-of-the-art techniques. Source codes, datasets and supplementary document are available at https://github.com/WuLoli/GAEseq.

【Keywords】:

89. Generating Realistic Stock Market Order Streams.

Paper Link】 【Pages】:727-734

【Authors】: Junyi Li ; Xintong Wang ; Yaoyang Lin ; Arunesh Sinha ; Michael P. Wellman

【Abstract】: We propose an approach to generate realistic and high-fidelity stock market data based on generative adversarial networks (GANs). Our Stock-GAN model employs a conditional Wasserstein GAN to capture history dependence of orders. The generator design includes specially crafted aspects including components that approximate the market's auction mechanism, augmenting the order history with order-book constructions to improve the generation task. We perform an ablation study to verify the usefulness of aspects of our network structure. We provide a mathematical characterization of distribution learned by the generator. We also propose statistics to measure the quality of generated orders. We test our approach with synthetic and actual market data, compare to many baseline generative models, and find the generated data to be close to real data.

【Keywords】:

90. SynSig2Vec: Learning Representations from Synthetic Dynamic Signatures for Real-World Verification.

Paper Link】 【Pages】:735-742

【Authors】: Songxuan Lai ; Lianwen Jin ; Luojun Lin ; Yecheng Zhu ; Huiyun Mao

【Abstract】: An open research problem in automatic signature verification is the skilled forgery attacks. However, the skilled forgeries are very difficult to acquire for representation learning. To tackle this issue, this paper proposes to learn dynamic signature representations through ranking synthesized signatures. First, a neuromotor inspired signature synthesis method is proposed to synthesize signatures with different distortion levels for any template signature. Then, given the templates, we construct a lightweight one-dimensional convolutional network to learn to rank the synthesized samples, and directly optimize the average precision of the ranking to exploit relative and fine-grained signature similarities. Finally, after training, fixed-length representations can be extracted from dynamic signatures of variable lengths for verification. One highlight of our method is that it requires neither skilled nor random forgeries for training, yet it surpasses the state-of-the-art by a large margin on two public benchmarks.

【Keywords】:

91. DeepAlerts: Deep Learning Based Multi-Horizon Alerts for Clinical Deterioration on Oncology Hospital Wards.

Paper Link】 【Pages】:743-750

【Authors】: Dingwen Li ; Patrick G. Lyons ; Chenyang Lu ; Marin Kollef

【Abstract】: Machine learning and data mining techniques are increasingly being applied to electronic health record (EHR) data to discover underlying patterns and make predictions for clinical use. For instance, these data may be evaluated to predict clinical deterioration events such as cardiopulmonary arrest or escalation of care to the intensive care unit (ICU). In clinical practice, early warning systems with multiple time horizons could indicate different levels of urgency, allowing clinicians to make decisions regarding triage, testing, and interventions for patients at risk of poor outcomes. These different horizon alerts are related and have intrinsic dependencies, which elicit multi-task learning. In this paper, we investigate approaches to properly train deep multi-task models for predicting clinical deterioration events via generating multi-horizon alerts for hospitalized patients outside the ICU, with particular application to oncology patients. Prior knowledge is used as a regularization to exploit the positive effects from the task relatedness. Simultaneously, we propose task-specific loss balancing to reduce the negative effects when optimizing the joint loss function of deep multi-task models. In addition, we demonstrate the effectiveness of the feature-generating techniques from prediction outcome interpretation. To evaluate the model performance of predicting multi-horizon deterioration alerts in a real world scenario, we apply our approaches to the EHR data from 20,700 hospitalizations of adult oncology patients. These patients' baseline high-risk status provides a unique opportunity: the application of an accurate model to an enriched population could produce improved positive predictive value and reduce false positive alerts. With our dataset, the model applying all proposed learning techniques achieves the best performance compared with common models previously developed for clinical deterioration warning.

【Keywords】:

92. Region Focus Network for Joint Optic Disc and Cup Segmentation.

Paper Link】 【Pages】:751-758

【Authors】: Ge Li ; Changsheng Li ; Chan Zeng ; Peng Gao ; Guotong Xie

【Abstract】: Glaucoma is one of the three leading causes of blindness in the world and is predicted to affect around 80 million people by 2020. The optic cup (OC) to optic disc (OD) ratio (CDR) in fundus images plays a pivotal role in the screening and diagnosis of glaucoma. Existing methods usually crop the optic disc region first, and subsequently perform segmentation in this region. However, these approaches come up with high complexities due to the separate operations. To remedy this issue, we propose a Region Focus Network (RF-Net) that innovatively integrates detection and multi-class segmentation into a unified architecture for end-to-end joint optic disc and cup segmentation with global optimization. The key idea of our method is designing a novel multi-class mask branch which generates a high-quality segmentation in the detected region for both disc and cup. To bridge the connection between the backbone and multi-class mask branch, a Fusion Feature Pooling (FFP) structure is presented to extract features from each level of the pyramid network and fuse them into a final feature representation for segmentation. Extensive experimental results on the REFUGE-2018 challenge dataset and the Drishti-GS dataset show that the proposed method achieves the best performance, compared with competitive approaches reported in the literature and the official leaderboard. Our code will be released soon.

【Keywords】:

93. Pose-Assisted Multi-Camera Collaboration for Active Object Tracking.

Paper Link】 【Pages】:759-766

【Authors】: Jing Li ; Jing Xu ; Fangwei Zhong ; Xiangyu Kong ; Yu Qiao ; Yizhou Wang

【Abstract】: Active Object Tracking (AOT) is crucial to many vision-based applications, e.g., mobile robot, intelligent surveillance. However, there are a number of challenges when deploying active tracking in complex scenarios, e.g., target is frequently occluded by obstacles. In this paper, we extend the single-camera AOT to a multi-camera setting, where cameras tracking a target in a collaborative fashion. To achieve effective collaboration among cameras, we propose a novel Pose-Assisted Multi-Camera Collaboration System, which enables a camera to cooperate with the others by sharing camera poses for active object tracking. In the system, each camera is equipped with two controllers and a switcher: The vision-based controller tracks targets based on observed images. The pose-based controller moves the camera in accordance to the poses of the other cameras. At each step, the switcher decides which action to take from the two controllers according to the visibility of the target. The experimental results demonstrate that our system outperforms all the baselines and is capable of generalizing to unseen environments. The code and demo videos are available on our website https://sites.google.com/view/pose-assisted-collaboration.

【Keywords】:

94. Robust Low-Rank Discovery of Data-Driven Partial Differential Equations.

Paper Link】 【Pages】:767-774

【Authors】: Jun Li ; Gan Sun ; Guoshuai Zhao ; Li-Wei H. Lehman

【Abstract】: Partial differential equations (PDEs) are essential foundations to model dynamic processes in natural sciences. Discovering the underlying PDEs of complex data collected from real world is key to understanding the dynamic processes of natural laws or behaviors. However, both the collected data and their partial derivatives are often corrupted by noise, especially from sparse outlying entries, due to measurement/process noise in the real-world applications. Our work is motivated by the observation that the underlying data modeled by PDEs are in fact often low rank. We thus develop a robust low-rank discovery framework to recover both the low-rank data and the sparse outlying entries by integrating double low-rank and sparse recoveries with a (group) sparse regression method, which is implemented as a minimization problem using mixed nuclear norms with ℓ1 and ℓ0 norms. We propose a low-rank sequential (grouped) threshold ridge regression algorithm to solve the minimization problem. Results from several experiments on seven canonical models (i.e., four PDEs and three parametric PDEs) verify that our framework outperforms the state-of-art sparse and group sparse regression methods. Code is available at https://github.com/junli2019/Robust-Discovery-of-PDEs

【Keywords】:

95. Towards Cross-Modality Medical Image Segmentation with Online Mutual Knowledge Distillation.

Paper Link】 【Pages】:775-783

【Authors】: Kang Li ; Lequan Yu ; Shujun Wang ; Pheng-Ann Heng

【Abstract】: The success of deep convolutional neural networks is partially attributed to the massive amount of annotated training data. However, in practice, medical data annotations are usually expensive and time-consuming to be obtained. Considering multi-modality data with the same anatomic structures are widely available in clinic routine, in this paper, we aim to exploit the prior knowledge (e.g., shape priors) learned from one modality (aka., assistant modality) to improve the segmentation performance on another modality (aka., target modality) to make up annotation scarcity. To alleviate the learning difficulties caused by modality-specific appearance discrepancy, we first present an Image Alignment Module (IAM) to narrow the appearance gap between assistant and target modality data. We then propose a novel Mutual Knowledge Distillation (MKD) scheme to thoroughly exploit the modality-shared knowledge to facilitate the target-modality segmentation. To be specific, we formulate our framework as an integration of two individual segmentors. Each segmentor not only explicitly extracts one modality knowledge from corresponding annotations, but also implicitly explores another modality knowledge from its counterpart in mutual-guided manner. The ensemble of two segmentors would further integrate the knowledge from both modalities and generate reliable segmentation results on target modality. Experimental results on the public multi-class cardiac segmentation data, i.e., MM-WHS 2017, show that our method achieves large improvements on CT segmentation by utilizing additional MRI data and outperforms other state-of-the-art multi-modality learning methods.

【Keywords】:

96. Privacy-Preserving Gradient Boosting Decision Trees.

Paper Link】 【Pages】:784-791

【Authors】: Qinbin Li ; Zhaomin Wu ; Zeyi Wen ; Bingsheng He

【Abstract】: The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be further reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.

【Keywords】:

97. MRI Reconstruction with Interpretable Pixel-Wise Operations Using Reinforcement Learning.

Paper Link】 【Pages】:792-799

【Authors】: Wentian Li ; Xidong Feng ; Haotian An ; Xiang Yao Ng ; Yu-Jin Zhang

【Abstract】: Compressed sensing magnetic resonance imaging (CS-MRI) is a technique aimed at accelerating the data acquisition of MRI. While down-sampling in k-space proportionally reduces the data acquisition time, it results in images corrupted by aliasing artifacts and blur. To reconstruct images from the down-sampled k-space, recent deep-learning based methods have shown better performance compared with classical optimization-based CS-MRI methods. However, they usually use deep neural networks as a black-box, which directly maps the corrupted images to the target images from fully-sampled k-space data. This lack of transparency may impede practical usage of such methods. In this work, we propose a deep reinforcement learning based method to reconstruct the corrupted images with meaningful pixel-wise operations (e.g. edge enhancing filters), so that the reconstruction process is transparent to users. Specifically, MRI reconstruction is formulated as Markov Decision Process with discrete actions and continuous action parameters. We conduct experiments on MICCAI dataset of brain tissues and fastMRI dataset of knee images. Our proposed method performs favorably against previous approaches. Our trained model learns to select pixel-wise operations that correspond to the anatomical structures in the MR images. This makes the reconstruction process more interpretable, which would be helpful for further medical analysis.

【Keywords】:

98. PSENet: Psoriasis Severity Evaluation Network.

Paper Link】 【Pages】:800-807

【Authors】: Yi Li ; Zhe Wu ; Shuang Zhao ; Xian Wu ; Yehong Kuang ; Yangtian Yan ; Shen Ge ; Kai Wang ; Wei Fan ; Xiang Chen ; Yong Wang

【Abstract】: Psoriasis is a chronic skin disease which affects hundreds of millions of people around the world. This disease cannot be fully cured and requires lifelong caring. If the deterioration of Psoriasis is not detected and properly treated in time, it could cause serious complications or even lead to a life threat. Therefore, a quantitative measurement that can track the Psoriasis severity is necessary. Currently, PASI (Psoriasis Area and Severity Index) is the most frequently used measurement in clinical practices. However, PASI has the following disadvantages: (1) Time consuming: calculating PASI usually takes more than 30 minutes which poses a heavy burden on dermatologists; and (2) Inconsistency: due to the complexity of PASI calculation, different or even the same dermatologist could give different scores for the same case. To overcome these drawbacks, we propose PSENet which applies deep neural networks to estimate Psoriasis severity based on skin lesion images. Different from typical deep learning frameworks for image processing, PSENet has the following characteristics: (1) PSENet introduces a score refine module which is able to capture the visual features of skin at both coarse and fine-grained granularities; (2) PSENet uses siamese structure in training and accepts pairwise inputs, which reduces the dependency on large amount of training data; and (3) PSENet can not only estimate the severity, but also locate the skin lesion regions from the input image. To train and evaluate PSENet, we work with professional dermatologists from a top hospital and spend years in building a golden dataset. The experimental results show that PSENet can achieve the mean absolute error of 2.21 and the accuracy of 77.87% in pair comparison, outperforming baseline methods. Overall, PSENet not only relieves dermatologists from the dull PASI calculation but also enables patients to track Psoriasis severity in a much more convenient manner.

【Keywords】:

99. Learning Geo-Contextual Embeddings for Commuting Flow Prediction.

Paper Link】 【Pages】:808-816

【Authors】: Zhicheng Liu ; Fabio Miranda ; Weiting Xiong ; Junyan Yang ; Qiao Wang ; Cláudio T. Silva

【Abstract】: Predicting commuting flows based on infrastructure and land-use information is critical for urban planning and public policy development. However, it is a challenging task given the complex patterns of commuting flows. Conventional models, such as gravity model, are mainly derived from physics principles and limited by their predictive power in real-world scenarios where many factors need to be considered. Meanwhile, most existing machine learning-based methods ignore the spatial correlations and fail to model the influence of nearby regions. To address these issues, we propose Geo-contextual Multitask Embedding Learner (GMEL), a model that captures the spatial correlations from geographic contextual information for commuting flow prediction. Specifically, we first construct a geo-adjacency network containing the geographic contextual information. Then, an attention mechanism is proposed based on the framework of graph attention network (GAT) to capture the spatial correlations and encode geographic contextual information to embedding space. Two separate GATs are used to model supply and demand characteristics. To enhance the effectiveness of the embedding representation, a multitask learning framework is used to introduce stronger restrictions, forcing the embeddings to encapsulate effective representation for flow prediction. Finally, a gradient boosting machine is trained based on the learned embeddings to predict commuting flows. We evaluate our model using real-world dataset from New York City and the experimental results demonstrate the effectiveness of our proposed method against the state of the art.

【Keywords】:

100. Learning Multi-Modal Biomarker Representations via Globally Aligned Longitudinal Enrichments.

Paper Link】 【Pages】:817-824

【Authors】: Lyujian Lu ; Saad Elbeleidy ; Lauren Zoe Baker ; Hua Wang

【Abstract】: Alzheimer's Disease (AD) is a chronic neurodegenerative disease that severely impacts patients' thinking, memory and behavior. To aid automatic AD diagnoses, many longitudinal learning models have been proposed to predict clinical outcomes and/or disease status, which, though, often fail to consider missing temporal phenotypic records of the patients that can convey valuable information of AD progressions. Another challenge in AD studies is how to integrate heterogeneous genotypic and phenotypic biomarkers to improve diagnosis prediction. To cope with these challenges, in this paper we propose a longitudinal multi-modal method to learn enriched genotypic and phenotypic biomarker representations in the format of fixed-length vectors that can simultaneously capture the baseline neuroimaging measurements of the entire dataset and progressive variations of the varied counts of follow-up measurements over time of every participant from different biomarker sources. The learned global and local projections are aligned by a soft constraint and the structured-sparsity norm is used to uncover the multi-modal structure of heterogeneous biomarker measurements. While the proposed objective is clearly motivated to characterize the progressive information of AD developments, it is a nonsmooth objective that is difficult to efficiently optimize in general. Thus, we derive an efficient iterative algorithm, whose convergence is rigorously guaranteed in mathematics. We have conducted extensive experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) data using one genotypic and two phenotypic biomarkers. Empirical results have demonstrated that the learned enriched biomarker representations are more effective in predicting the outcomes of various cognitive assessments. Moreover, our model has successfully identified disease-relevant biomarkers supported by existing medical findings that additionally warrant the correctness of our method from the clinical perspective.

【Keywords】:

101. AdaCare: Explainable Clinical Health Status Representation Learning via Scale-Adaptive Feature Extraction and Recalibration.

Paper Link】 【Pages】:825-832

【Authors】: Liantao Ma ; Junyi Gao ; Yasha Wang ; Chaohe Zhang ; Jiangtao Wang ; Wenjie Ruan ; Wen Tang ; Xin Gao ; Xinyu Ma

【Abstract】: Deep learning-based health status representation learning and clinical prediction have raised much research interest in recent years. Existing models have shown superior performance, but there are still several major issues that have not been fully taken into consideration. First, the historical variation pattern of the biomarker in diverse time scales plays a vital role in indicating the health status, but it has not been explicitly extracted by existing works. Second, key factors that strongly indicate the health risk are different among patients. It is still challenging to adaptively make use of the features for patients in diverse conditions. Third, using prediction models as the black box will limit the reliability in clinical practice. However, none of the existing works can provide satisfying interpretability and meanwhile achieve high prediction performance. In this work, we develop a general health status representation learning model, named AdaCare. It can capture the long and short-term variations of biomarkers as clinical features to depict the health status in multiple time scales. It also models the correlation between clinical features to enhance the ones which strongly indicate the health status and thus can maintain a state-of-the-art performance in terms of prediction accuracy while providing qualitative interpretability. We conduct a health risk prediction experiment on two real-world datasets. Experiment results indicate that AdaCare outperforms state-of-the-art approaches and provides effective interpretability, which is verifiable by clinical experts.

【Keywords】:

102. ConCare: Personalized Clinical Feature Embedding via Capturing the Healthcare Context.

Paper Link】 【Pages】:833-840

【Authors】: Liantao Ma ; Chaohe Zhang ; Yasha Wang ; Wenjie Ruan ; Jiangtao Wang ; Wen Tang ; Xinyu Ma ; Xin Gao ; Junyi Gao

【Abstract】: Predicting the patient's clinical outcome from the historical electronic medical records (EMR) is a fundamental research problem in medical informatics. Most deep learning-based solutions for EMR analysis concentrate on learning the clinical visit embedding and exploring the relations between visits. Although those works have shown superior performances in healthcare prediction, they fail to explore the personal characteristics during the clinical visits thoroughly. Moreover, existing works usually assume that the more recent record weights more in the prediction, but this assumption is not suitable for all conditions. In this paper, we propose ConCare to handle the irregular EMR data and extract feature interrelationship to perform individualized healthcare prediction. Our solution can embed the feature sequences separately by modeling the time-aware distribution. ConCare further improves the multi-head self-attention via the cross-head decorrelation, so that the inter-dependencies among dynamic features and static baseline information can be effectively captured to form the personal health context. Experimental results on two real-world EMR datasets demonstrate the effectiveness of ConCare. The medical findings extracted by ConCare are also empirically confirmed by human experts and medical literature.

【Keywords】:

Paper Link】 【Pages】:841-848

【Authors】: Farzan Masrour ; Tyler Wilson ; Heng Yan ; Pang-Ning Tan ; Abdol-Hossein Esfahanian

【Abstract】: Link prediction is an important task in online social networking as it can be used to infer new or previously unknown relationships of a network. However, due to the homophily principle, current algorithms are susceptible to promoting links that may lead to increase segregation of the network—an effect known as filter bubble. In this study, we examine the filter bubble problem from the perspective of algorithm fairness and introduce a dyadic-level fairness criterion based on network modularity measure. We show how the criterion can be utilized as a postprocessing step to generate more heterogeneous links in order to overcome the filter bubble problem. In addition, we also present a novel framework that combines adversarial network representation learning with supervised link prediction to alleviate the filter bubble problem. Experimental results conducted on several real-world datasets showed the effectiveness of the proposed methods compared to other baseline approaches, which include conventional link prediction and fairness-aware methods for i.i.d data.

【Keywords】:

104. Gait Recognition for Co-Existing Multiple People Using Millimeter Wave Sensing.

Paper Link】 【Pages】:849-856

【Authors】: Zhen Meng ; Song Fu ; Jie Yan ; Hongyuan Liang ; Anfu Zhou ; Shilin Zhu ; Huadong Ma ; Jianhua Liu ; Ning Yang

【Abstract】: Gait recognition, i.e., recognizing persons from their walking postures, has found versatile applications in security check, health monitoring, and novel human-computer interaction. The millimeter-wave (mmWave) based gait recognition represents the most recent advance. Compared with traditional camera-based solutions, mmWave based gait recognition bears unique advantages of being still effective under non-line-of-sight scenarios, such as in black, weak light, or blockage conditions. Moreover, they are able to accomplish person identification while preserving privacy. Currently, there are only few works in mmWave gait recognition, since no public data set is available. In this paper, we build a first-of-its-kind mmWave gait data set, in which we collect gait of 95 volunteers 'seen' from two mmWave radars in two different scenarios, which together lasts about 30 hours. Using the data set, we propose a novel deep-learning driven mmWave gait recognition method called mmGaitNet, and compare it with five state-of-the-art algorithms. We find that mmGaitNet is able to achieve 90% accuracy for single-person scenarios, 88% accuracy for five co-existing persons, while the existing methods achieve less than 66% accuracy for both scenarios.

【Keywords】:

105. Generalizable Resource Allocation in Stream Processing via Deep Reinforcement Learning.

Paper Link】 【Pages】:857-864

【Authors】: Xiang Ni ; Jing Li ; Mo Yu ; Wang Zhou ; Kun-Lung Wu

【Abstract】: This paper considers the problem of resource allocation in stream processing, where continuous data flows must be processed in real time in a large distributed system. To maximize system throughput, the resource allocation strategy that partitions the computation tasks of a stream processing graph onto computing devices must simultaneously balance workload distribution and minimize communication. Since this problem of graph partitioning is known to be NP-complete yet crucial to practical streaming systems, many heuristic-based algorithms have been developed to find reasonably good solutions. In this paper, we present a graph-aware encoder-decoder framework to learn a generalizable resource allocation strategy that can properly distribute computation tasks of stream processing graphs unobserved from training data. We, for the first time, propose to leverage graph embedding to learn the structural information of the stream processing graphs. Jointly trained with the graph-aware decoder using deep reinforcement learning, our approach can effectively find optimized solutions for unseen graphs. Our experiments show that the proposed model outperforms both METIS, a state-of-the-art graph partitioning algorithm, and an LSTM-based encoder-decoder model, in about 70% of the test cases.

【Keywords】:

106. ActiveThief: Model Extraction Using Active Learning and Unannotated Public Data.

Paper Link】 【Pages】:865-872

【Authors】: Soham Pal ; Yash Gupta ; Aditya Shukla ; Aditya Kanade ; Shirish K. Shevade ; Vinod Ganapathy

【Abstract】: Machine learning models are increasingly being deployed in practice. Machine Learning as a Service (MLaaS) providers expose such models to queries by third-party developers through application programming interfaces (APIs). Prior work has developed model extraction attacks, in which an attacker extracts an approximation of an MLaaS model by making black-box queries to it. We design ActiveThief – a model extraction framework for deep neural networks that makes use of active learning techniques and unannotated public datasets to perform model extraction. It does not expect strong domain knowledge or access to annotated data on the part of the attacker. We demonstrate that (1) it is possible to use ActiveThief to extract deep classifiers trained on a variety of datasets from image and text domains, while querying the model with as few as 10-30% of samples from public datasets, (2) the resulting model exhibits a higher transferability success rate of adversarial examples than prior work, and (3) the attack evades detection by the state-of-the-art model extraction detection method, PRADA.

【Keywords】:

107. Chemically Interpretable Graph Interaction Network for Prediction of Pharmacokinetic Properties of Drug-Like Molecules.

Paper Link】 【Pages】:873-880

【Authors】: Yashaswi Pathak ; Siddhartha Laghuvarapu ; Sarvesh Mehta ; U. Deva Priyakumar

【Abstract】: Solubility of drug molecules is related to pharmacokinetic properties such as absorption and distribution, which affects the amount of drug that is available in the body for its action. Computational or experimental evaluation of solvation free energies of drug-like molecules/solute that quantify solubilities is an arduous task and hence development of reliable computationally tractable models is sought after in drug discovery tasks in pharmaceutical industry. Here, we report a novel method based on graph neural network to predict solvation free energies. Previous studies considered only the solute for solvation free energy prediction and ignored the nature of the solvent, limiting their practical applicability. The proposed model is an end-to-end framework comprising three phases namely, message passing, interaction and prediction phases. In the first phase, message passing neural network was used to compute inter-atomic interaction within both solute and solvent molecules represented as molecular graphs. In the interaction phase, features from the preceding step is used to calculate a solute-solvent interaction map, since the solvation free energy depends on how (un)favorable the solute and solvent molecules interact with each other. The calculated interaction map that captures the solute-solvent interactions along with the features from the message passing phase is used to predict the solvation free energies in the final phase. The model predicts solvation free energies involving a large number of solvents with high accuracy. We also show that the interaction map captures the electronic and steric factors that govern the solubility of drug-like molecules and hence is chemically interpretable.

【Keywords】:

108. FuzzE: Fuzzy Fairness Evaluation of Offensive Language Classifiers on African-American English.

Paper Link】 【Pages】:881-889

【Authors】: Anthony Rios

【Abstract】: Hate speech and offensive language are rampant on social media. Machine learning has provided a way to moderate foul language at scale. However, much of the current research focuses on overall performance. Models may perform poorly on text written in a minority dialectal language. For instance, a hate speech classifier may produce more false positives on tweets written in African-American Vernacular English (AAVE). To measure these problems, we need text written in both AAVE and Standard American English (SAE). Unfortunately, it is challenging to curate data for all linguistic styles in a timely manner—especially when we are constrained to specific problems, social media platforms, or by limited resources. In this paper, we answer the question, “How can we evaluate the performance of classifiers across minority dialectal languages when they are not present within a particular dataset?” Specifically, we propose an automated fairness fuzzing tool called FuzzE to quantify the fairness of text classifiers applied to AAVE text using a dataset that only contains text written in SAE. Overall, we find that the fairness estimates returned by our technique moderately correlates with the use of real ground-truth AAVE text. Warning: Offensive language is displayed in this manuscript.

【Keywords】:

109. Learning to Generate Maps from Trajectories.

Paper Link】 【Pages】:890-897

【Authors】: Sijie Ruan ; Cheng Long ; Jie Bao ; Chunyang Li ; Zisheng Yu ; Ruiyuan Li ; Yuxuan Liang ; Tianfu He ; Yu Zheng

【Abstract】: Accurate and updated road network data is vital in many urban applications, such as car-sharing, and logistics. The traditional approach to identifying the road network, i.e., field survey, requires a significant amount of time and effort. With the wide usage of GPS embedded devices, a huge amount of trajectory data has been generated by different types of mobile objects, which provides a new opportunity to extract the underlying road network. However, the existing trajectory-based map recovery approaches require many empirical parameters and do not utilize the prior knowledge in existing maps, which over-simplifies or over-complicates the reconstructed road network. To this end, we propose a deep learning-based map generation framework, i.e., DeepMG, which learns the structure of the existing road network to overcome the noisy GPS positions. More specifically, DeepMG extracts features from trajectories in both spatial view and transition view and uses a convolutional deep neural network T2RNet to infer road centerlines. After that, a trajectory-based post-processing algorithm is proposed to refine the topological connectivity of the recovered map. Extensive experiments on two real-world trajectory datasets confirm that DeepMG significantly outperforms the state-of-the-art methods.

【Keywords】:

110. Spatial Classification with Limited Observations Based on Physics-Aware Structural Constraint.

Paper Link】 【Pages】:898-905

【Authors】: Arpan Man Sainju ; Wenchong He ; Zhe Jiang ; Da Yan

【Abstract】: Spatial classification with limited feature observations has been a challenging problem in machine learning. The problem exists in applications where only a subset of sensors are deployed at certain regions or partial responses are collected in field surveys. Existing research mostly focuses on addressing incomplete or missing data, e.g., data cleaning and imputation, classification models that allow for missing feature values, or modeling missing features as hidden variables and applying the EM algorithm. These methods, however, assume that incomplete feature observations only happen on a small subset of samples, and thus cannot solve problems where the vast majority of samples have missing feature observations. To address this issue, we propose a new approach that incorporates physics-aware structural constraints into the model representation. Our approach assumes that a spatial contextual feature is observed for all sample locations and establishes spatial structural constraint from the spatial contextual feature map. We design efficient algorithms for model parameter learning and class inference. Evaluations on real-world hydrological applications show that our approach significantly outperforms several baseline methods in classification accuracy, and the proposed solution is computationally efficient on a large data volume.

【Keywords】:

111. Effective Decoding in Graph Auto-Encoder Using Triadic Closure.

Paper Link】 【Pages】:906-913

【Authors】: Han Shi ; Haozheng Fan ; James T. Kwok

【Abstract】: The (variational) graph auto-encoder and its variants have been popularly used for representation learning on graph-structured data. While the encoder is often a powerful graph convolutional network, the decoder reconstructs the graph structure by only considering two nodes at a time, thus ignoring possible interactions among edges. On the other hand, structured prediction, which considers the whole graph simultaneously, is computationally expensive. In this paper, we utilize the well-known triadic closure property which is exhibited in many real-world networks. We propose the triad decoder, which considers and predicts the three edges involved in a local triad together. The triad decoder can be readily used in any graph-based auto-encoder. In particular, we incorporate this to the (variational) graph auto-encoder. Experiments on link prediction, node clustering and graph generation show that the use of triads leads to more accurate prediction, clustering and better preservation of the graph characteristics.

【Keywords】:

112. Spatial-Temporal Synchronous Graph Convolutional Networks: A New Framework for Spatial-Temporal Network Data Forecasting.

Paper Link】 【Pages】:914-921

【Authors】: Chao Song ; Youfang Lin ; Shengnan Guo ; Huaiyu Wan

【Abstract】: Spatial-temporal network data forecasting is of great importance in a huge amount of applications for traffic management and urban planning. However, the underlying complex spatial-temporal correlations and heterogeneities make this problem challenging. Existing methods usually use separate components to capture spatial and temporal correlations and ignore the heterogeneities in spatial-temporal data. In this paper, we propose a novel model, named Spatial-Temporal Synchronous Graph Convolutional Networks (STSGCN), for spatial-temporal network data forecasting. The model is able to effectively capture the complex localized spatial-temporal correlations through an elaborately designed spatial-temporal synchronous modeling mechanism. Meanwhile, multiple modules for different time periods are designed in the model to effectively capture the heterogeneities in localized spatial-temporal graphs. Extensive experiments are conducted on four real-world datasets, which demonstrates that our method achieves the state-of-the-art performance and consistently outperforms other baselines.

【Keywords】:

113. Continuous Multiagent Control Using Collective Behavior Entropy for Large-Scale Home Energy Management.

Paper Link】 【Pages】:922-929

【Authors】: Jianwen Sun ; Yan Zheng ; Jianye Hao ; Zhaopeng Meng ; Yang Liu

【Abstract】: With the increasing popularity of electric vehicles, distributed energy generation and storage facilities in smart grid systems, an efficient Demand-Side Management (DSM) is urgent for energy savings and peak loads reduction. Traditional DSM works focusing on optimizing the energy activities for a single household can not scale up to large-scale home energy management problems. Multi-agent Deep Reinforcement Learning (MA-DRL) shows a potential way to solve the problem of scalability, where modern homes interact together to reduce energy consumers consumption while striking a balance between energy cost and peak loads reduction. However, it is difficult to solve such an environment with the non-stationarity, and existing MA-DRL approaches cannot effectively give incentives for expected group behavior. In this paper, we propose a collective MA-DRL algorithm with continuous action space to provide fine-grained control on a large scale microgrid. To mitigate the non-stationarity of the microgrid environment, a novel predictive model is proposed to measure the collective market behavior. Besides, a collective behavior entropy is introduced to reduce the high peak loads incurred by the collective behaviors of all householders in the smart grid. Empirical results show that our approach significantly outperforms the state-of-the-art methods regarding power cost reduction and daily peak loads optimization.

【Keywords】:

114. DATA-GRU: Dual-Attention Time-Aware Gated Recurrent Unit for Irregular Multivariate Time Series.

Paper Link】 【Pages】:930-937

【Authors】: Qingxiong Tan ; Mang Ye ; Baoyao Yang ; Siqi Liu ; Andy Jinhua Ma ; Terry Cheuk-Fung Yip ; Grace Lai-Hung Wong ; PongChi Yuen

【Abstract】: Due to the discrepancy of diseases and symptoms, patients usually visit hospitals irregularly and different physiological variables are examined at each visit, producing large amounts of irregular multivariate time series (IMTS) data with missing values and varying intervals. Existing methods process IMTS into regular data so that standard machine learning models can be employed. However, time intervals are usually determined by the status of patients, while missing values are caused by changes in symptoms. Therefore, we propose a novel end-to-end Dual-Attention Time-Aware Gated Recurrent Unit (DATA-GRU) for IMTS to predict the mortality risk of patients. In particular, DATA-GRU is able to: 1) preserve the informative varying intervals by introducing a time-aware structure to directly adjust the influence of the previous status in coordination with the elapsed time, and 2) tackle missing values by proposing a novel dual-attention structure to jointly consider data-quality and medical-knowledge. A novel unreliability-aware attention mechanism is designed to handle the diversity in the reliability of different data, while a new symptom-aware attention mechanism is proposed to extract medical reasons from original clinical records. Extensive experimental results on two real-world datasets demonstrate that DATA-GRU can significantly outperform state-of-the-art methods and provide meaningful clinical interpretation.

【Keywords】:

Paper Link】 【Pages】:938-945

【Authors】: Binglin Tao ; Mingyu Xiao ; Jingyang Zhao

【Abstract】: Network survivability has drawn certain interest in network optimization. However, the demand for full protection of a network is usually too restrictive. To overcome the limitation of geographical environments and to save network resources, we turn to establish backup networks allowing a few common nodes. It comes out the problem of finding k link-disjoint paths between a given pair of source and sink in a network such that the number of common nodes shared by at least two paths is bounded by a constant and the total link weight of all paths is minimized under the above constraints. For the case k = 2, where we have only one backup path, several fast algorithms have been developed in the literature. For the case k > 2, little results are known. In this paper, we first establish the NP-hardness of the problem with general k. Motivated by the situation that each node in a network may have a capability of multicasting, we also study a restricted version with one more requirement that each node can be shared by at most two paths. For the restricted version, we build an ILP model and design a fast algorithm by using the techniques of augmenting paths and splitting nodes. Furthermore, experimental results on synthetic and real networks show that our algorithm is effective in practice.

【Keywords】:

116. Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning.

Paper Link】 【Pages】:946-953

【Authors】: Liang Tong ; Aron Laszka ; Chao Yan ; Ning Zhang ; Yevgeniy Vorobeychik

【Abstract】: Detection of malicious behavior is a fundamental problem in security. One of the major challenges in using detection systems in practice is in dealing with an overwhelming number of alerts that are triggered by normal behavior (the so-called false positives), obscuring alerts resulting from actual malicious activities. We introduce a novel approach for computing a policy for prioritizing alerts using adversarial reinforcement learning. Our approach assumes that the attacker knows the full state of the detection system and the defender's alert prioritization policy, and will dynamically choose an optimal attack. The first step of our approach is to capture the interaction between the defender and attacker in a game theoretic model. To tackle the computational complexity of solving this game to obtain a dynamic stochastic alert prioritization policy, we propose an adversarial reinforcement learning framework. In this framework, we use neural reinforcement learning to compute best response policies for both the defender and the adversary to an arbitrary stochastic policy of the other. We then use these in a double-oracle framework to obtain an approximate equilibrium of the game, which in turn yields a robust stochastic policy for the defender. We use case studies in network intrusion and fraud detection to demonstrate that our approach is effective in creating robust alert prioritization policies.1

【Keywords】:

117. Robust Adversarial Objects against Deep Learning Models.

Paper Link】 【Pages】:954-962

【Authors】: Tzungyu Tsai ; Kaichen Yang ; Tsung-Yi Ho ; Yier Jin

【Abstract】: Previous work has shown that Deep Neural Networks (DNNs), including those currently in use in many fields, are extremely vulnerable to maliciously crafted inputs, known as adversarial examples. Despite extensive and thorough research of adversarial examples in many areas, adversarial 3D data, such as point clouds, remain comparatively unexplored. The study of adversarial 3D data is crucial considering its impact in real-life, high-stakes scenarios including autonomous driving. In this paper, we propose a novel adversarial attack against PointNet++, a deep neural network that performs classification and segmentation tasks using features learned directly from raw 3D points. In comparison to existing works, our attack generates not only adversarial point clouds, but also robust adversarial objects that in turn generate adversarial point clouds when sampled both in simulation and after construction in real world. We also demonstrate that our objects can bypass existing defense mechanisms designed especially against adversarial 3D data.

【Keywords】:

118. OMuLeT: Online Multi-Lead Time Location Prediction for Hurricane Trajectory Forecasting.

Paper Link】 【Pages】:963-970

【Authors】: Ding Wang ; Boyang Liu ; Pang-Ning Tan ; Lifeng Luo

【Abstract】: Hurricanes are powerful tropical cyclones with sustained wind speeds ranging from at least 74 mph (for category 1 storms) to more than 157 mph (for category 5 storms). Accurate prediction of the storm tracks is essential for hurricane preparedness and mitigation of storm impacts. In this paper, we cast the hurricane trajectory forecasting task as an online multi-lead time location prediction problem and present a framework called OMuLeT to improve path prediction by combining the 6-hourly and 12-hourly forecasts generated from an ensemble of dynamical (physical) hurricane models. OMuLeT employs an online learning with restart strategy to incrementally update the weights of the ensemble model combination as new observation data become available. It can also handle the varying dynamical models available for predicting the trajectories of different hurricanes. Experimental results using the Atlantic and Eastern Pacific hurricane data showed that OMuLeT significantly outperforms various baseline methods, including the official forecasts produced by the U.S. National Hurricane Center (NHC), by more than 10% in terms of its 48-hour lead time forecasts.

【Keywords】:

119. Incorporating Expert-Based Investment Opinion Signals in Stock Prediction: A Deep Learning Framework.

Paper Link】 【Pages】:971-978

【Authors】: Heyuan Wang ; Tengjiao Wang ; Yi Li

【Abstract】: Investment messages published on social media platforms are highly valuable for stock prediction. Most previous work regards overall message sentiments as forecast indicators and relies on shallow features (bag-of-words, noun phrases, etc.) to determine the investment opinion signals. These methods neither capture the time-sensitive and target-aware characteristics of stock investment reviews, nor consider the impact of investor's reliability. In this study, we provide an in-depth analysis of public stock reviews and their application in stock movement prediction. Specifically, we propose a novel framework which includes the following three key components: time-sensitive and target-aware investment stance detection, expert-based dynamic stance aggregation, and stock movement prediction. We first introduce our stance detection model named MFN, which learns the representation of each review by integrating multi-view textual features and extended knowledge in financial domain to distill bullish/bearish investment opinions. Then we show how to identify the validity of each review, and enhance stock movement prediction by incorporating expert-based aggregated opinion signals. Experiments on real datasets show our framework can effectively improve the performance of both investment opinion mining and individual stock forecasting.

【Keywords】:

120. Graph-Driven Generative Models for Heterogeneous Multi-Task Learning.

Paper Link】 【Pages】:979-988

【Authors】: Wenlin Wang ; Hongteng Xu ; Zhe Gan ; Bai Li ; Guoyin Wang ; Liqun Chen ; Qian Yang ; Wenqi Wang ; Lawrence Carin

【Abstract】: We propose a novel graph-driven generative model, that unifies multiple heterogeneous learning tasks into the same framework. The proposed model is based on the fact that heterogeneous learning tasks, which correspond to different generative processes, often rely on data with a shared graph structure. Accordingly, our model combines a graph convolutional network (GCN) with multiple variational autoencoders, thus embedding the nodes of the graph (i.e., samples for the tasks) in a uniform manner, while specializing their organization and usage to different tasks. With a focus on healthcare applications (tasks), including clinical topic modeling, procedure recommendation and admission-type prediction, we demonstrate that our method successfully leverages information across different tasks, boosting performance in all tasks and outperforming existing state-of-the-art approaches.

【Keywords】:

121. Topic Enhanced Sentiment Spreading Model in Social Networks Considering User Interest.

Paper Link】 【Pages】:989-996

【Authors】: Xiaobao Wang ; Di Jin ; Katarzyna Musial ; Jianwu Dang

【Abstract】: Emotion is a complex emotional state, which can affect our physiology and psychology and lead to behavior changes. The spreading process of emotions in the text-based social networks is referred to as sentiment spreading. In this paper, we study an interesting problem of sentiment spreading in social networks. In particular, by employing a text-based social network (Twitter) , we try to unveil the correlation between users' sentimental statuses and topic distributions embedded in the tweets, then to automatically learn the influence strength between linked users. Furthermore, we introduce user interest to refine the influence strength. We develop a unified probabilistic framework to formalize the problem into a topic-enhanced sentiment spreading model. The model can predict users' sentimental statuses based on their historical emotional status, topic distributions in tweets and social structures. Experiments on the Twitter dataset show that the proposed model significantly outperforms several alternative methods in predicting users' sentimental status. We also discover an intriguing phenomenon that positive and negative sentiment is more relevant to user interest than neutral ones. Our method offers a new opportunity to understand the underlying mechanism of sentimental spreading in online social networks.

【Keywords】:

122. HDK: Toward High-Performance Deep-Learning-Based Kirchhoff Analysis.

Paper Link】 【Pages】:997-1004

【Authors】: Xinying Wang ; Olamide Timothy Tawose ; Feng Yan ; Dongfang Zhao

【Abstract】: The Kirchhoff law is one of the most widely used physical laws in many engineering principles, e.g., biomedical engineering, electrical engineering, and computer engineering. One challenge of applying the Kirchhoff law to real-world applications at scale lies in the high, if not prohibitive, computational cost to solve a large number of nonlinear equations. Despite recent advances in leveraging a convolutional neural network (CNN) to estimate the solutions of Kirchhoff equations, the low performance is still significantly hindering the broad adoption of CNN-based approaches. This paper proposes a high-performance deep-learning-based approach for Kirchhoff analysis, namely HDK. HDK employs two techniques to improve the performance: (i) early pruning of unqualified input candidates and (ii) parallelization of forward labelling. To retain high accuracy, HDK also applies various optimizations to the data such as randomized augmentation and dimension reduction. Collectively, the aforementioned techniques improve the analysis speed by 8× with accuracy as high as 99.6%.

【Keywords】:

123. Actor Critic Deep Reinforcement Learning for Neural Malware Control.

Paper Link】 【Pages】:1005-1012

【Authors】: Yu Wang ; Jack W. Stokes ; Mady Marinescu

【Abstract】: In addition to using signatures, antimalware products also detect malicious attacks by evaluating unknown files in an emulated environment, i.e. sandbox, prior to execution on a computer's native operating system. During emulation, a file cannot be scanned indefinitely, and antimalware engines often set the number of instructions to be executed based on a set of heuristics. These heuristics only make the decision of when to halt emulation using partial information leading to the execution of the file for either too many or too few instructions. Also this method is vulnerable if the attackers learn this set of heuristics. Recent research uses a deep reinforcement learning (DRL) model employing a Deep Q-Network (DQN) to learn when to halt the emulation of a file. In this paper, we propose a new DRL-based system which instead employs a modified actor critic (AC) framework for the emulation halting task. This AC model dynamically predicts the best time to halt the file's execution based on a sequence of system API calls. Compared to the earlier models, the new model is capable of handling adversarial attacks by simulating their behaviors using the critic model. The new AC model demonstrates much better performance than both the DQN model and antimalware engine's heuristics. In terms of execution speed (evaluated by the halting decision), the new model halts the execution of unknown files by up to 2.5% earlier than the DQN model and 93.6% earlier than the heuristics. For the task of detecting malicious files, the proposed AC model increases the true positive rate by 9.9% from 69.5% to 76.4% at a false positive rate of 1% compared to the DQN model, and by 83.4% from 41.2% to 76.4% at a false positive rate of 1% compared to a recently proposed LSTM model.

【Keywords】:

124. Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding.

Paper Link】 【Pages】:1013-1020

【Authors】: Zhecheng Wang ; Haoyuan Li ; Ram Rajagopal

【Abstract】: Understanding intrinsic patterns and predicting spatiotemporal characteristics of cities require a comprehensive representation of urban neighborhoods. Existing works relied on either inter- or intra-region connectivities to generate neighborhood representations but failed to fully utilize the informative yet heterogeneous data within neighborhoods. In this work, we propose Urban2Vec, an unsupervised multi-modal framework which incorporates both street view imagery and point-of-interest (POI) data to learn neighborhood embeddings. Specifically, we use a convolutional neural network to extract visual features from street view images while preserving geospatial similarity. Furthermore, we model each POI as a bag-of-words containing its category, rating, and review information. Analog to document embedding in natural language processing, we establish the semantic similarity between neighborhood (“document”) and the words from its surrounding POIs in the vector space. By jointly encoding visual, textual, and geospatial information into the neighborhood representation, Urban2Vec can achieve performances better than baseline models and comparable to fully-supervised methods in downstream prediction tasks. Extensive experiments on three U.S. metropolitan areas also demonstrate the model interpretability, generalization capability, and its value in neighborhood similarity analysis.

【Keywords】:

125. Hiding in Multilayer Networks.

Paper Link】 【Pages】:1021-1028

【Authors】: Marcin Waniek ; Tomasz P. Michalak ; Talal Rahwan

【Abstract】: Multilayer networks allow for modeling complex relationships, where individuals are embedded in multiple social networks at the same time. Given the ubiquity of such relationships, these networks have been increasingly gaining attention in the literature. This paper presents the first analysis of the robustness of centrality measures against strategic manipulation in multilayer networks. More specifically, we consider an “evader” who strategically chooses which connections to form in a multilayer network in order to obtain a low centrality-based ranking—thereby reducing the chance of being highlighted as a key figure in the network—while ensuring that she remains connected to a certain group of people. We prove that determining an optimal way to “hide” is NP-complete and hard to approximate for most centrality measures considered in our study. Moreover, we empirically evaluate a number of heuristics that the evader can use. Our results suggest that the centrality measures that are functions of the entire network topology are more robust to such a strategic evader than their counterparts which consider each layer separately.

【Keywords】:

126. A Deep Neural Network Model of Particle Thermal Radiation in Packed Bed.

Paper Link】 【Pages】:1029-1036

【Authors】: Hao Wu ; Shuang Hao

【Abstract】: Prediction of particle radiative heat transfer flux is an important task in the large discrete granular systems, such as pebble bed in power plants and industrial fluidized beds. For particle motion and packing, discrete element method (DEM) now is widely accepted as the excellent Lagrangian approach. For thermal radiation, traditional methods focus on calculating the obstructed view factor directly by numerical algorithms. The major challenge for the simulation is that the method is proven to be time-consuming and not feasible to be applied in the practical cases. In this work, we propose an analytical model to calculate macroscopic effective conductivity from particle packing structures Then, we develop a deep neural network (DNN) model used as a predictor of the complex view factor function. The DNN model is trained by a large dataset and the computational speed is greatly improved with good accuracy. It is feasible to perform real-time simulation with DNN model for radiative heat transfer in large pebble bed. The trained model also can be coupled with DEM and used to analyze efficiently the directional radiative conductivity, anisotropic factor and wall effect of the particle thermal radiation.

【Keywords】:

127. DeepDualMapper: A Gated Fusion Network for Automatic Map Extraction Using Aerial Images and Trajectories.

Paper Link】 【Pages】:1037-1045

【Authors】: Hao Wu ; Hanyuan Zhang ; Xinyu Zhang ; Weiwei Sun ; Baihua Zheng ; Yuning Jiang

【Abstract】: Automatic map extraction is of great importance to urban computing and location-based services. Aerial image and GPS trajectory data refer to two different data sources that could be leveraged to generate the map, although they carry different types of information. Most previous works on data fusion between aerial images and data from auxiliary sensors do not fully utilize the information of both modalities and hence suffer from the issue of information loss. We propose a deep convolutional neural network called DeepDualMapper which fuses the aerial image and trajectory data in a more seamless manner to extract the digital map. We design a gated fusion module to explicitly control the information flows from both modalities in a complementary-aware manner. Moreover, we propose a novel densely supervised refinement decoder to generate the prediction in a coarse-to-fine way. Our comprehensive experiments demonstrate that DeepDualMapper can fuse the information of images and trajectories much more effectively than existing approaches, and is able to generate maps with higher accuracy.

【Keywords】:

128. Accelerating and Improving AlphaZero Using Population Based Training.

Paper Link】 【Pages】:1046-1053

【Authors】: Ti-Rong Wu ; Ting-Han Wei ; I-Chen Wu

【Abstract】: AlphaZero has been very successful in many games. Unfortunately, it still consumes a huge amount of computing resources, the majority of which is spent in self-play. Hyperparameter tuning exacerbates the training cost since each hyperparameter configuration requires its own time to train one run, during which it will generate its own self-play records. As a result, multiple runs are usually needed for different hyperparameter configurations. This paper proposes using population based training (PBT) to help tune hyperparameters dynamically and improve strength during training time. Another significant advantage is that this method requires a single run only, while incurring a small additional time cost, since the time for generating self-play records remains unchanged though the time for optimization is increased following the AlphaZero training algorithm. In our experiments for 9x9 Go, the PBT method is able to achieve a higher win rate for 9x9 Go than the baselines, each with its own hyperparameter configuration and trained individually. For 19x19 Go, with PBT, we are able to obtain improvements in playing strength. Specifically, the PBT agent can obtain up to 74% win rate against ELF OpenGo, an open-source state-of-the-art AlphaZero program using a neural network of a comparable capacity. This is compared to a saturated non-PBT agent, which achieves a win rate of 47% against ELF OpenGo under the same circumstances.

【Keywords】:

129. Graph Convolutional Networks with Markov Random Field Reasoning for Social Spammer Detection.

Paper Link】 【Pages】:1054-1061

【Authors】: Yongji Wu ; Defu Lian ; Yiheng Xu ; Le Wu ; Enhong Chen

【Abstract】: The recent growth of social networking platforms also led to the emergence of social spammers, who overwhelm legitimate users with unwanted content. The existing social spammer detection methods can be characterized into two categories: features based ones and propagation-based ones. Features based methods mainly rely on matrix factorization using tweet text features, and regularization using social graphs is incorporated. However, these methods are fully supervised and can only utilize labeled part of social graphs, which fail to work in a real-world semi-supervised setting. The propagation-based methods primarily employ Markov Random Fields (MRFs) to capture human intuitions in user following relations, which cannot take advantages of rich text features. In this paper, we propose a novel social spammer detection model based on Graph Convolutional Networks (GCNs) that operate on directed social graphs by explicitly considering three types of neighbors. Furthermore, inspired by the propagation-based methods, we propose a MRF layer with refining effects to encapsulate these human insights in social relations, which can be formulated as a RNN through mean-field approximate inference, and stack on top of GCN layers to enable end-to-end training. We evaluate our proposed method on two real-world social network datasets, and the results demonstrate that our method outperforms the state-of-the-art approaches.

【Keywords】:

130. Generative Adversarial Regularized Mutual Information Policy Gradient Framework for Automatic Diagnosis.

Paper Link】 【Pages】:1062-1069

【Authors】: Yuan Xia ; Jingbo Zhou ; Zhenhui Shi ; Chao Lu ; Haifeng Huang

【Abstract】: Automatic diagnosis systems have attracted increasing attention in recent years. The reinforcement learning (RL) is an attractive technique for building an automatic diagnosis system due to its advantages for handling sequential decision making problem. However, the RL method still cannot achieve good enough prediction accuracy. In this paper, we propose a Generative Adversarial regularized Mutual information Policy gradient framework (GAMP) for automatic diagnosis which aims to make a diagnosis rapidly and accurately. We first propose a new policy gradient framework based on the Generative Adversarial Network (GAN) to optimize the RL model for automatic diagnosis. In our framework, we take the generator of GAN as a policy network, and also use the discriminator of GAN as a part of the reward function. This generative adversarial regularized policy gradient framework can try to avoid generating randomized trials of symptom inquires deviated from the common diagnosis paradigm. In addition, we add mutual information to enhance the reward function to encourage the model to select the most discriminative symptoms to make a diagnosis. Experiment evaluations on two public datasets show that our method beats the state-of-art methods, not only can achieve higher diagnosis accuracy, but also can use a smaller number of inquires to make diagnosis decision.

【Keywords】:

131. Generate (Non-Software) Bugs to Fool Classifiers.

Paper Link】 【Pages】:1070-1078

【Authors】: Hiromu Yakura ; Youhei Akimoto ; Jun Sakuma

【Abstract】: In adversarial attacks intended to confound deep learning models, most studies have focused on limiting the magnitude of the modification so that humans do not notice the attack. On the other hand, during an attack against autonomous cars, for example, most drivers would not find it strange if a small insect image were placed on a stop sign, or they may overlook it. In this paper, we present a systematic approach to generate natural adversarial examples against classification models by employing such natural-appearing perturbations that imitate a certain object or signal. We first show the feasibility of this approach in an attack against an image classifier by employing generative adversarial networks that produce image patches that have the appearance of a natural object to fool the target model. We also introduce an algorithm to optimize placement of the perturbation in accordance with the input image, which makes the generation of adversarial examples fast and likely to succeed. Moreover, we experimentally show that the proposed approach can be extended to the audio domain, for example, to generate perturbations that sound like the chirping of birds to fool a speech classifier.

【Keywords】:

132. Fairness-Aware Demand Prediction for New Mobility.

Paper Link】 【Pages】:1079-1087

【Authors】: An Yan ; Bill Howe

【Abstract】: Emerging transportation modes, including car-sharing, bike-sharing, and ride-hailing, are transforming urban mobility yet have been shown to reinforce socioeconomic inequity. These services rely on accurate demand prediction, but the demand data on which these models are trained reflect biases around demographics, socioeconomic conditions, and entrenched geographic patterns. To address these biases and improve fairness, we present FairST, a fairness-aware demand prediction model for spatiotemporal urban applications, with emphasis on new mobility. We use 1D (time-varying, space-constant), 2D (space-varying, time-constant) and 3D (both time- and space-varying) convolutional branches to integrate heterogeneous features, while including fairness metrics as a form of regularization to improve equity across demographic groups. We propose two spatiotemporal fairness metrics, region-based fairness gap (RFG), applicable when demographic information is provided as a constant for a region, and individual-based fairness gap (IFG), applicable when a continuous distribution of demographic information is available. Experimental results on bike share and ride share datasets show that FairST can reduce inequity in demand prediction for multiple sensitive attributes (i.e. race, age, and education level), while achieving better accuracy than even state-of-the-art fairness-oblivious methods.

【Keywords】:

133. Beyond Digital Domain: Fooling Deep Learning Based Recognition System in Physical World.

Paper Link】 【Pages】:1088-1095

【Authors】: Kaichen Yang ; Tzungyu Tsai ; Honggang Yu ; Tsung-Yi Ho ; Yier Jin

【Abstract】: Adversarial examples that can fool deep neural network (DNN) models in computer vision present a growing threat. The current methods of launching adversarial attacks concentrate on attacking image classifiers by adding noise to digital inputs. The problem of attacking object detection models and adversarial attacks in physical world are rarely touched. Some prior works are proposed to launch physical adversarial attack against object detection models, but limited by certain aspects. In this paper, we propose a novel physical adversarial attack targeting object detection models. Instead of simply printing images, we manufacture real metal objects that could achieve the adversarial effect. In both indoor and outdoor experiments we show our physical adversarial objects can fool widely applied object detection models including SSD, YOLO and Faster R-CNN in various environments. We also test our attack in a variety of commercial platforms for object detection and demonstrate that our attack is still valid on these platforms. Consider the potential defense mechanisms our adversarial objects may encounter, we conduct a series of experiments to evaluate the effect of existing defense methods on our physical attack.

【Keywords】:

134. Scalable and Generalizable Social Bot Detection through Data Selection.

Paper Link】 【Pages】:1096-1103

【Authors】: Kai-Cheng Yang ; Onur Varol ; Pik-Mai Hui ; Filippo Menczer

【Abstract】: Efficient and reliable social bot classification is crucial for detecting information manipulation on social media. Despite rapid development, state-of-the-art bot detection models still face generalization and scalability challenges, which greatly limit their applications. In this paper we propose a framework that uses minimal account metadata, enabling efficient analysis that scales up to handle the full stream of public tweets of Twitter in real time. To ensure model accuracy, we build a rich collection of labeled datasets for training and validation. We deploy a strict validation system so that model performance on unseen datasets is also optimized, in addition to traditional cross-validation. We find that strategically selecting a subset of training data yields better model accuracy and generalization than exhaustively training on all available data. Thanks to the simplicity of the proposed model, its logic can be interpreted to provide insights into social bot characteristics.

【Keywords】:

135. Instance-Wise Dynamic Sensor Selection for Human Activity Recognition.

Paper Link】 【Pages】:1104-1111

【Authors】: Xiaodong Yang ; Yiqiang Chen ; Hanchao Yu ; Yingwei Zhang ; Wang Lu ; Ruizhe Sun

【Abstract】: Human Activity Recognition (HAR) is an important application of smart wearable/mobile systems for many human-centric problems such as healthcare. The multi-sensor synchronous measurement has shown better performance for HAR than a single sensor. However, the multi-sensor setting increases the costs of data transmission, computation and energy. Therefore, the efficient sensor selection to balance recognition accuracy and sensor cost is the critical challenge. In this paper, we propose an Instance-wise Dynamic Sensor Selection (IDSS) method for HAR. Firstly, we formalize this problem as minimizing both activity classification loss and sensor number by dynamically selecting a sparse subset for each instance. Then, IDSS solves the above minimization problem via Markov Decision Process whose policy for sensor selection is learned by exploiting the instance-wise states using Imitation Learning. In order to optimize the parameters of the activity classification model and the sensor selection policy, an algorithm named Mutual DAgger is proposed to alternatively enhance their learning process. To evaluate the performance of IDSS, we conduct experiments on three real-world HAR datasets. The experimental results show that IDSS can effectively reduce the overall sensor number without losing accuracy and outperforms the state-of-the-art methods regarding the combined measurement of accuracy and sensor number.

【Keywords】:

136. Reinforcement-Learning Based Portfolio Management with Augmented Asset Movement Prediction States.

Paper Link】 【Pages】:1112-1119

【Authors】: Yunan Ye ; Hengzhi Pei ; Boxin Wang ; Pin-Yu Chen ; Yada Zhu ; Ju Xiao ; Bo Li

【Abstract】: Portfolio management (PM) is a fundamental financial planning task that aims to achieve investment goals such as maximal profits or minimal risks. Its decision process involves continuous derivation of valuable information from various data sources and sequential decision optimization, which is a prospective research direction for reinforcement learning (RL). In this paper, we propose SARL, a novel State-Augmented RL framework for PM. Our framework aims to address two unique challenges in financial PM: (1) data heterogeneity – the collected information for each asset is usually diverse, noisy and imbalanced (e.g., news articles); and (2) environment uncertainty – the financial market is versatile and non-stationary. To incorporate heterogeneous data and enhance robustness against environment uncertainty, our SARL augments the asset information with their price movement prediction as additional states, where the prediction can be solely based on financial data (e.g., asset prices) or derived from alternative sources such as news. Experiments on two real-world datasets, (i) Bitcoin market and (ii) HighTech stock market with 7-year Reuters news articles, validate the effectiveness of SARL over existing PM approaches, both in terms of accumulated profits and risk-adjusted profits. Moreover, extensive simulations are conducted to demonstrate the importance of our proposed state augmentation, providing new insights and boosting performance significantly over standard RL-based PM method and other baselines.

【Keywords】:

137. Attention Based Data Hiding with Generative Adversarial Networks.

Paper Link】 【Pages】:1120-1128

【Authors】: Chong Yu

【Abstract】: Recently, the generative adversarial network is the hotspot in research and industrial areas. Its application on data generation is the most common usage. In this paper, we propose the novel end-to-end framework to extend its application to data hiding area. The discriminative model simulates the detection process, which can help us understand the sensitivity of the cover image to semantic changes. The generative model is to generate the target image which is aligned with the original cover image. An attention model is introduced to generate the attention mask. This mask can help to generate a better target image without perturbation of the spotlight. The introduction of cycle discriminative model and inconsistent loss can help to enhance the quality of the generated target image in the iterative training process. The training dataset is mixed with intact images and attacked images. The mix training process can further improve robustness. Through the qualitative, quantitative experiments and analysis, this novel framework shows compelling performance and advantages over the current state-of-the-art methods in data hiding applications.

【Keywords】:

138. AirNet: A Calibration Model for Low-Cost Air Monitoring Sensors Using Dual Sequence Encoder Networks.

Paper Link】 【Pages】:1129-1136

【Authors】: Haomin Yu ; Qingyong Li ; Yangli-ao Geng ; Yingjun Zhang ; Zhi Wei

【Abstract】: Air pollution monitoring has attracted much attention in recent years. However, accurate and high-resolution monitoring of atmospheric pollution remains challenging. There are two types of devices for air pollution monitoring, i.e., static stations and mobile stations. Static stations can provide accurate pollution measurements but their spatial distribution is sparse because of their high expense. In contrast, mobile stations offer an effective solution for dense placement by utilizing low-cost air monitoring sensors, whereas their measurements are less accurate. In this work, we propose a data-driven model based on deep neural networks, referred to as AirNet, for calibrating low-cost air monitoring sensors. Unlike traditional methods, which treat the calibration task as a point-to-point regression problem, we model it as a sequence-to-point mapping problem by introducing historical data sequences from both a mobile station (to be calibrated) and the referred static station. Specifically, AirNet first extracts an observation trend feature of the mobile station and a reference trend feature of the static station via dual encoder neural networks. Then, a social-based guidance mechanism is designed to select periodic and adjacent features. Finally, the features are fused and fed into a decoder to obtain a calibrated measurement. We evaluate the proposed method on two real-world datasets and compare it with six baselines. The experimental results demonstrate that our method yields the best performance.

【Keywords】:

139. Towards Hands-Free Visual Dialog Interactive Recommendation.

Paper Link】 【Pages】:1137-1144

【Authors】: Tong Yu ; Yilin Shen ; Hongxia Jin

【Abstract】: With the recent advances of multimodal interactive recommendations, the users are able to express their preference by natural language feedback to the item images, to find the desired items. However, the existing systems either retrieve only one item or require the user to specify (e.g., by click or touch) the commented items from a list of recommendations in each user interaction. As a result, the users are not hands-free and the recommendations may be impractical. We propose a hands-free visual dialog recommender system to interactively recommend a list of items. At each time, the system shows a list of items with visual appearance. The user can comment on the list in natural language, to describe the desired features they further want. With these multimodal data, the system chooses another list of items to recommend. To understand the user preference from these multimodal data, we develop neural network models which identify the described items among the list and further predict the desired attributes. To achieve efficient interactive recommendations, we leverage the inferred user preference and further develop a novel bandit algorithm. Specifically, to avoid the system exploring more than needed, the desired attributes are utilized to reduce the exploration space. More importantly, to achieve sample efficient learning in this hands-free setting, we derive additional samples from the user's relative preference expressed in natural language and design a pairwise logistic loss in bandit learning. Our bandit model is jointly updated by the pairwise logistic loss on the additional samples derived from natural language feedback and the traditional logistic loss. The empirical results show that the probability of finding the desired items by our system is about 3 times as high as that by the traditional interactive recommenders, after a few user interactions.

【Keywords】:

140. Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection.

Paper Link】 【Pages】:1145-1152

【Authors】: Zeping Yu ; Rui Cao ; Qiyi Tang ; Sen Nie ; Junzhou Huang ; Shi Wu

【Abstract】: Binary code similarity detection, whose goal is to detect similar binary functions without having access to the source code, is an essential task in computer security. Traditional methods usually use graph matching algorithms, which are slow and inaccurate. Recently, neural network-based approaches have made great achievements. A binary function is first represented as an control-flow graph (CFG) with manually selected block features, and then graph neural network (GNN) is adopted to compute the graph embedding. While these methods are effective and efficient, they could not capture enough semantic information of the binary code. In this paper we propose semantic-aware neural networks to extract the semantic information of the binary code. Specially, we use BERT to pre-train the binary code on one token-level task, one block-level task, and two graph-level tasks. Moreover, we find that the order of the CFG's nodes is important for graph similarity detection, so we adopt convolutional neural network (CNN) on adjacency matrices to extract the order information. We conduct experiments on two tasks with four datasets. The results demonstrate that our method outperforms the state-of-art models.

【Keywords】:

141. MetaLight: Value-Based Meta-Reinforcement Learning for Traffic Signal Control.

Paper Link】 【Pages】:1153-1160

【Authors】: Xinshi Zang ; Huaxiu Yao ; Guanjie Zheng ; Nan Xu ; Kai Xu ; Zhenhui Li

【Abstract】: Using reinforcement learning for traffic signal control has attracted increasing interests recently. Various value-based reinforcement learning methods have been proposed to deal with this classical transportation problem and achieved better performances compared with traditional transportation methods. However, current reinforcement learning models rely on tremendous training data and computational resources, which may have bad consequences (e.g., traffic jams or accidents) in the real world. In traffic signal control, some algorithms have been proposed to empower quick learning from scratch, but little attention is paid to learning by transferring and reusing learned experience. In this paper, we propose a novel framework, named as MetaLight, to speed up the learning process in new scenarios by leveraging the knowledge learned from existing scenarios. MetaLight is a value-based meta-reinforcement learning workflow based on the representative gradient-based meta-learning algorithm (MAML), which includes periodically alternate individual-level adaptation and global-level adaptation. Moreover, MetaLight improves the-state-of-the-art reinforcement learning model FRAP in traffic signal control by optimizing its model structure and updating paradigm. The experiments on four real-world datasets show that our proposed MetaLight not only adapts more quickly and stably in new traffic scenarios, but also achieves better performance.

【Keywords】:

142. Geometry-Constrained Car Recognition Using a 3D Perspective Network.

Paper Link】 【Pages】:1161-1168

【Authors】: Rui Zeng ; Zongyuan Ge ; Simon Denman ; Sridha Sridharan ; Clinton Fookes

【Abstract】: We present a novel learning framework for vehicle recognition from a single RGB image. Unlike existing methods which only use attention mechanisms to locate 2D discriminative information, our work learns a novel 3D perspective feature representation of a vehicle, which is then fused with 2D appearance feature to predict the category. The framework is composed of a global network (GN), a 3D perspective network (3DPN), and a fusion network. The GN is used to locate the region of interest (RoI) and generate the 2D global feature. With the assistance of the RoI, the 3DPN estimates the 3D bounding box under the guidance of the proposed vanishing point loss, which provides a perspective geometry constraint. Then the proposed 3D representation is generated by eliminating the viewpoint variance of the 3D bounding box using perspective transformation. Finally, the 3D and 2D feature are fused to predict the category of the vehicle. We present qualitative and quantitative results on the vehicle classification and verification tasks in the BoxCars dataset. The results demonstrate that, by learning such a concise 3D representation, we can achieve superior performance to methods that only use 2D information while retain 3D meaningful information without the challenge of requiring a 3D CAD model.

【Keywords】:

143. Generating Adversarial Examples for Holding Robustness of Source Code Processing Models.

Paper Link】 【Pages】:1169-1176

【Authors】: Huangzhao Zhang ; Zhuo Li ; Ge Li ; Lei Ma ; Yang Liu ; Zhi Jin

【Abstract】: Automated processing, analysis, and generation of source code are among the key activities in software and system lifecycle. To this end, while deep learning (DL) exhibits a certain level of capability in handling these tasks, the current state-of-the-art DL models still suffer from non-robust issues and can be easily fooled by adversarial attacks.Different from adversarial attacks for image, audio, and natural languages, the structured nature of programming languages brings new challenges. In this paper, we propose a Metropolis-Hastings sampling-based identifier renaming technique, named \fullmethod (\method), which generates adversarial examples for DL models specialized for source code processing. Our in-depth evaluation on a functionality classification benchmark demonstrates the effectiveness of \method in generating adversarial examples of source code. The higher robustness and performance enhanced through our adversarial training with \method further confirms the usefulness of DL models-based method for future fully automated source code processing.

【Keywords】:

144. Spatio-Temporal Graph Structure Learning for Traffic Forecasting.

Paper Link】 【Pages】:1177-1185

【Authors】: Qi Zhang ; Jianlong Chang ; Gaofeng Meng ; Shiming Xiang ; Chunhong Pan

【Abstract】: As an indispensable part in Intelligent Traffic System (ITS), the task of traffic forecasting inherently subjects to the following three challenging aspects. First, traffic data are physically associated with road networks, and thus should be formatted as traffic graphs rather than regular grid-like tensors. Second, traffic data render strong spatial dependence, which implies that the nodes in the traffic graphs usually have complex and dynamic relationships between each other. Third, traffic data demonstrate strong temporal dependence, which is crucial for traffic time series modeling. To address these issues, we propose a novel framework named Structure Learning Convolution (SLC) that enables to extend the traditional convolutional neural network (CNN) to graph domains and learn the graph structure for traffic forecasting. Technically, SLC explicitly models the structure information into the convolutional operation. Under this framework, various non-Euclidean CNN methods can be considered as particular instances of our formulation, yielding a flexible mechanism for learning on the graph. Along this technical line, two SLC modules are proposed to capture the global and local structures respectively and they are integrated to construct an end-to-end network for traffic forecasting. Additionally, in this process, Pseudo three Dimensional convolution (P3D) networks are combined with SLC to capture the temporal dependencies in traffic data. Extensively comparative experiments on six real-world datasets demonstrate our proposed approach significantly outperforms the state-of-the-art ones.

【Keywords】:

145. Semi-Supervised Hierarchical Recurrent Graph Neural Network for City-Wide Parking Availability Prediction.

Paper Link】 【Pages】:1186-1193

【Authors】: Weijia Zhang ; Hao Liu ; Yanchi Liu ; Jingbo Zhou ; Hui Xiong

【Abstract】: The ability to predict city-wide parking availability is crucial for the successful development of Parking Guidance and Information (PGI) systems. Indeed, the effective prediction of city-wide parking availability can improve parking efficiency, help urban planning, and ultimately alleviate city congestion. However, it is a non-trivial task for predicting city-wide parking availability because of three major challenges: 1) the non-Euclidean spatial autocorrelation among parking lots, 2) the dynamic temporal autocorrelation inside of and between parking lots, and 3) the scarcity of information about real-time parking availability obtained from real-time sensors (e.g., camera, ultrasonic sensor, and GPS). To this end, we propose Semi-supervised Hierarchical Recurrent Graph Neural Network (SHARE) for predicting city-wide parking availability. Specifically, we first propose a hierarchical graph convolution structure to model non-Euclidean spatial autocorrelation among parking lots. Along this line, a contextual graph convolution block and a soft clustering graph convolution block are respectively proposed to capture local and global spatial dependencies between parking lots. Additionally, we adopt a recurrent neural network to incorporate dynamic temporal dependencies of parking lots. Moreover, we propose a parking availability approximation module to estimate missing real-time parking availabilities from both spatial and temporal domain. Finally, experiments on two real-world datasets demonstrate the prediction performance of \hmgnn outperforms seven state-of-the-art baselines.

【Keywords】:

146. Shoreline: Data-Driven Threshold Estimation of Online Reserves of Cryptocurrency Trading Platforms.

Paper Link】 【Pages】:1194-1201

【Authors】: Xitong Zhang ; He Zhu ; Jiayu Zhou

【Abstract】: With the proliferation of blockchain projects and applications, cryptocurrency exchanges, which provides exchange services among different types of cryptocurrencies, become pivotal platforms that allow customers to trade digital assets on different blockchains. Because of the anonymity and trustlessness nature of cryptocurrency, one major challenge of crypto-exchanges is asset safety, and all-time amount hacked from crypto-exchanges until 2018 is over $1.5 billion even with carefully maintained secure trading systems. The most critical vulnerability of crypto-exchanges is from the so-called hot wallet, which is used to store a certain portion of the total asset of an exchange and programmatically sign transactions when a withdraw happens. Whenever hackers managed to gain control over the computing infrastructure of the exchange, they usually immediately obtain all the assets in the hot wallet. It is important to develop network security mechanisms. However, the fact is that there is no guarantee that the system can defend all attacks. Thus, accurately controlling the available assets in the hot wallets becomes the key to minimize the risk of running an exchange. However, determining such optimal threshold remains a challenging task because of the complicated dynamics inside exchanges. In this paper, we propose Shoreline, a deep learning-based threshold estimation framework that estimates the optimal threshold of hot wallets from historical wallet activities and dynamic trading networks. We conduct extensive empirical studies on the real trading data from a trading platform and demonstrate the effectiveness of the proposed approach.

【Keywords】:

147. A Novel Learning Framework for Sampling-Based Motion Planning in Autonomous Driving.

Paper Link】 【Pages】:1202-1209

【Authors】: Yifan Zhang ; Jinghuai Zhang ; Jindi Zhang ; Jianping Wang ; Kejie Lu ; Jeff Hong

【Abstract】: Sampling-based motion planning (SBMP) is a major trajectory planning approach in autonomous driving given its high efficiency in practice. As the core of SBMP schemes, sampling strategy holds the key to whether a smooth and collision-free trajectory can be found in real-time. Although some bias sampling strategies have been explored in the literature to accelerate SBMP, the trajectory generated under existing bias sampling strategies may lead to sharp lane changing. To address this issue, we propose a new learning framework for SBMP. Specifically, we develop a novel automatic labeling scheme and a 2-Stage prediction model to improve the accuracy in predicting the intention of surrounding vehicles. We then develop an imitation learning scheme to generate sample points based on the experience of human drivers. Using the prediction results, we design a new bias sampling strategy to accelerate the SBMP algorithm by strategically selecting necessary sample points that can generate a smooth and collision-free trajectory and avoid sharp lane changing. Data-driven experiments show that the proposed sampling strategy outperforms existing sampling strategies, in terms of the computing time, traveling time, and smoothness of the trajectory. The results also show that our scheme is even better than human drivers.

【Keywords】:

148. Dynamic Malware Analysis with Feature Engineering and Feature Learning.

Paper Link】 【Pages】:1210-1217

【Authors】: Zhaoqi Zhang ; Panpan Qi ; Wei Wang

【Abstract】: Dynamic malware analysis executes the program in an isolated environment and monitors its run-time behaviour (e.g. system API calls) for malware detection. This technique has been proven to be effective against various code obfuscation techniques and newly released (“zero-day”) malware. However, existing works typically only consider the API name while ignoring the arguments, or require complex feature engineering operations and expert knowledge to process the arguments. In this paper, we propose a novel and low-cost feature extraction approach, and an effective deep neural network architecture for accurate and fast malware detection. Specifically, the feature representation approach utilizes a feature hashing trick to encode the API call arguments associated with the API name. The deep neural network architecture applies multiple Gated-CNNs (convolutional neural networks) to transform the extracted features of each API call. The outputs are further processed through bidirectional LSTM (long-short term memory networks) to learn the sequential correlation among API calls. Experiments show that our solution outperforms baselines significantly on a large real dataset. Valuable insights about feature engineering and architecture design are derived from the ablation study.

【Keywords】:

149. OF-MSRN: Optical Flow-Auxiliary Multi-Task Regression Network for Direct Quantitative Measurement, Segmentation and Motion Estimation.

Paper Link】 【Pages】:1218-1225

【Authors】: Chengqian Zhao ; Cheng Feng ; Dengwang Li ; Shuo Li

【Abstract】: Comprehensively analyzing the carotid artery is critically significant to diagnosing and treating cardiovascular diseases. The object of this work is to simultaneously achieve direct quantitative measurement and automated segmentation of the lumen diameter and intima-media thickness as well as the motion estimation of the carotid wall. No work has simultaneously achieved the comprehensive analysis of carotid artery due to three intractable challenges: 1) Tiny intima-media is more challenging to measure and segment; 2) Artifact generated by radial motion restrict the accuracy of measurement and segmentation; 3) Occlusions on diseased carotid walls generate dynamic complexity and indeterminacy. In this paper, we propose a novel optical flow-auxiliary multi-task regression network named OF-MSRN to overcome these challenges. We concatenate multi-scale features to a regression network to simultaneously achieve measurement and segmentation, which makes full use of the potential correlation between the two tasks. More importantly, we creatively explore an optical flow auxiliary module to take advantage of the co-promotion of segmentation and motion estimation to overcome the restrictions of the radial motion. Besides, we evaluate consistency between forward and backward optical flow to improve the accuracy of motion estimation of the diseased carotid wall. Extensive experiments on US sequences of 101 patients demonstrate the superior performance of OF-MSRN on the comprehensive analysis of the carotid artery by utilizing the dual optimization of the optical flow auxiliary module.

【Keywords】:

150. MaskGEC: Improving Neural Grammatical Error Correction via Dynamic Masking.

Paper Link】 【Pages】:1226-1233

【Authors】: Zewei Zhao ; Houfeng Wang

【Abstract】: Grammatical error correction (GEC) is a promising natural language processing (NLP) application, whose goal is to change the sentences with grammatical errors into the correct ones. Neural machine translation (NMT) approaches have been widely applied to this translation-like task. However, such methods need a fairly large parallel corpus of error-annotated sentence pairs, which is not easy to get especially in the field of Chinese grammatical error correction. In this paper, we propose a simple yet effective method to improve the NMT-based GEC models by dynamic masking. By adding random masks to the original source sentences dynamically in the training procedure, more diverse instances of error-corrected sentence pairs are generated to enhance the generalization ability of the grammatical error correction model without additional data. The experiments on NLPCC 2018 Task 2 show that our MaskGEC model improves the performance of the neural GEC models. Besides, our single model for Chinese GEC outperforms the current state-of-the-art ensemble system in NLPCC 2018 Task 2 without any extra knowledge.

【Keywords】:

151. GMAN: A Graph Multi-Attention Network for Traffic Prediction.

Paper Link】 【Pages】:1234-1241

【Authors】: Chuanpan Zheng ; Xiaoliang Fan ; Cheng Wang ; Jianzhong Qi

【Abstract】: Long-term traffic prediction is highly challenging due to the complexity of traffic systems and the constantly changing nature of many impacting factors. In this paper, we focus on the spatio-temporal factors, and propose a graph multi-attention network (GMAN) to predict traffic conditions for time steps ahead at different locations on a road network graph. GMAN adapts an encoder-decoder architecture, where both the encoder and the decoder consist of multiple spatio-temporal attention blocks to model the impact of the spatio-temporal factors on traffic conditions. The encoder encodes the input traffic features and the decoder predicts the output sequence. Between the encoder and the decoder, a transform attention layer is applied to convert the encoded traffic features to generate the sequence representations of future time steps as the input of the decoder. The transform attention mechanism models the direct relationships between historical and future time steps that helps to alleviate the error propagation problem among prediction time steps. Experimental results on two real-world traffic prediction tasks (i.e., traffic volume prediction and traffic speed prediction) demonstrate the superiority of GMAN. In particular, in the 1 hour ahead prediction, GMAN outperforms state-of-the-art methods by up to 4% improvement in MAE measure. The source code is available at https://github.com/zhengchuanpan/GMAN.

【Keywords】:

152. Index Tracking with Cardinality Constraints: A Stochastic Neural Networks Approach.

Paper Link】 【Pages】:1242-1249

【Authors】: Yu Zheng ; Bowei Chen ; Timothy M. Hospedales ; Yongxin Yang

【Abstract】: Partial (replication) index tracking is a popular passive investment strategy. It aims to replicate the performance of a given index by constructing a tracking portfolio which contains some constituents of the index. The tracking error optimisation is quadratic and NP-hard when taking the ℓ0 constraint into account so it is usually solved by heuristic methods such as evolutionary algorithms. This paper introduces a simple, efficient and scalable connectionist model as an alternative. We propose a novel reparametrisation method and then solve the optimisation problem with stochastic neural networks. The proposed approach is examined with S&P 500 index data for more than 10 years and compared with widely used index tracking approaches such as forward and backward selection and the largest market capitalisation methods. The empirical results show our model achieves excellent performance. Compared with the benchmarked models, our model has the lowest tracking error, across a range of portfolio sizes. Meanwhile it offers comparable performance to the others on secondary criteria such as volatility, Sharpe ratio and maximum drawdown.

【Keywords】:

Paper Link】 【Pages】:1250-1257

【Authors】: Haoxi Zhong ; Yuzhong Wang ; Cunchao Tu ; Tianyang Zhang ; Zhiyuan Liu ; Maosong Sun

【Abstract】: Legal Judgment Prediction (LJP) aims to predict judgment results according to the facts of cases. In recent years, LJP has drawn increasing attention rapidly from both academia and the legal industry, as it can provide references for legal practitioners and is expected to promote judicial justice. However, the research to date usually suffers from the lack of interpretability, which may lead to ethical issues like inconsistent judgments or gender bias. In this paper, we present QAjudge, a model based on reinforcement learning to visualize the prediction process and give interpretable judgments. QAjudge follows two essential principles in legal systems across the world: Presumption of Innocence and Elemental Trial. During inference, a Question Net will select questions from the given set and an Answer Net will answer the question according to the fact description. Finally, a Predict Net will produce judgment results based on the answers. Reward functions are designed to minimize the number of questions asked. We conduct extensive experiments on several real-world datasets. Experimental results show that QAjudge can provide interpretable judgments while maintaining comparable performance with other state-of-the-art LJP models. The codes can be found from https://github.com/thunlp/QAjudge.

【Keywords】:

154. RiskOracle: A Minute-Level Citywide Traffic Accident Forecasting Framework.

Paper Link】 【Pages】:1258-1265

【Authors】: Zhengyang Zhou ; Yang Wang ; Xike Xie ; Lianliang Chen ; Hengchang Liu

【Abstract】: Real-time traffic accident forecasting is increasingly important for public safety and urban management (e.g., real-time safe route planning and emergency response deployment). Previous works on accident forecasting are often performed on hour levels, utilizing existed neural networks with static region-wise correlations taken into account. However, it is still challenging when the granularity of forecasting step improves as the highly dynamic nature of road network and inherent rareness of accident records in one training sample, which leads to biased results and zero-inflated issue. In this work, we propose a novel framework RiskOracle, to improve the prediction granularity to minute levels. Specifically, we first transform the zero-risk values in labels to fit the training network. Then, we propose the Differential Time-varying Graph neural network (DTGN) to capture the immediate changes of traffic status and dynamic inter-subregion correlations. Furthermore, we adopt multi-task and region selection schemes to highlight citywide most-likely accident subregions, bridging the gap between biased risk values and sporadic accident distribution. Extensive experiments on two real-world datasets demonstrate the effectiveness and scalability of our RiskOracle framework.

【Keywords】:

155. Deep Reservoir Computing Meets 5G MIMO-OFDM Systems in Symbol Detection.

Paper Link】 【Pages】:1266-1273

【Authors】: Zhou Zhou ; Lingjia Liu ; Vikram Chandrasekhar ; Jianzhong Zhang ; Yang Yi

【Abstract】: Conventional reservoir computing (RC) is a shallow recurrent neural network (RNN) with fixed high dimensional hidden dynamics and one trainable output layer. It has the nice feature of requiring limited training which is critical for certain applications where training data is extremely limited and costly to obtain. In this paper, we consider two ways to extend the shallow architecture to deep RC to improve the performance without sacrificing the underlying benefit: (1) Extend the output layer to a three layer structure which promotes a joint time-frequency processing to neuron states; (2) Sequentially stack RCs to form a deep neural network. Using the new structure of the deep RC we redesign the physical layer receiver for multiple-input multiple-output with orthogonal frequency division multiplexing (MIMO-OFDM) signals since MIMO-OFDM is a key enabling technology in the 5th generation (5G) cellular network. The combination of RNN dynamics and the time-frequency structure of MIMO-OFDM signals allows deep RC to handle miscellaneous interference in nonlinear MIMO-OFDM channels to achieve improved performance compared to existing techniques. Meanwhile, rather than deep feedforward neural networks which rely on a massive amount of training, our introduced deep RC framework can provide a decent generalization performance using the same amount of pilots as conventional model-based methods in 5G systems. Numerical experiments show that the deep RC based receiver can offer a faster learning convergence and effectively mitigate unknown non-linear radio frequency (RF) distortion yielding twenty percent gain in terms of bit error rate (BER) over the shallow RC structure.

【Keywords】:

156. MixedAD: A Scalable Algorithm for Detecting Mixed Anomalies in Attributed Graphs.

Paper Link】 【Pages】:1274-1281

【Authors】: Mengxiao Zhu ; Haogang Zhu

【Abstract】: Attributed graphs, where nodes are associated with a rich set of attributes, have been widely used in various domains. Among all the nodes, those with patterns that deviate significantly from others are of particular interest. There are mainly two challenges for anomaly detection. For one thing, we often encounter large graphs with lots of nodes and attributes in the real-life scenario, which requires a scalable algorithm. For another, there are anomalies w.r.t. both the structure and attribute in a mixed manner. The algorithm should identify all of them simultaneously. State-of-art algorithms often fail in some respects. In this paper, we propose the scalable algorithm called MixedAD. Theoretical analysis is provided to prove its superiority. Extensive experiments are also conducted on both synthetic and real-life datasets. Specifically, the results show that MixedAD often achieves the F1 scores greater than those of others by at least 25% and runs at least 10 times faster than the others.

【Keywords】:

AAAI Technical Track: Cognitive Modeling 7

157. Theory-Based Causal Transfer: Integrating Instance-Level Induction and Abstract-Level Structure Learning.

Paper Link】 【Pages】:1283-1291

【Authors】: Mark Edmonds ; Xiaojian Ma ; Siyuan Qi ; Yixin Zhu ; Hongjing Lu ; Song-Chun Zhu

【Abstract】: Learning transferable knowledge across similar but different settings is a fundamental component of generalized intelligence. In this paper, we approach the transfer learning challenge from a causal theory perspective. Our agent is endowed with two basic yet general theories for transfer learning: (i) a task shares a common abstract structure that is invariant across domains, and (ii) the behavior of specific features of the environment remain constant across domains. We adopt a Bayesian perspective of causal theory induction and use these theories to transfer knowledge between environments. Given these general theories, the goal is to train an agent by interactively exploring the problem space to (i) discover, form, and transfer useful abstract and structural knowledge, and (ii) induce useful knowledge from the instance-level attributes observed in the environment. A hierarchy of Bayesian structures is used to model abstract-level structural causal knowledge, and an instance-level associative learning scheme learns which specific objects can be used to induce state changes through interaction. This model-learning scheme is then integrated with a model-based planner to achieve a task in the OpenLock environment, a virtual “escape room” with a complex hierarchy that requires agents to reason about an abstract, generalized causal structure. We compare performances against a set of predominate model-free reinforcement learning (RL) algorithms. RL agents showed poor ability transferring learned knowledge across different trials. Whereas the proposed model revealed similar performance trends as human learners, and more importantly, demonstrated transfer behavior across trials and learning situations.1

【Keywords】:

158. Deep Spiking Delayed Feedback Reservoirs and Its Application in Spectrum Sensing of MIMO-OFDM Dynamic Spectrum Sharing.

Paper Link】 【Pages】:1292-1299

【Authors】: Kian Hamedani ; Lingjia Liu ; Shiya Liu ; Haibo He ; Yang Yi

【Abstract】: In this paper, we introduce a deep spiking delayed feedback reservoir (DFR) model to combine DFR with spiking neuros: DFRs are a new type of recurrent neural networks (RNNs) that are able to capture the temporal correlations in time series while spiking neurons are energy-efficient and biologically plausible neurons models. The introduced deep spiking DFR model is energy-efficient and has the capability of analyzing time series signals. The corresponding field programmable gate arrays (FPGA)-based hardware implementation of such deep spiking DFR model is introduced and the underlying energy-efficiency and recourse utilization are evaluated. Various spike encoding schemes are explored and the optimal spike encoding scheme to analyze the time series has been identified. To be specific, we evaluate the performance of the introduced model using the spectrum occupancy time series data in MIMO-OFDM based cognitive radio (CR) in dynamic spectrum sharing (DSS) networks. In a MIMO-OFDM DSS system, available spectrum is very scarce and efficient utilization of spectrum is very essential. To improve the spectrum efficiency, the first step is to identify the frequency bands that are not utilized by the existing users so that a secondary user (SU) can use them for transmission. Due to the channel correlation as well as users' activities, there is a significant temporal correlation in the spectrum occupancy behavior of the frequency bands in different time slots. The introduced deep spiking DFR model is used to capture the temporal correlation of the spectrum occupancy time series and predict the idle/busy subcarriers in future time slots for potential spectrum access. Evaluation results suggest that our introduced model achieves higher area under curve (AUC) in the receiver operating characteristic (ROC) curve compared with the traditional energy detection-based strategies and the learning-based support vector machines (SVMs).

【Keywords】:

159. People Do Not Just Plan, They Plan to Plan.

Paper Link】 【Pages】:1300-1307

【Authors】: Mark K. Ho ; David Abel ; Jonathan D. Cohen ; Michael L. Littman ; Thomas L. Griffiths

【Abstract】: Planning is useful. It lets people take actions that have desirable long-term consequences. But, planning is hard. It requires thinking about consequences, which consumes limited computational and cognitive resources. Thus, people should plan their actions, but they should also be smart about how they deploy resources used for planning their actions. Put another way, people should also “plan their plans”. Here, we formulate this aspect of planning as a meta-reasoning problem and formalize it in terms of a recursive Bellman objective that incorporates both task rewards and information-theoretic planning costs. Our account makes quantitative predictions about how people should plan and meta-plan as a function of the overall structure of a task, which we test in two experiments with human participants. We find that people's reaction times reflect a planned use of information processing, consistent with our account. This formulation of planning to plan provides new insight into the function of hierarchical planning, state abstraction, and cognitive control in both humans and machines.

【Keywords】:

160. Effective AER Object Classification Using Segmented Probability-Maximization Learning in Spiking Neural Networks.

Paper Link】 【Pages】:1308-1315

【Authors】: Qianhui Liu ; Haibo Ruan ; Dong Xing ; Huajin Tang ; Gang Pan

【Abstract】: Address event representation (AER) cameras have recently attracted more attention due to the advantages of high temporal resolution and low power consumption, compared with traditional frame-based cameras. Since AER cameras record the visual input as asynchronous discrete events, they are inherently suitable to coordinate with the spiking neural network (SNN), which is biologically plausible and energy-efficient on neuromorphic hardware. However, using SNN to perform the AER object classification is still challenging, due to the lack of effective learning algorithms for this new representation. To tackle this issue, we propose an AER object classification model using a novel segmented probability-maximization (SPA) learning algorithm. Technically, 1) the SPA learning algorithm iteratively maximizes the probability of the classes that samples belong to, in order to improve the reliability of neuron responses and effectiveness of learning; 2) a peak detection (PD) mechanism is introduced in SPA to locate informative time points segment by segment, based on which information within the whole event stream can be fully utilized by the learning. Extensive experimental results show that, compared to state-of-the-art methods, not only our model is more effective, but also it requires less information to reach a certain level of accuracy.

【Keywords】:

161. Biologically Plausible Sequence Learning with Spiking Neural Networks.

Paper Link】 【Pages】:1316-1323

【Authors】: Zuozhu Liu ; Thiparat Chotibut ; Christopher Hillar ; Shaowei Lin

【Abstract】: Motivated by the celebrated discrete-time model of nervous activity outlined by McCulloch and Pitts in 1943, we propose a novel continuous-time model, the McCulloch-Pitts network (MPN), for sequence learning in spiking neural networks. Our model has a local learning rule, such that the synaptic weight updates depend only on the information directly accessible by the synapse. By exploiting asymmetry in the connections between binary neurons, we show that MPN can be trained to robustly memorize multiple spatiotemporal patterns of binary vectors, generalizing the ability of the symmetric Hopfield network to memorize static spatial patterns. In addition, we demonstrate that the model can efficiently learn sequences of binary pictures as well as generative models for experimental neural spike-train data. Our learning rule is consistent with spike-timing-dependent plasticity (STDP), thus providing a theoretical ground for the systematic design of biologically inspired networks with large and robust long-range sequence storage capacity.

【Keywords】:

162. Transfer Reinforcement Learning Using Output-Gated Working Memory.

Paper Link】 【Pages】:1324-1331

【Authors】: Arthur Williams ; Joshua Phillips

【Abstract】: Transfer learning allows for knowledge to generalize across tasks, resulting in increased learning speed and/or performance. These tasks must have commonalities that allow for knowledge to be transferred. The main goal of transfer learning in the reinforcement learning domain is to train and learn on one or more source tasks in order to learn a target task that exhibits better performance than if transfer was not used (Taylor and Stone 2009). Furthermore, the use of output-gated neural network models of working memory has been shown to increase generalization for supervised learning tasks (Kriete and Noelle 2011; Kriete et al. 2013). We propose that working memory-based generalization plays a significant role in a model's ability to transfer knowledge successfully across tasks. Thus, we extended the Holographic Working Memory Toolkit (HWMtk) (Dubois and Phillips 2017; Phillips and Noelle 2005) to utilize the generalization benefits of output gating within a working memory system. Finally, the model's utility was tested on a temporally extended, partially observable 5x5 2D grid-world maze task that required the agent to learn 3 tasks over the duration of the training period. The results indicate that the addition of output gating increases the initial learning performance of an agent in target tasks and decreases the learning time required to reach a fixed performance threshold.

【Keywords】:

163. Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning.

Paper Link】 【Pages】:1332-1340

【Authors】: Wenhe Zhang ; Chi Zhang ; Yixin Zhu ; Song-Chun Zhu

【Abstract】: As a comprehensive indicator of mathematical thinking and intelligence, the number sense (Dehaene 2011) bridges the induction of symbolic concepts and the competence of problem-solving. To endow such a crucial cognitive ability to machine intelligence, we propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model—And-Or Graph (AOG). These visual arithmetic problems are in the form of geometric figures: each problem has a set of geometric shapes as its context and embedded number symbols. Solving such problems is not trivial; the machine not only has to recognize the number, but also to interpret the number with its contexts, shapes, and relations (e.g., symmetry) together with proper operations. We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task. Comprehensive experiments show that current neural-network-based models still struggle to understand number concepts and relational operations. We show that a simple brute-force search algorithm could work out some of the problems without context information. Crucially, taking geometric context into account by an additional perception module would provide a sharp performance gain with fewer search steps. Altogether, we call for attention in fusing the classic search-based algorithms with modern neural networks to discover the essential number concepts in future research.

【Keywords】:

AAAI Technical Track: Cognitive Systems 3

164. STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits.

Paper Link】 【Pages】:1342-1350

【Authors】: Uttaran Bhattacharya ; Trisha Mittal ; Rohan Chandra ; Tanmay Randhavane ; Aniket Bera ; Dinesh Manocha

【Abstract】: We present a novel classifier network called STEP, to classify perceived human emotion from gaits, based on a Spatial Temporal Graph Convolutional Network (ST-GCN) architecture. Given an RGB video of an individual walking, our formulation implicitly exploits the gait features to classify the perceived emotion of the human into one of four emotions: happy, sad, angry, or neutral. We train STEP on annotated real-world gait videos, augmented with annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an ST-GCN based Conditional Variational Autoencoder (CVAE). We incorporate a novel push-pull regularization loss in the CVAE formulation of STEP-Gen to generate realistic gaits and improve the classification accuracy of STEP. We also release a novel dataset (E-Gait), which consists of 4,227 human gaits annotated with perceived emotions along with thousands of synthetic gaits. In practice, STEP can learn the affective features and exhibits classification accuracy of 88% on E-Gait, which is 14–30% more accurate over prior methods.

【Keywords】:

165. Synch-Graph: Multisensory Emotion Recognition Through Neural Synchrony via Graph Convolutional Networks.

Paper Link】 【Pages】:1351-1358

【Authors】: Esma Mansouri-Benssassi ; Juan Ye

【Abstract】: Human emotions are essentially multisensory, where emotional states are conveyed through multiple modalities such as facial expression, body language, and non-verbal and verbal signals. Therefore having multimodal or multisensory learning is crucial for recognising emotions and interpreting social signals. Existing multisensory emotion recognition approaches focus on extracting features on each modality, while ignoring the importance of constant interaction and co-learning between modalities. In this paper, we present a novel bio-inspired approach based on neural synchrony in audio-visual multisensory integration in the brain, named Synch-Graph. We model multisensory interaction using spiking neural networks (SNN) and explore the use of Graph Convolutional Networks (GCN) to represent and learn neural synchrony patterns. We hypothesise that modelling interactions between modalities will improve the accuracy of emotion recognition. We have evaluated Synch-Graph on two state-of-the-art datasets and achieved an overall accuracy of 98.3% and 96.82%, which are significantly higher than the existing techniques.

【Keywords】:

166. M3ER: Multiplicative Multimodal Emotion Recognition using Facial, Textual, and Speech Cues.

Paper Link】 【Pages】:1359-1367

【Authors】: Trisha Mittal ; Uttaran Bhattacharya ; Rohan Chandra ; Aniket Bera ; Dinesh Manocha

【Abstract】: We present M3ER, a learning-based method for emotion recognition from multiple input modalities. Our approach combines cues from multiple co-occurring modalities (such as face, text, and speech) and also is more robust than other methods to sensor noise in any of the individual modalities. M3ER models a novel, data-driven multiplicative fusion method to combine the modalities, which learn to emphasize the more reliable cues and suppress others on a per-sample basis. By introducing a check step which uses Canonical Correlational Analysis to differentiate between ineffective and effective modalities, M3ER is robust to sensor noise. M3ER also generates proxy features in place of the ineffectual modalities. We demonstrate the efficiency of our network through experimentation on two benchmark datasets, IEMOCAP and CMU-MOSEI. We report a mean accuracy of 82.7% on IEMOCAP and 89.0% on CMU-MOSEI, which, collectively, is an improvement of about 5% over prior work.

【Keywords】:

AAAI Technical Track: Computational Sustainability 5

167. To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and Sustainability.

Paper Link】 【Pages】:1369-1377

【Authors】: Elizabeth Bondi ; Hoon Oh ; Haifeng Xu ; Fei Fang ; Bistra Dilkina ; Milind Tambe

【Abstract】: Motivated by real-world deployment of drones for conservation, this paper advances the state-of-the-art in security games with signaling. The well-known defender-attacker security games framework can help in planning for such strategic deployments of sensors and human patrollers, and warning signals to ward off adversaries. However, we show that defenders can suffer significant losses when ignoring real-world uncertainties despite carefully planned security game strategies with signaling. In fact, defenders may perform worse than forgoing drones completely in this case. We address this shortcoming by proposing a novel game model that integrates signaling and sensor uncertainty; perhaps surprisingly, we show that defenders can still perform well via a signaling strategy that exploits uncertain real-time information. For example, even in the presence of uncertainty, the defender still has an informational advantage in knowing that she has or has not actually detected the attacker; and she can design a signaling scheme to “mislead” the attacker who is uncertain as to whether he has been detected. We provide theoretical results, a novel algorithm, scale-up techniques, and experimental results from simulation based on our ongoing deployment of a conservation drone system in South Africa.

【Keywords】:

168. End-to-End Game-Focused Learning of Adversary Behavior in Security Games.

Paper Link】 【Pages】:1378-1386

【Authors】: Andrew Perrault ; Bryan Wilder ; Eric Ewing ; Aditya Mate ; Bistra Dilkina ; Milind Tambe

【Abstract】: Stackelberg security games are a critical tool for maximizing the utility of limited defense resources to protect important targets from an intelligent adversary. Motivated by green security, where the defender may only observe an adversary's response to defense on a limited set of targets, we study the problem of learning a defense that generalizes well to a new set of targets with novel feature values and combinations. Traditionally, this problem has been addressed via a two-stage approach where an adversary model is trained to maximize predictive accuracy without considering the defender's optimization problem. We develop an end-to-end game-focused approach, where the adversary model is trained to maximize a surrogate for the defender's expected utility. We show both in theory and experimental results that our game-focused approach achieves higher defender expected utility than the two-stage alternative when there is limited data.

【Keywords】:

169. Modeling Electrical Motor Dynamics Using Encoder-Decoder with Recurrent Skip Connection.

Paper Link】 【Pages】:1387-1394

【Authors】: Sagar Verma ; Nicolas Henwood ; Marc Castella ; François Malrait ; Jean-Christophe Pesquet

【Abstract】: Electrical motors are the most important source of mechanical energy in the industrial world. Their modeling traditionally relies on a physics-based approach, which aims at taking their complex internal dynamics into account. In this paper, we explore the feasibility of modeling the dynamics of an electrical motor by following a data-driven approach, which uses only its inputs and outputs and does not make any assumption on its internal behaviour. We propose a novel encoder-decoder architecture which benefits from recurrent skip connections. We also propose a novel loss function that takes into account the complexity of electrical motor quantities and helps in avoiding model bias. We show that the proposed architecture can achieve a good learning performance on our high-frequency high-variance datasets. Two datasets are considered: the first one is generated using a simulator based on the physics of an induction motor and the second one is recorded from an industrial electrical motor. We benchmark our solution using variants of traditional neural networks like feedforward, convolutional, and recurrent networks. We evaluate various design choices of our architecture and compare it to the baselines. We show the domain adaptation capability of our model to learn dynamics just from simulated data by testing it on the raw sensor data. We finally show the effect of signal complexity on the proposed method ability to model temporal dynamics.

【Keywords】:

Paper Link】 【Pages】:1395-1402

【Authors】: Dongkuan Xu ; Wei Cheng ; Bo Zong ; Dongjin Song ; Jingchao Ni ; Wenchao Yu ; Yanchi Liu ; Haifeng Chen ; Xiang Zhang

【Abstract】: The problem of learning and forecasting underlying trends in time series data arises in a variety of applications, such as traffic management, energy optimization, etc. In literature, a trend in time series is characterized by the slope and duration, and its prediction is then to forecast the two values of the subsequent trend given historical data of the time series. For this problem, existing approaches mainly deal with the case in univariate time series. However, in many real-world applications, there are multiple variables at play, and handling all of them at the same time is crucial for an accurate prediction. A natural way is to employ multi-task learning (MTL) techniques in which the trend learning of each time series is treated as a task. The key point of MTL is to learn task relatedness to achieve better parameter sharing, which however is challenging in trend prediction task. First, effectively modeling the complex temporal patterns in different tasks is hard as the temporal and spatial dimensions are entangled. Second, the relatedness among tasks may change over time. In this paper, we propose a neural network, DeepTrends, for multivariate time series trend prediction. The core module of DeepTrends is a tensorized LSTM with adaptive shared memory (TLASM). TLASM employs the tensorized LSTM to model the temporal patterns of long-term trend sequences in an MTL setting. With an adaptive shared memory, TLASM is able to learn the relatedness among tasks adaptively, based upon which it can dynamically vary degrees of parameter sharing among tasks. To further consider short-term patterns, DeepTrends utilizes a multi-task 1dCNN to learn the local time series features, and employs a task-specific sub-network to learn a mixture of long-term and short-term patterns for trend prediction. Extensive experiments on real datasets demonstrate the effectiveness of the proposed model.

【Keywords】:

171. Deep Unsupervised Binary Coding Networks for Multivariate Time Series Retrieval.

Paper Link】 【Pages】:1403-1411

【Authors】: Dixian Zhu ; Dongjin Song ; Yuncong Chen ; Cristian Lumezanu ; Wei Cheng ; Bo Zong ; Jingchao Ni ; Takehiko Mizoguchi ; Tianbao Yang ; Haifeng Chen

【Abstract】: Multivariate time series data are becoming increasingly ubiquitous in varies real-world applications such as smart city, power plant monitoring, wearable devices, etc. Given the current time series segment, how to retrieve similar segments within the historical data in an efficient and effective manner is becoming increasingly important. As it can facilitate underlying applications such as system status identification, anomaly detection, etc. Despite the fact that various binary coding techniques can be applied to this task, few of them are specially designed for multivariate time series data in an unsupervised setting. To this end, we present Deep Unsupervised Binary Coding Networks (DUBCNs) to perform multivariate time series retrieval. DUBCNs employ the Long Short-Term Memory (LSTM) encoder-decoder framework to capture the temporal dynamics within the input segment and consist of three key components, i.e., a temporal encoding mechanism to capture the temporal order of different segments within a mini-batch, a clustering loss on the hidden feature space to capture the hidden feature structure, and an adversarial loss based upon Generative Adversarial Networks (GANs) to enhance the generalization capability of the generated binary codes. Thoroughly empirical studies on three public datasets demonstrated that the proposed DUBCNs can outperform state-of-the-art unsupervised binary coding techniques.

【Keywords】:

AAAI Technical Track: Constraint Satisfaction and Optimization 34

172. Improved Filtering for the Euclidean Traveling Salesperson Problem in CLP(FD).

Paper Link】 【Pages】:1412-1419

【Authors】: Alessandro Bertagnon ; Marco Gavanelli

【Abstract】: The Traveling Salesperson Problem (TSP) is one of the best-known problems in computer science. The Euclidean TSP is a special case in which each node is identified by its coordinates on the plane and the Euclidean distance is used as cost function. Many works in the Constraint Programming (CP) literature addressed the TSP, and use as benchmark Euclidean instances; however the usual approach is to build a distance matrix from the points coordinates, and then address the problem as a TSP, disregarding the information carried by the points coordinates for constraint propagation. In this work, we propose to use geometric information, present in Euclidean TSP instances, to improve the filtering power. In order to have a declarative approach, we implemented the filtering algorithms in Constraint Logic Programming on Finite Domains (CLP(FD)).

【Keywords】:

173. Chain Length and CSPs Learnable with Few Queries.

Paper Link】 【Pages】:1420-1427

【Authors】: Christian Bessiere ; Clément Carbonnel ; George Katsirelos

【Abstract】: The goal of constraint acquisition is to learn exactly a constraint network given access to an oracle that answers truthfully certain types of queries. In this paper we focus on partial membership queries and initiate a systematic investigation of the learning complexity of constraint languages. First, we use the notion of chain length to show that a wide class of languages can be learned with as few as O(n log(n)) queries. Then, we combine this result with generic lower bounds to derive a dichotomy in the learning complexity of binary languages. Finally, we identify a class of ternary languages that eludes our framework and hints at new research directions.

【Keywords】:

Paper Link】 【Pages】:1428-1435

【Authors】: Md. Solimul Chowdhury ; Martin Müller ; Jia-Huai You

【Abstract】: The efficiency of Conflict Driven Clause Learning (CDCL) SAT solving depends crucially on finding conflicts at a fast rate. State-of-the-art CDCL branching heuristics such as VSIDS, CHB and LRB conform to this goal. We take a closer look at the way in which conflicts are generated over the course of a CDCL SAT search. Our study of the VSIDS branching heuristic shows that conflicts are typically generated in short bursts, followed by what we call a conflict depression phase in which the search fails to generate any conflicts in a span of decisions. The lack of conflict indicates that the variables that are currently ranked highest by the branching heuristic fail to generate conflicts. Based on this analysis, we propose an exploration strategy, called expSAT, which randomly samples variable selection sequences in order to learn an updated heuristic from the generated conflicts. The goal is to escape from conflict depressions expeditiously. The branching heuristic deployed in expSAT combines these updates with the standard VSIDS activity scores. An extensive empirical evaluation with four state-of-the-art CDCL SAT solvers demonstrates good-to-strong performance gains with the expSAT approach.

【Keywords】:

175. Representative Solutions for Bi-Objective Optimisation.

Paper Link】 【Pages】:1436-1443

【Authors】: Emir Demirovic ; Nicolas Schwind

【Abstract】: Bi-objective optimisation aims to optimise two generally competing objective functions. Typically, it consists in computing the set of nondominated solutions, called the Pareto front. This raises two issues: 1) time complexity, as the Pareto front in general can be infinite for continuous problems and exponentially large for discrete problems, and 2) lack of decisiveness. This paper focusses on the computation of a small, “relevant” subset of the Pareto front called the representative set, which provides meaningful trade-offs between the two objectives. We introduce a procedure which, given a pre-computed Pareto front, computes a representative set in polynomial time, and then we show how to adapt it to the case where the Pareto front is not provided. This has three important consequences for computing the representative set: 1) does not require the whole Pareto front to be provided explicitly, 2) can be done in polynomial time for bi-objective mixed-integer linear programs, and 3) only requires a polynomial number of solver calls for bi-objective problems, as opposed to the case where a higher number of objectives is involved. We implement our algorithm and empirically illustrate the efficiency on two families of benchmarks.

【Keywords】:

176. Dynamic Programming for Predict+Optimise.

Paper Link】 【Pages】:1444-1451

【Authors】: Emir Demirovic ; Peter J. Stuckey ; Tias Guns ; James Bailey ; Christopher Leckie ; Kotagiri Ramamohanarao ; Jeffrey Chan

【Abstract】: We study the predict+optimise problem, where machine learning and combinatorial optimisation must interact to achieve a common goal. These problems are important when optimisation needs to be performed on input parameters that are not fully observed but must instead be estimated using machine learning. We provide a novel learning technique for predict+optimise to directly reason about the underlying combinatorial optimisation problem, offering a meaningful integration of machine learning and optimisation. This is done by representing the combinatorial problem as a piecewise linear function parameterised by the coefficients of the learning model and then iteratively performing coordinate descent on the learning coefficients. Our approach is applicable to linear learning functions and any optimisation problem solvable by dynamic programming. We illustrate the effectiveness of our approach on benchmarks from the literature.

【Keywords】:

177. Accelerating Primal Solution Findings for Mixed Integer Programs Based on Solution Prediction.

Paper Link】 【Pages】:1452-1459

【Authors】: Jian-Ya Ding ; Chao Zhang ; Lei Shen ; Shengyin Li ; Bing Wang ; Yinghui Xu ; Le Song

【Abstract】: Mixed Integer Programming (MIP) is one of the most widely used modeling techniques for combinatorial optimization problems. In many applications, a similar MIP model is solved on a regular basis, maintaining remarkable similarities in model structures and solution appearances but differing in formulation coefficients. This offers the opportunity for machine learning methods to explore the correlations between model structures and the resulting solution values. To address this issue, we propose to represent a MIP instance using a tripartite graph, based on which a Graph Convolutional Network (GCN) is constructed to predict solution values for binary variables. The predicted solutions are used to generate a local branching type cut which can be either treated as a global (invalid) inequality in the formulation resulting in a heuristic approach to solve the MIP, or as a root branching rule resulting in an exact approach. Computational evaluations on 8 distinct types of MIP problems show that the proposed framework improves the primal solution finding performance significantly on a state-of-the-art open-source MIP solver.

【Keywords】:

178. Optimization of Chance-Constrained Submodular Functions.

Paper Link】 【Pages】:1460-1467

【Authors】: Benjamin Doerr ; Carola Doerr ; Aneta Neumann ; Frank Neumann ; Andrew M. Sutton

【Abstract】: Submodular optimization plays a key role in many real-world problems. In many real-world scenarios, it is also necessary to handle uncertainty, and potentially disruptive events that violate constraints in stochastic settings need to be avoided. In this paper, we investigate submodular optimization problems with chance constraints. We provide a first analysis on the approximation behavior of popular greedy algorithms for submodular problems with chance constraints. Our results show that these algorithms are highly effective when using surrogate functions that estimate constraint violations based on Chernoff bounds. Furthermore, we investigate the behavior of the algorithms on popular social network problems and show that high quality solutions can still be obtained even if there are strong restrictions imposed by the chance constraint.

【Keywords】:

179. ADDMC: Weighted Model Counting with Algebraic Decision Diagrams.

Paper Link】 【Pages】:1468-1476

【Authors】: Jeffrey M. Dudek ; Vu Phan ; Moshe Y. Vardi

【Abstract】: We present an algorithm to compute exact literal-weighted model counts of Boolean formulas in Conjunctive Normal Form. Our algorithm employs dynamic programming and uses Algebraic Decision Diagrams as the main data structure. We implement this technique in ADDMC, a new model counter. We empirically evaluate various heuristics that can be used with ADDMC. We then compare ADDMC to four state-of-the-art weighted model counters (Cachet, c2d, d4, and miniC2D) on 1914 standard model counting benchmarks and show that ADDMC significantly improves the virtual best solver.

【Keywords】:

180. Modelling and Solving Online Optimisation Problems.

Paper Link】 【Pages】:1477-1485

【Authors】: Alexander Ek ; Maria Garcia de la Banda ; Andreas Schutt ; Peter J. Stuckey ; Guido Tack

【Abstract】: Many optimisation problems are of an online—also called dynamic—nature, where new information is expected to arrive and the problem must be resolved in an ongoing fashion to (a) improve or revise previous decisions and (b) take new ones. Typically, building an online decision-making system requires substantial ad-hoc coding to ensure the offline version of the optimisation problem is continually adjusted and resolved. This paper defines a general framework for automatically solving online optimisation problems. This is achieved by extending a model of the offline optimisation problem, from which an online version is automatically constructed, thus requiring no further modelling effort. In doing so, it formalises many of the aspects that arise in online optimisation problems. The same framework can be applied for automatically creating sliding-window solving approaches for problems that have a large time horizon. Experiments show we can automatically create efficient online and sliding-window solutions to optimisation problems.

【Keywords】:

181. Justifying All Differences Using Pseudo-Boolean Reasoning.

Paper Link】 【Pages】:1486-1494

【Authors】: Jan Elffers ; Stephan Gocht ; Ciaran McCreesh ; Jakob Nordström

【Abstract】: Constraint programming solvers support rich global constraints and propagators, which make them both powerful and hard to debug. In the Boolean satisfiability community, proof-logging is the standard solution for generating trustworthy outputs, and this has become key to the social acceptability of computer-generated proofs. However, reusing this technology for constraint programming requires either much weaker propagation, or an impractical blowup in proof length. This paper demonstrates that simple, clean, and efficient proof logging is still possible for the all-different constraint, through pseudo-Boolean reasoning. We explain how such proofs can be expressed and verified mechanistically, describe an implementation, and discuss the broader implications for proof logging in constraint programming.

【Keywords】:

182. A Cardinal Improvement to Pseudo-Boolean Solving.

Paper Link】 【Pages】:1495-1503

【Authors】: Jan Elffers ; Jakob Nordström

【Abstract】: Pseudo-Boolean solvers hold out the theoretical potential of exponential improvements over conflict-driven clause learning (CDCL) SAT solvers, but in practice perform very poorly if the input is given in the standard conjunctive normal form (CNF) format. We present a technique to remedy this problem by recovering cardinality constraints from CNF on the fly during search. This is done by collecting potential building blocks of cardinality constraints during propagation and combining these blocks during conflict analysis. Our implementation has a non-negligible but manageable overhead when detection is not successful, and yields significant gains for some SAT competition and crafted benchmarks for which pseudo-Boolean reasoning is stronger than CDCL. It also boosts performance for some native pseudo-Boolean formulas where this approach helps to improve learned constraints.

【Keywords】:

183. MIPaaL: Mixed Integer Program as a Layer.

Paper Link】 【Pages】:1504-1511

【Authors】: Aaron Ferber ; Bryan Wilder ; Bistra Dilkina ; Milind Tambe

【Abstract】: Machine learning components commonly appear in larger decision-making pipelines; however, the model training process typically focuses only on a loss that measures average accuracy between predicted values and ground truth values. Decision-focused learning explicitly integrates the downstream decision problem when training the predictive model, in order to optimize the quality of decisions induced by the predictions. It has been successfully applied to several limited combinatorial problem classes, such as those that can be expressed as linear programs (LP), and submodular optimization. However, these previous applications have uniformly focused on problems with simple constraints. Here, we enable decision-focused learning for the broad class of problems that can be encoded as a mixed integer linear program (MIP), hence supporting arbitrary linear constraints over discrete and continuous variables. We show how to differentiate through a MIP by employing a cutting planes solution approach, an algorithm that iteratively tightens the continuous relaxation by adding constraints removing fractional solutions. We evaluate our new end-to-end approach on several real world domains and show that it outperforms the standard two phase approaches that treat prediction and optimization separately, as well as a baseline approach of simply applying decision-focused learning to the LP relaxation of the MIP. Lastly, we demonstrate generalization performance in several transfer learning tasks.

【Keywords】:

184. Using Approximation within Constraint Programming to Solve the Parallel Machine Scheduling Problem with Additional Unit Resources.

Paper Link】 【Pages】:1512-1519

【Authors】: Arthur Godet ; Xavier Lorca ; Emmanuel Hebrard ; Gilles Simonin

【Abstract】: In this paper, we consider the Parallel Machine Scheduling Problem with Additional Unit Resources, which consists in scheduling a set of n jobs on m parallel unrelated machines and subject to exactly one of r unit resources. This problem arises from the download of acquisitions from satellites to ground stations. We first introduce two baseline constraint models for this problem. Then, we build on an approximation algorithm for this problem, and we discuss about the efficiency of designing an improved constraint model based on these approximation results. In particular, we introduce new constraints that restrict search to executions of the approximation algorithm. Finally, we report experimental data demonstrating that this model significantly outperforms the two reference models.

【Keywords】:

185. SPAN: A Stochastic Projected Approximate Newton Method.

Paper Link】 【Pages】:1520-1527

【Authors】: Xunpeng Huang ; Xianfeng Liang ; Zhengyang Liu ; Lei Li ; Yue Yu ; Yitan Li

【Abstract】: Second-order optimization methods have desirable convergence properties. However, the exact Newton method requires expensive computation for the Hessian and its inverse. In this paper, we propose SPAN, a novel approximate and fast Newton method. SPAN computes the inverse of the Hessian matrix via low-rank approximation and stochastic Hessian-vector products. Our experiments on multiple benchmark datasets demonstrate that SPAN outperforms existing first-order and second-order optimization methods in terms of the convergence wall-clock time. Furthermore, we provide a theoretical analysis of the per-iteration complexity, the approximation error, and the convergence rate. Both the theoretical analysis and experimental results show that our proposed method achieves a better trade-off between the convergence rate and the per-iteration efficiency.

【Keywords】:

186. Modelling Diversity of Solutions.

Paper Link】 【Pages】:1528-1535

【Authors】: Linnea Ingmar ; Maria Garcia de la Banda ; Peter J. Stuckey ; Guido Tack

【Abstract】: For many combinatorial problems, finding a single solution is not enough. This is clearly the case for multi-objective optimization problems, as they have no single “best solution” and, thus, it is useful to find a representation of the non-dominated solutions (the Pareto frontier). However, it also applies to single objective optimization problems, where one may be interested in finding several (close to) optimal solutions that illustrate some form of diversity. The same applies to satisfaction problems. This is because models usually idealize the problem in some way, and a diverse pool of solutions may provide a better choice with respect to considerations that are omitted or simplified in the model. This paper describes a general framework for finding k diverse solutions to a combinatorial problem (be it satisfaction, single-objective or multi-objective), various approaches to solve problems in the framework, their implementations, and an experimental evaluation of their practicality.

【Keywords】:

Paper Link】 【Pages】:1536-1543

【Authors】: Avraham Itzhakov ; Michael Codish

【Abstract】: This paper introduces incremental symmetry breaking constraints for graph search problems which are complete and compact. We show that these constraints can be computed incrementally: A symmetry breaking constraint for order n graphs can be extended to one for order n + 1 graphs. Moreover, these constraints induce a special property on their canonical solutions: An order n canonical graph contains a canonical subgraph on the first k vertices for every 1 ≤ k ≤ n. This facilitates a “generate and extend” paradigm for parallel graph search problem solving: To solve a graph search problem φ on order n graphs, first generate the canonical graphs of some order k < n. Then, compute canonical solutions for φ by extending, in parallel, each canonical order k graph together with suitable symmetry breaking constraints. The contribution is that the proposed symmetry breaking constraints enable us to extend the order k canonical graphs to order n canonical solutions. We demonstrate our approach through its application on two hard graph search problems.

【Keywords】:

188. Finding Most Compatible Phylogenetic Trees over Multi-State Characters.

Paper Link】 【Pages】:1544-1551

【Authors】: Tuukka Korhonen ; Matti Järvisalo

【Abstract】: The reconstruction of the evolutionary tree of a set of species based on qualitative attributes is a central problem in phylogenetics. In the NP-hard perfect phylogeny problem the input is a set of taxa (species) and characters (attributes) on them, and the task is to find an evolutionary tree that describes the evolution of the taxa so that each character state evolves only once. However, in practical situations a perfect phylogeny rarely exists, motivating the maximum compatibility problem of finding the largest subset of characters admitting a perfect phylogeny. Various declarative approaches, based on applying integer programming (IP), answer set programming (ASP) and pseudo-Boolean optimization (PBO) solvers, have been proposed for maximum compatibility. In this work we develop a new hybrid approach to solving maximum compatibility for multi-state characters, making use of both declarative optimization techniques (specifically maximum satisfiability, MaxSAT) and an adaptation of the Bouchitt'e-Todinca approach to triangulation-based graph optimization problems. Empirically our approach outperforms in scalability the earlier proposed approaches w.r.t. various parameters underlying the problem.

【Keywords】:

189. FourierSAT: A Fourier Expansion-Based Algebraic Framework for Solving Hybrid Boolean Constraints.

Paper Link】 【Pages】:1552-1560

【Authors】: Anastasios Kyrillidis ; Anshumali Shrivastava ; Moshe Y. Vardi ; Zhiwei Zhang

【Abstract】: The Boolean SATisfiability problem (SAT) is of central importance in computer science. Although SAT is known to be NP-complete, progress on the engineering side—especially that of Conflict-Driven Clause Learning (CDCL) and Local Search SAT solvers—has been remarkable. Yet, while SAT solvers, aimed at solving industrial-scale benchmarks in Conjunctive Normal Form (CNF), have become quite mature, SAT solvers that are effective on other types of constraints (e.g., cardinality constraints and XORs) are less well-studied; a general approach to handling non-CNF constraints is still lacking. In addition, previous work indicated that for specific classes of benchmarks, the running time of extant SAT solvers depends heavily on properties of the formula and details of encoding, instead of the scale of the benchmarks, which adds uncertainty to expectations of running time.To address the issues above, we design FourierSAT, an incomplete SAT solver based on Fourier analysis of Boolean functions, a technique to represent Boolean functions by multilinear polynomials. By such a reduction to continuous optimization, we propose an algebraic framework for solving systems consisting of different types of constraints. The idea is to leverage gradient information to guide the search process in the direction of local improvements. Empirical results demonstrate that FourierSAT is more robust than other solvers on certain classes of benchmarks.

【Keywords】:

190. Augmenting the Power of (Partial) MaxSat Resolution with Extension.

Paper Link】 【Pages】:1561-1568

【Authors】: Javier Larrosa ; Emma Rollon

【Abstract】: The refutation power of SAT and MaxSAT resolution is challenged by problems like the soft and hard Pigeon Hole Problem PHP for which short refutations do not exist. In this paper we augment the MaxSAT resolution proof system with an extension rule. The new proof system MaxResE is sound and complete, and more powerful than plain MaxSAT resolution, since it can refute the soft and hard PHP in polynomial time. We show that MaxResE refutations actually subtract lower bounds from the objective function encoded by the formulas. The resulting formula is the residual after the lower bound extraction. We experimentally show that the residual of the soft PHP (once its necessary cost of 1 has been efficiently subtracted with MaxResE) is a concise, easy to solve, satisfiable problem.

【Keywords】:

191. Solving Set Cover and Dominating Set via Maximum Satisfiability.

Paper Link】 【Pages】:1569-1576

【Authors】: Zhendong Lei ; Shaowei Cai

【Abstract】: The Set Covering Problem (SCP) and Dominating Set Problem (DSP) are NP-hard and have many real world applications. SCP and DSP can be encoded into Maximum Satisfiability (MaxSAT) naturally and the resulting instances share a special structure. In this paper, we develop an efficient local search solver for MaxSAT instances of this kind. Our algorithm contains three phrase: construction, local search and recovery. In construction phrase, we simplify the instance by three reduction rules and construct an initial solution by a greedy heuristic. The initial solution is improved during the local search phrase, which exploits the feature of such instances in the scoring function and the variable selection heuristic. Finally, the corresponding solution of original instance is recovered in the recovery phrase. Experiment results on a broad range of large scale instances of SCP and DSP show that our algorithm significantly outperforms state of the art solvers for SCP, DSP and MaxSAT.

【Keywords】:

192. Finding Good Subtrees for Constraint Optimization Problems Using Frequent Pattern Mining.

Paper Link】 【Pages】:1577-1584

【Authors】: Hongbo Li ; Jimmy Lee ; He Mi ; Minghao Yin

【Abstract】: Making good decisions at the top of a search tree is important for finding good solutions early in constraint optimization. In this paper, we propose a method employing frequent pattern mining (FPM), a classic datamining technique, to find good subtrees for solving constraint optimization problems. We demonstrate that applying FPM in a small number of random high-quality feasible solutions enables us to identify subtrees containing optimal solutions in more than 55% of problem instances for four real world benchmark problems. The method works as a plugin that can be combined with any search strategy for branch-and-bound search. Exploring the identified subtrees first, the method brings substantial improvements for four efficient search strategies in both total runtime and runtime of finding optimal solutions.

【Keywords】:

193. An Effective Hard Thresholding Method Based on Stochastic Variance Reduction for Nonconvex Sparse Learning.

Paper Link】 【Pages】:1585-1592

【Authors】: Guannan Liang ; Qianqian Tong ; Chun Jiang Zhu ; Jinbo Bi

【Abstract】: We propose a hard thresholding method based on stochastically controlled stochastic gradients (SCSG-HT) to solve a family of sparsity-constrained empirical risk minimization problems. The SCSG-HT uses batch gradients where batch size is pre-determined by the desirable precision tolerance rather than full gradients to reduce the variance in stochastic gradients. It also employs the geometric distribution to determine the number of loops per epoch. We prove that, similar to the latest methods based on stochastic gradient descent or stochastic variance reduction methods, SCSG-HT enjoys a linear convergence rate. However, SCSG-HT now has a strong guarantee to recover the optimal sparse estimator. The computational complexity of SCSG-HT is independent of sample size n when n is larger than 1/ε, which enhances the scalability to massive-scale problems. Empirical results demonstrate that SCSG-HT outperforms several competitors and decreases the objective value the most with the same computational costs.

【Keywords】:

194. Accelerating Column Generation via Flexible Dual Optimal Inequalities with Application to Entity Resolution.

Paper Link】 【Pages】:1593-1602

【Authors】: Vishnu Suresh Lokhande ; Shaofei Wang ; Maneesh Singh ; Julian Yarkony

【Abstract】: In this paper, we introduce a new optimization approach to Entity Resolution. Traditional approaches tackle entity resolution with hierarchical clustering, which does not benefit from a formal optimization formulation. In contrast, we model entity resolution as correlation-clustering, which we treat as a weighted set-packing problem and write as an integer linear program (ILP). In this case, sources in the input data correspond to elements and entities in output data correspond to sets/clusters. We tackle optimization of weighted set packing by relaxing integrality in our ILP formulation. The set of potential sets/clusters can not be explicitly enumerated, thus motivating optimization via column generation. In addition to the novel formulation, we also introduce new dual optimal inequalities (DOI), that we call flexible dual optimal inequalities, which tightly lower-bound dual variables during optimization and accelerate column generation. We apply our formulation to entity resolution (also called de-duplication of records), and achieve state-of-the-art accuracy on two popular benchmark datasets. Our F-DOI can be extended to other weighted set-packing problems.

【Keywords】:

195. Smart Predict-and-Optimize for Hard Combinatorial Optimization Problems.

Paper Link】 【Pages】:1603-1610

【Authors】: Jayanta Mandi ; Emir Demirovic ; Peter J. Stuckey ; Tias Guns

【Abstract】: Combinatorial optimization assumes that all parameters of the optimization problem, e.g. the weights in the objective function, are fixed. Often, these weights are mere estimates and increasingly machine learning techniques are used to for their estimation. Recently, Smart Predict and Optimize (SPO) has been proposed for problems with a linear objective function over the predictions, more specifically linear programming problems. It takes the regret of the predictions on the linear problem into account, by repeatedly solving it during learning. We investigate the use of SPO to solve more realistic discrete optimization problems. The main challenge is the repeated solving of the optimization problem. To this end, we investigate ways to relax the problem as well as warm-starting the learning and the solving. Our results show that even for discrete problems it often suffices to train by solving the relaxation in the SPO loss. Furthermore, this approach outperforms the state-of-the-art approach of Wilder, Dilkina, and Tambe. We experiment with weighted knapsack problems as well as complex scheduling problems, and show for the first time that a predict-and-optimize approach can successfully be used on large-scale combinatorial optimization problems.

【Keywords】:

196. Grammar Filtering for Syntax-Guided Synthesis.

Paper Link】 【Pages】:1611-1618

【Authors】: Kairo Morton ; William T. Hallahan ; Elven Shum ; Ruzica Piskac ; Mark Santolucito

【Abstract】: Programming-by-example (PBE) is a synthesis paradigm that allows users to generate functions by simply providing input-output examples. While a promising interaction paradigm, synthesis is still too slow for realtime interaction and more widespread adoption. Existing approaches to PBE synthesis have used automated reasoning tools, such as SMT solvers, as well as works applying machine learning techniques. At its core, the automated reasoning approach relies on highly domain specific knowledge of programming languages. On the other hand, the machine learning approaches utilize the fact that when working with program code, it is possible to generate arbitrarily large training datasets. In this work, we propose a system for using machine learning in tandem with automated reasoning techniques to solve Syntax Guided Synthesis (SyGuS) style PBE problems. By preprocessing SyGuS PBE problems with a neural network, we can use a data driven approach to reduce the size of the search space, then allow automated reasoning-based solvers to more quickly find a solution analytically. Our system is able to run atop existing SyGuS PBE synthesis tools, decreasing the runtime of the winner of the 2019 SyGuS Competition for the PBE Strings track by 47.65% to outperform all of the competing tools.

【Keywords】:

197. D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems.

Paper Link】 【Pages】:1619-1626

【Authors】: Taoxing Pan ; Jun Liu ; Jie Wang

【Abstract】: Decentralized optimization algorithms have attracted intensive interests recently, as it has a balanced communication pattern, especially when solving large-scale machine learning problems. Stochastic Path Integrated Differential Estimator Stochastic First-Order method (SPIDER-SFO) nearly achieves the algorithmic lower bound in certain regimes for nonconvex problems. However, whether we can find a decentralized algorithm which achieves a similar convergence rate to SPIDER-SFO is still unclear. To tackle this problem, we propose a decentralized variant of SPIDER-SFO, called decentralized SPIDER-SFO (D-SPIDER-SFO). We show that D-SPIDER-SFO achieves a similar gradient computation cost—that is, O(ε−3) for finding an ϵ-approximate first-order stationary point—to its centralized counterpart. To the best of our knowledge, D-SPIDER-SFO achieves the state-of-the-art performance for solving nonconvex optimization problems on decentralized networks in terms of the computational cost. Experiments on different network configurations demonstrate the efficiency of the proposed method.

【Keywords】:

198. Estimating the Density of States of Boolean Satisfiability Problems on Classical and Quantum Computing Platforms.

Paper Link】 【Pages】:1627-1635

【Authors】: Tuhin Sahai ; Anurag Mishra ; Jose Miguel Pasini ; Susmit Jha

【Abstract】: Given a Boolean formula ϕ(x) in conjunctive normal form (CNF), the density of states counts the number of variable assignments that violate exactly e clauses, for all values of e. Thus, the density of states is a histogram of the number of unsatisfied clauses over all possible assignments. This computation generalizes both maximum-satisfiability (MAX-SAT) and model counting problems and not only provides insight into the entire solution space, but also yields a measure for the hardness of the problem instance. Consequently, in real-world scenarios, this problem is typically infeasible even when using state-of-the-art algorithms. While finding an exact answer to this problem is a computationally intensive task, we propose a novel approach for estimating density of states based on the concentration of measure inequalities. The methodology results in a quadratic unconstrained binary optimization (QUBO), which is particularly amenable to quantum annealing-based solutions. We present the overall approach and compare results from the D-Wave quantum annealer against the best-known classical algorithms such as the Hamze-de Freitas-Selby (HFS) algorithm and satisfiability modulo theory (SMT) solvers.

【Keywords】:

199. Efficient Algorithms for Generating Provably Near-Optimal Cluster Descriptors for Explainability.

Paper Link】 【Pages】:1636-1643

【Authors】: Prathyush Sambaturu ; Aparna Gupta ; Ian Davidson ; S. S. Ravi ; Anil Vullikanti ; Andrew Warren

【Abstract】: Improving the explainability of the results from machine learning methods has become an important research goal. Here, we study the problem of making clusters more interpretable by extending a recent approach of [Davidson et al., NeurIPS 2018] for constructing succinct representations for clusters. Given a set of objects S, a partition π of S (into clusters), and a universe T of tags such that each element in S is associated with a subset of tags, the goal is to find a representative set of tags for each cluster such that those sets are pairwise-disjoint and the total size of all the representatives is minimized. Since this problem is NP-hard in general, we develop approximation algorithms with provable performance guarantees for the problem. We also show applications to explain clusters from datasets, including clusters of genomic sequences that represent different threat levels.

【Keywords】:

200. Probabilistic Inference for Predicate Constraint Satisfaction.

Paper Link】 【Pages】:1644-1651

【Authors】: Yuki Satake ; Hiroshi Unno ; Hinata Yanagi

【Abstract】: In this paper, we present a novel constraint solving method for a class of predicate Constraint Satisfaction Problems (pCSP) where each constraint is represented by an arbitrary clause of first-order predicate logic over predicate variables. The class of pCSP properly subsumes the well-studied class of Constrained Horn Clauses (CHCs) where each constraint is restricted to a Horn clause. The class of CHCs has been widely applied to verification of linear-time safety properties of programs in different paradigms. In this paper, we show that pCSP further widens the applicability to verification of branching-time safety properties of programs that exhibit finitely-branching non-determinism. Solving pCSP (and CHCs) however is challenging because the search space of solutions is often very large (or unbounded), high-dimensional, and non-smooth. To address these challenges, our method naturally combines techniques studied separately in different literatures: counterexample guided inductive synthesis (CEGIS) and probabilistic inference in graphical models. We have implemented the presented method and obtained promising results on existing benchmarks as well as new ones that are beyond the scope of existing CHC solvers.

【Keywords】:

201. Hard Examples for Common Variable Decision Heuristics.

Paper Link】 【Pages】:1652-1659

【Authors】: Marc Vinyals

【Abstract】: The CDCL algorithm for SAT is equivalent to the resolution proof system under a few assumptions, one of them being an optimal non-deterministic procedure for choosing the next variable to branch on. In practice this task is left to a variable decision heuristic, and since the so-called VSIDS decision heuristic is considered an integral part of CDCL, whether CDCL with a VSIDS-like heuristic is also equivalent to resolution remained a significant open question.We give a negative answer by building a family of formulas that have resolution proofs of polynomial size but require exponential time to decide in CDCL with common heuristics such as VMTF, CHB, and certain implementations of VSIDS and LRB.

【Keywords】:

202. Multiple Graph Matching and Clustering via Decayed Pairwise Matching Composition.

Paper Link】 【Pages】:1660-1667

【Authors】: Tianzhe Wang ; Zetian Jiang ; Junchi Yan

【Abstract】: Jointly matching of multiple graphs is challenging and recently has been an active topic in machine learning and computer vision. State-of-the-art methods have been devised, however, to our best knowledge there is no effective mechanism that can explicitly deal with the matching of a mixture of graphs belonging to multiple clusters, e.g., a collection of bikes and bottles. Seeing its practical importance, we propose a novel approach for multiple graph matching and clustering. Firstly, for the traditional multi-graph matching setting, we devise a composition scheme based on a tree structure, which can be seen as in the between of two strong multi-graph matching solvers, i.e., MatchOpt (Yan et al. 2015a) and CAO (Yan et al. 2016a). In particular, it can be more robust than MatchOpt against a set of diverse graphs and more efficient than CAO. Then we further extend the algorithm to the multiple graph matching and clustering setting, by adopting a decaying technique along the composition path, to discount the meaningless matching between graphs in different clusters. Experimental results show the proposed methods achieve excellent trade-off on the traditional multi-graph matching case, and outperform in both matching and clustering accuracy, as well as time efficiency.

【Keywords】:

203. Constructing Minimal Perfect Hash Functions Using SAT Technology.

Paper Link】 【Pages】:1668-1675

【Authors】: Sean A. Weaver ; Marijn Heule

【Abstract】: Minimal perfect hash functions (MPHFs) are used to provide efficient access to values of large dictionaries (sets of key-value pairs). Discovering new algorithms for building MPHFs is an area of active research, especially from the perspective of storage efficiency. The information-theoretic limit for MPHFs is 1/ln 2 ≈ 1.44 bits per key. The current best practical algorithms range between 2 and 4 bits per key. In this article, we propose two SAT-based constructions of MPHFs. Our first construction yields MPHFs near the information-theoretic limit. For this construction, current state-of-the-art SAT solvers can handle instances where the dictionaries contain up to 40 elements, thereby outperforming the existing (brute-force) methods. Our second construction uses XORSAT filters to realize a practical approach with long-term storage of approximately 1.83 bits per key.

【Keywords】:

204. Explaining Propagators for String Edit Distance Constraints.

Paper Link】 【Pages】:1676-1683

【Authors】: Felix Winter ; Nysret Musliu ; Peter J. Stuckey

【Abstract】: The computation of string similarity measures has been thoroughly studied in the scientific literature and has applications in a wide variety of different areas. One of the most widely used measures is the so called string edit distance which captures the number of required edit operations to transform a string into another given string. Although polynomial time algorithms are known for calculating the edit distance between two strings, there also exist NP-hard problems from practical applications like scheduling or computational biology that constrain the minimum edit distance between arrays of decision variables. In this work, we propose a novel global constraint to formulate restrictions on the minimum edit distance for such problems. Furthermore, we describe a propagation algorithm and investigate an explanation strategy for an edit distance constraint propagator that can be incorporated into state of the art lazy clause generation solvers. Experimental results show that the proposed propagator is able to significantly improve the performance of existing exact methods regarding solution quality and computation speed for benchmark problems from the literature.

【Keywords】:

205. Deep Neural Network Approximated Dynamic Programming for Combinatorial Optimization.

Paper Link】 【Pages】:1684-1691

【Authors】: Shenghe Xu ; Shivendra S. Panwar ; Murali S. Kodialam ; T. V. Lakshman

【Abstract】: In this paper, we propose a general framework for combining deep neural networks (DNNs) with dynamic programming to solve combinatorial optimization problems. For problems that can be broken into smaller subproblems and solved by dynamic programming, we train a set of neural networks to replace value or policy functions at each decision step. Two variants of the neural network approximated dynamic programming (NDP) methods are proposed; in the value-based NDP method, the networks learn to estimate the value of each choice at the corresponding step, while in the policy-based NDP method the DNNs only estimate the best decision at each step. The training procedure of the NDP starts from the smallest problem size and a new DNN for the next size is trained to cooperate with previous DNNs. After all the DNNs are trained, the networks are fine-tuned together to further improve overall performance. We test NDP on the linear sum assignment problem, the traveling salesman problem and the talent scheduling problem. Experimental results show that NDP can achieve considerable computation time reduction on hard problems with reasonable performance loss. In general, NDP can be applied to reducible combinatorial optimization problems for the purpose of computation time reduction.

【Keywords】:

AAAI Technical Track: Game Playing and Interactive Entertainment 7

206. Generating Interactive Worlds with Text.

Paper Link】 【Pages】:1693-1700

【Authors】: Angela Fan ; Jack Urbanek ; Pratik Ringshia ; Emily Dinan ; Emma Qian ; Siddharth Karamcheti ; Shrimai Prabhumoye ; Douwe Kiela ; Tim Rocktäschel ; Arthur Szlam ; Jason Weston

【Abstract】: Procedurally generating cohesive and interesting game environments is challenging and time-consuming. In order for the relationships between the game elements to be natural, common-sense has to be encoded into arrangement of the elements. In this work, we investigate a machine learning approach for world creation using content from the multi-player text adventure game environment LIGHT (Urbanek et al. 2019). We introduce neural network based models to compositionally arrange locations, characters, and objects into a coherent whole. In addition to creating worlds based on existing elements, our models can generate new game content. Humans can also leverage our models to interactively aid in worldbuilding. We show that the game environments created with our approach are cohesive, diverse, and preferred by human evaluators compared to other machine learning based world construction algorithms.

【Keywords】:

207. Deep Reinforcement Learning for General Game Playing.

Paper Link】 【Pages】:1701-1708

【Authors】: Adrian Goldwaser ; Michael Thielscher

【Abstract】: General Game Playing agents are required to play games they have never seen before simply by looking at a formal description of the rules of the game at runtime. Previous successful agents have been based on search with generic heuristics, with almost no work done into using machine learning. Recent advances in deep reinforcement learning have shown it to be successful in some two-player zero-sum board games such as Chess and Go. This work applies deep reinforcement learning to General Game Playing, extending the AlphaZero algorithm and finds that it can provide competitive results.

【Keywords】:

208. Narrative Planning Model Acquisition from Text Summaries and Descriptions.

Paper Link】 【Pages】:1709-1716

【Authors】: Thomas Hayton ; Julie Porteous ; João Fernando Ferreira ; Alan Lindsay

【Abstract】: AI Planning has been shown to be a useful approach for the generation of narrative in interactive entertainment systems and games. However, the creation of the underlying narrative domain models is challenging: the well documented AI planning modelling bottleneck is further compounded by the need for authors, who tend to be non-technical, to create content. We seek to support authors in this task by allowing natural language (NL) plot synopses to be used as a starting point from which planning domain models can be automatically acquired. We present a solution which analyses input NL text summaries, and builds structured representations from which a pddl model is output (fully automated or author in-the-loop). We introduce a novel sieve-based approach to pronoun resolution that demonstrates consistently high performance across domains. In the paper we focus on authoring of narrative planning models for use in interactive entertainment systems and games. We show that our approach exhibits comprehensive detection of both actions and objects in the system-extracted domain models, in combination with significant improvement in the accuracy of pronoun resolution due to the use of contextual object information. Our results and an expert user assessment show that our approach enables a reduction in authoring effort required to generate baseline narrative domain models from which variants can be built.

【Keywords】:

209. FET-GAN: Font and Effect Transfer via K-shot Adaptive Instance Normalization.

Paper Link】 【Pages】:1717-1724

【Authors】: Wei Li ; Yongxing He ; Yanwei Qi ; Zejian Li ; Yongchuan Tang

【Abstract】: Text effect transfer aims at learning the mapping between text visual effects while maintaining the text content. While remarkably successful, existing methods have limited robustness in font transfer and weak generalization ability to unseen effects. To address these problems, we propose FET-GAN, a novel end-to-end framework to implement visual effects transfer with font variation among multiple text effects domains. Our model achieves remarkable results both on arbitrary effect transfer between texts and effect translation from text to graphic objects. By a few-shot fine-tuning strategy, FET-GAN can generalize the transfer of the pre-trained model to the new effect. Through extensive experimental validation and comparison, our model advances the state-of-the-art in the text effect transfer task. Besides, we have collected a font dataset including 100 fonts of more than 800 Chinese and English characters. Based on this dataset, we demonstrated the generalization ability of our model by the application that complements the font library automatically by few-shot samples. This application is significant in reducing the labor cost for the font designer.

【Keywords】:

210. A Character-Centric Neural Model for Automated Story Generation.

Paper Link】 【Pages】:1725-1732

【Authors】: Danyang Liu ; Juntao Li ; Meng-Hsuan Yu ; Ziming Huang ; Gongshen Liu ; Dongyan Zhao ; Rui Yan

【Abstract】: Automated story generation is a challenging task which aims to automatically generate convincing stories composed of successive plots correlated with consistent characters. Most recent generation models are built upon advanced neural networks, e.g., variational autoencoder, generative adversarial network, convolutional sequence to sequence model. Although these models have achieved prompting results on learning linguistic patterns, very few methods consider the attributes and prior knowledge of the story genre, especially from the perspectives of explainability and consistency. To fill this gap, we propose a character-centric neural storytelling model, where a story is created encircling the given character, i.e., each part of a story is conditioned on a given character and corresponded context environment. In this way, we explicitly capture the character information and the relations between plots and characters to improve explainability and consistency. Experimental results on open dataset indicate that our model yields meaningful improvements over several strong baselines on both human and automatic evaluations.

【Keywords】:

211. Fast and Robust Face-to-Parameter Translation for Game Character Auto-Creation.

Paper Link】 【Pages】:1733-1740

【Authors】: Tianyang Shi ; Zhengxia Zou ; Yi Yuan ; Changjie Fan

【Abstract】: With the rapid development of Role-Playing Games (RPGs), players are now allowed to edit the facial appearance of their in-game characters with their preferences rather than using default templates. This paper proposes a game character auto-creation framework that generates in-game characters according to a player's input face photo. Different from the previous methods that are designed based on neural style transfer or monocular 3D face reconstruction, we re-formulate the character auto-creation process in a different point of view: by predicting a large set of physically meaningful facial parameters under a self-supervised learning paradigm. Instead of updating facial parameters iteratively at the input end of the renderer as suggested by previous methods, which are time-consuming, we introduce a facial parameter translator so that the creation can be done efficiently through a single forward propagation from the face embeddings to parameters, with a considerable 1000x computational speedup. Despite its high efficiency, the interactivity is preserved in our method where users are allowed to optionally fine-tune the facial parameters on our creation according to their needs. Our approach also shows better robustness than previous methods, especially for those photos with head-pose variance. Comparison results and ablation analysis on seven public face verification datasets suggest the effectiveness of our method.

【Keywords】:

212. Draft and Edit: Automatic Storytelling Through Multi-Pass Hierarchical Conditional Variational Autoencoder.

Paper Link】 【Pages】:1741-1748

【Authors】: Meng-Hsuan Yu ; Juntao Li ; Danyang Liu ; Bo Tang ; Haisong Zhang ; Dongyan Zhao ; Rui Yan

【Abstract】: Automatic Storytelling has consistently been a challenging area in the field of natural language processing. Despite considerable achievements have been made, the gap between automatically generated stories and human-written stories is still significant. Moreover, the limitations of existing automatic storytelling methods are obvious, e.g., the consistency of content, wording diversity. In this paper, we proposed a multi-pass hierarchical conditional variational autoencoder model to overcome the challenges and limitations in existing automatic storytelling models. While the conditional variational autoencoder (CVAE) model has been employed to generate diversified content, the hierarchical structure and multi-pass editing scheme allow the story to create more consistent content. We conduct extensive experiments on the ROCStories Dataset. The results verified the validity and effectiveness of our proposed model and yields substantial improvement over the existing state-of-the-art approaches.

【Keywords】:

AAAI Technical Track: Game Theory and Economic Paradigms 71

213. Distance-Based Equilibria in Normal-Form Games.

Paper Link】 【Pages】:1750-1757

【Authors】: Erman Acar ; Reshef Meir

【Abstract】: We propose a simple uncertainty modification for the agent model in normal-form games; at any given strategy profile, the agent can access only a set of “possible profiles” that are within a certain distance from the actual action profile. We investigate the various instantiations in which the agent chooses her strategy using well-known rationales e.g., considering the worst case, or trying to minimize the regret, to cope with such uncertainty. Any such modification in the behavioral model naturally induces a corresponding notion of equilibrium; a distance-based equilibrium. We characterize the relationships between the various equilibria, and also their connections to well-known existing solution concepts such as Trembling-hand perfection. Furthermore, we deliver existence results, and show that for some class of games, such solution concepts can actually lead to better outcomes.

【Keywords】:

214. Swap Stability in Schelling Games on Graphs.

Paper Link】 【Pages】:1758-1765

【Authors】: Aishwarya Agarwal ; Edith Elkind ; Jiarui Gan ; Alexandros A. Voudouris

【Abstract】: We study a recently introduced class of strategic games that is motivated by and generalizes Schelling's well-known residential segregation model. These games are played on undirected graphs, with the set of agents partitioned into multiple types; each agent either occupies a node of the graph and never moves away or aims to maximize the fraction of her neighbors who are of her own type. We consider a variant of this model that we call swap Schelling games, where the number of agents is equal to the number of nodes of the graph, and agents may swap positions with other agents to increase their utility. We study the existence, computational complexity and quality of equilibrium assignments in these games, both from a social welfare perspective and from a diversity perspective.

【Keywords】:

215. The Impact of Selfishness in Hypergraph Hedonic Games.

Paper Link】 【Pages】:1766-1773

【Authors】: Alessandro Aloisio ; Michele Flammini ; Cosimo Vinci

【Abstract】: We consider a class of coalition formation games that can be succinctly represented by means of hypergraphs and properly generalizes symmetric additively separable hedonic games. More precisely, an instance of hypegraph hedonic game consists of a weighted hypergraph, in which each agent is associated to a distinct node and her utility for being in a given coalition is equal to the sum of the weights of all the hyperedges included in the coalition. We study the performance of stable outcomes in such games, investigating the degradation of their social welfare under two different metrics, the k-Nash price of anarchy and k-core price of anarchy, where k is the maximum size of a deviating coalition. Such prices are defined as the worst-case ratio between the optimal social welfare and the social welfare obtained when the agents reach an outcome satisfying the respective stability criteria. We provide asymptotically tight upper and lower bounds on the values of these metrics for several classes of hypergraph hedonic games, parametrized according to the integer k, the hypergraph arity r and the number of agents n. Furthermore, we show that the problem of computing the exact value of such prices for a given instance is computationally hard, even in case of non-negative hyperedge weights.

【Keywords】:

216. Multiagent Evaluation Mechanisms.

Paper Link】 【Pages】:1774-1781

【Authors】: Tal Alon ; Magdalen Dobson ; Ariel D. Procaccia ; Inbal Talgam-Cohen ; Jamie Tucker-Foltz

【Abstract】: We consider settings where agents are evaluated based on observed features, and assume they seek to achieve feature values that bring about good evaluations. Our goal is to craft evaluation mechanisms that incentivize the agents to invest effort in desirable actions; a notable application is the design of course grading schemes. Previous work has studied this problem in the case of a single agent. By contrast, we investigate the general, multi-agent model, and provide a complete characterization of its computational complexity.

【Keywords】:

217. Peeking Behind the Ordinal Curtain: Improving Distortion via Cardinal Queries.

Paper Link】 【Pages】:1782-1789

【Authors】: Georgios Amanatidis ; Georgios Birmpas ; Aris Filos-Ratsikas ; Alexandros A. Voudouris

【Abstract】: The notion of distortion was introduced by Procaccia and Rosenschein (2006) to quantify the inefficiency of using only ordinal information when trying to maximize the social welfare. Since then, this research area has flourished and bounds on the distortion have been obtained for a wide variety of fundamental scenarios. However, the vast majority of the existing literature is focused on the case where nothing is known beyond the ordinal preferences of the agents over the alternatives. In this paper, we take a more expressive approach, and consider mechanisms that are allowed to further ask a few cardinal queries in order to gain partial access to the underlying values that the agents have for the alternatives. With this extra power, we design new deterministic mechanisms that achieve significantly improved distortion bounds and outperform the best-known randomized ordinal mechanisms. We draw an almost complete picture of the number of queries required to achieve specific distortion bounds.

【Keywords】:

218. Multiple Birds with One Stone: Beating 1/2 for EFX and GMMS via Envy Cycle Elimination.

Paper Link】 【Pages】:1790-1797

【Authors】: Georgios Amanatidis ; Evangelos Markakis ; Apostolos Ntokos

【Abstract】: Several relaxations of envy-freeness, tailored to fair division in settings with indivisible goods, have been introduced within the last decade. Due to the lack of general existence results for most of these concepts, great attention has been paid to establishing approximation guarantees. In this work, we propose a simple algorithm that is universally fair in the sense that it returns allocations that have good approximation guarantees with respect to four such fairness notions at once. In particular, this is the first algorithm achieving a (φ−1)-approximation of envy-freeness up to any good (EFX) and a 2/φ+2 -approximation of groupwise maximin share fairness (GMMS), where φ is the golden ratio. The best known approximation factor, in polynomial time, for either one of these fairness notions prior to this work was 1/2. Moreover, the returned allocation achieves envy-freeness up to one good (EF1) and a 2/3-approximation of pairwise maximin share fairness (PMMS). While EFX is our primary focus, we also exhibit how to fine-tune our algorithm and improve further the guarantees for GMMS or PMMS.Finally, we show that GMMS—and thus PMMS and EFX—allocations always exist when the number of goods does not exceed the number of agents by more than two.

【Keywords】:

219. All-Pay Bidding Games on Graphs.

Paper Link】 【Pages】:1798-1805

【Authors】: Guy Avni ; Rasmus Ibsen-Jensen ; Josef Tkadlec

【Abstract】: In this paper we introduce and study all-pay bidding games, a class of two player, zero-sum games on graphs. The game proceeds as follows. We place a token on some vertex in the graph and assign budgets to the two players. Each turn, each player submits a sealed legal bid (non-negative and below their remaining budget), which is deducted from their budget and the highest bidder moves the token onto an adjacent vertex. The game ends once a sink is reached, and Player 1 pays Player 2 the outcome that is associated with the sink. The players attempt to maximize their expected outcome. Our games model settings where effort (of no inherent value) needs to be invested in an ongoing and stateful manner. On the negative side, we show that even in simple games on DAGs, optimal strategies may require a distribution over bids with infinite support. A central quantity in bidding games is the ratio of the players budgets. On the positive side, we show a simple FPTAS for DAGs, that, for each budget ratio, outputs an approximation for the optimal strategy for that ratio. We also implement it, show that it performs well, and suggests interesting properties of these games. Then, given an outcome c, we show an algorithm for finding the necessary and sufficient initial ratio for guaranteeing outcome c with probability 1 and a strategy ensuring such. Finally, while the general case has not previously been studied, solving the specific game in which Player 1 wins iff he wins the first two auctions, has been long stated as an open question, which we solve.

【Keywords】:

220. Facility Location Problem with Capacity Constraints: Algorithmic and Mechanism Design Perspectives.

Paper Link】 【Pages】:1806-1813

【Authors】: Haris Aziz ; Hau Chan ; Barton Lee ; Bo Li ; Toby Walsh

【Abstract】: We consider the facility location problem in the one-dimensional setting where each facility can serve a limited number of agents from the algorithmic and mechanism design perspectives. From the algorithmic perspective, we prove that the corresponding optimization problem, where the goal is to locate facilities to minimize either the total cost to all agents or the maximum cost of any agent is NP-hard. However, we show that the problem is fixed-parameter tractable, and the optimal solution can be computed in polynomial time whenever the number of facilities is bounded, or when all facilities have identical capacities. We then consider the problem from a mechanism design perspective where the agents are strategic and need not reveal their true locations. We show that several natural mechanisms studied in the uncapacitated setting either lose strategyproofness or a bound on the solution quality %on the returned solution for the total or maximum cost objective. We then propose new mechanisms that are strategyproof and achieve approximation guarantees that almost match the lower bounds.

【Keywords】:

221. Fair Division of Mixed Divisible and Indivisible Goods.

Paper Link】 【Pages】:1814-1821

【Authors】: Xiaohui Bei ; Zihao Li ; Jinyan Liu ; Shengxin Liu ; Xinhang Lu

【Abstract】: We study the problem of fair division when the resources contain both divisible and indivisible goods. Classic fairness notions such as envy-freeness (EF) and envy-freeness up to one good (EF1) cannot be directly applied to the mixed goods setting. In this work, we propose a new fairness notion envy-freeness for mixed goods (EFM), which is a direct generalization of both EF and EF1 to the mixed goods setting. We prove that an EFM allocation always exists for any number of agents. We also propose efficient algorithms to compute an EFM allocation for two agents and for n agents with piecewise linear valuations over the divisible goods. Finally, we relax the envy-free requirement, instead asking for ϵ-envy-freeness for mixed goods (ϵ-EFM), and present an algorithm that finds an ϵ-EFM allocation in time polynomial in the number of agents, the number of indivisible goods, and 1/ϵ.

【Keywords】:

222. Individual-Based Stability in Hedonic Diversity Games.

Paper Link】 【Pages】:1822-1829

【Authors】: Niclas Boehmer ; Edith Elkind

【Abstract】: In hedonic diversity games (HDGs), recently introduced by Bredereck, Elkind, and Igarashi (2019), each agent belongs to one of two classes (men and women, vegetarians and meat-eaters, junior and senior researchers), and agents' preferences over coalitions are determined by the fraction of agents from their class in each coalition. Bredereck et al. show that while an HDG may fail to have a Nash stable (NS) or a core stable (CS) outcome, every HDG in which all agents have single-peaked preferences admits an individually stable (IS) outcome, which can be computed in polynomial time. In this work, we extend and strengthen these results in several ways. First, we establish that the problem of deciding if an HDG has an NS outcome is NP-complete, but admits an XP algorithm with respect to the size of the smaller class. Second, we show that, in fact, all HDGs admit IS outcomes that can be computed in polynomial time; our algorithm for finding such outcomes is considerably simpler than that of Bredereck et al. We also consider two ways of generalizing the model of Bredereck et al. to k ≥ 2 classes. We complement our theoretical results by empirical analysis, comparing the IS outcomes found by our algorithm, the algorithm of Bredereck et al. and a natural better-response dynamics.

【Keywords】:

223. Adapting Stable Matchings to Evolving Preferences.

Paper Link】 【Pages】:1830-1837

【Authors】: Robert Bredereck ; Jiehua Chen ; Dusan Knop ; Junjie Luo ; Rolf Niedermeier

【Abstract】: Adaptivity to changing environments and constraints is key to success in modern society. We address this by proposing “incrementalized versions” of Stable Marriage and Stable Roommates. That is, we try to answer the following question: for both problems, what is the computational cost of adapting an existing stable matching after some of the preferences of the agents have changed. While doing so, we also model the constraint that the new stable matching shall be not too different from the old one. After formalizing these incremental versions, we provide a fairly comprehensive picture of the computational complexity landscape of Incremental Stable Marriage and Incremental Stable Roommates. To this end, we exploit the parameters “degree of change” both in the input (difference between old and new preference profile) and in the output (difference between old and new stable matching). We obtain both hardness and tractability results, in particular showing a fixed-parameter tractability result with respect to the parameter “distance between old and new stable matching”.

【Keywords】:

224. Parameterized Algorithms for Finding a Collective Set of Items.

Paper Link】 【Pages】:1838-1845

【Authors】: Robert Bredereck ; Piotr Faliszewski ; Andrzej Kaczmarczyk ; Dusan Knop ; Rolf Niedermeier

【Abstract】: We extend the work of Skowron et al. (AIJ, 2016) by considering the parameterized complexity of the following problem. We are given a set of items and a set of agents, where each agent assigns an integer utility value to each item. The goal is to find a set of k items that these agents would collectively use. For each such collective set of items, each agent provides a score that can be described using an OWA (ordered weighted average) operator and we seek a set with the highest total score. We focus on the parameterization by the number of agents and we find numerous fixed-parameter tractability results (however, we also find some W[1]-hardness results). It turns out that most of our algorithms even apply to the setting where each agent has an integer weight.

【Keywords】:

225. Electing Successive Committees: Complexity and Algorithms.

Paper Link】 【Pages】:1846-1853

【Authors】: Robert Bredereck ; Andrzej Kaczmarczyk ; Rolf Niedermeier

【Abstract】: We introduce successive committees elections. The point is that our new model additionally takes into account that “committee members” shall have a short term of office possibly over a consecutive time period (e.g., to limit the influence of elitist power cartels or to keep the social costs of overloading committees as small as possible) but at the same time overly frequent elections are to be avoided (e.g., for the sake of long-term planning). Thus, given voter preferences over a set of candidates, a desired committee size, a number of committees to be elected, and an upper bound on the number of committees that each candidate can participate in, the goal is to find a “best possible” series of committees representing the electorate. We show a sharp complexity dichotomy between computing series of committees of size at most two (mostly in polynomial time) and of committees of size at least three (mostly NP-hard). Depending on the voting rule, however, even for larger committee sizes we can spot some tractable cases.

【Keywords】:

226. Approval-Based Apportionment.

Paper Link】 【Pages】:1854-1861

【Authors】: Markus Brill ; Paul Gölz ; Dominik Peters ; Ulrike Schmidt-Kraepelin ; Kai Wilker

【Abstract】: In the apportionment problem, a fixed number of seats must be distributed among parties in proportion to the number of voters supporting each party. We study a generalization of this setting, in which voters cast approval ballots over parties, such that each voter can support multiple parties. This approval-based apportionment setting generalizes traditional apportionment and is a natural restriction of approval-based multiwinner elections, where approval ballots range over individual candidates. Using techniques from both apportionment and multiwinner elections, we are able to provide representation guarantees that are currently out of reach in the general setting of multiwinner elections: First, we show that core-stable committees are guaranteed to exist and can be found in polynomial time. Second, we demonstrate that extended justified representation is compatible with committee monotonicity.

【Keywords】:

227. Refining Tournament Solutions via Margin of Victory.

Paper Link】 【Pages】:1862-1869

【Authors】: Markus Brill ; Ulrike Schmidt-Kraepelin ; Warut Suksompong

【Abstract】: Tournament solutions are frequently used to select winners from a set of alternatives based on pairwise comparisons between alternatives. Prior work has shown that several common tournament solutions tend to select large winner sets and therefore have low discriminative power. In this paper, we propose a general framework for refining tournament solutions. In order to distinguish between winning alternatives, and also between non-winning ones, we introduce the notion of margin of victory (MoV) for tournament solutions. MoV is a robustness measure for individual alternatives: For winners, the MoV captures the distance from dropping out of the winner set, and for non-winners, the distance from entering the set. In each case, distance is measured in terms of which pairwise comparisons would have to be reversed in order to achieve the desired outcome. For common tournament solutions, including the top cycle, the uncovered set, and the Banks set, we determine the complexity of computing the MoV and provide worst-case bounds on the MoV for both winners and non-winners. Our results can also be viewed from the perspective of bribery and manipulation.

【Keywords】:

228. Persuading Voters: It's Easy to Whisper, It's Hard to Speak Loud.

Paper Link】 【Pages】:1870-1877

【Authors】: Matteo Castiglioni ; Andrea Celli ; Nicola Gatti

【Abstract】: We focus on the following natural question: is it possible to influence the outcome of a voting process through the strategic provision of information to voters who update their beliefs rationally? We investigate whether it is computationally tractable to design a signaling scheme maximizing the probability with which the sender's preferred candidate is elected. We resort to the model recently introduced by Arieli and Babichenko (2019) (i.e., without inter-agent externalities), and focus on, as illustrative examples, k-voting rules and plurality voting. There is a sharp contrast between the case in which private signals are allowed and the more restrictive setting in which only public signals are allowed. In the former, we show that an optimal signaling scheme can be computed efficiently both under a k-voting rule and plurality voting. In establishing these results, we provide two contributions applicable to general settings beyond voting. Specifically, we extend a well-known result by Dughmi and Xu (2017) to more general settings and prove that, when the sender's utility function is anonymous, computing an optimal signaling scheme is fixed-parameter tractable in the number of receivers' actions. In the public signaling case, we show that the sender's optimal expected return cannot be approximated to within any factor under a k-voting rule. This negative result easily extends to plurality voting and problems where utility functions are anonymous.

【Keywords】:

229. Election Control in Social Networks via Edge Addition or Removal.

Paper Link】 【Pages】:1878-1885

【Authors】: Matteo Castiglioni ; Diodato Ferraioli ; Nicola Gatti

【Abstract】: We focus on the scenario in which messages pro and/or against one or multiple candidates are spread through a social network in order to affect the votes of the receivers. Several results are known in the literature when the manipulator can make seeding by buying influencers. In this paper, instead, we assume the set of influencers and their messages to be given, and we ask whether a manipulator (e.g., the platform) can alter the outcome of the election by adding or removing edges in the social network. We study a wide range of cases distinguishing for the number of candidates or for the kind of messages spread over the network. We provide a positive result, showing that, except for trivial cases, manipulation is not affordable, the optimization problem being hard even if the manipulator has an unlimited budget (i.e., he can add or remove as many edges as desired). Furthermore, we prove that our hardness results still hold in a reoptimization variant, where the manipulator already knows an optimal solution to the problem and needs to compute a new solution once a local modification occurs (e.g., in bandit scenarios where estimations related to random variables change over time).

【Keywords】:

230. Private Bayesian Persuasion with Sequential Games.

Paper Link】 【Pages】:1886-1893

【Authors】: Andrea Celli ; Stefano Coniglio ; Nicola Gatti

【Abstract】: We study an information-structure design problem (a.k.a. a persuasion problem) with a single sender and multiple receivers with actions of a priori unknown types, independently drawn from action-specific marginal probability distributions. As in the standard Bayesian persuasion model, the sender has access to additional information regarding the action types, which she can exploit when committing to a (noisy) signaling scheme through which she sends a private signal to each receiver. The novelty of our model is in considering the much more expressive case in which the receivers interact in a sequential game with imperfect information, with utilities depending on the game outcome and the realized action types. After formalizing the notions of ex ante and ex interim persuasiveness (which differ by the time at which the receivers commit to following the sender's signaling scheme), we investigate the continuous optimization problem of computing a signaling scheme which maximizes the sender's expected revenue. We show that computing an optimal ex ante persuasive signaling scheme is NP-hard when there are three or more receivers. Instead, in contrast with previous hardness results for ex interim persuasion, we show that, for games with two receivers, an optimal ex ante persuasive signaling scheme can be computed in polynomial time thanks to the novel algorithm we propose, based on the ellipsoid method.

【Keywords】:

Paper Link】 【Pages】:1894-1901

【Authors】: Xujin Chen ; Minming Li ; Chenhao Wang

【Abstract】: We study single-candidate voting embedded in a metric space, where both voters and candidates are points in the space, and the distances between voters and candidates specify the voters' preferences over candidates. In the voting, each voter is asked to submit her favorite candidate. Given the collection of favorite candidates, a mechanism for eliminating the least popular candidate finds a committee containing all candidates but the one to be eliminated. Each committee is associated with a social value that is the sum of the costs (utilities) it imposes (provides) to the voters. We design mechanisms for finding a committee to optimize the social value. We measure the quality of a mechanism by its distortion, defined as the worst-case ratio between the social value of the committee found by the mechanism and the optimal one. We establish new upper and lower bounds on the distortion of mechanisms in this single-candidate voting, for both general metrics and well-motivated special cases.

【Keywords】:

232. Manipulating Districts to Win Elections: Fine-Grained Complexity.

Paper Link】 【Pages】:1902-1909

【Authors】: Eduard Eiben ; Fedor V. Fomin ; Fahad Panolan ; Kirill Simonov

【Abstract】: Gerrymandering is a practice of manipulating district boundaries and locations in order to achieve a political advantage for a particular party. Lewenberg, Lev, and Rosenschein [AAMAS 2017] initiated the algorithmic study of a geographically-based manipulation problem, where voters must vote at the ballot box closest to them. In this variant of gerrymandering, for a given set of possible locations of ballot boxes and known political preferences of n voters, the task is to identify locations for k boxes out of m possible locations to guarantee victory of a certain party in at least ℓ districts. Here integers k and ℓ are some selected parameter.It is known that the problem is NP-complete already for 4 political parties and prior to our work only heuristic algorithms for this problem were developed. We initiate the rigorous study of the gerrymandering problem from the perspectives of parameterized and fine-grained complexity and provide asymptotically matching lower and upper bounds on its computational complexity. We prove that the problem is W[1]-hard parameterized by k + n and that it does not admit an f(n,k) · mo(√k) algorithm for any function f of k and n only, unless the Exponential Time Hypothesis (ETH) fails. Our lower bounds hold already for 2 parties. On the other hand, we give an algorithm that solves the problem for a constant number of parties in time (m+n)O(√k).

【Keywords】:

233. On Swap Convexity of Voting Rules.

Paper Link】 【Pages】:1910-1917

【Authors】: Svetlana Obraztsova ; Edith Elkind ; Piotr Faliszewski

【Abstract】: Obraztsova et al. (2013) have recently proposed an intriguing convexity axiom for voting rules. This axiom imposes conditions on the shape of the sets of elections with a given candidate as a winner. However, this new axiom is both too weak and too strong: it is too weak because it defines a set to be convex if for any two elements of the set some shortest path between them lies within the set, whereas the standard definition of convexity requires all shortest paths between two elements to lie within the set, and it is too strong because common voting rules do not satisfy this axiom. In this paper, we (1) propose several families of voting rules that are convex in the sense of Obraztsova et al.; (2) put forward a weaker notion of convexity that is satisfied by most common voting rules; (3) prove impossibility results for a variant of this definition that considers all, rather than some shortest paths.

【Keywords】:

234. Analysis of One-to-One Matching Mechanisms via SAT Solving: Impossibilities for Universal Axioms.

Paper Link】 【Pages】:1918-1925

【Authors】: Ulle Endriss

【Abstract】: We develop a powerful approach that makes modern SAT solving techniques available as a tool to support the axiomatic analysis of economic matching mechanisms. Our central result is a preservation theorem, establishing sufficient conditions under which the possibility of designing a matching mechanism meeting certain axiomatic requirements for a given number of agents carries over to all scenarios with strictly fewer agents. This allows us to obtain general results about matching by verifying claims for specific instances using a SAT solver. We use our approach to automatically derive elementary proofs for two new impossibility theorems: (i) a strong form of Roth's classical result regarding the impossibility of designing mechanisms that are both stable and strategyproof and (ii) a result establishing the impossibility of guaranteeing stability while also respecting a basic notion of cross-group fairness (so-called gender-indifference).

【Keywords】:

235. Iterative Delegations in Liquid Democracy with Restricted Preferences.

Paper Link】 【Pages】:1926-1933

【Authors】: Bruno Escoffier ; Hugo Gilbert ; Adèle Pass-Lanneau

【Abstract】: Liquid democracy is a collective decision making paradigm which lies between direct and representative democracy. One main feature of liquid democracy is that voters can delegate their votes in a transitive manner so that: A delegates to B and B delegates to C leads to A delegates to C. Unfortunately, because voters' preferences over delegates may be conflicting, this process may not converge. There may not even exist a stable state (also called equilibrium). In this paper, we investigate the stability of the delegation process in liquid democracy when voters have restricted types of preference on the agent representing them (e.g., single-peaked preferences). We show that various natural structures of preference guarantee the existence of an equilibrium and we obtain both tractability and hardness results for the problem of computing several equilibria with some desirable properties.

【Keywords】:

236. Coarse Correlation in Extensive-Form Games.

Paper Link】 【Pages】:1934-1941

【Authors】: Gabriele Farina ; Tommaso Bianchi ; Tuomas Sandholm

【Abstract】: Coarse correlation models strategic interactions of rational agents complemented by a correlation device which is a mediator that can recommend behavior but not enforce it. Despite being a classical concept in the theory of normal-form games since 1978, not much is known about the merits of coarse correlation in extensive-form settings. In this paper, we consider two instantiations of the idea of coarse correlation in extensive-form games: normal-form coarse-correlated equilibrium (NFCCE), already defined in the literature, and extensive-form coarse-correlated equilibrium (EFCCE), a new solution concept that we introduce. We show that EFCCEs are a subset of NFCCEs and a superset of the related extensive-form correlated equilibria. We also show that, in n-player extensive-form games, social-welfare-maximizing EFCCEs and NFCCEs are bilinear saddle points, and give new efficient algorithms for the special case of two-player games with no chance moves. Experimentally, our proposed algorithm for NFCCE is two to four orders of magnitude faster than the prior state of the art.

【Keywords】:

237. Designing Committees for Mitigating Biases.

Paper Link】 【Pages】:1942-1949

【Authors】: Michal Feldman ; Yishay Mansour ; Noam Nisan ; Sigal Oren ; Moshe Tennenholtz

【Abstract】: It is widely observed that individuals prefer to interact with others who are more similar to them (this phenomenon is termed homophily). This similarity manifests itself in various ways such as beliefs, values and education. Thus, it should not come as a surprise that when people make hiring choices, for example, their similarity to the candidate plays a role in their choice. In this paper, we suggest that putting the decision in the hands of a committee instead of a single person can reduce this bias. We study a novel model of voting in which a committee of experts is constructed to reduce the biases of its members. We first present voting rules that optimally reduce the biases of a given committee. Our main results include the design of committees, for several settings, that are able to reach a nearly optimal (unbiased) choice. We also provide a thorough analysis of the trade-offs between the committee size and the obtained error. Our model is inherently different from the well-studied models of voting that focus on aggregation of preferences or on aggregation of information due to the introduction of similarity biases.

【Keywords】:

238. Strategyproof Mechanisms for Friends and Enemies Games.

Paper Link】 【Pages】:1950-1957

【Authors】: Michele Flammini ; Bojana Kodric ; Giovanna Varricchio

【Abstract】: We investigate strategyproof mechanisms for Friends and Enemies Games, a subclass of Hedonic Games in which every agent classifies any other one as a friend or as an enemy. In this setting, we consider the two classical scenarios proposed in the literature, called Friends Appreciation (FA) and Enemies Aversion (EA). Roughly speaking, in the former each agent gives priority to the number of friends in her coalition, while in the latter to the number of enemies.We provide strategyproof mechanisms for both settings. More precisely, for FA we first present a deterministic n-approximation mechanism, and then show that a much better result can be accomplished by resorting to randomization. Namely, we provide a randomized mechanism whose expected approximation ratio is 4, and arbitrarily close to 4 with high probability. For EA, we give a simple (1+√2)n-approximation mechanism, and show that its performance is asymptotically tight by proving that it is NP-hard to approximate the optimal solution within O(n1−ɛ) for any fixed ɛ > 0.Finally, we show how to extend our results in the presence of neutrals, i.e., when agents can also be indifferent about other agents, and we discuss anonymity.

【Keywords】:

239. Preventing Arbitrage from Collusion When Eliciting Probabilities.

Paper Link】 【Pages】:1958-1965

【Authors】: Rupert Freeman ; David M. Pennock ; Dominik Peters ; Bo Waggoner

【Abstract】: We consider the design of mechanisms to elicit probabilistic forecasts when agents are strategic and may collude with one another. Chun and Shachter (2011) have shown that when agents may form coalitions, many known mechanisms for elicitation permit arbitrage, allowing the coalition members to guarantee themselves higher payments by misreporting their beliefs. We consider two approaches to protect against colluding agents. First, we present a novel strictly proper mechanism that does not admit arbitrage provided that the reports of the agents are bounded away from 0 and 1, a common assumption in many settings. Second, we discover strictly arbitrage-free mechanisms that satisfy an intermediate guarantee between weak and strict properness.

【Keywords】:

240. VCG under Sybil (False-Name) Attacks - A Bayesian Analysis.

Paper Link】 【Pages】:1966-1973

【Authors】: Yotam Gafni ; Ron Lavi ; Moshe Tennenholtz

【Abstract】: VCG is a classical combinatorial auction that maximizes social welfare. However, while the standard single-item Vickrey auction is false-name-proof, a major failure of multi-item VCG is its vulnerability to false-name attacks. This occurs already in the natural bare minimum model in which there are two identical items and bidders are single-minded. Previous solutions to this challenge focused on developing alternative mechanisms that compromise social welfare. We re-visit the VCG auction vulnerability and consider the bidder behavior in Bayesian settings. In service of that we introduce a novel notion, termed the granularity threshold, that characterizes VCG Bayesian resilience to false-name attacks as a function of the bidder type distribution. Using this notion we show a large class of cases in which VCG indeed obtains Bayesian resilience for the two-item single-minded setting.

【Keywords】:

241. Bidding in Smart Grid PDAs: Theory, Analysis and Strategy.

Paper Link】 【Pages】:1974-1981

【Authors】: Susobhan Ghosh ; Sujit Gujar ; Praveen Paruchuri ; Easwar Subramanian ; Sanjay P. Bhat

【Abstract】: Periodic Double Auctions (PDAs) are commonly used in the real world for trading, e.g. in stock markets to determine stock opening prices, and energy markets to trade energy in order to balance net demand in smart grids, involving trillions of dollars in the process. A bidder, participating in such PDAs, has to plan for bids in the current auction as well as for the future auctions, which highlights the necessity of good bidding strategies. In this paper, we perform an equilibrium analysis of single unit single-shot double auctions with a certain clearing price and payment rule, which we refer to as ACPR, and find it intractable to analyze as number of participating agents increase. We further derive the best response for a bidder with complete information in a single-shot double auction with ACPR. Leveraging the theory developed for single-shot double auction and taking the PowerTAC wholesale market PDA as our testbed, we proceed by modeling the PDA of PowerTAC as an MDP. We propose a novel bidding strategy, namely MDPLCPBS. We empirically show that MDPLCPBS follows the equilibrium strategy for double auctions that we previously analyze. In addition, we benchmark our strategy against the baseline and the state-of-the-art bidding strategies for the PowerTAC wholesale market PDAs, and show that MDPLCPBS outperforms most of them consistently.

【Keywords】:

242. Beyond Pairwise Comparisons in Social Choice: A Setwise Kemeny Aggregation Problem.

Paper Link】 【Pages】:1982-1989

【Authors】: Hugo Gilbert ; Tom Portoleau ; Olivier Spanjaard

【Abstract】: In this paper, we advocate the use of setwise contests for aggregating a set of input rankings into an output ranking. We propose a generalization of the Kemeny rule where one minimizes the number of k-wise disagreements instead of pairwise disagreements (one counts 1 disagreement each time the top choice in a subset of alternatives of cardinality at most k differs between an input ranking and the output ranking). After an algorithmic study of this k-wise Kemeny aggregation problem, we introduce a k-wise counterpart of the majority graph. It reveals useful to divide the aggregation problem into several sub-problems. We conclude with numerical tests.

【Keywords】:

243. Contiguous Cake Cutting: Hardness Results and Approximation Algorithms.

Paper Link】 【Pages】:1990-1997

【Authors】: Paul W. Goldberg ; Alexandros Hollender ; Warut Suksompong

【Abstract】: We study the fair allocation of a cake, which serves as a metaphor for a divisible resource, under the requirement that each agent should receive a contiguous piece of the cake. While it is known that no finite envy-free algorithm exists in this setting, we exhibit efficient algorithms that produce allocations with low envy among the agents. We then establish NP-hardness results for various decision problems on the existence of envy-free allocations, such as when we fix the ordering of the agents or constrain the positions of certain cuts. In addition, we consider a discretized setting where indivisible items lie on a line and show a number of hardness results strengthening those from prior work.

【Keywords】:

244. Strongly Budget Balanced Auctions for Multi-Sided Markets.

Paper Link】 【Pages】:1998-2005

【Authors】: Rica Gonen ; Erel Segal-Halevi

【Abstract】: In two-sided markets, Myerson and Satterthwaite's impossibility theorem states that one can not maximize the gain-from-trade while also satisfying truthfulness, individual-rationality and no deficit. Attempts have been made to circumvent Myerson and Satterthwaite's result by attaining approximately-maximum gain-from-trade: the double-sided auctions of McAfee (1992) is truthful and has no deficit, and the one by Segal-Halevi et al. (2016) additionally has no surplus — it is strongly-budget-balanced. They consider two categories of agents — buyers and sellers, where each trade set is composed of a single buyer and a single seller.The practical complexity of applications such as supply chain require one to look beyond two-sided markets. Common requirements are for: buyers trading with multiple sellers of different or identical items, buyers trading with sellers through transporters and mediators, and sellers trading with multiple buyers. We attempt to address these settings.We generalize Segal-Halevi et al. (2016)'s strongly-budget-balanced double-sided auction setting to a multilateral market where each trade set is composed of any number of agent categories. Our generalization refines the notion of competition in multi-sided auctions by introducing the concepts of external competition and trade reduction. We also show an obviously-truthful implementation of our auction using multiple ascending prices.Full version, including omitted proofs and simulation experiments, is available at https://arxiv.org/abs/1911.08094.

【Keywords】:

245. The Complexity of Computing Maximin Share Allocations on Graphs.

Paper Link】 【Pages】:2006-2013

【Authors】: Gianluigi Greco ; Francesco Scarcello

【Abstract】: Maximin share is a compelling notion of fairness proposed by Buddish as a relaxation of more traditional concepts for fair allocations of indivisible goods. In this paper we consider this notion within a setting where bundles of goods must induce connected subsets over an underlying graph. This setting received much attention in earlier literature, and our study answers a number of questions that were left open. First, we show that computing maximin share allocations is FΔ2P-complete, even when focusing on consistent scenarios, that is, where such allocations are a-priori guaranteed to exist. Moreover, the problem remains intractable if all agents have the same type, i.e., have the same utility functions, and if either the values returned by the utility functions are polynomially bounded, or the underlying graphs have a low degree of cyclicity (more precisely, have bounded treewidth). However, if these conditions hold all together, then computing maximin share allocations (or checking that none exists) becomes tractable. The result is established via machineries based on logspace alternating machines that use partial representations of connected bundles, which are interesting in their own.

【Keywords】:

246. Fair Division Through Information Withholding.

Paper Link】 【Pages】:2014-2021

【Authors】: Hadi Hosseini ; Sujoy Sikdar ; Rohit Vaish ; Hejun Wang ; Lirong Xia

【Abstract】: Envy-freeness up to one good (EF1) is a well-studied fairness notion for indivisible goods that addresses pairwise envy by the removal of at most one good. In the worst case, each pair of agents might require the (hypothetical) removal of a different good, resulting in a weak aggregate guarantee. We study allocations that are nearly envy-free in aggregate, and define a novel fairness notion based on information withholding. Under this notion, an agent can withhold (or hide) some of the goods in its bundle and reveal the remaining goods to the other agents. We observe that in practice, envy-freeness can be achieved by withholding only a small number of goods overall. We show that finding allocations that withhold an optimal number of goods is computationally hard even for highly restricted classes of valuations. In contrast to the worst-case results, our experiments on synthetic and real-world preference data show that existing algorithms for finding EF1 allocations withhold a close-to-optimal amount of information.

【Keywords】:

247. Model and Reinforcement Learning for Markov Games with Risk Preferences.

Paper Link】 【Pages】:2022-2029

【Authors】: Wenjie Huang ; Pham Viet Hai ; William Benjamin Haskell

【Abstract】: We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic “risk” from both stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria is demonstrated in stationary strategies by an application of Kakutani's fixed point theorem. We further propose a simulation-based Q-learning type algorithm for risk-aware equilibrium computation. This algorithm works with a special form of minimax risk measures which can naturally be written as saddle-point stochastic optimization problems, and covers many widely investigated risk measures. Finally, the almost sure convergence of this simulation-based algorithm to an equilibrium is demonstrated under some mild conditions. Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.

【Keywords】:

248. A Simple, Fast, and Safe Mediator for Congestion Management.

Paper Link】 【Pages】:2030-2037

【Authors】: Kei Ikegami ; Kyohei Okumura ; Takumi Yoshikawa

【Abstract】: Congestion is a severe problem in cities. A large population with little information about each other's preferences hardly reaches equilibrium and causes unexpected congestion. Controlling such congestion requires us to collect information dispersed in the market and to coordinate actions among agents. We aim to design a mediator that a) induces a game with high social welfare in equilibrium, b) computes an equilibrium efficiently, c) works without common prior, and d) performs well even when only some of the agents in the market use the mediator. We propose a mediator based on a version of best response dynamics (BRD). We prove that, in a simple setting with two resources, “good behavior” (reporting truthfully and following the recommendation) forms an (approximate) ex-post Nash equilibrium in the mediated game; in the equilibrium, the welfare is close to the first-best when preferences diverge enough. Furthermore, under a certain behavioral assumption, those who are not using the mediator can always enjoy non-negative payoff gain by joining it even without the full participation of others. Additionally, our experimental results suggest that such results remain valid for more general settings.

【Keywords】:

249. Repeated Multimarket Contact with Private Monitoring: A Belief-Free Approach.

Paper Link】 【Pages】:2038-2045

【Authors】: Atsushi Iwasaki ; Tadashi Sekiguchi ; Shun Yamamoto ; Makoto Yokoo

【Abstract】: This paper studies repeated games where two players play multiple duopolistic games simultaneously (multimarket contact). A key assumption is that each player receives a noisy and private signal about the other's actions (private monitoring or observation errors). There has been no game-theoretic support that multimarket contact facilitates collusion or not, in the sense that more collusive equilibria in terms of per-market profits exist than those under a benchmark case of one market. An equilibrium candidate under the benchmark case is belief-free strategies. We are the first to construct a non-trivial class of strategies that exhibits the effect of multimarket contact from the perspectives of simplicity and mild punishment. Strategies must be simple because firms in a cartel must coordinate each other with no communication. Punishment must be mild to an extent that it does not hurt even the minimum required profits in the cartel. We thus focus on two-state automaton strategies such that the players are cooperative in at least one market even when he or she punishes a traitor. Furthermore, we identify an additional condition (partial indifference), under which the collusive equilibrium yields the optimal payoff.

【Keywords】:

250. A Multiarmed Bandit Based Incentive Mechanism for a Subset Selection of Customers for Demand Response in Smart Grids.

Paper Link】 【Pages】:2046-2053

【Authors】: Shweta Jain ; Sujit Gujar

【Abstract】: Demand response is a crucial tool to maintain the stability of the smart grids. With the upcoming research trends in the area of electricity markets, it has become a possibility to design a dynamic pricing system, and consumers are made aware of what they are going to pay. Though the dynamic pricing system (pricing based on the total demand a distributor company is facing) seems to be one possible solution, the current dynamic pricing approaches are either too complex for a consumer to understand or are too naive leading to inefficiencies in the system (either consumer side or distributor side). Due to these limitations, the recent literature is focusing on the approach to provide incentives to the consumers to reduce the electricity, especially in peak hours. For each round, the goal is to select a subset of consumers to whom the distributor should offer incentives so as to minimize the loss which comprises of cost of buying the electricity from the market, uncertainties at consumer end, and cost incurred to the consumers to reduce the electricity which is a private information to the consumers. Due to the uncertainties in the loss function (arising from renewable energy resources as well as consumption needs), traditional auction theory-based incentives face manipulation challenges. Towards this, we propose a novel combinatorial multi-armed bandit (MAB) algorithm, which we refer to as \namemab\ to learn the uncertainties along with an auction to elicit true costs incurred by the consumers. We prove that our mechanism is regret optimal and is incentive compatible. We further demonstrate efficacy of our algorithms via simulations.

【Keywords】:

251. Double-Oracle Sampling Method for Stackelberg Equilibrium Approximation in General-Sum Extensive-Form Games.

Paper Link】 【Pages】:2054-2061

【Authors】: Jan Karwowski ; Jacek Mandziuk

【Abstract】: The paper presents a new method for approximating Strong Stackelberg Equilibrium in general-sum sequential games with imperfect information and perfect recall. The proposed approach is generic as it does not rely on any specific properties of a particular game model. The method is based on iterative interleaving of the two following phases: (1) guided Monte Carlo Tree Search sampling of the Follower's strategy space and (2) building the Leader's behavior strategy tree for which the sampled Follower's strategy is an optimal response. The above solution scheme is evaluated with respect to expected Leader's utility and time requirements on three sets of interception games with variable characteristics, played on graphs. A comparison with three state-of-the-art MILP/LP-based methods shows that in vast majority of test cases proposed simulation-based approach leads to optimal Leader's strategies, while excelling the competitive methods in terms of better time scalability and lower memory requirements.

【Keywords】:

252. Strategy-Proof and Non-Wasteful Multi-Unit Auction via Social Network.

Paper Link】 【Pages】:2062-2069

【Authors】: Takehiro Kawasaki ; Nathanaël Barrot ; Seiji Takanashi ; Taiki Todo ; Makoto Yokoo

【Abstract】: Auctions via social network, pioneered by Li et al. (2017), have been attracting considerable attention in the literature of mechanism design for auctions. However, no known mechanism has satisfied strategy-proofness, non-deficit, non-wastefulness, and individual rationality for the multi-unit unit-demand auction, except for some naïve ones. In this paper, we first propose a mechanism that satisfies all the above properties. We then make a comprehensive comparison with two naïve mechanisms, showing that the proposed mechanism dominates them in social surplus, seller's revenue, and incentive of buyers for truth-telling. We also analyze the characteristics of the social surplus and the revenue achieved by the proposed mechanism, including the constant approximability of the worst-case efficiency loss and the complexity of optimizing revenue from the seller's perspective.

【Keywords】:

253. On the Max-Min Fair Stochastic Allocation of Indivisible Goods.

Paper Link】 【Pages】:2070-2078

【Authors】: Yasushi Kawase ; Hanna Sumita

【Abstract】: We study the problem of fairly allocating a set of indivisible goods to risk-neutral agents in a stochastic setting. We propose an (approximation) algorithm to find a stochastic allocation that maximizes the minimum utility among the agents. The algorithm runs by repeatedly finding an (approximate) allocation to maximize the total virtual utility of the agents. This implies that the problem is solvable in polynomial time when the utilities are gross-substitutes (which is a subclass of submodular). When the utilities are submodular, we can find a (1 − 1/e)-approximate solution for the problem and this is best possible unless P=NP. We also extend the problem where a stochastic allocation must satisfy the (ex ante) envy-freeness. Under this condition, we demonstrate that the problem is NP-hard even when every agent has an additive utility with a matroid constraint (which is a subclass of gross-substitutes). Furthermore, we propose a polynomial-time algorithm for the setting with a restriction that the matroid constraint is common to all agents.

【Keywords】:

254. An Analysis Framework for Metric Voting based on LP Duality.

Paper Link】 【Pages】:2079-2086

【Authors】: David Kempe

【Abstract】: Distortion-based analysis has established itself as a fruitful framework for comparing voting mechanisms. m voters and n candidates are jointly embedded in an (unknown) metric space, and the voters submit rankings of candidates by non-decreasing distance from themselves. Based on the submitted rankings, the social choice rule chooses a winning candidate; the quality of the winner is the sum of the (unknown) distances to the voters. The rule's choice will in general be suboptimal, and the worst-case ratio between the cost of its chosen candidate and the optimal candidate is called the rule's distortion. It was shown in prior work that every deterministic rule has distortion at least 3, while the Copeland rule and related rules guarantee distortion at most 5; a very recent result gave a rule with distortion 2 + √5 ≈ 4.236.We provide a framework based on LP-duality and flow interpretations of the dual which provides a simpler and more unified way for proving upper bounds on the distortion of social choice rules. We illustrate the utility of this approach with three examples. First, we show that the Ranked Pairs and Schulze rules have distortion Θ(√n). Second, we give a fairly simple proof of a strong generalization of the upper bound of 5 on the distortion of Copeland, to social choice rules with short paths from the winning candidate to the optimal candidate in generalized weak preference graphs. A special case of this result recovers the recent 2 + √5 guarantee. Finally, our framework naturally suggests a combinatorial rule that is a strong candidate for achieving distortion 3, which had also been proposed in recent work. We prove that the distortion bound of 3 would follow from any of three combinatorial conjectures we formulate.

【Keywords】:

255. Communication, Distortion, and Randomness in Metric Voting.

Paper Link】 【Pages】:2087-2094

【Authors】: David Kempe

【Abstract】: In distortion-based analysis of social choice rules over metric spaces, voters and candidates are jointly embedded in a metric space. Voters rank candidates by non-decreasing distance. The mechanism, receiving only this ordinal (comparison) information, must select a candidate approximately minimizing the sum of distances from all voters to the chosen candidate. It is known that while the Copeland rule and related rules guarantee distortion at most 5, the distortion of many other standard voting rules, such as Plurality, Veto, or k-approval, grows unboundedly in the number n of candidates.An advantage of Plurality, Veto, or k-approval with small k is that they require less communication from the voters; all deterministic social choice rules known to achieve constant distortion require voters to transmit their complete rankings of all candidates. This motivates our study of the tradeoff between the distortion and the amount of communication in deterministic social choice rules.We show that any one-round deterministic voting mechanism in which each voter communicates only the candidates she ranks in a given set of k positions must have distortion at least 2n-k/k; we give a mechanism achieving an upper bound of O(n/k), which matches the lower bound up to a constant. For more general communication-bounded voting mechanisms, in which each voter communicates b bits of information about her ranking, we show a slightly weaker lower bound of Ω(n/b) on the distortion.For randomized mechanisms, Random Dictatorship achieves expected distortion strictly smaller than 3, almost matching a lower bound of 3 − 2/n for any randomized mechanism that only receives each voter's top choice. We close this gap, by giving a simple randomized social choice rule which only uses each voter's first choice, and achieves expected distortion 3 − 2/n.

【Keywords】:

256. Information Elicitation Mechanisms for Statistical Estimation.

Paper Link】 【Pages】:2095-2102

【Authors】: Yuqing Kong ; Grant Schoenebeck ; Biaoshuai Tao ; Fang-Yi Yu

【Abstract】: We study learning statistical properties from strategic agents with private information. In this problem, agents must be incentivized to truthfully reveal their information even when it cannot be directly verified. Moreover, the information reported by the agents must be aggregated into a statistical estimate. We study two fundamental statistical properties: estimating the mean of an unknown Gaussian, and linear regression with Gaussian error. The information of each agent is one point in a Euclidean space.Our main results are two mechanisms for each of these problems which optimally aggregate the information of agents in the truth-telling equilibrium:• A minimal (non-revelation) mechanism for large populations — agents only need to report one value, but that value need not be their point.• A mechanism for small populations that is non-minimal — agents need to answer more than one question.These mechanisms are “informed truthful” mechanisms where reporting unaltered data (truth-telling) 1) forms a strict Bayesian Nash equilibrium and 2) has strictly higher welfare than any oblivious equilibrium where agents' strategies are independent of their private signals. We also show a minimal revelation mechanism (each agent only reports her signal) for a restricted setting and use an impossibility result to prove the necessity of this restriction.We build upon the peer prediction literature in the single-question setting; however, most previous work in this area focuses on discrete signals, whereas our setting is inherently continuous, and we further simplify the agents' reports.

【Keywords】:

257. Perpetual Voting: Fairness in Long-Term Decision Making.

Paper Link】 【Pages】:2103-2110

【Authors】: Martin Lackner

【Abstract】: In this paper we introduce a new voting formalism to support long-term collective decision making: perpetual voting rules. These are voting rules that take the history of previous decisions into account. Due to this additional information, perpetual voting rules may offer temporal fairness guarantees that cannot be achieved in singular decisions. In particular, such rules may enable minorities to have a fair (proportional) influence on the decision process and thus foster long-term participation of minorities. This paper explores the proposed voting rules via an axiomatic analysis as well as a quantitative evaluation by computer simulations. We identify two perpetual voting rules as particularly recommendable in long-term collective decision making.

【Keywords】:

258. Defending with Shared Resources on a Network.

Paper Link】 【Pages】:2111-2118

【Authors】: Minming Li ; Long Tran-Thanh ; Xiaowei Wu

【Abstract】: In this paper we consider a defending problem on a network. In the model, the defender holds a total defending resource of R, which can be distributed to the nodes of the network. The defending resource allocated to a node can be shared by its neighbors. There is a weight associated with every edge that represents the efficiency defending resources are shared between neighboring nodes. We consider the setting when each attack can affect not only the target node, but its neighbors as well. Assuming that nodes in the network have different treasures to defend and different defending requirements, the defender aims at allocating the defending resource to the nodes to minimize the loss due to attack. We give polynomial time exact algorithms for two important special cases of the network defending problem. For the case when an attack can only affect the target node, we present an LP-based exact algorithm. For the case when defending resources cannot be shared, we present a max-flow-based exact algorithm. We show that the general problem is NP-hard, and we give a 2-approximation algorithm based on LP-rounding. Moreover, by giving a matching lower bound of 2 on the integrality gap on the LP relaxation, we show that our rounding is tight.

【Keywords】:

259. Structure Learning for Approximate Solution of Many-Player Games.

Paper Link】 【Pages】:2119-2127

【Authors】: Zun Li ; Michael P. Wellman

【Abstract】: Games with many players are difficult to solve or even specify without adopting structural assumptions that enable representation in compact form. Such structure is generally not given and will not hold exactly for particular games of interest. We introduce an iterative structure-learning approach to search for approximate solutions of many-player games, assuming only black-box simulation access to noisy payoff samples. Our first algorithm, K-Roles, exploits symmetry by learning a role assignment for players of the game through unsupervised learning (clustering) methods. Our second algorithm, G3L, seeks sparsity by greedy search over local interactions to learn a graphical game model. Both algorithms use supervised learning (regression) to fit payoff values to the learned structures, in compact representations that facilitate equilibrium calculation. We experimentally demonstrate the efficacy of both methods in reaching quality solutions and uncovering hidden structure, on both perfectly and approximately structured game instances.

【Keywords】:

260. Adaptive Quantitative Trading: An Imitative Deep Reinforcement Learning Approach.

Paper Link】 【Pages】:2128-2135

【Authors】: Yang Liu ; Qi Liu ; Hongke Zhao ; Zhen Pan ; Chuanren Liu

【Abstract】: In recent years, considerable efforts have been devoted to developing AI techniques for finance research and applications. For instance, AI techniques (e.g., machine learning) can help traders in quantitative trading (QT) by automating two tasks: market condition recognition and trading strategies execution. However, existing methods in QT face challenges such as representing noisy high-frequent financial data and finding the balance between exploration and exploitation of the trading agent with AI techniques. To address the challenges, we propose an adaptive trading model, namely iRDPG, to automatically develop QT strategies by an intelligent trading agent. Our model is enhanced by deep reinforcement learning (DRL) and imitation learning techniques. Specifically, considering the noisy financial data, we formulate the QT process as a Partially Observable Markov Decision Process (POMDP). Also, we introduce imitation learning to leverage classical trading strategies useful to balance between exploration and exploitation. For better simulation, we train our trading agent in the real financial market using minute-frequent data. Experimental results demonstrate that our model can extract robust market features and be adaptive in different markets.

【Keywords】:

261. Limitations of Incentive Compatibility on Discrete Type Spaces.

Paper Link】 【Pages】:2136-2143

【Authors】: Taylor Lundy ; Hu Fu

【Abstract】: In the design of incentive compatible mechanisms, a common approach is to enforce incentive compatibility as constraints in programs that optimize over feasible mechanisms. Such constraints are often imposed on sparsified representations of the type spaces, such as their discretizations or samples, in order for the program to be manageable. In this work, we explore limitations of this approach, by studying whether all dominant strategy incentive compatible mechanisms on a set T of discrete types can be extended to the convex hull of T.Dobzinski, Fu and Kleinberg (2015) answered the question affirmatively for all settings where types are single dimensional. It is not difficult to show that the same holds when the set of feasible outcomes is downward closed. In this work we show that the question has a negative answer for certain non-downward-closed settings with multi-dimensional types. This result should call for caution in the use of the said approach to enforcing incentive compatibility beyond single-dimensional preferences and downward closed feasible outcomes.

【Keywords】:

262. Mechanism Design with Predicted Task Revenue for Bike Sharing Systems.

Paper Link】 【Pages】:2144-2151

【Authors】: Hongtao Lv ; Chaoli Zhang ; Zhenzhe Zheng ; Tie Luo ; Fan Wu ; Guihai Chen

【Abstract】: Bike sharing systems have been widely deployed around the world in recent years. A core problem in such systems is to reposition the bikes so that the distribution of bike supply is reshaped to better match the dynamic bike demand. When the bike-sharing company or platform is able to predict the revenue of each reposition task based on historic data, an additional constraint is to cap the payment for each task below its predicted revenue. In this paper, we propose an incentive mechanism called TruPreTar to incentivize users to park bicycles at locations desired by the platform toward rebalancing supply and demand. TruPreTar possesses four important economic and computational properties such as truthfulness and budget feasibility. Furthermore, we prove that even when the payment budget is tight, the total revenue still exceeds or equals the budget. Otherwise, TruPreTar achieves 2-approximation as compared to the optimal (revenue-maximizing) solution, which is close to the lower bound of at least √2 that we also prove. Using an industrial dataset obtained from a large bike-sharing company, our experiments show that TruPreTar is effective in rebalancing bike supply and demand and, as a result, generates high revenue that outperforms several benchmark mechanisms.

【Keywords】:

263. Lifting Preferences over Alternatives to Preferences over Sets of Alternatives: The Complexity of Recognizing Desirable Families of Sets.

Paper Link】 【Pages】:2152-2159

【Authors】: Jan Maly

【Abstract】: The problem of lifting a preference order on a set of objects to a preference order on a family of subsets of this set is a fundamental problem with a wide variety of applications in AI. The process is often guided by axioms postulating properties the lifted order should have. Well-known impossibility results by Kannai and Peleg and by Barberà and Pattanaik tell us that some desirable axioms – namely dominance and (strict) independence – are not jointly satisfiable for any linear order on the objects if all non-empty sets of objects are to be ordered. On the other hand, if not all non-empty sets of objects are to be ordered, the axioms are jointly satisfiable for all linear orders on the objects for some families of sets. Such families are very important for applications as they allow for the use of lifted orders, for example, in combinatorial voting. In this paper, we determine the computational complexity of recognizing such families. We show that it is Π2p-complete to decide for a given family of subsets whether dominance and independence or dominance and strict independence are jointly satisfiable for all linear orders on the objects if the lifted order needs to be total. Furthermore, we show that the problem remains coNP-complete if the lifted order can be incomplete. Additionally, we show that the complexity of these problem can increase exponentially if the family of sets is not given explicitly but via a succinct domain restriction.

【Keywords】:

264. The Effectiveness of Peer Prediction in Long-Term Forecasting.

Paper Link】 【Pages】:2160-2167

【Authors】: Debmalya Mandal ; Radanovic Goran ; David C. Parkes

【Abstract】:

【Keywords】:

265. The Surprising Power of Hiding Information in Facility Location.

Paper Link】 【Pages】:2168-2175

【Authors】: Safwan Hossain ; Evi Micha ; Nisarg Shah

【Abstract】: Facility location is the problem of locating a public facility based on the preferences of multiple agents. In the classic framework, where each agent holds a single location on a line and can misreport it, strategyproof mechanisms for choosing the location of the facility are well-understood.We revisit this problem in a more general framework. We assume that each agent may hold several locations on the line with different degrees of importance to the agent. We study mechanisms which elicit the locations of the agents and different levels of information about their importance. Further, in addition to the classic manipulation of misreporting locations, we introduce and study a new manipulation, whereby agents may hide some of their locations. We argue for its novelty in facility location and applicability in practice. Our results provide a complete picture of the power of strategyproof mechanisms eliciting different levels of information and with respect to each type of manipulation. Surprisingly, we show that in some cases hiding locations can be a strictly more powerful manipulation than misreporting locations.

【Keywords】:

266. Can We Predict the Election Outcome from Sampled Votes?

Paper Link】 【Pages】:2176-2183

【Authors】: Evi Micha ; Nisarg Shah

【Abstract】: In the standard model of voting, it is assumed that a voting rule observes the ranked preferences of each individual over a set of alternatives and makes a collective decision. In practice, however, not every individual votes. Is it possible to make a good collective decision for a group given the preferences of only a few of its members? We propose a framework in which we are given the ranked preferences of k out of n individuals sampled from a distribution, and the goal is to predict what a given voting rule would output if applied on the underlying preferences of all n individuals. We focus on the family of positional scoring rules, derive a strong negative result when the underlying preferences can be arbitrary, and discover interesting phenomena when they are generated from a known distribution.

【Keywords】:

267. Price of Fairness in Budget Division and Probabilistic Social Choice.

Paper Link】 【Pages】:2184-2191

【Authors】: Marcin Michorzewski ; Dominik Peters ; Piotr Skowron

【Abstract】: A group of agents needs to divide a divisible common resource (such as a monetary budget) among several uses or projects. We assume that agents have approval preferences over projects, and their utility is the fraction of the budget spent on approved projects. If we maximize utilitarian social welfare, the entire budget will be spent on a single popular project, even if a substantial fraction of the agents disapprove it. This violates the individual fair share axiom (IFS) which requires that for each agent, at least 1/n of the budget is spent on approved projects. We study the price of imposing such fairness axioms on utilitarian social welfare. We show that no division rule satisfying IFS can guarantee to achieve more than an O(1/√m) fraction of maximum utilitarian welfare, in the worst case. However, imposing stronger group fairness conditions (such as the core) does not come with an increased price, since both the conditional utilitarian rule and the Nash rule match this bound and guarantee an Ώ(1/√m) fraction. The same guarantee is attained by the rule under which the spending on a project is proportional to its approval score. We also study a family of rules interpolating between the utilitarian and the Nash rule, quantifying a trade-off between welfare and group fairness. An experimental analysis by sampling using several probabilistic models shows that the conditional utilitarian rule achieves very high welfare on average.

【Keywords】:

268. Robust Market Equilibria with Uncertain Preferences.

Paper Link】 【Pages】:2192-2199

【Authors】: Riley Murray ; Christian Kroer ; Alex Peysakhovich ; Parikshit Shah

【Abstract】: The problem of allocating scarce items to individuals is an important practical question in market design. An increasingly popular set of mechanisms for this task uses the concept of market equilibrium: individuals report their preferences, have a budget of real or fake currency, and a set of prices for items and allocations is computed that sets demand equal to supply. An important real world issue with such mechanisms is that individual valuations are often only imperfectly known. In this paper, we show how concepts from classical market equilibrium can be extended to reflect such uncertainty. We show that in linear, divisible Fisher markets a robust market equilibrium (RME) always exists; this also holds in settings where buyers may retain unspent money. We provide theoretical analysis of the allocative properties of RME in terms of envy and regret. Though RME are hard to compute for general uncertainty sets, we consider some natural and tractable uncertainty sets which lead to well behaved formulations of the problem that can be solved via modern convex programming methods. Finally, we show that very mild uncertainty about valuations can cause RME allocations to outperform those which take estimates as having no underlying uncertainty.

【Keywords】:

269. Practical Frank-Wolfe Method with Decision Diagrams for Computing Wardrop Equilibrium of Combinatorial Congestion Games.

Paper Link】 【Pages】:2200-2209

【Authors】: Kengo Nakamura ; Shinsaku Sakaue ; Norihito Yasuda

【Abstract】: Computation of equilibria for congestion games has been an important research subject. In many realistic scenarios, each strategy of congestion games is given by a combination of elements that satisfies certain constraints; such games are called combinatorial congestion games. For example, given a road network with some toll roads, each strategy of routing games is a path (a combination of edges) whose total toll satisfies a certain budget constraint. Generally, given a ground set of n elements, the set of all such strategies, called the strategy set, can be large exponentially in n, and it often has complicated structures; these issues make equilibrium computation very hard. In this paper, we propose a practical algorithm for such hard equilibrium computation problems. We use data structures, called zero-suppressed binary decision diagrams (ZDDs), to compactly represent strategy sets, and we develop a Frank–Wolfe-style iterative equilibrium computation algorithm whose per-iteration complexity is linear in the size of the ZDD representation. We prove that an ϵ-approximate Wardrop equilibrium can be computed in O(poly(n)/ϵ) iterations, and we improve the result to O(poly(n) log ϵ−1) for some special cases. Experiments confirm the practical utility of our method.

【Keywords】:

270. Balancing the Tradeoff between Profit and Fairness in Rideshare Platforms during High-Demand Hours.

Paper Link】 【Pages】:2210-2217

【Authors】: Vedant Nanda ; Pan Xu ; Karthik Abinav Sankararaman ; John P. Dickerson ; Aravind Srinivasan

【Abstract】: Rideshare platforms, when assigning requests to drivers, tend to maximize profit for the system and/or minimize waiting time for riders. Such platforms can exacerbate biases that drivers may have over certain types of requests. We consider the case of peak hours when the demand for rides is more than the supply of drivers. Drivers are well aware of their advantage during the peak hours and can choose to be selective about which rides to accept. Moreover, if in such a scenario, the assignment of requests to drivers (by the platform) is made only to maximize profit and/or minimize wait time for riders, requests of a certain type (e.g., from a non-popular pickup location, or to a non-popular drop-off location) might never be assigned to a driver. Such a system can be highly unfair to riders. However, increasing fairness might come at a cost of the overall profit made by the rideshare platform. To balance these conflicting goals, we present a flexible, non-adaptive algorithm, NAdap, that allows the platform designer to control the profit and fairness of the system via parameters α and β respectively. We model the matching problem as an online bipartite matching where the set of drivers is offline and requests arrive online. Upon the arrival of a request, we use NAdap to assign it to a driver (the driver might then choose to accept or reject it) or reject the request. We formalize the measures of profit and fairness in our setting and show that by using NAdap, the competitive ratios for profit and fairness measures would be no worse than α/e and β/e respectively. Extensive experimental results on both real-world and synthetic datasets confirm the validity of our theoretical lower bounds. Additionally, they show that NAdap under some choice of (α, β) can beat two natural heuristics, Greedy and Uniform, on both fairness and profit. Code is available at: https://github.com/nvedant07/rideshare-fairness-peak/.

【Keywords】:

271. Comparing Election Methods Where Each Voter Ranks Only Few Candidates.

Paper Link】 【Pages】:2218-2225

【Authors】: Matthias Bentert ; Piotr Skowron

【Abstract】: Election rules are formal processes that aggregate voters' preferences, typically to select a single winning candidate. Most of the election rules studied in the literature require the voters to rank the candidates from the most to the least preferred one. This method of eliciting preferences is impractical when the number of candidates to be ranked is large. We ask how well certain election rules (focusing on positional scoring rules and the Minimax rule) can be approximated from partial preferences collected through one of the following procedures: (i) randomized—we ask each voter to rank a random subset of ℓ candidates, and (ii) deterministic—we ask each voter to provide a ranking of her ℓ most preferred candidates (the ℓ-truncated ballot). We establish theoretical bounds on the approximation ratios and complement our theoretical analysis with computer simulations. We find that it is usually better to use the randomized approach.

【Keywords】:

272. Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning.

Paper Link】 【Pages】:2226-2235

【Authors】: Sanket Shah ; Arunesh Sinha ; radeep Varakanthamp ; Andrew Perrault ; Milind Tambe

【Abstract】: Large-scale screening for potential threats with limited resources and capacity for screening is a problem of interest at airports, seaports, and other ports of entry. Adversaries can observe screening procedures and arrive at a time when there will be gaps in screening due to limited resource capacities. To capture this game between ports and adversaries, this problem has been previously represented as a Stackelberg game, referred to as a Threat Screening Game (TSG). Given the significant complexity associated with solving TSGs and uncertainty in arrivals of customers, existing work has assumed that screenees arrive and are allocated security resources at the beginning of the time-window. In practice, screenees such as airport passengers arrive in bursts correlated with flight time and are not bound by fixed time-windows. To address this, we propose an online threat screening model in which the screening strategy is determined adaptively as a passenger arrives while satisfying a hard bound on acceptable risk of not screening a threat. To solve the online problem, we first reformulate it as a Markov Decision Process (MDP) in which the hard bound on risk translates to a constraint on the action space and then solve the resultant MDP using Deep Reinforcement Learning (DRL). To this end, we provide a novel way to efficiently enforce linear inequality constraints on the action output in DRL. We show that our solution allows us to significantly reduce screenee wait time without compromising on the risk.

【Keywords】:

Paper Link】 【Pages】:2236-2243

【Authors】: Weiran Shen ; Binghui Peng ; Hanpeng Liu ; Michael Zhang ; Ruohan Qian ; Yan Hong ; Zhi Guo ; Zongyao Ding ; Pengjun Lu ; Pingzhong Tang

【Abstract】: In many social systems in which individuals and organizations interact with each other, there can be no easy laws to govern the rules of the environment, and agents' payoffs are often influenced by other agents' actions. We examine such a social system in the setting of sponsored search auctions and tackle the search engine's dynamic pricing problem by combining the tools from both mechanism design and the AI domain. In this setting, the environment not only changes over time, but also behaves strategically. Over repeated interactions with bidders, the search engine can dynamically change the reserve prices and determine the optimal strategy that maximizes the profit. We first train a buyer behavior model, with a real bidding data set from a major search engine, that predicts bids given information disclosed by the search engine and the bidders' performance data from previous rounds. We then formulate the dynamic pricing problem as an MDP and apply a reinforcement-based algorithm that optimizes reserve prices over time. Experiments demonstrate that our model outperforms static optimization strategies including the ones that are currently in use as well as several other dynamic ones.

【Keywords】:

274. Complexity of Computing the Shapley Value in Games with Externalities.

Paper Link】 【Pages】:2244-2251

【Authors】: Oskar Skibski

【Abstract】: We study the complexity of computing the Shapley value in games with externalities. We focus on two representations based on marginal contribution nets (embedded MC-nets and weighted MC-nets) and five extensions of the Shapley value to games with externalities. Our results show that while weighted MC-nets are more concise than embedded MC-nets, they have slightly worse computational properties when it comes to computing the Shapley value: two out of five extensions can be computed in polynomial time for embedded MC-nets and only one for weighted MC-nets.

【Keywords】:

275. Path Planning Problems with Side Observations - When Colonels Play Hide-and-Seek.

Paper Link】 【Pages】:2252-2259

【Authors】: Dong Quan Vu ; Patrick Loiseau ; Alonso Silva ; Long Tran-Thanh

【Abstract】: Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path planning problems with side-observations (SOPPP): at each stage, a learner chooses a path on a directed acyclic graph and suffers the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semi-bandit feedback with side-observations (i.e., she observes the losses on the chosen edges plus some others). We propose a novel algorithm, Exp3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP without requiring any auxiliary oracle. We provide an expected-regret bound of Exp3-OE in SOPPP matching the order of the best benchmark in the literature. Moreover, we introduce additional assumptions on the observability model under which we can further improve the regret bounds of Exp3-OE. We illustrate the benefit of using Exp3-OE in SOPPP by applying it to the online CB and HS games.

【Keywords】:

276. Multi-Type Resource Allocation with Partial Preferences.

Paper Link】 【Pages】:2260-2267

【Authors】: Haibin Wang ; Sujoy Sikdar ; Xiaoxi Guo ; Lirong Xia ; Yongzhi Cao ; Hanpin Wang

【Abstract】: We propose multi-type probabilistic serial (MPS) and multi-type random priority (MRP) as extensions of the well-known PS and RP mechanisms to the multi-type resource allocation problems (MTRAs) with partial preferences. In our setting, there are multiple types of divisible items, and a group of agents who have partial order preferences over bundles consisting of one item of each type. We show that for the unrestricted domain of partial order preferences, no mechanism satisfies both sd-efficiency and sd-envy-freeness. Notwithstanding this impossibility result, our main message is positive: When agents' preferences are represented by acyclic CP-nets, MPS satisfies sd-efficiency, sd-envy-freeness, ordinal fairness, and upper invariance, while MRP satisfies ex-post-efficiency, sd-strategyproofness, and upper invariance, recovering the properties of PS and RP. Besides, we propose a hybrid mechanism, multi-type general dictatorship (MGD), combining the ideas of MPS and MRP, which satisfies sd-efficiency, equal treatment of equals and decomposability under the unrestricted domain of partial order preferences.

【Keywords】:

277. Nice Invincible Strategy for the Average-Payoff IPD.

Paper Link】 【Pages】:2268-2275

【Authors】: Shiheng Wang ; Fangzhen Lin

【Abstract】: The Iterated Prisoner's Dilemma (IPD) is a well-known benchmark for studying the long term behaviours of rational agents. Many well-known strategies have been studied, from the simple tit-for-tat (TFT) to more involved ones like zero determinant and extortionate strategies studied recently by Press and Dyson. In this paper, we consider what we call invincible strategies. These are ones that will never lose against any other strategy in terms of average payoff in the limit. We provide a simple characterization of this class of strategies, and show that invincible strategies can also be nice. We discuss its relationship with some important strategies and generalize our results to some typical repeated 2x2 games. It's known that experimentally, nice strategies like the TFT and extortionate ones can act as catalysts for the evolution of cooperation. Our experiments show that this is also the case for some invincible strategies that are neither nice nor extortionate.

【Keywords】:

278. Bounded Incentives in Manipulating the Probabilistic Serial Rule.

Paper Link】 【Pages】:2276-2283

【Authors】: Zihe Wang ; Zhide Wei ; Jie Zhang

【Abstract】: The Probabilistic Serial mechanism is well-known for its desirable fairness and efficiency properties. It is one of the most prominent protocols for the random assignment problem. However, Probabilistic Serial is not incentive-compatible, thereby these desirable properties only hold for the agents' declared preferences, rather than their genuine preferences. A substantial utility gain through strategic behaviors would trigger self-interested agents to manipulate the mechanism and would subvert the very foundation of adopting the mechanism in practice. In this paper, we characterize the extent to which an individual agent can increase its utility by strategic manipulation. We show that the incentive ratio of the mechanism is 3/2. That is, no agent can misreport its preferences such that its utility becomes more than 1.5 times of what it is when reports truthfully. This ratio is a worst-case guarantee by allowing an agent to have complete information about other agents' reports and to figure out the best response strategy even if it is computationally intractable in general. To complement this worst-case study, we further evaluate an agent's utility gain on average by experiments. The experiments show that an agent' incentive in manipulating the rule is very limited. These results shed some light on the robustness of Probabilistic Serial against strategic manipulation, which is one step further than knowing that it is not incentive-compatible.

【Keywords】:

279. Deep Learning-Powered Iterative Combinatorial Auctions.

Paper Link】 【Pages】:2284-2293

【Authors】: Jakob Weissteiner ; Sven Seuken

【Abstract】: In this paper, we study the design of deep learning-powered iterative combinatorial auctions (ICAs). We build on prior work where preference elicitation was done via kernelized support vector regressions (SVRs). However, the SVR-based approach has limitations because it requires solving a machine learning (ML)-based winner determination problem (WDP). With expressive kernels (like gaussians), the ML-based WDP cannot be solved for large domains. While linear or quadratic kernels have better computational scalability, these kernels have limited expressiveness. In this work, we address these shortcomings by using deep neural networks (DNNs) instead of SVRs. We first show how the DNN-based WDP can be reformulated into a mixed integer program (MIP). Second, we experimentally compare the prediction performance of DNNs against SVRs. Third, we present experimental evaluations in two medium-sized domains which show that even ICAs based on relatively small-sized DNNs lead to higher economic efficiency than ICAs based on kernelized SVRs. Finally, we show that our DNN-powered ICA also scales well to very large CA domains.

【Keywords】:

280. A Multi-Unit Profit Competitive Mechanism for Cellular Traffic Offloading.

Paper Link】 【Pages】:2294-2301

【Authors】: Jun Wu ; Yu Qiao ; Lei Zhang ; Chongjun Wang ; Meilin Liu

【Abstract】: Cellular traffic offloading is nowadays an important problem in mobile networking. We model it as a procurement problem where each agent sells multi-units of a homogeneous item with privately known capacity and unit cost, and the auctioneer's demand valuation function is symmetric submodular. Based on the framework of random sampling and profit extraction, we aim to design a prior-free mechanism which guarantees a profit competitive to the omniscient single-price auction. However, the symmetric submodular demand valuation function and 2-parameter setting present new challenges. By adopting the highest feasible clear price, we successfully design a truthful profit extractor, and then we propose a mechanism which is proved to be truthful, individually rational and constant-factor competitive in a fixed market.

【Keywords】:

281. Algorithms for Manipulating Sequential Allocation.

Paper Link】 【Pages】:2302-2309

【Authors】: Mingyu Xiao ; Jiaxing Ling

【Abstract】: Sequential allocation is a simple and widely studied mechanism to allocate indivisible items in turns to agents according to a pre-specified picking sequence of agents. At each turn, the current agent in the picking sequence picks its most preferred item among all items having not been allocated yet. This problem is well-known to be not strategyproof, i.e., an agent may get more utility by reporting an untruthful preference ranking of items. It arises the problem: how to find the best response of an agent? It is known that this problem is polynomially solvable for only two agents and NP-complete for an arbitrary number of agents. The computational complexity of this problem with three agents was left as an open problem. In this paper, we give a novel algorithm that solves the problem in polynomial time for each fixed number of agents. We also show that an agent can always get at least half of its optimal utility by simply using its truthful preference as the response.

【Keywords】:

282. Computing Equilibria in Binary Networked Public Goods Games.

Paper Link】 【Pages】:2310-2317

【Authors】: Sixie Yu ; Kai Zhou ; P. Jeffrey Brantingham ; Yevgeniy Vorobeychik

【Abstract】: Public goods games study the incentives of individuals to contribute to a public good and their behaviors in equilibria. In this paper, we examine a specific type of public goods game where players are networked and each has binary actions, and focus on the algorithmic aspects of such games. First, we show that checking the existence of a pure-strategy Nash equilibrium is NP-complete. We then identify tractable instances based on restrictions of either utility functions or of the underlying graphical structure. In certain cases, we also show that we can efficiently compute a socially optimal Nash equilibrium. Finally, we propose a heuristic approach for computing approximate equilibria in general binary networked public goods games, and experimentally demonstrate its effectiveness. Due to space limitation, some proofs are deferred to the extended version1.

【Keywords】:

283. Computing Team-Maxmin Equilibria in Zero-Sum Multiplayer Extensive-Form Games.

Paper Link】 【Pages】:2318-2325

【Authors】: Youzhi Zhang ; Bo An

【Abstract】: The study of finding the equilibrium for multiplayer games is challenging. This paper focuses on computing Team-Maxmin Equilibria (TMEs) in zero-sum multiplayer Extensive-Form Games (EFGs), which describes the optimal strategies for a team of players who share the same goal but they take actions independently against an adversary. TMEs can capture many realistic scenarios, including: 1) a team of players play against a target player in poker games; and 2) defense resources schedule and patrol independently in security games. However, the study of efficiently finding TMEs within any given accuracy in EFGs is almost completely unexplored. To fill this gap, we first study the inefficiency caused by computing the equilibrium where team players correlate their strategies and then transforming it into the mixed strategy profile of the team and show that this inefficiency can be arbitrarily large. Second, to efficiently solve the non-convex program for finding TMEs directly, we develop the Associated Recursive Asynchronous Multiparametric Disaggregation Technique (ARAMDT) to approximate multilinear terms in the program with two novel techniques: 1) an asynchronous precision method to reduce the number of constraints and variables for approximation by using different precision levels to approximate these terms; and 2) an associated constraint method to reduce the feasible solution space of the mixed-integer linear program resulting from ARAMDT by exploiting the relation between these terms. Third, we develop a novel iterative algorithm to efficiently compute TMEs within any given accuracy based on ARAMDT. Our algorithm is orders of magnitude faster than baselines in the experimental evaluation.

【Keywords】:

284. A Unifying View on Individual Bounds and Heuristic Inaccuracies in Bidirectional Search.

Paper Link】 【Pages】:2327-2334

【Authors】: Vidal Alcázar ; Patricia J. Riddle ; Mike Barley

【Abstract】: In the past few years, new very successful bidirectional heuristic search algorithms have been proposed. Their key novelty is a lower bound on the cost of a solution that includes information from the g values in both directions. Kaindl and Kainz (1997) proposed measuring how inaccurate a heuristic is while expanding nodes in the opposite direction, and using this information to raise the f value of the evaluated nodes. However, this comes with a set of disadvantages and remains yet to be exploited to its full potential. Additionally, Sadhukhan (2013) presented BAE∗, a bidirectional best-first search algorithm based on the accumulated heuristic inaccuracy along a path. However, no complete comparison in regards to other bidirectional algorithms has yet been done, neither theoretical nor empirical. In this paper we define individual bounds within the lower-bound framework and show how both Kaindl and Kainz's and Sadhukhan's methods can be generalized thus creating new bounds. This overcomes previous shortcomings and allows newer algorithms to benefit from these techniques as well. Experimental results show a substantial improvement, up to an order of magnitude in the number of necessarily-expanded nodes compared to state-of-the-art near-optimal algorithms in common benchmarks.

【Keywords】:

285. An Interactive Regret-Based Genetic Algorithm for Solving Multi-Objective Combinatorial Optimization Problems.

Paper Link】 【Pages】:2335-2342

【Authors】: Nawal Benabbou ; Cassandre Leroy ; Thibaut Lust

【Abstract】: We propose a new approach consisting in combining genetic algorithms and regret-based incremental preference elicitation for solving multi-objective combinatorial optimization problems with unknown preferences. For the purpose of elicitation, we assume that the decision maker's preferences can be represented by a parameterized scalarizing function but the parameters are initially not known. Instead, the parameter imprecision is progressively reduced by asking preference queries to the decision maker during the search to help identify the best solutions within a population. Our algorithm, called RIGA, can be applied to any multi-objective combinatorial optimization problem provided that the scalarizing function is linear in its parameters and that a (near-)optimal solution can be efficiently determined when preferences are known. Moreover, RIGA runs in polynomial time while asking no more than a polynomial number of queries. For the multi-objective traveling salesman problem, we provide numerical results showing its practical efficiency in terms of number of queries, computation time and gap to optimality.

【Keywords】:

Paper Link】 【Pages】:2343-2350

【Authors】: Peilin Chen ; Hai Wan ; Shaowei Cai ; Jia Li ; Haicheng Chen

【Abstract】: The Maximum k-plex Problem is an important combinatorial optimization problem with increasingly wide applications. In this paper, we propose a novel strategy, named Dynamic-threshold Configuration Checking (DCC), to reduce the cycling problem of local search. Due to the complicated neighborhood relations, all the previous local search algorithms for this problem spend a large amount of time in identifying feasible neighbors in each step. To further improve the performance on dense and challenging instances, we propose Double-attributes Incremental Neighborhood Updating (DINU) scheme which reduces the worst-case time complexity per iteration from O(|V|⋅ΔG) to O(k · Δ‾G). Based on DCC strategy and DINU scheme, we develop a local search algorithm named DCCplex. According to the experiment result, DCCplex shows promising result on DIMACS and BHOSLIB benchmark as well as real-world massive graphs. Especially, DCCplex updates the lower bound of the maximum k-plex for most dense and challenging instances.

【Keywords】:

287. Envelope-Based Approaches to Real-Time Heuristic Search.

Paper Link】 【Pages】:2351-2358

【Authors】: Kevin C. Gall ; Bence Cserna ; Wheeler Ruml

【Abstract】: In real-time heuristic search, the planner must return the next action for the agent within a pre-specified time bound. Many algorithms for this setting are ‘agent-centered’ in that, at every iteration, they only expand states near the agent's current state, discarding the search frontier afterwards. In this paper, we investigate the alternative paradigm in which the search expands a single ever-growing envelope of states. Previous work on envelope-based methods restricts the agent to move along the generated search tree. We propose a more flexible approach in which an auxiliary search is performed within the envelope to guide the agent toward a promising frontier node. Experimental results indicate that intra-envelope search is beneficial in state spaces that are highly interconnected, such as those for grid pathfinding.

【Keywords】:

288. Runtime Analysis of Somatic Contiguous Hypermutation Operators in MOEA/D Framework.

Paper Link】 【Pages】:2359-2366

【Authors】: Zhengxin Huang ; Yuren Zhou

【Abstract】: Somatic contiguous hypermutation (CHM) operators are important variation operators in artificial immune systems. The few existing theoretical studies are only concerned with understanding the optimization behavior of CHM operators on solving single-objective optimization problems. The MOEA/D framework is one of the most popular strategies for solving multi-objective optimization problems (MOPs). In this paper, we present a runtime analysis of using two CHM operators in MOEA/D framework for solving five benchmark MOPs, including four bi-objective and one many-objective problems. Our analyses show that the expected runtimes of CHM operators on the four bi-objective problems are better than or as good as that of the well-studied standard bit mutation operator. Moreover, using CHM operators in MOEA/D framework can improve the best known upper bound on the many-objective problem by a factor of n. This paper provides insight into understanding the optimization behavior of CHM operators in the well-known MOEA/D framework, and indicates that using the CHM operator in MOEA/D framework is a promising method for handling MOPs.

【Keywords】:

289. Learning to Optimize Variational Quantum Circuits to Solve Combinatorial Problems.

Paper Link】 【Pages】:2367-2375

【Authors】: Sami Khairy ; Ruslan Shaydulin ; Lukasz Cincio ; Yuri Alexeev ; Prasanna Balaprakash

【Abstract】: Quantum computing is a computational paradigm with the potential to outperform classical methods for a variety of problems. Proposed recently, the Quantum Approximate Optimization Algorithm (QAOA) is considered as one of the leading candidates for demonstrating quantum advantage in the near term. QAOA is a variational hybrid quantum-classical algorithm for approximately solving combinatorial optimization problems. The quality of the solution obtained by QAOA for a given problem instance depends on the performance of the classical optimizer used to optimize the variational parameters. In this paper, we formulate the problem of finding optimal QAOA parameters as a learning task in which the knowledge gained from solving training instances can be leveraged to find high-quality solutions for unseen test instances. To this end, we develop two machine-learning-based approaches. Our first approach adopts a reinforcement learning (RL) framework to learn a policy network to optimize QAOA circuits. Our second approach adopts a kernel density estimation (KDE) technique to learn a generative model of optimal QAOA parameters. In both approaches, the training procedure is performed on small-sized problem instances that can be simulated on a classical computer; yet the learned RL policy and the generative model can be used to efficiently solve larger problems. Extensive simulations using the IBM Qiskit Aer quantum circuit simulator demonstrate that our proposed RL- and KDE-based approaches reduce the optimality gap by factors up to 30.15 when compared with other commonly used off-the-shelf optimizers.

【Keywords】:

290. How the Duration of the Learning Period Affects the Performance of Random Gradient Selection Hyper-Heuristics.

Paper Link】 【Pages】:2376-2383

【Authors】: Andrei Lissovoi ; Pietro S. Oliveto ; John Alasdair Warwicker

【Abstract】: Recent analyses have shown that a random gradient hyper-heuristic (HH) using randomised local search (RLSk) low-level heuristics with different neighbourhood sizes k can optimise the unimodal benchmark function LeadingOnes in the best expected time achievable with the available heuristics, if sufficiently long learning periods τ are employed. In this paper, we examine the impact of the learning period on the performance of the hyper-heuristic for standard unimodal benchmark functions with different characteristics: Ridge, where the HH has to learn that RLS1 is always the best low-level heuristic, and OneMax, where different low-level heuristics are preferable in different areas of the search space. We rigorously prove that super-linear learning periods τ are required for the HH to achieve optimal expected runtime for Ridge. Conversely, a sub-logarithmic learning period is the best static choice for OneMax, while using super-linear values for τ increases the expected runtime above the asymptotic unary unbiased black box complexity of the problem. We prove that a random gradient HH which automatically adapts the learning period throughout the run has optimal asymptotic expected runtime for both OneMax and Ridge. Additionally, we show experimentally that it outperforms any static learning period for realistic problem sizes.

【Keywords】:

291. On Performance Estimation in Automatic Algorithm Configuration.

Paper Link】 【Pages】:2384-2391

【Authors】: Shengcai Liu ; Ke Tang ; Yunwei Lei ; Xin Yao

【Abstract】: Over the last decade, research on automated parameter tuning, often referred to as automatic algorithm configuration (AAC), has made significant progress. Although the usefulness of such tools has been widely recognized in real world applications, the theoretical foundations of AAC are still very weak. This paper addresses this gap by studying the performance estimation problem in AAC. More specifically, this paper first proves the universal best performance estimator in a practical setting, and then establishes theoretical bounds on the estimation error, i.e., the difference between the training performance and the true performance for a parameter configuration, considering finite and infinite configuration spaces respectively. These findings were verified in extensive experiments conducted on four algorithm configuration scenarios involving different problem domains. Moreover, insights for enhancing existing AAC methods are also identified.

【Keywords】:

Paper Link】 【Pages】:2392-2399

【Authors】: Yanli Liu ; Chu-Min Li ; Hua Jiang ; Kun He

【Abstract】: The performance of a branch-and-bound (BnB) algorithm for maximum common subgraph (MCS) problem and its related problems, like maximum common connected subgraph (MCCS) and induced Subgraph Isomorphism (SI), crucially depends on the branching heuristic. We propose a branching heuristic inspired from reinforcement learning with a goal of reaching a tree leaf as early as possible to greatly reduce the search tree size. Experimental results show that the proposed heuristic consistently and significantly improves the current best BnB algorithm for the MCS, MCCS and SI problems. An analysis is carried out to give insight on why and how reinforcement learning is useful in the new branching heuristic.

【Keywords】:

293. Cakewalk Sampling.

Paper Link】 【Pages】:2400-2407

【Authors】: Uri Patish ; Shimon Ullman

【Abstract】: We study the task of finding good local optima in combinatorial optimization problems. Although combinatorial optimization is NP-hard in general, locally optimal solutions are frequently used in practice. Local search methods however typically converge to a limited set of optima that depend on their initialization. Sampling methods on the other hand can access any valid solution, and thus can be used either directly or alongside methods of the former type as a way for finding good local optima. Since the effectiveness of this strategy depends on the sampling distribution, we derive a robust learning algorithm that adapts sampling distributions towards good local optima of arbitrary objective functions. As a first use case, we empirically study the efficiency in which sampling methods can recover locally maximal cliques in undirected graphs. Not only do we show how our adaptive sampler outperforms related methods, we also show how it can even approach the performance of established clique algorithms. As a second use case, we consider how greedy algorithms can be combined with our adaptive sampler, and we demonstrate how this leads to superior performance in k-medoid clustering. Together, these findings suggest that our adaptive sampler can provide an effective strategy to combinatorial optimization problems that arise in practice.

【Keywords】:

294. Subset Selection by Pareto Optimization with Recombination.

Paper Link】 【Pages】:2408-2415

【Authors】: Chao Qian ; Chao Bian ; Chao Feng

【Abstract】: Subset selection, i.e., to select a limited number of items optimizing some given objective function, is a fundamental problem with various applications such as unsupervised feature selection and sparse regression. By employing a multi-objective evolutionary algorithm (EA) with mutation only to optimize the given objective function and minimize the number of selected items simultaneously, the recently proposed POSS algorithm achieves state-of-the-art performance for subset selection. In this paper, we propose the PORSS algorithm by incorporating recombination, a characterizing feature of EAs, into POSS. We prove that PORSS can achieve the optimal polynomial-time approximation guarantee as POSS when the objective function is monotone, and can find an optimal solution efficiently in some cases whereas POSS cannot. Extensive experiments on unsupervised feature selection and sparse regression show the superiority of PORSS over POSS. Our analysis also theoretically discloses that recombination from diverse solutions can be more likely than mutation alone to generate various variations, thereby leading to better exploration; this may be of independent interest for understanding the influence of recombination.

【Keywords】:

295. Asymptotic Risk of Bézier Simplex Fitting.

Paper Link】 【Pages】:2416-2424

【Authors】: Akinori Tanaka ; Akiyoshi Sannai ; Ken Kobayashi ; Naoki Hamada

【Abstract】: The B'ezier simplex fitting is a novel data modeling technique which utilizes geometric structures of data to approximate the Pareto set of multi-objective optimization problems. There are two fitting methods based on different sampling strategies. The inductive skeleton fitting employs a stratified subsampling from skeletons of a simplex, whereas the all-at-once fitting uses a non-stratified sampling which treats a simplex as a single object. In this paper, we analyze the asymptotic risks of those B'ezier simplex fitting methods and derive the optimal subsample ratio for the inductive skeleton fitting. It is shown that the inductive skeleton fitting with the optimal ratio has a smaller risk when the degree of a B'ezier simplex is less than three. Those results are verified numerically under small to moderate sample sizes. In addition, we provide two complementary applications of our theory: a generalized location problem and a multi-objective hyper-parameter tuning of the group lasso. The former can be represented by a B'ezier simplex of degree two where the inductive skeleton fitting outperforms. The latter can be represented by a B'ezier simplex of degree three where the all-at-once fitting gets an advantage.

【Keywords】:

296. Trading Convergence Rate with Computational Budget in High Dimensional Bayesian Optimization.

Paper Link】 【Pages】:2425-2432

【Authors】: Hung Tran-The ; Sunil Gupta ; Santu Rana ; Svetha Venkatesh

【Abstract】: Scaling Bayesian optimisation (BO) to high-dimensional search spaces is a active and open research problems particularly when no assumptions are made on function structure. The main reason is that at each iteration, BO requires to find global maximisation of acquisition function, which itself is a non-convex optimization problem in the original search space. With growing dimensions, the computational budget for this maximisation gets increasingly short leading to inaccurate solution of the maximisation. This inaccuracy adversely affects both the convergence and the efficiency of BO. We propose a novel approach where the acquisition function only requires maximisation on a discrete set of low dimensional subspaces embedded in the original high-dimensional search space. Our method is free of any low dimensional structure assumption on the function unlike many recent high-dimensional BO methods. Optimising acquisition function in low dimensional subspaces allows our method to obtain accurate solutions within limited computational budget. We show that in spite of this convenience, our algorithm remains convergent. In particular, cumulative regret of our algorithm only grows sub-linearly with the number of iterations. More importantly, as evident from our regret bounds, our algorithm provides a way to trade the convergence rate with the number of subspaces used in the optimisation. Finally, when the number of subspaces is "sufficiently large", our algorithm's cumulative regret is at most O(√TγT) as opposed to O(√DTγT) for the GP-UCB of Srinivas et al. (2012), reducing a crucial factor √D where D being the dimensional number of input space. We perform empirical experiments to evaluate our method extensively, showing that its sample efficiency is better than the existing methods for many optimisation problems involving dimensions up to 5000.

【Keywords】:

Paper Link】 【Pages】:2433-2441

【Authors】: Yiyuan Wang ; Shaowei Cai ; Shiwei Pan ; Ximing Li ; Minghao Yin

【Abstract】: The weighted graph coloring problem (WGCP) is an important extension of the graph coloring problem (GCP) with wide applications. Compared to GCP, where numerous methods have been developed and even massive graphs with millions of vertices can be solved well, fewer works have been done for WGCP, and no solution is available for solving WGCP for massive graphs. This paper explores techniques for solving WGCP, including a lower bound and a reduction rule based on clique sampling, and a local search algorithm based on two selection rules and a new variant of configuration checking. This results in our algorithm RedLS (Reduction plus Local Search). Experiments are conducted to compare RedLS with the state-of-the-art algorithms on massive graphs as well as conventional benchmarks studied in previous works. RedLS exhibits very good performance and robustness. It significantly outperforms previous algorithms on all benchmarks.

【Keywords】:

298. Enumerating Maximal k-Plexes with Worst-Case Time Guarantee.

Paper Link】 【Pages】:2442-2449

【Authors】: Yi Zhou ; Jingwei Xu ; Zhenyu Guo ; Mingyu Xiao ; Yan Jin

【Abstract】: The problem of enumerating all maximal cliques in a graph is a key primitive in a variety of real-world applications such as community detection and so on. However, in practice, communities are rarely formed as cliques due to data noise. Hence, k-plex, a subgraph in which any vertex is adjacent to all but at most k vertices, is introduced as a relaxation of clique. In this paper, we investigate the problem of enumerating all maximal k-plexes and present FaPlexen, an enumeration algorithm which integrates the “pivot” heuristic and new branching schemes. To our best knowledge, for the first time, FaPlexen lists all maximal k-plexes with provably worst-case running time O(n2γn) in a graph with n vertices, where γ < 2. Then, we propose another algorithm CommuPlex which non-trivially extends FaPlexen to find all maximal k-plexes of prescribed size for community detection in massive real-life networks. We finally carry out experiments on both real and synthetic graphs and demonstrate that our algorithms run much faster than the state-of-the-art algorithms.

【Keywords】:

AAAI Technical Track: Human-AI Collaboration 13

299. A Human-AI Loop Approach for Joint Keyword Discovery and Expectation Estimation in Micropost Event Detection.

Paper Link】 【Pages】:2451-2458

【Authors】: Akansha Bhardwaj ; Jie Yang ; Philippe Cudré-Mauroux

【Abstract】: Microblogging platforms such as Twitter are increasingly being used in event detection. Existing approaches mainly use machine learning models and rely on event-related keywords to collect the data for model training. These approaches make strong assumptions on the distribution of the relevant microposts containing the keyword – referred to as the expectation of the distribution – and use it as a posterior regularization parameter during model training. Such approaches are, however, limited as they fail to reliably estimate the informativeness of a keyword and its expectation for model training. This paper introduces a Human-AI loop approach to jointly discover informative keywords for model training while estimating their expectation. Our approach iteratively leverages the crowd to estimate both keyword-specific expectation and the disagreement between the crowd and the model in order to discover new keywords that are most beneficial for model training. These keywords and their expectation not only improve the resulting performance but also make the model training process more transparent. We empirically demonstrate the merits of our approach, both in terms of accuracy and interpretability, on multiple real-world datasets and show that our approach improves the state of the art by 24.3%.

【Keywords】:

Paper Link】 【Pages】:2459-2466

【Authors】: Ta-Chung Chi ; Minmin Shen ; Mihail Eric ; Seokhwan Kim ; Dilek Hakkani-Tür

【Abstract】: In the vision and language navigation task (Anderson et al. 2018), the agent may encounter ambiguous situations that are hard to interpret by just relying on visual information and natural language instructions. We propose an interactive learning framework to endow the agent with the ability to ask for users' help in such situations. As part of this framework, we investigate multiple learning approaches for the agent with different levels of complexity. The simplest model-confusion-based method lets the agent ask questions based on its confusion, relying on the predefined confidence threshold of a next action prediction model. To build on this confusion-based method, the agent is expected to demonstrate more sophisticated reasoning such that it discovers the timing and locations to interact with a human. We achieve this goal using reinforcement learning (RL) with a proposed reward shaping term, which enables the agent to ask questions only when necessary. The success rate can be boosted by at least 15% with only one question asked on average during the navigation. Furthermore, we show that the RL agent is capable of adjusting dynamically to noisy human responses. Finally, we design a continual learning strategy, which can be viewed as a data augmentation method, for the agent to improve further utilizing its interaction history with a human. We demonstrate the proposed strategy is substantially more realistic and data-efficient compared to previously proposed pre-exploration techniques.

【Keywords】:

301. Asymptotically Unambitious Artificial General Intelligence.

Paper Link】 【Pages】:2467-2476

【Authors】: Michael K. Cohen ; Badri N. Vellambi ; Marcus Hutter

【Abstract】: General intelligence, the ability to solve arbitrary solvable problems, is supposed by many to be artificially constructible. Narrow intelligence, the ability to solve a given particularly difficult problem, has seen impressive recent development. Notable examples include self-driving cars, Go engines, image classifiers, and translators. Artificial General Intelligence (AGI) presents dangers that narrow intelligence does not: if something smarter than us across every domain were indifferent to our concerns, it would be an existential threat to humanity, just as we threaten many species despite no ill will. Even the theory of how to maintain the alignment of an AGI's goals with our own has proven highly elusive. We present the first algorithm we are aware of for asymptotically unambitious AGI, where “unambitiousness” includes not seeking arbitrary power. Thus, we identify an exception to the Instrumental Convergence Thesis, which is roughly that by default, an AGI would seek power, including over us.

【Keywords】:

302. A Framework for Engineering Human/Agent Teaming Systems.

Paper Link】 【Pages】:2477-2484

【Authors】: Rick Evertsz ; John Thangarajah

【Abstract】: The increasing capabilities of autonomous systems offer the potential for more effective teaming with humans. Effective human/agent teaming is facilitated by a mutual understanding of the team objective and how that objective is decomposed into team roles. This paper presents a framework for engineering human/agent teams that delineates the key human/agent teaming components, using TDF-T diagrams to design the agents/teams and then present contextualised team cognition to the human team members at runtime. Our hypothesis is that this facilitates effective human/agent teaming by enhancing the human's understanding of their role in the team and their coordination requirements. To evaluate this hypothesis we conducted a study with human participants using our user interface for the StarCraft strategy game, which presents pertinent, instantiated TDF-T diagrams to the human at runtime. The performance of human participants in the study indicates that their ability to work in concert with the non-player characters in the game is significantly enhanced by the timely presentation of a diagrammatic representation of team cognition.

【Keywords】:

303. What Is It You Really Want of Me? Generalized Reward Learning with Biased Beliefs about Domain Dynamics.

Paper Link】 【Pages】:2485-2492

【Authors】: Ze Gong ; Yu Zhang

【Abstract】: Reward learning as a method for inferring human intent and preferences has been studied extensively. Prior approaches make an implicit assumption that the human maintains a correct belief about the robot's domain dynamics. However, this may not always hold since the human's belief may be biased, which can ultimately lead to a misguided estimation of the human's intent and preferences, which is often derived from human feedback on the robot's behaviors. In this paper, we remove this restrictive assumption by considering that the human may have an inaccurate understanding of the robot. We propose a method called Generalized Reward Learning with biased beliefs about domain dynamics (GeReL) to infer both the reward function and human's belief about the robot in a Bayesian setting based on human ratings. Due to the complex forms of the posteriors, we formulate it as a variational inference problem to infer the posteriors of the parameters that govern the reward function and human's belief about the robot simultaneously. We evaluate our method in a simulated domain and with a user study where the user has a bias based on the robot's appearances. The results show that our method can recover the true human preferences while subject to such biased beliefs, in contrast to prior approaches that could have misinterpreted them completely.

【Keywords】:

304. Explainable Reinforcement Learning through a Causal Lens.

Paper Link】 【Pages】:2493-2500

【Authors】: Prashan Madumal ; Tim Miller ; Liz Sonenberg ; Frank Vetere

【Abstract】: Prominent theories in cognitive science propose that humans understand and represent the knowledge of the world through causal relationships. In making sense of the world, we build causal models in our mind to encode cause-effect relations of events and use these to explain why new events happen by referring to counterfactuals — things that did not happen. In this paper, we use causal models to derive causal explanations of the behaviour of model-free reinforcement learning agents. We present an approach that learns a structural causal model during reinforcement learning and encodes causal relationships between variables of interest. This model is then used to generate explanations of behaviour based on counterfactual analysis of the causal model. We computationally evaluate the model in 6 domains and measure performance and task prediction accuracy. We report on a study with 120 participants who observe agents playing a real-time strategy game (Starcraft II) and then receive explanations of the agents' behaviour. We investigate: 1) participants' understanding gained by explanations through task prediction; 2) explanation satisfaction and 3) trust. Our results show that causal model explanations perform better on these measures compared to two other baseline explanation models.

【Keywords】:

305. Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks.

Paper Link】 【Pages】:2501-2508

【Authors】: Woo-Jeoung Nam ; Shir Gur ; Jaesik Choi ; Lior Wolf ; Seong-Whan Lee

【Abstract】: As Deep Neural Networks (DNNs) have demonstrated superhuman performance in a variety of fields, there is an increasing interest in understanding the complex internal mechanisms of DNNs. In this paper, we propose Relative Attributing Propagation (RAP), which decomposes the output predictions of DNNs with a new perspective of separating the relevant (positive) and irrelevant (negative) attributions according to the relative influence between the layers. The relevance of each neuron is identified with respect to its degree of contribution, separated into positive and negative, while preserving the conservation rule. Considering the relevance assigned to neurons in terms of relative priority, RAP allows each neuron to be assigned with a bi-polar importance score concerning the output: from highly relevant to highly irrelevant. Therefore, our method makes it possible to interpret DNNs with much clearer and attentive visualizations of the separated attributions than the conventional explaining methods. To verify that the attributions propagated by RAP correctly account for each meaning, we utilize the evaluation metrics: (i) Outside-inside relevance ratio, (ii) Segmentation mIOU and (iii) Region perturbation. In all experiments and metrics, we present a sizable gap in comparison to the existing literature.

【Keywords】:

306. Human-Machine Collaboration for Fast Land Cover Mapping.

Paper Link】 【Pages】:2509-2517

【Authors】: Caleb Robinson ; Anthony Ortiz ; Kolya Malkin ; Blake Elias ; Andi Peng ; Dan Morris ; Bistra Dilkina ; Nebojsa Jojic

【Abstract】: We propose incorporating human labelers in a model fine-tuning system that provides immediate user feedback. In our framework, human labelers can interactively query model predictions on unlabeled data, choose which data to label, and see the resulting effect on the model's predictions. This bi-directional feedback loop allows humans to learn how the model responds to new data. We implement this framework for fine-tuning high-resolution land cover segmentation models and compare human-selected points to points selected using standard active learning methods. Specifically, we fine-tune a deep neural network – trained to segment high-resolution aerial imagery into different land cover classes in Maryland, USA – to a new spatial area in New York, USA using both our human-in-the-loop method and traditional active learning methods. The tight loop in our proposed system turns the algorithm and the human operator into a hybrid system that can produce land cover maps of large areas more efficiently than the traditional workflows. Our framework has applications in machine learning settings where there is a practically limitless supply of unlabeled data, of which only a small fraction can feasibly be labeled through human efforts, such as geospatial and medical image-based applications.

【Keywords】:

307. Hierarchical Expertise-Level Modeling for User Specific Robot-Behavior Explanations.

Paper Link】 【Pages】:2518-2526

【Authors】: Sarath Sreedharan ; Tathagata Chakraborti ; Christian Muise ; Subbarao Kambhampati

【Abstract】: In this work, we present a new planning formalism called Expectation-Aware planning for decision making with humans in the loop where the human's expectations about an agent may differ from the agent's own model. We show how this formulation allows agents to not only leverage existing strategies for handling model differences like explanations (Chakraborti et al. 2017) and explicability (Kulkarni et al. 2019), but can also exhibit novel behaviors that are generated through the combination of these different strategies. Our formulation also reveals a deep connection to existing approaches in epistemic planning. Specifically, we show how we can leverage classical planning compilations for epistemic planning to solve Expectation-Aware planning problems. To the best of our knowledge, the proposed formulation is the first complete solution to planning with diverging user expectations that is amenable to a classical planning compilation while successfully combining previous works on explanation and explicability. We empirically show how our approach provides a computational advantage over our earlier approaches that rely on search in the space of models.

【Keywords】:

308. Corpus-Level End-to-End Exploration for Interactive Systems.

Paper Link】 【Pages】:2527-2534

【Authors】: Zhiwen Tang ; Grace Hui Yang

【Abstract】: A core interest in building Artificial Intelligence (AI) agents is to let them interact with and assist humans. One example is Dynamic Search (DS), which models the process that a human works with a search engine agent to accomplish a complex and goal-oriented task. Early DS agents using Reinforcement Learning (RL) have only achieved limited success for (1) their lack of direct control over which documents to return and (2) the difficulty to recover from wrong search trajectories. In this paper, we present a novel corpus-level end-to-end exploration (CE3) method to address these issues. In our method, an entire text corpus is compressed into a global low-dimensional representation, which enables the agent to gain access to the full state and action spaces, including the under-explored areas. We also propose a new form of retrieval function, whose linear approximation allows end-to-end manipulation of documents. Experiments on the Text REtrieval Conference (TREC) Dynamic Domain (DD) Track show that CE3 outperforms the state-of-the-art DS systems.

【Keywords】:

309. Learning to Interactively Learn and Assist.

Paper Link】 【Pages】:2535-2543

【Authors】: Mark Woodward ; Chelsea Finn ; Karol Hausman

【Abstract】: When deploying autonomous agents in the real world, we need effective ways of communicating objectives to them. Traditional skill learning has revolved around reinforcement and imitation learning, each with rigid constraints on the format of information exchanged between the human and the agent. While scalar rewards carry little information, demonstrations require significant effort to provide and may carry more information than is necessary. Furthermore, rewards and demonstrations are often defined and collected before training begins, when the human is most uncertain about what information would help the agent. In contrast, when humans communicate objectives with each other, they make use of a large vocabulary of informative behaviors, including non-verbal communication, and often communicate throughout learning, responding to observed behavior. In this way, humans communicate intent with minimal effort. In this paper, we propose such interactive learning as an alternative to reward or demonstration-driven learning. To accomplish this, we introduce a multi-agent training framework that enables an agent to learn from another agent who knows the current task. Through a series of experiments, we demonstrate the emergence of a variety of interactive learning behaviors, including information-sharing, information-seeking, and question-answering. Most importantly, we find that our approach produces an agent that is capable of learning interactively from a human user, without a set of explicit demonstrations or a reward function, and achieving significantly better performance cooperatively with a human than a human performing the task alone.

【Keywords】:

310. CG-GAN: An Interactive Evolutionary GAN-Based Approach for Facial Composite Generation.

Paper Link】 【Pages】:2544-2551

【Authors】: Nicola Zaltron ; Luisa Zurlo ; Sebastian Risi

【Abstract】: Facial composites are graphical representations of an eyewitness's memory of a face. Many digital systems are available for the creation of such composites but are either unable to reproduce features unless previously designed or do not allow holistic changes to the image. In this paper, we improve the efficiency of composite creation by removing the reliance on expert knowledge and letting the system learn to represent faces from examples. The novel approach, Composite Generating GAN (CG-GAN), applies generative and evolutionary computation to allow casual users to easily create facial composites. Specifically, CG-GAN utilizes the generator network of a pg-GAN to create high-resolution human faces. Users are provided with several functions to interactively breed and edit faces. CG-GAN offers a novel way of generating and handling static and animated photo-realistic facial composites, with the possibility of combining multiple representations of the same perpetrator, generated by different eyewitnesses.

【Keywords】:

311. Querying to Find a Safe Policy under Uncertain Safety Constraints in Markov Decision Processes.

Paper Link】 【Pages】:2552-2559

【Authors】: Shun Zhang ; Edmund H. Durfee ; Satinder P. Singh

【Abstract】: An autonomous agent acting on behalf of a human user has the potential of causing side-effects that surprise the user in unsafe ways. When the agent cannot formulate a policy with only side-effects it knows are safe, it needs to selectively query the user about whether other useful side-effects are safe. Our goal is an algorithm that queries about as few potential side-effects as possible to find a safe policy, or to prove that none exists. We extend prior work on irreducible infeasible sets to also handle our problem's complication that a constraint to avoid a side-effect cannot be relaxed without user permission. By proving that our objectives are also adaptive submodular, we devise a querying algorithm that we empirically show finds nearly-optimal queries with much less computation than a guaranteed-optimal approach, and outperforms competing approximate approaches.

【Keywords】:

AAAI Technical Track: Human-Computation and Crowd Sourcing 4

312. BAR - A Reinforcement Learning Agent for Bounding-Box Automated Refinement.

Paper Link】 【Pages】:2561-2568

【Authors】: Morgane Ayle ; Jimmy Tekli ; Julia El Zini ; Boulos El Asmar ; Mariette Awad

【Abstract】: Research has shown that deep neural networks are able to help and assist human workers throughout the industrial sector via different computer vision applications. However, such data-driven learning approaches require a very large number of labeled training images in order to generalize well and achieve high accuracies that meet industry standards. Gathering and labeling large amounts of images is both expensive and time consuming, specifically for industrial use-cases. In this work, we introduce BAR (Bounding-box Automated Refinement), a reinforcement learning agent that learns to correct inaccurate bounding-boxes that are weakly generated by certain detection methods, or wrongly annotated by a human, using either an offline training method with Deep Reinforcement Learning (BAR-DRL), or an online one using Contextual Bandits (BAR-CB). Our agent limits the human intervention to correcting or verifying a subset of bounding-boxes instead of re-drawing new ones. Results on a car industry-related dataset and on the PASCAL VOC dataset show a consistent increase of up to 0.28 in the Intersection-over-Union of bounding-boxes with their desired ground-truths, while saving 30%-82% of human intervention time in either correcting or re-drawing inaccurate proposals.

【Keywords】:

313. Cost-Accuracy Aware Adaptive Labeling for Active Learning.

Paper Link】 【Pages】:2569-2576

【Authors】: Ruijiang Gao ; Maytal Saar-Tsechansky

【Abstract】: Conventional active learning algorithms assume a single labeler that produces noiseless label at a given, fixed cost, and aim to achieve the best generalization performance for given classifier under a budget constraint. However, in many real settings, different labelers have different labeling costs and can yield different labeling accuracies. Moreover, a given labeler may exhibit different labeling accuracies for different instances. This setting can be referred to as active learning with diverse labelers with varying costs and accuracies, and it arises in many important real settings. It is therefore beneficial to understand how to effectively trade-off between labeling accuracy for different instances, labeling costs, as well as the informativeness of training instances, so as to achieve the best generalization performance at the lowest labeling cost. In this paper, we propose a new algorithm for selecting instances, labelers (and their corresponding costs and labeling accuracies), that employs generalization bound of learning with label noise to select informative instances and labelers so as to achieve higher generalization accuracy at a lower cost. Our proposed algorithm demonstrates state-of-the-art performance on five UCI and a real crowdsourcing dataset.

【Keywords】:

314. HirePeer: Impartial Peer-Assessed Hiring at Scale in Expert Crowdsourcing Markets.

Paper Link】 【Pages】:2577-2584

【Authors】: Yasmine Kotturi ; Anson Kahng ; Ariel D. Procaccia ; Chinmay Kulkarni

【Abstract】: Expert crowdsourcing (e.g., Upwork.com) provides promising benefits such as productivity improvements for employers, and flexible working arrangements for workers. Yet to realize these benefits, a key persistent challenge is effective hiring at scale. Current approaches, such as reputation systems and standardized competency tests, develop weaknesses such as score inflation over time, thus degrading market quality. This paper presents HirePeer, a novel alternative approach to hiring at scale that leverages peer assessment to elicit honest assessments of fellow workers' job application materials, which it then aggregates using an impartial ranking algorithm. This paper reports on three studies that investigate both the costs and the benefits to workers and employers of impartial peer-assessed hiring. We find, to solicit honest assessments, algorithms must be communicated in terms of their impartial effects. Second, in practice, peer assessment is highly accurate, and impartial rank aggregation algorithms incur a small accuracy cost for their impartiality guarantee. Third, workers report finding peer-assessed hiring useful for receiving targeted feedback on their job materials.

【Keywords】:

315. Fine-Grained Machine Teaching with Attention Modeling.

Paper Link】 【Pages】:2585-2592

【Authors】: Jiacheng Liu ; Xiaofeng Hou ; Feilong Tang

【Abstract】: The state-of-the-art machine teaching techniques overestimate the ability of learners in grasping a complex concept. On one side, since a complicated concept always contains multiple fine-grained concepts, students can only grasp parts of them during a practical teaching process. On the other side, because a single teaching sample contains unequal information in terms of various fine-grained concepts, learners accept them at different levels. Thus, with more and more complicated dataset, it is challenging for us to rethink the machine teaching frameworks. In this work, we propose a new machine teaching framework called Attentive Machine Teaching (AMT). Specifically, we argue that a complicated concept always consists of multiple features, which we call fine-grained concepts. We define attention to represent the learning level of a learner in studying a fine-grained concept. Afterwards, we propose AMT, an adaptive teaching framework to construct the personalized optimal teaching dataset for learners. During each iteration, we estimate the workers' ability with Graph Neural Network (GNN) and select the best sample using a pool-based searching approach. For corroborating our theoretical findings, we conduct extensive experiments with both synthetic datasets and real datasets. Our experimental results verify the effectiveness of AMT algorithms.

【Keywords】:

AAAI Technical Track: Humans and AI 16

316. CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines.

Paper Link】 【Pages】:2594-2601

【Authors】: Arjun R. Akula ; Shuai Wang ; Song-Chun Zhu

【Abstract】: We present CoCoX (short for Conceptual and Counterfactual Explanations), a model for explaining decisions made by a deep convolutional neural network (CNN). In Cognitive Psychology, the factors (or semantic-level features) that humans zoom in on when they imagine an alternative to a model prediction are often referred to as fault-lines. Motivated by this, our CoCoX model explains decisions made by a CNN using fault-lines. Specifically, given an input image I for which a CNN classification model M predicts class cpred, our fault-line based explanation identifies the minimal semantic-level features (e.g., stripes on zebra, pointed ears of dog), referred to as explainable concepts, that need to be added to or deleted from I in order to alter the classification category of I by M to another specified class calt. We argue that, due to the conceptual and counterfactual nature of fault-lines, our CoCoX explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex deep learning models. Extensive quantitative and qualitative experiments verify our hypotheses, showing that CoCoX significantly outperforms the state-of-the-art explainable AI models. Our implementation is available at https://github.com/arjunakula/CoCoX

【Keywords】:

317. Towards Awareness of Human Relational Strategies in Virtual Agents.

Paper Link】 【Pages】:2602-2610

【Authors】: Ian Beaver ; Cynthia Freeman ; Abdullah Mueen

【Abstract】: As Intelligent Virtual Agents (IVAs) increase in adoption and further emulate human personalities, we are interested in how humans apply relational strategies to them compared to other humans in a service environment. Human-computer data from three live customer service IVAs was collected, and annotators marked all text that was deemed unnecessary to the determination of user intention as well as the presence of multiple intents. After merging the selections of multiple annotators, a second round of annotation determined the classes of relational language present in the unnecessary sections such as Greetings, Backstory, Justification, Gratitude, Rants, or Expressing Emotions. We compare the usage of such language in human-human service interactions. We show that removal of this language from task-based inputs has a positive effect by both an increase in confidence and improvement in responses, as evaluated by humans, demonstrating the need for IVAs to anticipate relational language injection. This work provides a methodology to identify relational segments and a baseline of human performance in this task as well as laying the groundwork for IVAs to reciprocate relational strategies in order to improve their believeability.

【Keywords】:

318. Regression under Human Assistance.

Paper Link】 【Pages】:2611-2620

【Authors】: Abir De ; Paramita Koley ; Niloy Ganguly ; Manuel Gomez-Rodriguez

【Abstract】: Decisions are increasingly taken by both humans and machine learning models. However, machine learning models are currently trained for full automation—they are not aware that some of the decisions may still be taken by humans. In this paper, we take a first step towards the development of machine learning models that are optimized to operate under different automation levels. More specifically, we first introduce the problem of ridge regression under human assistance and show that it is NP-hard. Then, we derive an alternative representation of the corresponding objective function as a difference of nondecreasing submodular functions. Building on this representation, we further show that the objective is nondecreasing and satisfies α-submodularity, a recently introduced notion of approximate submodularity. These properties allow a simple and efficient greedy algorithm to enjoy approximation guarantees at solving the problem. Experiments on synthetic and real-world data from two important applications—medical diagnosis and content moderation—demonstrate that the greedy algorithm beats several competitive baselines.

【Keywords】:

319. MIMAMO Net: Integrating Micro- and Macro-Motion for Video Emotion Recognition.

Paper Link】 【Pages】:2621-2628

【Authors】: Didan Deng ; Zhaokang Chen ; Yuqian Zhou ; Bertram E. Shi

【Abstract】: Spatial-temporal feature learning is of vital importance for video emotion recognition. Previous deep network structures often focused on macro-motion which extends over long time scales, e.g., on the order of seconds. We believe integrating structures capturing information about both micro- and macro-motion will benefit emotion prediction, because human perceive both micro- and macro-expressions. In this paper, we propose to combine micro- and macro-motion features to improve video emotion recognition with a two-stream recurrent network, named MIMAMO (Micro-Macro-Motion) Net. Specifically, smaller and shorter micro-motions are analyzed by a two-stream network, while larger and more sustained macro-motions can be well captured by a subsequent recurrent network. Assigning specific interpretations to the roles of different parts of the network enables us to make choice of parameters based on prior knowledge: choices that turn out to be optimal. One of the important innovations in our model is the use of interframe phase differences rather than optical flow as input to the temporal stream. Compared with the optical flow, phase differences require less computation and are more robust to illumination changes. Our proposed network achieves state of the art performance on two video emotion datasets, the OMG emotion dataset and the Aff-Wild dataset. The most significant gains are for arousal prediction, for which motion information is intuitively more informative. Source code is available at https://github.com/wtomin/MIMAMO-Net.

【Keywords】:

320. Conditional Generative Neural Decoding with Structured CNN Feature Prediction.

Paper Link】 【Pages】:2629-2636

【Authors】: Changde Du ; Changying Du ; Lijie Huang ; Huiguang He

【Abstract】: Decoding visual contents from human brain activity is a challenging task with great scientific value. Two main facts that hinder existing methods from producing satisfactory results are 1) typically small paired training data; 2) under-exploitation of the structural information underlying the data. In this paper, we present a novel conditional deep generative neural decoding approach with structured intermediate feature prediction. Specifically, our approach first decodes the brain activity to the multilayer intermediate features of a pretrained convolutional neural network (CNN) with a structured multi-output regression (SMR) model, and then inverts the decoded CNN features to the visual images with an introspective conditional generation (ICG) model. The proposed SMR model can simultaneously leverage the covariance structures underlying the brain activities, the CNN features and the prediction tasks to improve the decoding accuracy and interpretability. Further, our ICG model can 1) leverage abundant unpaired images to augment the training data; 2) self-evaluate the quality of its conditionally generated images; and 3) adversarially improve itself without extra discriminator. Experimental results show that our approach yields state-of-the-art visual reconstructions from brain activities.

【Keywords】:

321. GaSPing for Utility.

Paper Link】 【Pages】:2637-2644

【Authors】: Mengyang Gu ; Debarun Bhattacharjya ; Dharmashankar Subramanian

【Abstract】: High-consequence decisions often require a detailed investigation of a decision maker's preferences, as represented by a utility function. Inferring a decision maker's utility function through assessments typically involves an elicitation phase where the decision maker responds to a series of elicitation queries, followed by an estimation phase where the state-of-the-art for direct elicitation approaches in practice is to either fit responses to a parametric form or perform linear interpolation. We introduce a Bayesian nonparametric method involving Gaussian stochastic processes for estimating a utility function from direct elicitation responses. Advantages include the flexibility to fit a large class of functions, favorable theoretical properties, and a fully probabilistic view of the decision maker's preference properties including risk attitude. Through extensive simulation experiments as well as two real datasets from management science, we demonstrate that the proposed approach results in better function fitting.

【Keywords】:

322. Harnessing GANs for Zero-Shot Learning of New Classes in Visual Speech Recognition.

Paper Link】 【Pages】:2645-2652

【Authors】: Yaman Kumar ; Dhruva Sahrawat ; Shubham Maheshwari ; Debanjan Mahata ; Amanda Stent ; Yifang Yin ; Rajiv Ratn Shah ; Roger Zimmermann

【Abstract】: Visual Speech Recognition (VSR) is the process of recognizing or interpreting speech by watching the lip movements of the speaker. Recent machine learning based approaches model VSR as a classification problem; however, the scarcity of training data leads to error-prone systems with very low accuracies in predicting unseen classes. To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases. We also show that our models are language agnostic and therefore capable of seamlessly generating, using English training data, videos for a new language (Hindi). To the best of our knowledge, this is the first work to show empirical evidence of the use of GANs for generating training samples of unseen classes in the domain of VSR, hence facilitating zero-shot learning. We make the added videos for new classes publicly available along with our code1.

【Keywords】:

323. Graph-Based Decoding Model for Functional Alignment of Unaligned fMRI Data.

Paper Link】 【Pages】:2653-2660

【Authors】: Weida Li ; Mingxia Liu ; Fang Chen ; Daoqiang Zhang

【Abstract】: Aggregating multi-subject functional magnetic resonance imaging (fMRI) data is indispensable for generating valid and general inferences from patterns distributed across human brains. The disparities in anatomical structures and functional topographies of human brains warrant aligning fMRI data across subjects. However, the existing functional alignment methods cannot handle well various kinds of fMRI datasets today, especially when they are not temporally-aligned, i.e., some of the subjects probably lack the responses to some stimuli, or different subjects might follow different sequences of stimuli. In this paper, a cross-subject graph that depicts the (dis)similarities between samples across subjects is used as a priori for developing a more flexible framework that suits an assortment of fMRI datasets. However, the high dimension of fMRI data and the use of multiple subjects makes the crude framework time-consuming or unpractical. To address this issue, we further regularize the framework, so that a novel feasible kernel-based optimization, which permits non-linear feature extraction, could be theoretically developed. Specifically, a low-dimension assumption is imposed on each new feature space to avoid overfitting caused by the high-spatial-low-temporal resolution of fMRI data. Experimental results on five datasets suggest that the proposed method is not only superior to several state-of-the-art methods on temporally-aligned fMRI data, but also suitable for dealing with temporally-unaligned fMRI data.

【Keywords】:

324. Multi-Source Domain Adaptation for Visual Sentiment Classification.

Paper Link】 【Pages】:2661-2668

【Authors】: Chuang Lin ; Sicheng Zhao ; Lei Meng ; Tat-Seng Chua

【Abstract】: Existing domain adaptation methods on visual sentiment classification typically are investigated under the single-source scenario, where the knowledge learned from a source domain of sufficient labeled data is transferred to the target domain of loosely labeled or unlabeled data. However, in practice, data from a single source domain usually have a limited volume and can hardly cover the characteristics of the target domain. In this paper, we propose a novel multi-source domain adaptation (MDA) method, termed Multi-source Sentiment Generative Adversarial Network (MSGAN), for visual sentiment classification. To handle data from multiple source domains, it learns to find a unified sentiment latent space where data from both the source and target domains share a similar distribution. This is achieved via cycle consistent adversarial learning in an end-to-end manner. Extensive experiments conducted on four benchmark datasets demonstrate that MSGAN significantly outperforms the state-of-the-art MDA approaches for visual sentiment classification.

【Keywords】:

325. Learning Graph Convolutional Network for Skeleton-Based Human Action Recognition by Neural Searching.

Paper Link】 【Pages】:2669-2676

【Authors】: Wei Peng ; Xiaopeng Hong ; Haoyu Chen ; Guoying Zhao

【Abstract】: Human action recognition from skeleton data, fuelled by the Graph Convolutional Network (GCN) with its powerful capability of modeling non-Euclidean data, has attracted lots of attention. However, many existing GCNs provide a pre-defined graph structure and share it through the entire network, which can loss implicit joint correlations especially for the higher-level features. Besides, the mainstream spectral GCN is approximated by one-order hop such that higher-order connections are not well involved. All of these require huge efforts to design a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for this task. Specifically, we explore the spatial-temporal correlations between nodes and build a search space with multiple dynamic graph modules. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a corresponding sampling- and memory-efficient evolution strategy is proposed to search in this space. The resulted architecture proves the effectiveness of the higher-order approximation and the layer-wise dynamic graph modules. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scale skeleton-based action recognition datasets. The results show that our model gets the state-of-the-art results in term of given metrics.

【Keywords】:

326. UCF-STAR: A Large Scale Still Image Dataset for Understanding Human Actions.

Paper Link】 【Pages】:2677-2684

【Authors】: Marjaneh Safaei ; Pooyan Balouchian ; Hassan Foroosh

【Abstract】: Action recognition in still images poses a great challenge due to (i) fewer available training data, (ii) absence of temporal information. To address the first challenge, we introduce a dataset for STill image Action Recognition (STAR), containing over $1M$ images across 50 different human body-motion action categories. UCF-STAR is the largest dataset in the literature for action recognition in still images. The key characteristics of UCF-STAR include (1) focusing on human body-motion rather than relatively static human-object interaction categories, (2) collecting images from the wild to benefit from a varied set of action representations, (3) appending multiple human-annotated labels per image rather than just the action label, and (4) inclusion of rich, structured and multi-modal set of metadata for each image. This departs from existing datasets, which typically provide single annotation in a smaller number of images and categories, with no metadata. UCF-STAR exposes the intrinsic difficulty of action recognition through its realistic scene and action complexity. To benchmark and demonstrate the benefits of UCF-STAR as a large-scale dataset, and to show the role of “latent” motion information in recognizing human actions in still images, we present a novel approach relying on predicting temporal information, yielding higher accuracy on 5 widely-used datasets.

【Keywords】:

327. Towards Socially Responsible AI: Cognitive Bias-Aware Multi-Objective Learning.

Paper Link】 【Pages】:2685-2692

【Authors】: Procheta Sen ; Debasis Ganguly

【Abstract】: Human society had a long history of suffering from cognitive biases leading to social prejudices and mass injustice. The prevalent existence of cognitive biases in large volumes of historical data can pose a threat of being manifested as unethical and seemingly inhumane predictions as outputs of AI systems trained on such data. To alleviate this problem, we propose a bias-aware multi-objective learning framework that given a set of identity attributes (e.g. gender, ethnicity etc.) and a subset of sensitive categories of the possible classes of prediction outputs, learns to reduce the frequency of predicting certain combinations of them, e.g. predicting stereotypes such as ‘most blacks use abusive language’, or ‘fear is a virtue of women’. Our experiments conducted on an emotion prediction task with balanced class priors shows that a set of baseline bias-agnostic models exhibit cognitive biases with respect to gender, such as women are prone to be afraid whereas men are more prone to be angry. In contrast, our proposed bias-aware multi-objective learning methodology is shown to reduce such biases in the predictid emotions.

【Keywords】:

328. Reinforcing an Image Caption Generator Using Off-Line Human Feedback.

Paper Link】 【Pages】:2693-2700

【Authors】: Paul Hongsuck Seo ; Piyush Sharma ; Tomer Levinboim ; Bohyung Han ; Radu Soricut

【Abstract】: Human ratings are currently the most accurate way to assess the quality of an image captioning model, yet most often the only used outcome of an expensive human rating evaluation is a few overall statistics over the evaluation dataset. In this paper, we show that the signal from instance-level human caption ratings can be leveraged to improve captioning models, even when the amount of caption ratings is several orders of magnitude less than the caption training data. We employ a policy gradient method to maximize the human ratings as rewards in an off-policy reinforcement learning setting, where policy gradients are estimated by samples from a distribution that focuses on the captions in a caption ratings dataset. Our empirical evidence indicates that the proposed method learns to generalize the human raters' judgments to a previously unseen set of images, as judged by a different set of human judges, and additionally on a different, multi-dimensional side-by-side human evaluation procedure.

【Keywords】:

329. Instance-Adaptive Graph for EEG Emotion Recognition.

Paper Link】 【Pages】:2701-2708

【Authors】: Tengfei Song ; Suyuan Liu ; Wenming Zheng ; Yuan Zong ; Zhen Cui

【Abstract】: To tackle the individual differences and characterize the dynamic relationships among different EEG regions for EEG emotion recognition, in this paper, we propose a novel instance-adaptive graph method (IAG), which employs a more flexible way to construct graphic connections so as to present different graphic representations determined by different input instances. To fit the different EEG pattern, we employ an additional branch to characterize the intrinsic dynamic relationships between different EEG channels. To give a more precise graphic representation, we design the multi-level and multi-graph convolutional operation and the graph coarsening. Furthermore, we present a type of sparse graphic representation to extract more discriminative features. Experiments on two widely-used EEG emotion recognition datasets are conducted to evaluate the proposed model and the experimental results show that our method achieves the state-of-the-art performance.

【Keywords】:

330. Variational Pathway Reasoning for EEG Emotion Recognition.

Paper Link】 【Pages】:2709-2716

【Authors】: Tong Zhang ; Zhen Cui ; Chunyan Xu ; Wenming Zheng ; Jian Yang

【Abstract】: Research on human emotion cognition revealed that connections and pathways exist between spatially-adjacent and functional-related areas during emotion expression (Adolphs 2002a; Bullmore and Sporns 2009). Deeply inspired by this mechanism, we propose a heuristic Variational Pathway Reasoning (VPR) method to deal with EEG-based emotion recognition. We introduce random walk to generate a large number of candidate pathways along electrodes. To encode each pathway, the dynamic sequence model is further used to learn between-electrode dependencies. The encoded pathways around each electrode are aggregated to produce a pseudo maximum-energy pathway, which consists of the most important pair-wise connections. To find those most salient connections, we propose a sparse variational scaling (SVS) module to learn scaling factors of pseudo pathways by using the Bayesian probabilistic process and sparsity constraint, where the former endows good generalization ability while the latter favors adaptive pathway selection. Finally, the salient pathways from those candidates are jointly decided by the pseudo pathways and scaling factors. Extensive experiments on EEG emotion recognition demonstrate that the proposed VPR is superior to those state-of-the-art methods, and could find some interesting pathways w.r.t. different emotions.

【Keywords】:

331. Crowd-Assisted Disaster Scene Assessment with Human-AI Interactive Attention.

Paper Link】 【Pages】:2717-2724

【Authors】: Daniel Yue Zhang ; Yifeng Huang ; Yang Zhang ; Dong Wang

【Abstract】: The recent advances of mobile sensing and artificial intelligence (AI) have brought new revolutions in disaster response applications. One example is disaster scene assessment (DSA) which leverages computer vision techniques to assess the level of damage severity of the disaster events from images provided by eyewitnesses on social media. The assessment results are critical in prioritizing the rescue operations of the response teams. While AI algorithms can significantly reduce the detection time and manual labeling cost in such applications, their performance often falls short of the desired accuracy. Our work is motivated by the emergence of crowdsourcing platforms (e.g., Amazon Mechanic Turk, Waze) that provide unprecedented opportunities for acquiring human intelligence for AI applications. In this paper, we develop an interactive Disaster Scene Assessment (iDSA) scheme that allows AI algorithms to directly interact with humans to identify the salient regions of the disaster images in DSA applications. We also develop new incentive designs and active learning techniques to ensure reliable, timely, and cost-efficient responses from the crowdsourcing platforms. Our evaluation results on real-world case studies during Nepal and Ecuador earthquake events demonstrate that iDSA can significantly outperform state-of-the-art baselines in accurately assessing the damage of disaster scenes.

【Keywords】:

AAAI Technical Track: Knowledge Representation and Reasoning 46

332. Learning and Reasoning for Robot Sequential Decision Making under Uncertainty.

Paper Link】 【Pages】:2726-2733

【Authors】: Saeid Amiri ; Mohammad Shokrolah Shirazi ; Shiqi Zhang

【Abstract】: Robots frequently face complex tasks that require more than one action, where sequential decision-making (sdm) capabilities become necessary. The key contribution of this work is a robot sdm framework, called lcorpp, that supports the simultaneous capabilities of supervised learning for passive state estimation, automated reasoning with declarative human knowledge, and planning under uncertainty toward achieving long-term goals. In particular, we use a hybrid reasoning paradigm to refine the state estimator, and provide informative priors for the probabilistic planner. In experiments, a mobile robot is tasked with estimating human intentions using their motion trajectories, declarative contextual knowledge, and human-robot interaction (dialog-based and motion-based). Results suggest that, in efficiency and accuracy, our framework performs better than its no-learning and no-reasoning counterparts in office environment.

【Keywords】:

333. Query Rewriting for Ontology-Mediated Conditional Answers.

Paper Link】 【Pages】:2734-2741

【Authors】: Medina Andresel ; Magdalena Ortiz ; Mantas Simkus

【Abstract】: Among many solutions for extracting useful answers from incomplete data, ontology-mediated queries (OMQs) use domain knowledge to infer missing facts. We propose an extension of OMQs that allows us to make certain assumptions—for example, about parts of the data that may be unavailable at query time, or costly to query—and retrieve conditional answers, that is, tuples that become certain query answers when the assumptions hold. We show that querying in this powerful formalism often has no higher worst-case complexity than in plain OMQs, and that these queries are first-order rewritable for DL-Liteℛ. Rewritability is preserved even if we allow some use of closed predicates to combine the (partial) closed- and open-world assumptions. This is remarkable, as closed predicates are a very useful extension of OMQs, but they usually make query answering intractable in data complexity, even in very restricted settings.

【Keywords】:

334. Revisiting the Foundations of Abstract Argumentation - Semantics Based on Weak Admissibility and Weak Defense.

Paper Link】 【Pages】:2742-2749

【Authors】: Ringo Baumann ; Gerhard Brewka ; Markus Ulbricht

【Abstract】: In his seminal 1995 paper, Dung paved the way for abstract argumentation, a by now major research area in knowledge representation. He pointed out that there is a problematic issue with self-defeating arguments underlying all traditional semantics. A self-defeat occurs if an argument attacks itself either directly or indirectly via an odd attack loop, unless the loop is broken up by some argument attacking the loop from outside. Motivated by the fact that such arguments represent self-contradictory or paradoxical arguments, he asked for reasonable semantics which overcome the problem that such arguments may indeed invalidate any argument they attack. This paper tackles this problem from scratch. More precisely, instead of continuing to use previous concepts defined by Dung we provide new foundations for abstract argumentation, so-called weak admissibility and weak defense. After showing that these key concepts are compatible as in the classical case we introduce new versions of the classical Dung-style semantics including complete, preferred and grounded semantics. We provide a rigorous study of these new concepts including interrelationships as well as the relations to their Dung-style counterparts. The newly introduced semantics overcome the issue with self-defeating arguments, and they are semantically insensitive to syntactic deletions of self-attacking arguments, a special case of self-defeat.

【Keywords】:

335. Forgetting an Argument.

Paper Link】 【Pages】:2750-2757

【Authors】: Ringo Baumann ; Dov M. Gabbay ; Odinaldo Rodrigues

【Abstract】: The notion of forgetting, as considered in the famous paper by Lin and Reiter in 1994 has been extensively studied in classical logic and more recently, in non-monotonic formalisms like logic programming. In this paper, we convey the idea of forgetting to another major AI formalism, namely Dung-style argumentation frameworks. Our approach is axiomatic-driven and not limited to any specific semantics: we propose semantical and syntactical desiderata encoding different criteria for what forgetting an argument might mean; analyze how these criteria relate to each other; and check whether the criteria can be satisfied in general. The analysis is done for a number of widely used argumentation semantics. Our investigation shows that almost all desiderata are individually satisfiable. However, combinations of semantical and/or syntactical conditions reveal a much more interesting landscape. For instance, we found that the ad hoc approach to forgetting an argument, i.e., by the syntactical removal of the argument and all of its associated attacks, is too restrictive and only compatible with the two weakest semantical desiderata. Amongst the several interesting combinations identified, we showed that one satisfies a notion of minimal change and presented an algorithm that given an AF F and argument x, constructs a suitable AF G satisfying the conditions in the combination.

【Keywords】:

336. Checking Chase Termination over Ontologies of Existential Rules with Equality.

Paper Link】 【Pages】:2758-2765

【Authors】: David Carral ; Jacopo Urbani

【Abstract】: The chase is a sound and complete algorithm for conjunctive query answering over ontologies of existential rules with equality. To enable its effective use, we can apply acyclicity notions; that is, sufficient conditions that guarantee chase termination. Unfortunately, most of these notions have only been defined for existential rule sets without equality. A proposed solution to circumvent this issue is to treat equality as an ordinary predicate with an explicit axiomatisation. We empirically show that this solution is not efficient in practice and propose an alternative approach. More precisely, we show that, if the chase terminates for any equality axiomatisation of an ontology, then it terminates for the original ontology (which may contain equality). Therefore, one can apply existing acyclicity notions to check chase termination over an axiomatisation of an ontology and then use the original ontology for reasoning. We show that, in practice, doing so results in a more efficient reasoning procedure. Furthermore, we present equality model-faithful acyclicity, a general acyclicity notion that can be directly applied to ontologies with equality.

【Keywords】:

337. Model-Based Diagnosis with Uncertain Observations.

Paper Link】 【Pages】:2766-2773

【Authors】: Dean Cazes ; Meir Kalech

【Abstract】: Classical model-based diagnosis uses a model of the system to infer diagnoses – explanations – of a given abnormal observation. In this work, we explore how to address the case where there is uncertainty over a given observation. This can happen, for example, when the observations are collected by noisy sensors, that are known to return incorrect observations with some probability. We formally define this common scenario for consistency-based and abductive models. In addition, we analyze the complexity of two complete algorithms we propose for finding all diagnoses and correctly ranking them. Finally, we propose a third algorithm that returns the most probable diagnosis without finding all possible diagnoses. Experimental evaluation shows that this third algorithm can be very effective in cases where the number of faults is small and the uncertainty over the observations is not large. If, however, all possible diagnoses are desired, then the choice between the first two algorithms depends on whether the domain's diagnosis form is abductive or consistent.

【Keywords】:

338. ParamE: Regarding Neural Network Parameters as Relation Embeddings for Knowledge Graph Completion.

Paper Link】 【Pages】:2774-2781

【Authors】: Feihu Che ; Dawei Zhang ; Jianhua Tao ; Mingyue Niu ; Bocheng Zhao

【Abstract】: We study the task of learning entity and relation embeddings in knowledge graphs for predicting missing links. Previous translational models on link prediction make use of translational properties but lack enough expressiveness, while the convolution neural network based model (ConvE) takes advantage of the great nonlinearity fitting ability of neural networks but overlooks translational properties. In this paper, we propose a new knowledge graph embedding model called ParamE which can utilize the two advantages together. In ParamE, head entity embeddings, relation embeddings and tail entity embeddings are regarded as the input, parameters and output of a neural network respectively. Since parameters in networks are effective in converting input to output, taking neural network parameters as relation embeddings makes ParamE much more expressive and translational. In addition, the entity and relation embeddings in ParamE are from feature space and parameter space respectively, which is in line with the essence that entities and relations are supposed to be mapped into two different spaces. We evaluate the performances of ParamE on standard FB15k-237 and WN18RR datasets, and experiments show ParamE can significantly outperform existing state-of-the-art models, such as ConvE, SACN, RotatE and D4-STE/Gumbel.

【Keywords】:

339. Answering Conjunctive Queries with Inequalities in DL-Liteℛ.

Paper Link】 【Pages】:2782-2789

【Authors】: Gianluca Cima ; Maurizio Lenzerini ; Antonella Poggi

【Abstract】: In the context of the Description Logic DL-Liteℛ≠, i.e., DL-Liteℛ without UNA and with inequality axioms, we address the problem of adding to unions of conjunctive queries (UCQs) one of the simplest forms of negation, namely, inequality. It is well known that answering conjunctive queries with unrestricted inequalities over DL-Liteℛ ontologies is in general undecidable. Therefore, we explore two strategies for recovering decidability, and, hopefully, tractability. Firstly, we weaken the ontology language, and consider the variant of DL-Liteℛ≠ corresponding to rdfs enriched with both inequality and disjointness axioms. Secondly, we weaken the query language, by preventing inequalities to be applied to existentially quantified variables, thus obtaining the class of queries named UCQ≠,bs. We prove that in the two cases, query answering is decidable, and we provide tight complexity bounds for the problem, both for data and combined complexity. Notably, the results show that answering UCQ≠,bs over DL-Liteℛ≠ ontologies is still in AC0 in data complexity.

【Keywords】:

340. Epistemic Integrity Constraints for Ontology-Based Data Management.

Paper Link】 【Pages】:2790-2797

【Authors】: Marco Console ; Maurizio Lenzerini

【Abstract】: Ontology-based data management (OBDM) is a powerful knowledge-oriented paradigm for managing data spread over multiple heterogeneous sources. In OBDM, the data sources of an information system are handled through the reconciled view provided by an ontology, i.e., the conceptualization of the underlying domain of interest expressed in some formal language. In any information systems where the basic knowledge resides in data sources, it is of paramount importance to specify the acceptable states of such information. Usually, this is done via integrity constraints, i.e., requirements that the data must satisfy formally expressed in some specific language. However, while the semantics of integrity constraints are clear in the context of databases, the presence of inferred information, typical of OBDM systems, considerably complicates the matter. In this paper, we establish a novel framework for integrity constraints in the OBDM scenarios, based on the notion of knowledge state of the information system. For integrity constraints in this framework, we define a language based on epistemic logic, and study decidability and complexity of both checking satisfaction and performing different forms of static analysis on them.

【Keywords】:

341. Hypothetical Answers to Continuous Queries over Data Streams.

Paper Link】 【Pages】:2798-2805

【Authors】: Luís Cruz-Filipe ; Isabel Nunes ; Graça Gaspar

【Abstract】: Continuous queries over data streams often delay answers until some relevant input arrives through the data stream. These delays may turn answers, when they arrive, obsolete to users who sometimes have to make decisions with no help whatsoever. Therefore, it can be useful to provide hypothetical answers – “given the current information, it is possible that X will become true at time t” – instead of no information at all. In this paper we present a semantics for queries and corresponding answers that covers such hypothetical answers, together with an online algorithm for updating the set of facts that are consistent with the currently available information.

【Keywords】:

342. ElGolog: A High-Level Programming Language with Memory of the Execution History.

Paper Link】 【Pages】:2806-2813

【Authors】: Giuseppe De Giacomo ; Yves Lespérance ; Eugenia Ternovska

【Abstract】: Most programming languages only support tests that refer exclusively to the current state. This applies even to high-level programming languages based on the situation calculus such as Golog. The result is that additional variables/fluents/data structures must be introduced to track conditions that the program uses in tests to make decisions. In this paper, drawing inspiration from McCarthy's Elephant 2000, we propose an extended version of Golog, called ElGolog, that supports rich tests about the execution history, where tests are expressed in a first-order variant of two-way linear dynamic logic that uses ElGolog programs with converse. We show that in spite of rich tests, ElGolog shares key features with Golog, including a sematics based on macroexpansion into situation calculus formulas, upon which regression can still be applied. We also show that like Golog, our extended language can easily be implemented in ElGolog.

【Keywords】:

343. Efficient Model-Based Diagnosis of Sequential Circuits.

Paper Link】 【Pages】:2814-2821

【Authors】: Alexander Feldman ; Ingo Pill ; Franz Wotawa ; Ion Matei ; Johan de Kleer

【Abstract】: In Model-Based Diagnosis (MBD), we concern ourselves with the health and safety of physical and software systems. Although we often use different knowledge representations and algorithms, some tools like satisfiability (SAT) solvers and temporal logics, are used in both domains. In this paper we introduce Finite Trace Next Logic (FTNL) models of sequential circuits and propose an enhanced algorithm for computing minimal-cardinality diagnoses. Existing state-of-the-art satisfiability algorithms for minimal diagnosis use Sorting Networks (SNs) for constraining the cardinality of the diagnostic candidates. In our approach we exploit Multi-Operand Adders (MOAs). Based on extensive tests with ISCAS-89 circuits, we found that MOAs enable Conjunctive Normal Form (CNF) encodings that are significantly more compact. These encodings lead to 19.7 to 67.6 times fewer variables and 18.4 to 62 times fewer clauses. For converting an FTNL model to CNF, we could achieve a speed-up ranging from 6.2 to 22.2. Using SNs fosters 3.4 to 5.5 times faster on-line satisfiability checking though. This makes MOAs preferable for applications where RAM and off-line time are more limited than on-line CPU time.

【Keywords】:

344. Proportional Belief Merging.

Paper Link】 【Pages】:2822-2829

【Authors】: Adrian Haret ; Martin Lackner ; Andreas Pfandler ; Johannes Peter Wallner

【Abstract】: In this paper we introduce proportionality to belief merging. Belief merging is a framework for aggregating information presented in the form of propositional formulas, and it generalizes many aggregation models in social choice. In our analysis, two incompatible notions of proportionality emerge: one similar to standard notions of proportionality in social choice, the other more in tune with the logic-based merging setting. Since established merging operators meet neither of these proportionality requirements, we design new proportional belief merging operators. We analyze the proposed operators against established rationality postulates, finding that current approaches to proportionality from the field of social choice are, at their core, incompatible with standard rationality postulates in belief merging. We provide characterization results that explain the underlying conflict, and provide a complexity analysis of our novel operators.

【Keywords】:

345. Structural Decompositions of Epistemic Logic Programs.

Paper Link】 【Pages】:2830-2837

【Authors】: Markus Hecher ; Michael Morak ; Stefan Woltran

【Abstract】: Epistemic logic programs (ELPs) are a popular generalization of standard Answer Set Programming (ASP) providing means for reasoning over answer sets within the language. This richer formalism comes at the price of higher computational complexity reaching up to the fourth level of the polynomial hierarchy. However, in contrast to standard ASP, dedicated investigations towards tractability have not been undertaken yet. In this paper, we give first results in this direction and show that central ELP problems can be solved in linear time for ELPs exhibiting structural properties in terms of bounded treewidth. We also provide a full dynamic programming algorithm that adheres to these bounds. Finally, we show that applying treewidth to a novel dependency structure—given in terms of epistemic literals—allows to bound the number of ASP solver calls in typical ELP solving procedures.

【Keywords】:

346. Going Deep: Graph Convolutional Ladder-Shape Networks.

Paper Link】 【Pages】:2838-2845

【Authors】: Ruiqi Hu ; Shirui Pan ; Guodong Long ; Qinghua Lu ; Liming Zhu ; Jing Jiang

【Abstract】: Neighborhood aggregation algorithms like spectral graph convolutional networks (GCNs) formulate graph convolutions as a symmetric Laplacian smoothing operation to aggregate the feature information of one node with that of its neighbors. While they have achieved great success in semi-supervised node classification on graphs, current approaches suffer from the over-smoothing problem when the depth of the neural networks increases, which always leads to a noticeable degradation of performance. To solve this problem, we present graph convolutional ladder-shape networks (GCLN), a novel graph neural network architecture that transmits messages from shallow layers to deeper layers to overcome the over-smoothing problem and dramatically extend the scale of the neural networks with improved performance. We have validated the effectiveness of proposed GCLN at a node-wise level with a semi-supervised task (node classification) and an unsupervised task (node clustering), and at a graph-wise level with graph classification by applying a differentiable pooling operation. The proposed GCLN outperforms original GCNs, deep GCNs and other state-of-the-art GCN-based models for all three tasks, which were designed from various perspectives on six real-world benchmark data sets.

【Keywords】:

347. Aggregation of Perspectives Using the Constellations Approach to Probabilistic Argumentation.

Paper Link】 【Pages】:2846-2853

【Authors】: Anthony Hunter ; Kawsar Noor

【Abstract】: In the constellations approach to probabilistic argumentation, there is a probability distribution over the subgraphs of an argument graph, and this can be used to represent the uncertainty in the structure of the argument graph. In this paper, we consider how we can construct this probability distribution from data. We provide a language for data based on perspectives (opinions) on the structure of the graph, and we introduce a framework (based on general properties and some specific proposals) for aggregating these perspectives, and as a result obtaining a probability distribution that best reflects these perspectives. This can be used in applications such as summarizing collections of online reviews and combining conflicting reports.

【Keywords】:

348. Least General Generalizations in Description Logic: Verification and Existence.

Paper Link】 【Pages】:2854-2861

【Authors】: Jean Christoph Jung ; Carsten Lutz ; Frank Wolter

【Abstract】: We study two forms of least general generalizations in description logic, the least common subsumer (LCS) and most specific concept (MSC). While the LCS generalizes from examples that take the form of concepts, the MSC generalizes from individuals in data. Our focus is on the complexity of existence and verification, the latter meaning to decide whether a candidate concept is the LCS or MSC. We consider cases with and without a background TBox and a target signature. Our results range from coNP-complete for LCS and MSC verification in the description logic εℒ without TBoxes to undecidability of LCS and MSC verification and existence in εℒI with TBoxes. To obtain results in the presence of a TBox, we establish a close link between the problems studied in this paper and concept learning from positive and negative examples. We also give a way to regain decidability in εℒI with TBoxes and study single example MSC as a special case.

【Keywords】:

349. Complexity and Expressive Power of Disjunction and Negation in Limit Datalog.

Paper Link】 【Pages】:2862-2869

【Authors】: Mark Kaminski ; Bernardo Cuenca Grau ; Egor V. Kostylev ; Ian Horrocks

【Abstract】: Limit Datalog is a fragment of Datalogℤ—the extension of Datalog with arithmetic functions over the integers—which has been proposed as a declarative language suitable for capturing data analysis tasks. In limit Datalog programs, all intensional predicates with a numeric argument are limit predicates that keep maximal (or minimal) bounds on numeric values. Furthermore, to ensure decidability of reasoning, limit Datalog imposes a linearity condition restricting the use of multiplication in rules. In this paper, we study the complexity and expressive power of limit Datalog programs extended with disjunction in the heads of rules and non-monotonic negation under the stable model semantics. We show that allowing for unrestricted use of negation leads to undecidability of reasoning. Decidability can be restored by stratifying the use of negation over predicates carrying numeric values. We show that the resulting language is Π2EXP -complete in combined complexity and that it captures Π2P over ordered structures in the sense of descriptive complexity.We also provide a study of several fragments of this language: we show that the complexity and expressive power of the full language are already reached for disjunction-free programs; furthermore, we show that semi-positive disjunctive programs are coNEXPcomplete and that they capture coNP.

【Keywords】:

350. Logics for Sizes with Union or Intersection.

Paper Link】 【Pages】:2870-2876

【Authors】: Caleb Kisby ; Saúl Blanco ; Alex Kruckman ; Lawrence S. Moss

【Abstract】: This paper presents the most basic logics for reasoning about the sizes of sets that admit either the union of terms or the intersection of terms. That is, our logics handle assertions All x y and AtLeast x y, where x and y are built up from basic terms by either unions or intersections. We present a sound, complete, and polynomial-time decidable proof system for these logics. An immediate consequence of our work is the completeness of the logic additionally permitting More x y. The logics considered here may be viewed as efficient fragments of two logics which appear in the literature: Boolean Algebra with Presburger Arithmetic and the Logic of Comparative Cardinality.

【Keywords】:

351. FastLAS: Scalable Inductive Logic Programming Incorporating Domain-Specific Optimisation Criteria.

Paper Link】 【Pages】:2877-2885

【Authors】: Mark Law ; Alessandra Russo ; Elisa Bertino ; Krysia Broda ; Jorge Lobo

【Abstract】: Inductive Logic Programming (ILP) systems aim to find a set of logical rules, called a hypothesis, that explain a set of examples. In cases where many such hypotheses exist, ILP systems often bias towards shorter solutions, leading to highly general rules being learned. In some application domains like security and access control policies, this bias may not be desirable, as when data is sparse more specific rules that guarantee tighter security should be preferred. This paper presents a new general notion of a scoring function over hypotheses that allows a user to express domain-specific optimisation criteria. This is incorporated into a new ILP system, called FastLAS, that takes as input a learning task and a customised scoring function, and computes an optimal solution with respect to the given scoring function. We evaluate the accuracy of FastLAS over real-world datasets for access control policies and show that varying the scoring function allows a user to target domain-specific performance metrics. We also compare FastLAS to state-of-the-art ILP systems, using the standard ILP bias for shorter solutions, and demonstrate that FastLAS is significantly faster and more scalable.

【Keywords】:

352. Automatic Verification of Liveness Properties in the Situation Calculus.

Paper Link】 【Pages】:2886-2892

【Authors】: Jian Li ; Yongmei Liu

【Abstract】: In dynamic systems, liveness properties concern whether something good will eventually happen. Examples of liveness properties are termination of programs and goal achievability. In this paper, we consider the following theorem-proving problem: given an action theory and a goal, check whether the goal is achievable in every model of the action theory. We make the assumption that there are finitely many non-number objects. We propose to use mathematical induction to address this problem: we identify a natural number feature and prove by mathematical induction that for any values of the feature, the goal is achievable. Both the basis and induction steps are verified using first-order theorem provers. We propose a simple method to identify potential features which are the number of objects satisfying a certain formula by generating small models of the action theory and calling a classical planner to achieve the goal. We also propose to regress the goal via different actions and then verify whether the resulting goals are achievable. We implemented the proposed method and experimented with the blocks world domain and a number of other domains from the literature. Experimental results showed that most goals can be verified within a reasonable amount of time.

【Keywords】:

353. Path Ranking with Attention to Type Hierarchies.

Paper Link】 【Pages】:2893-2900

【Authors】: Weiyu Liu ; Angel Andres Daruna ; Zsolt Kira ; Sonia Chernova

【Abstract】: The objective of the knowledge base completion problem is to infer missing information from existing facts in a knowledge base. Prior work has demonstrated the effectiveness of path-ranking based methods, which solve the problem by discovering observable patterns in knowledge graphs, consisting of nodes representing entities and edges representing relations. However, these patterns either lack accuracy because they rely solely on relations or cannot easily generalize due to the direct use of specific entity information. We introduce Attentive Path Ranking, a novel path pattern representation that leverages type hierarchies of entities to both avoid ambiguity and maintain generalization. Then, we present an end-to-end trained attention-based RNN model to discover the new path patterns from data. Experiments conducted on benchmark knowledge base completion datasets WN18RR and FB15k-237 demonstrate that the proposed model outperforms existing methods on the fact prediction task by statistically significant margins of 26% and 10%, respectively. Furthermore, quantitative and qualitative analyses show that the path patterns balance between generalization and discrimination.

【Keywords】:

354. K-BERT: Enabling Language Representation with Knowledge Graph.

Paper Link】 【Pages】:2901-2908

【Authors】: Weijie Liu ; Peng Zhou ; Zhe Zhao ; Zhiruo Wang ; Qi Ju ; Haotang Deng ; Ping Wang

【Abstract】: Pre-trained language representation models, such as BERT, capture a general language representation from large-scale corpora, but lack domain-specific knowledge. When reading a domain text, experts make inferences with relevant knowledge. For machines to achieve this capability, we propose a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge. However, too much knowledge incorporation may divert the sentence from its correct meaning, which is called knowledge noise (KN) issue. To overcome KN, K-BERT introduces soft-position and visible matrix to limit the impact of knowledge. K-BERT can easily inject domain knowledge into the models by being equipped with a KG without pre-training by itself because it is capable of loading model parameters from the pre-trained BERT. Our investigation reveals promising results in twelve NLP tasks. Especially in domain-specific tasks (including finance, law, and medicine), K-BERT significantly outperforms BERT, which demonstrates that K-BERT is an excellent choice for solving the knowledge-driven problems that require experts.

【Keywords】:

355. Explanations for Inconsistency-Tolerant Query Answering under Existential Rules.

Paper Link】 【Pages】:2909-2916

【Authors】: Thomas Lukasiewicz ; Enrico Malizia ; Cristian Molinaro

【Abstract】: Querying inconsistent knowledge bases is a problem that has attracted a great deal of interest over the last decades. While several semantics of query answering have been proposed, and their complexity is rather well-understood, little attention has been paid to the problem of explaining query answers. Explainability has recently become a prominent problem in different areas of AI. In particular, explaining query answers allows users to understand not only what is entailed by an inconsistent knowledge base, but also why. In this paper, we address the problem of explaining query answers for existential rules under three popular inconsistency-tolerant semantics, namely, the ABox repair, the intersection of repairs, and the intersection of closed repairs semantics. We provide a thorough complexity analysis for a wide range of existential rule languages and for different complexity measures.

【Keywords】:

356. Resilient Logic Programs: Answer Set Programs Challenged by Ontologies.

Paper Link】 【Pages】:2917-2924

【Authors】: Sanja Lukumbuzya ; Magdalena Ortiz ; Mantas Simkus

【Abstract】: We introduce resilient logic programs (RLPs) that couple a non-monotonic logic program and a first-order (FO) theory or description logic (DL) ontology. Unlike previous hybrid languages, where the interaction between the program and the theory is limited to consistency or query entailment tests, in RLPs answer sets must be ‘resilient’ to the models of the theory, allowing non-output predicates of the program to respond differently to different models. RLPs can elegantly express ∃∀∃-QBFs, disjunctive ASP, and configuration problems under incompleteness of information. RLPs are decidable when a couple of natural assumptions are made: (i) satisfiability of FO theories in the presence of closed predicates is decidable, and (ii) rules are safe in the style of the well-known DL-safeness. We further show that a large fragment of such RLPs can be translated into standard (disjunctive) ASP, for which efficient implementations exist. For RLPs with theories expressed in DLs, we use a novel relaxation of safeness that safeguards rules via predicates whose extensions can be inferred to have a finite bound. We present several complexity results for the case where ontologies are written in some standard DLs.

【Keywords】:

357. Commonsense Knowledge Base Completion with Structural and Semantic Context.

Paper Link】 【Pages】:2925-2933

【Authors】: Chaitanya Malaviya ; Chandra Bhagavatula ; Antoine Bosselut ; Yejin Choi

【Abstract】: Automatic KB completion for commonsense knowledge graphs (e.g., ATOMIC and ConceptNet) poses unique challenges compared to the much studied conventional knowledge bases (e.g., Freebase). Commonsense knowledge graphs use free-form text to represent nodes, resulting in orders of magnitude more nodes compared to conventional KBs ( ∼18x more nodes in ATOMIC compared to Freebase (FB15K-237)). Importantly, this implies significantly sparser graph structures — a major challenge for existing KB completion methods that assume densely connected graphs over a relatively smaller set of nodes.In this paper, we present novel KB completion models that can address these challenges by exploiting the structural and semantic context of nodes. Specifically, we investigate two key ideas: (1) learning from local graph structure, using graph convolutional networks and automatic graph densification and (2) transfer learning from pre-trained language models to knowledge graphs for enhanced contextual representation of knowledge. We describe our method to incorporate information from both these sources in a joint model and provide the first empirical results for KB completion on ATOMIC and evaluation with ranking metrics on ConceptNet. Our results demonstrate the effectiveness of language model representations in boosting link prediction performance and the advantages of learning from local graph structure (+1.5 points in MRR for ConceptNet) when training on subgraphs for computational efficiency. Further analysis on model predictions shines light on the types of commonsense knowledge that language models capture well.

【Keywords】:

358. Blameworthiness in Security Games.

Paper Link】 【Pages】:2934-2941

【Authors】: Pavel Naumov ; Jia Tao

【Abstract】: Security games are an example of a successful real-world application of game theory. The paper defines blameworthiness of the defender and the attacker in security games using the principle of alternative possibilities and provides a sound and complete logical system for reasoning about blameworthiness in such games. Two of the axioms of this system capture the asymmetry of information in security games.

【Keywords】:

359. Deciding Acceptance in Incomplete Argumentation Frameworks.

Paper Link】 【Pages】:2942-2949

【Authors】: Andreas Niskanen ; Daniel Neugebauer ; Matti Järvisalo ; Jörg Rothe

【Abstract】: Expressing incomplete knowledge in abstract argumentation frameworks (AFs) through incomplete AFs has recently received noticeable attention. However, algorithmic aspects of deciding acceptance in incomplete AFs are still under-developed. We address this current shortcoming by developing algorithms for NP-hard and coNP-hard variants of acceptance problems over incomplete AFs via harnessing Boolean satisfiability (SAT) solvers. Focusing on nonempty conflict-free or admissible sets and on stable extensions, we also provide new complexity results for a refined variant of skeptical acceptance in incomplete AFs, ranging from polynomial-time computability to hardness for the second level of the polynomial hierarchy. Furthermore, central to the proposed SAT-based counterexample-guided abstraction refinement approach for the second-level problem variants, we establish conditions for redundant atomic changes to incomplete AFs from the perspective of preserving extensions. We show empirically that the resulting SAT-based approach for incomplete AFs scales at least as well as existing SAT-based approaches to deciding acceptance in AFs.

【Keywords】:

360. Rule-Guided Compositional Representation Learning on Knowledge Graphs.

Paper Link】 【Pages】:2950-2958

【Authors】: Guanglin Niu ; Yongfei Zhang ; Bo Li ; Peng Cui ; Si Liu ; Jingyang Li ; Xiaowei Zhang

【Abstract】: Representation learning on a knowledge graph (KG) is to embed entities and relations of a KG into low-dimensional continuous vector spaces. Early KG embedding methods only pay attention to structured information encoded in triples, which would cause limited performance due to the structure sparseness of KGs. Some recent attempts consider paths information to expand the structure of KGs but lack explainability in the process of obtaining the path representations. In this paper, we propose a novel Rule and Path-based Joint Embedding (RPJE) scheme, which takes full advantage of the explainability and accuracy of logic rules, the generalization of KG embedding as well as the supplementary semantic structure of paths. Specifically, logic rules of different lengths (the number of relations in rule body) in the form of Horn clauses are first mined from the KG and elaborately encoded for representation learning. Then, the rules of length 2 are applied to compose paths accurately while the rules of length 1 are explicitly employed to create semantic associations among relations and constrain relation embeddings. Moreover, the confidence level of each rule is also considered in optimization to guarantee the availability of applying the rule to representation learning. Extensive experimental results illustrate that RPJE outperforms other state-of-the-art baselines on KG completion task, which also demonstrate the superiority of utilizing logic rules as well as paths for improving the accuracy and explainability of representation learning.

【Keywords】:

361. Learning Query Inseparable εℒℋ Ontologies.

Paper Link】 【Pages】:2959-2966

【Authors】: Ana Ozaki ; Cosimo Persia ; Andrea Mazzullo

【Abstract】: We investigate the complexity of learning query inseparable εℒℋ ontologies in a variant of Angluin's exact learning model. Given a fixed data instance A and a query language 𝒬, we are interested in computing an ontology ℋ that entails the same queries as a target ontology 𝒯 on A, that is, ℋ and 𝒯 are inseparable w.r.t. A and 𝒬. The learner is allowed to pose two kinds of questions. The first is ‘Does (𝒯,A)⊨ q?’, with A an arbitrary data instance and q and query in 𝒬. An oracle replies this question with ‘yes’ or ‘no’. In the second, the learner asks ‘Are ℋ and 𝒯 inseparable w.r.t. A and 𝒬?’. If so, the learning process finishes, otherwise, the learner receives (A,q) with q ∈ 𝒬, (𝒯,A) |= q and (ℋ,A) ⊭ q (or vice-versa). Then, we analyse conditions in which query inseparability is preserved if A changes. Finally, we consider the PAC learning model and a setting where the algorithms learn from a batch of classified data, limiting interactions with the oracles.

【Keywords】:

362. Graph Representations for Higher-Order Logic and Theorem Proving.

Paper Link】 【Pages】:2967-2974

【Authors】: Aditya Paliwal ; Sarah M. Loos ; Markus N. Rabe ; Kshitij Bansal ; Christian Szegedy

【Abstract】: This paper presents the first use of graph neural networks (GNNs) for higher-order proof search and demonstrates that GNNs can improve upon state-of-the-art results in this domain. Interactive, higher-order theorem provers allow for the formalization of most mathematical theories and have been shown to pose a significant challenge for deep learning. Higher-order logic is highly expressive and, even though it is well-structured with a clearly defined grammar and semantics, there still remains no well-established method to convert formulas into graph-based representations. In this paper, we consider several graphical representations of higher-order logic and evaluate them against the HOList benchmark for higher-order theorem proving.

【Keywords】:

363. Relatedness and TBox-Driven Rule Learning in Large Knowledge Bases.

Paper Link】 【Pages】:2975-2982

【Authors】: Giuseppe Pirrò

【Abstract】: We present RARL, an approach to discover rules of the form body ⇒ head in large knowledge bases (KBs) that typically include a set of terminological facts (TBox) and a set of TBox-compliant assertional facts (ABox). RARL's main intuition is to learn rules by leveraging TBox-information and the semantic relatedness between the predicate(s) in the atoms of the body and the predicate in the head. RARL uses an efficient relatedness-driven TBox traversal algorithm, which given an input rule head, generates the set of most semantically related candidate rule bodies. Then, rule confidence is computed in the ABox based on a set of positive and negative examples. Decoupling candidate generation and rule quality assessment offers greater flexibility than previous work.

【Keywords】:

364. A Framework for Measuring Information Asymmetry.

Paper Link】 【Pages】:2983-2990

【Authors】: Yakoub Salhi

【Abstract】: Information asymmetry occurs when an imbalance of knowledge exists between two parties, such as a buyer and a seller, a regulator and an operator, and an employer and an employee. It is a key concept in several domains, in particular, in economics. We propose in this work a general logic-based framework for measuring the information asymmetry between two parties. A situation of information asymmetry is represented by a knowledge base and a set of questions. We define the notion of information asymmetry measure through rationality postulates. We further introduce a syntactic concept, called minimal question subset (MQS), to take into consideration the fact that answering some questions allows avoiding others. This concept is used for defining rationality postulates and measures. Finally, we propose a method for computing the MQSes of a given situation of information asymmetry.

【Keywords】:

365. Adversarial Deep Network Embedding for Cross-Network Node Classification.

Paper Link】 【Pages】:2991-2999

【Authors】: Xiao Shen ; Quanyu Dai ; Fu-Lai Chung ; Wei Lu ; Kup-Sze Choi

【Abstract】: In this paper, the task of cross-network node classification, which leverages the abundant labeled nodes from a source network to help classify unlabeled nodes in a target network, is studied. The existing domain adaptation algorithms generally fail to model the network structural information, and the current network embedding models mainly focus on single-network applications. Thus, both of them cannot be directly applied to solve the cross-network node classification problem. This motivates us to propose an adversarial cross-network deep network embedding (ACDNE) model to integrate adversarial domain adaptation with deep network embedding so as to learn network-invariant node representations that can also well preserve the network structural information. In ACDNE, the deep network embedding module utilizes two feature extractors to jointly preserve attributed affinity and topological proximities between nodes. In addition, a node classifier is incorporated to make node representations label-discriminative. Moreover, an adversarial domain adaptation technique is employed to make node representations network-invariant. Extensive experimental results demonstrate that the proposed ACDNE model achieves the state-of-the-art performance in cross-network node classification.

【Keywords】:

Paper Link】 【Pages】:3000-3008

【Authors】: George Stoica ; Otilia Stretcu ; Emmanouil Antonios Platanios ; Tom M. Mitchell ; Barnabás Póczos

【Abstract】: We consider the task of knowledge graph link prediction. Given a question consisting of a source entity and a relation (e.g., Shakespeare and BornIn), the objective is to predict the most likely answer entity (e.g., England). Recent approaches tackle this problem by learning entity and relation embeddings. However, they often constrain the relationship between these embeddings to be additive (i.e., the embeddings are concatenated and then processed by a sequence of linear functions and element-wise non-linearities). We show that this type of interaction significantly limits representational power. For example, such models cannot handle cases where a different projection of the source entity is used for each relation. We propose to use contextual parameter generation to address this limitation. More specifically, we treat relations as the context in which source entities are processed to produce predictions, by using relation embeddings to generate the parameters of a model operating over source entity embeddings. This allows models to represent more complex interactions between entities and relations. We apply our method on two existing link prediction methods, including the current state-of-the-art, resulting in significant performance gains and establishing a new state-of-the-art for this task. These gains are achieved while also reducing convergence time by up to 28 times.

【Keywords】:

367. InteractE: Improving Convolution-Based Knowledge Graph Embeddings by Increasing Feature Interactions.

Paper Link】 【Pages】:3009-3016

【Authors】: Shikhar Vashishth ; Soumya Sanyal ; Vikram Nitin ; Nilesh Agrawal ; Partha P. Talukdar

【Abstract】: Most existing knowledge graphs suffer from incompleteness, which can be alleviated by inferring missing links based on known facts. One popular way to accomplish this is to generate low-dimensional embeddings of entities and relations, and use these to make inferences. ConvE, a recently proposed approach, applies convolutional filters on 2D reshapings of entity and relation embeddings in order to capture rich interactions between their components. However, the number of interactions that ConvE can capture is limited. In this paper, we analyze how increasing the number of these interactions affects link prediction performance, and utilize our observations to propose InteractE. InteractE is based on three key ideas – feature permutation, a novel feature reshaping, and circular convolution. Through extensive experiments, we find that InteractE outperforms state-of-the-art convolutional link prediction baselines on FB15k-237. Further, InteractE achieves an MRR score that is 9%, 7.5%, and 23% better than ConvE on the FB15k-237, WN18RR and YAGO3-10 datasets respectively. The results validate our central hypothesis – that increasing feature interaction is beneficial to link prediction performance. We make the source code of InteractE available to encourage reproducible research.

【Keywords】:

368. Query Answering with Guarded Existential Rules under Stable Model Semantics.

Paper Link】 【Pages】:3017-3024

【Authors】: Hai Wan ; Guohui Xiao ; Chenglin Wang ; Xianqiao Liu ; Junhong Chen ; Zhe Wang

【Abstract】: In this paper, we study the problem of query answering with guarded existential rules (also called GNTGDs) under stable model semantics. Our goal is to use existing answer set programming (ASP) solvers. However, ASP solvers handle only finitely-ground logic programs while the program translated from GNTGDs by Skolemization is not in general. To address this challenge, we introduce two novel notions of (1) guarded instantiation forest to describe the instantiation of GNTGDs and (2) prime block to characterize the repeated infinitely-ground program translated from GNTGDs. Using these notions, we prove that the ground termination problem for GNTGDs is decidable. We also devise an algorithm for query answering with GNTGDs using ASP solvers. We have implemented our approach in a prototype system. The evaluation over a set of benchmarks shows encouraging results.

【Keywords】:

369. COTSAE: CO-Training of Structure and Attribute Embeddings for Entity Alignment.

Paper Link】 【Pages】:3025-3032

【Authors】: Kai Yang ; Shaoqin Liu ; Junfeng Zhao ; Yasha Wang ; Bing Xie

【Abstract】: Entity alignment is a fundamental and vital task in Knowledge Graph (KG) construction and fusion. Previous works mainly focus on capturing the structural semantics of entities by learning the entity embeddings on the relational triples and pre-aligned "seed entities". Some works also seek to incorporate the attribute information to assist refining the entity embeddings. However, there are still many problems not considered, which dramatically limits the utilization of attribute information in the entity alignment. Different KGs may have lots of different attribute types, and even the same attribute may have diverse data structures and value granularities. Most importantly, attributes may have various "contributions" to the entity alignment. To solve these problems, we propose COTSAE that combines the structure and attribute information of entities by co-training two embedding learning components, respectively. We also propose a joint attention method in our model to learn the attentions of attribute types and values cooperatively. We verified our COTSAE on several datasets from real-world KGs, and the results showed that it is significantly better than the latest entity alignment methods. The structure and attribute information can complement each other and both contribute to performance improvement.

【Keywords】:

370. Ranking-Based Semantics for Sets of Attacking Arguments.

Paper Link】 【Pages】:3033-3040

【Authors】: Bruno Yun ; Srdjan Vesic ; Madalina Croitoru

【Abstract】: Argumentation is a process of evaluating and comparing sets of arguments. Ranking-based semantics received a lot of attention recently. All of the semantics introduced so far are applicable to binary attack relations. In this paper, we study a more general case when sets of arguments can jointly attack an argument. We generalise existing postulates for ranking-based semantics to fit this framework, define a general variant of h-categoriser, prove that it converges for every argumentation framework and study the postulates it satisfies. We also study the link between binary and hypergraph version of h-categoriser.

【Keywords】:

371. Few-Shot Knowledge Graph Completion.

Paper Link】 【Pages】:3041-3048

【Authors】: Chuxu Zhang ; Huaxiu Yao ; Chao Huang ; Meng Jiang ; Zhenhui Li ; Nitesh V. Chawla

【Abstract】: Knowledge graphs (KGs) serve as useful resources for various natural language processing applications. Previous KG completion approaches require a large number of training instances (i.e., head-tail entity pairs) for every relation. The real case is that for most of the relations, very few entity pairs are available. Existing work of one-shot learning limits method generalizability for few-shot scenarios and does not fully use the supervisory information; however, few-shot KG completion has not been well studied yet. In this work, we propose a novel few-shot relation learning model (FSRL) that aims at discovering facts of new relations with few-shot references. FSRL can effectively capture knowledge from heterogeneous graph structure, aggregate representations of few-shot references, and match similar entity pairs of reference set for every relation. Extensive experiments on two public datasets demonstrate that FSRL outperforms the state-of-the-art.

【Keywords】:

372. Towards Universal Languages for Tractable Ontology Mediated Query Answering.

Paper Link】 【Pages】:3049-3056

【Authors】: Heng Zhang ; Yan Zhang ; Jia-Huai You ; Zhiyong Feng ; Guifei Jiang

【Abstract】: An ontology language for ontology mediated query answering (OMQA-language) is universal for a family of OMQA-languages if it is the most expressive one among this family. In this paper, we focus on three families of tractable OMQA-languages, including first-order rewritable languages and languages whose data complexity of the query answering is in AC0 or PTIME. On the negative side, we prove that there is, in general, no universal language for each of these families of languages. On the positive side, we propose a novel property, the locality, to approximate the first-order rewritability, and show that there exists a language of disjunctive embedded dependencies that is universal for the family of OMQA-languages with locality. All of these results apply to OMQA with query languages such as conjunctive queries, unions of conjunctive queries and acyclic conjunctive queries.

【Keywords】:

373. On the Expressivity of ASK Queries in SPARQL.

Paper Link】 【Pages】:3057-3064

【Authors】: Xiaowang Zhang ; Jan Van den Bussche ; Kewen Wang ; Heng Zhang ; Xuanxing Yang ; Zhiyong Feng

【Abstract】: As a major query type in SPARQL, ASK queries are boolean queries and have found applications in several domains such as semantic SPARQL optimization. This paper is a first systematic study of the relative expressive power of various fragments of ASK queries in SPARQL. Among many new results, a surprising one is that the operator UNION is redundant for ASK queries. The results in this paper as a whole paint a rich picture for the expressivity of fragments of ASK queries with the four basic operators of SPARQL 1.0 possibly together with a negation. The work in this paper provides a guideline for future SPARQL query optimization and implementation.

【Keywords】:

Paper Link】 【Pages】:3065-3072

【Authors】: Zhanqiu Zhang ; Jianyu Cai ; Yongdong Zhang ; Jie Wang

【Abstract】: Knowledge graph embedding, which aims to represent entities and relations as low dimensional vectors (or matrices, tensors, etc.), has been shown to be a powerful technique for predicting missing links in knowledge graphs. Existing knowledge graph embedding models mainly focus on modeling relation patterns such as symmetry/antisymmetry, inversion, and composition. However, many existing approaches fail to model semantic hierarchies, which are common in real-world applications. To address this challenge, we propose a novel knowledge graph embedding model—namely, Hierarchy-Aware Knowledge Graph Embedding (HAKE)—which maps entities into the polar coordinate system. HAKE is inspired by the fact that concentric circles in the polar coordinate system can naturally reflect the hierarchy. Specifically, the radial coordinate aims to model entities at different levels of the hierarchy, and entities with smaller radii are expected to be at higher levels; the angular coordinate aims to distinguish entities at the same level of the hierarchy, and these entities are expected to have roughly the same radii but different angles. Experiments demonstrate that HAKE can effectively model the semantic hierarchies in knowledge graphs, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the link prediction task.

【Keywords】:

375. A Practical Approach to Forgetting in Description Logics with Nominals.

Paper Link】 【Pages】:3073-3079

【Authors】: Yizheng Zhao ; Renate A. Schmidt ; Yuejie Wang ; Xuanming Zhang ; Hao Feng

【Abstract】: This paper investigates the problem of forgetting in description logics with nominals. In particular, we develop a practical method for forgetting concept and role names from ontologies specified in the description logic ALCO, extending the basic ALC with nominals. The method always terminates, and is sound in the sense that the forgetting solution computed by the method has the same logical consequences with the original ontology. The method is so far the only approach to deductive forgetting in description logics with nominals. An evaluation of a prototype implementation shows that the method achieves a significant speed-up and notably better success rates than the Lethe tool which performs deductive forgetting for ALC-ontologies. Compared to Fame, a semantic forgetting tool for ALCOIH-ontologies, better success rates are attained. From the perspective of ontology engineering this is very useful, as it provides ontology curators with a powerful tool to produce views of ontologies.

【Keywords】:

376. Deciding the Loosely Guarded Fragment and Querying Its Horn Fragment Using Resolution.

Paper Link】 【Pages】:3080-3087

【Authors】: Sen Zheng ; Renate A. Schmidt

【Abstract】: We consider the following query answering problem: Given a Boolean conjunctive query and a theory in the Horn loosely guarded fragment, the aim is to determine whether the query is entailed by the theory. In this paper, we present a resolution decision procedure for the loosely guarded fragment, and use such a procedure to answer Boolean conjunctive queries against the Horn loosely guarded fragment. The Horn loosely guarded fragment subsumes classes of rules that are prevalent in ontology-based query answering, such as Horn ALCHOI and guarded existential rules. Additionally, we identify star queries and cloud queries, which using our procedure, can be answered against the loosely guarded fragment.

【Keywords】:

377. LTLƒ Synthesis with Fairness and Stability Assumptions.

Paper Link】 【Pages】:3088-3095

【Authors】: Shufang Zhu ; Giuseppe De Giacomo ; Geguang Pu ; Moshe Y. Vardi

【Abstract】: In synthesis, assumptions are constraints on the environment that rule out certain environment behaviors. A key observation here is that even if we consider systems with LTLƒ goals on finite traces, environment assumptions need to be expressed over infinite traces, since accomplishing the agent goals may require an unbounded number of environment action. To solve synthesis with respect to finite-trace LTLƒ goals under infinite-trace assumptions, we could reduce the problem to LTL synthesis. Unfortunately, while synthesis in LTLƒ and in LTL have the same worst-case complexity (both 2EXPTIME-complete), the algorithms available for LTL synthesis are much more difficult in practice than those for LTLƒ synthesis. In this work we show that in interesting cases we can avoid such a detour to LTL synthesis and keep the simplicity of LTLƒ synthesis. Specifically, we develop a BDD-based fixpoint-based technique for handling basic forms of fairness and of stability assumptions. We show, empirically, that this technique performs much better than standard LTL synthesis.

【Keywords】:

AAAI Technical Track: Machine Learning 483

378. Learning to Reason: Leveraging Neural Networks for Approximate DNF Counting.

Paper Link】 【Pages】:3097-3104

【Authors】: Ralph Abboud ; Ismail Ilkan Ceylan ; Thomas Lukasiewicz

【Abstract】: Weighted model counting (WMC) has emerged as a prevalent approach for probabilistic inference. In its most general form, WMC is #P-hard. Weighted DNF counting (weighted #DNF) is a special case, where approximations with probabilistic guarantees are obtained in O(nm), where n denotes the number of variables, and m the number of clauses of the input DNF, but this is not scalable in practice. In this paper, we propose a neural model counting approach for weighted #DNF that combines approximate model counting with deep learning, and accurately approximates model counts in linear time when width is bounded. We conduct experiments to validate our method, and show that our model learns and generalizes very well to large-scale #DNF instances.

【Keywords】:

379. Quantized Compressive Sampling of Stochastic Gradients for Efficient Communication in Distributed Deep Learning.

Paper Link】 【Pages】:3105-3112

【Authors】: Afshin Abdi ; Faramarz Fekri

【Abstract】: In distributed training of deep models, the transmission volume of stochastic gradients (SG) imposes a bottleneck in scaling up the number of processing nodes. On the other hand, the existing methods for compression of SGs have two major drawbacks. First, due to the increase in the overall variance of the compressed SG, the hyperparameters of the learning algorithm must be readjusted to ensure the convergence of the training. Further, the convergence rate of the resulting algorithm still would be adversely affected. Second, for those approaches for which the compressed SG values are biased, there is no guarantee for the learning convergence and thus an error feedback is often required. We propose Quantized Compressive Sampling (QCS) of SG that addresses the above two issues while achieving an arbitrarily large compression gain. We introduce two variants of the algorithm: Unbiased-QCS and MMSE-QCS and show their superior performance w.r.t. other approaches. Specifically, we show that for the same number of communication bits, the convergence rate is improved by a factor of 2 relative to state of the art. Next, we propose to improve the convergence rate of the distributed training algorithm via a weighted error feedback. Specifically, we develop and analyze a method to both control the overall variance of the compressed SG and prevent the staleness of the updates. Finally, through simulations, we validate our theoretical results and establish the superior performance of the proposed SG compression in the distributed training of deep models. Our simulations also demonstrate that our proposed compression method expands substantially the region of step-size values for which the learning algorithm converges.

【Keywords】:

380. Indirect Stochastic Gradient Quantization and Its Application in Distributed Deep Learning.

Paper Link】 【Pages】:3113-3120

【Authors】: Afshin Abdi ; Faramarz Fekri

【Abstract】: Transmitting the gradients or model parameters is a critical bottleneck in distributed training of large models. To mitigate this issue, we propose an indirect quantization and compression of stochastic gradients (SG) via factorization. The gist of the idea is that, in contrast to the direct compression methods, we focus on the factors in SGs, i.e., the forward and backward signals in the backpropagation algorithm. We observe that these factors are correlated and generally sparse in most deep models. This gives rise to rethinking of the approaches for quantization and compression of gradients with the ultimate goal of minimizing the error in the final computed gradients subject to the desired communication constraints. We have proposed and theoretically analyzed different indirect SG quantization (ISGQ) methods. The proposed ISGQ reduces the reconstruction error in SGs compared to the direct quantization methods with the same number of quantization bits. Moreover, it can achieve compression gains of more than 100, while the existing traditional quantization schemes can achieve compression ratio of at most 32 (quantizing to 1 bit). Further, for a fixed total batch-size, the required transmission bit-rate per worker decreases in ISGQ as the number of workers increases.

【Keywords】:

381. Image-Adaptive GAN Based Reconstruction.

Paper Link】 【Pages】:3121-3129

【Authors】: Shady Abu Hussein ; Tom Tirer ; Raja Giryes

【Abstract】: In the recent years, there has been a significant improvement in the quality of samples produced by (deep) generative models such as variational auto-encoders and generative adversarial networks. However, the representation capabilities of these methods still do not capture the full distribution for complex classes of images, such as human faces. This deficiency has been clearly observed in previous works that use pre-trained generative models to solve imaging inverse problems. In this paper, we suggest to mitigate the limited representation capabilities of generators by making them image-adaptive and enforcing compliance of the restoration with the observations via back-projections. We empirically demonstrate the advantages of our proposed approach for image super-resolution and compressed sensing.

【Keywords】:

382. DeGAN: Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier.

Paper Link】 【Pages】:3130-3137

【Authors】: Sravanti Addepalli ; Gaurav Kumar Nayak ; Anirban Chakraborty ; Venkatesh Babu Radhakrishnan

【Abstract】: In this era of digital information explosion, an abundance of data from numerous modalities is being generated as well as archived everyday. However, most problems associated with training Deep Neural Networks still revolve around lack of data that is rich enough for a given task. Data is required not only for training an initial model, but also for future learning tasks such as Model Compression and Incremental Learning. A diverse dataset may be used for training an initial model, but it may not be feasible to store it throughout the product life cycle due to data privacy issues or memory constraints. We propose to bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a given trained network. We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples from a trained classifier, using a novel Data-enriching GAN (DeGAN) framework. We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance for the tasks of Data-free Knowledge Distillation and Incremental Learning on benchmark datasets. We further demonstrate that our proposed framework can enrich any data, even from unrelated domains, to make it more useful for the future learning tasks of a given network.

【Keywords】:

383. Bounds and Complexity Results for Learning Coalition-Based Interaction Functions in Networked Social Systems.

Paper Link】 【Pages】:3138-3145

【Authors】: Abhijin Adiga ; Chris J. Kuhlman ; Madhav Marathe ; S. S. Ravi ; Daniel J. Rosenkrantz ; Richard Edwin Stearns ; Anil Vullikanti

【Abstract】: Using a discrete dynamical system model for a networked social system, we consider the problem of learning a class of local interaction functions in such networks. Our focus is on learning local functions which are based on pairwise disjoint coalitions formed from the neighborhood of each node. Our work considers both active query and PAC learning models. We establish bounds on the number of queries needed to learn the local functions under both models. We also establish a complexity result regarding efficient consistent learners for such functions. Our experimental results on synthetic and real social networks demonstrate how the number of queries depends on the structure of the underlying network and number of coalitions.

【Keywords】:

384. Learning Optimal Decision Trees Using Caching Branch-and-Bound Search.

Paper Link】 【Pages】:3146-3153

【Authors】: Gaël Aglin ; Siegfried Nijssen ; Pierre Schaus

【Abstract】: Several recent publications have studied the use of Mixed Integer Programming (MIP) for finding an optimal decision tree, that is, the best decision tree under formal requirements on accuracy, fairness or interpretability of the predictive model. These publications used MIP to deal with the hard computational challenge of finding such trees. In this paper, we introduce a new efficient algorithm, DL8.5, for finding optimal decision trees, based on the use of itemset mining techniques. We show that this new approach outperforms earlier approaches with several orders of magnitude, for both numerical and discrete data, and is generic as well. The key idea underlying this new approach is the use of a cache of itemsets in combination with branch-and-bound search; this new type of cache also stores results for parts of the search space that have been traversed partially.

【Keywords】:

385. Detecting Semantic Anomalies.

Paper Link】 【Pages】:3154-3162

【Authors】: Faruk Ahmed ; Aaron C. Courville

【Abstract】: We critically appraise the recent interest in out-of-distribution (OOD) detection and question the practical relevance of existing benchmarks. While the currently prevalent trend is to consider different datasets as OOD, we argue that out-distributions of practical interest are ones where the distinction is semantic in nature for a specified context, and that evaluative tasks should reflect this more closely. Assuming a context of object recognition, we recommend a set of benchmarks, motivated by practical applications. We make progress on these benchmarks by exploring a multi-task learning based approach, showing that auxiliary objectives for improved semantic awareness result in improved semantic anomaly detection, with accompanying generalization benefits.

【Keywords】:

386. Exact and Efficient Inference for Collective Flow Diffusion Model via Minimum Convex Cost Flow Algorithm.

Paper Link】 【Pages】:3163-3170

【Authors】: Yasunori Akagi ; Takuya Nishimura ; Yusuke Tanaka ; Takeshi Kurashima ; Hiroyuki Toda

【Abstract】: Collective Flow Diffusion Model (CFDM) is a general framework to find the hidden movements underlying aggregated population data. The key procedure in CFDM analysis is MAP inference of hidden variables. Unfortunately, existing approaches fail to offer exact MAP inferences, only approximate versions, and take a lot of computation time when applied to large scale problems. In this paper, we propose an exact and efficient method for MAP inference in CFDM. Our key idea is formulating the MAP inference problem as a combinatorial optimization problem called Minimum Convex Cost Flow Problem (C-MCFP) with no approximation or continuous relaxation. On the basis of this formulation, we propose an efficient inference method that employs the C-MCFP algorithm as a subroutine. Our experiments on synthetic and real datasets show that the proposed method is effective both in single MAP inference and people flow estimation with EM algorithm.

【Keywords】:

387. Pursuit of Low-Rank Models of Time-Varying Matrices Robust to Sparse and Measurement Noise.

Paper Link】 【Pages】:3171-3178

【Authors】: Albert Akhriev ; Jakub Marecek ; Andrea Simonetto

【Abstract】: In tracking of time-varying low-rank models of time-varying matrices, we present a method robust to both uniformly-distributed measurement noise and arbitrarily-distributed “sparse” noise. In theory, we bound the tracking error. In practice, our use of randomised coordinate descent is scalable and allows for encouraging results on changedetection.net, a benchmark.

【Keywords】:

388. An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint.

Paper Link】 【Pages】:3179-3186

【Authors】: Ehsan Amid ; Manfred K. Warmuth

【Abstract】: We shed new insights on the two commonly used updates for the online k-PCA problem, namely, Krasulina's and Oja's updates. We show that Krasulina's update corresponds to a projected gradient descent step on the Stiefel manifold of orthonormal k-frames, while Oja's update amounts to a gradient descent step using the unprojected gradient. Following these observations, we derive a more implicit form of Krasulina's k-PCA update, i.e. a version that uses the information of the future gradient as much as possible. Most interestingly, our implicit Krasulina update avoids the costly QR-decomposition step by bypassing the orthonormality constraint. A related update, called the Sanger's rule, can be seen as an explicit approximation of our implicit update. We show that the new update in fact corresponds to an online EM step applied to a probabilistic k-PCA model. The probabilistic view of the update allows us to combine multiple models in a distributed setting. We show experimentally that the implicit Krasulina update yields superior convergence while being significantly faster. We also give strong evidence that the new update can benefit from parallelism and is more stable w.r.t. tuning of the learning rate.

【Keywords】:

389. Kriging Convolutional Networks.

Paper Link】 【Pages】:3187-3194

【Authors】: Gabriel Appleby ; Linfeng Liu ; Liping Liu

【Abstract】: Spatial interpolation is a class of estimation problems where locations with known values are used to estimate values at other locations, with an emphasis on harnessing spatial locality and trends. Traditional kriging methods have strong Gaussian assumptions, and as a result, often fail to capture complexities within the data. Inspired by the recent progress of graph neural networks, we introduce Kriging Convolutional Networks (KCN), a method of combining advantages of Graph Neural Networks (GNN) and kriging. Compared to standard GNNs, KCNs make direct use of neighboring observations when generating predictions. KCNs also contain the kriging method as a specific configuration. Empirically, we show that this model outperforms GNNs and kriging in several applications.

【Keywords】:

390. Efficient Inference of Optimal Decision Trees.

Paper Link】 【Pages】:3195-3202

【Authors】: Florent Avellaneda

【Abstract】: Inferring a decision tree from a given dataset is a classic problem in machine learning. This problem consists of building, from a labelled dataset, a tree where each node corresponds to a class and a path between the tree root and a leaf corresponds to a conjunction of features to be satisfied in this class. Following the principle of parsimony, we want to infer a minimal tree consistent with the dataset. Unfortunately, inferring an optimal decision tree is NP-complete for several definitions of optimality. For this reason, the majority of existing approaches rely on heuristics, and the few existing exact approaches do not work on large datasets. In this paper, we propose a novel approach for inferring an optimal decision tree with a minimum depth based on the incremental generation of Boolean formulas. The experimental results indicate that it scales sufficiently well and the time it takes to run grows slowly with the size of datasets.

【Keywords】:

391. Few Shot Network Compression via Cross Distillation.

Paper Link】 【Pages】:3203-3210

【Authors】: Haoli Bai ; Jiaxiang Wu ; Irwin King ; Michael R. Lyu

【Abstract】: Model compression has been widely adopted to obtain light-weighted deep neural networks. Most prevalent methods, however, require fine-tuning with sufficient training data to ensure accuracy, which could be challenged by privacy and security issues. As a compromise between privacy and performance, in this paper we investigate few shot network compression: given few samples per class, how can we effectively compress the network with negligible performance drop? The core challenge of few shot network compression lies in high estimation errors from the original network during inference, since the compressed network can easily over-fits on the few training instances. The estimation errors could propagate and accumulate layer-wisely and finally deteriorate the network output. To address the problem, we propose cross distillation, a novel layer-wise knowledge distillation approach. By interweaving hidden layers of teacher and student network, layer-wisely accumulated estimation errors can be effectively reduced. The proposed method offers a general framework compatible with prevalent network compression techniques such as pruning. Extensive experiments n benchmark datasets demonstrate that cross distillation can significantly improve the student network's accuracy when only a few training instances are available.

【Keywords】:

392. A Three-Level Optimization Model for Nonlinearly Separable Clustering.

Paper Link】 【Pages】:3211-3218

【Authors】: Liang Bai ; Jiye Liang

【Abstract】: Due to the complex structure of the real-world data, nonlinearly separable clustering is one of popular and widely studied clustering problems. Currently, various types of algorithms, such as kernel k-means, spectral clustering and density clustering, have been developed to solve this problem. However, it is difficult for them to balance the efficiency and effectiveness of clustering, which limits their real applications. To get rid of the deficiency, we propose a three-level optimization model for nonlinearly separable clustering which divides the clustering problem into three sub-problems: a linearly separable clustering on the object set, a nonlinearly separable clustering on the cluster set and an ensemble clustering on the partition set. An iterative algorithm is proposed to solve the optimization problem. The proposed algorithm can use low computational cost to effectively recognize nonlinearly separable clusters. The performance of this algorithm has been studied on synthetical and real data sets. Comparisons with other nonlinearly separable clustering algorithms illustrate the efficiency and effectiveness of the proposed algorithm.

【Keywords】:

393. Learning-Based Efficient Graph Similarity Computation via Multi-Scale Convolutional Set Matching.

Paper Link】 【Pages】:3219-3226

【Authors】: Yunsheng Bai ; Hao Ding ; Ken Gu ; Yizhou Sun ; Wei Wang

【Abstract】: Graph similarity computation is one of the core operations in many graph-based applications, such as graph similarity search, graph database analysis, graph clustering, etc. Since computing the exact distance/similarity between two graphs is typically NP-hard, a series of approximate methods have been proposed with a trade-off between accuracy and speed. Recently, several data-driven approaches based on neural networks have been proposed, most of which model the graph-graph similarity as the inner product of their graph-level representations, with different techniques proposed for generating one embedding per graph. However, using one fixed-dimensional embedding per graph may fail to fully capture graphs in varying sizes and link structures—a limitation that is especially problematic for the task of graph similarity computation, where the goal is to find the fine-grained difference between two graphs. In this paper, we address the problem of graph similarity computation from another perspective, by directly matching two sets of node embeddings without the need to use fixed-dimensional vectors to represent whole graphs for their similarity computation. The model, Graph-Sim, achieves the state-of-the-art performance on four real-world graph datasets under six out of eight settings (here we count a specific dataset and metric combination as one setting), compared to existing popular methods for approximate Graph Edit Distance (GED) and Maximum Common Subgraph (MCS) computation.

【Keywords】:

394. Learning to Optimize Computational Resources: Frugal Training with Generalization Guarantees.

Paper Link】 【Pages】:3227-3234

【Authors】: Maria-Florina Balcan ; Tuomas Sandholm ; Ellen Vitercik

【Abstract】: Algorithms typically come with tunable parameters that have a considerable impact on the computational resources they consume. Too often, practitioners must hand-tune the parameters, a tedious and error-prone task. A recent line of research provides algorithms that return nearly-optimal parameters from within a finite set. These algorithms can be used when the parameter space is infinite by providing as input a random sample of parameters. This data-independent discretization, however, might miss pockets of nearly-optimal parameters: prior research has presented scenarios where the only viable parameters lie within an arbitrarily small region. We provide an algorithm that learns a finite set of promising parameters from within an infinite set. Our algorithm can help compile a configuration portfolio, or it can be used to select the input to a configuration algorithm for finite parameter spaces. Our approach applies to any configuration problem that satisfies a simple yet ubiquitous structure: the algorithm's performance is a piecewise constant function of its parameters. Prior research has exhibited this structure in domains from integer programming to clustering.

【Keywords】:

395. Scalable Attentive Sentence Pair Modeling via Distilled Sentence Embedding.

Paper Link】 【Pages】:3235-3242

【Authors】: Oren Barkan ; Noam Razin ; Itzik Malkiel ; Ori Katz ; Avi Caciularu ; Noam Koenigstein

【Abstract】: Recent state-of-the-art natural language understanding models, such as BERT and XLNet, score a pair of sentences (A and B) using multiple cross-attention operations – a process in which each word in sentence A attends to all words in sentence B and vice versa. As a result, computing the similarity between a query sentence and a set of candidate sentences, requires the propagation of all query-candidate sentence-pairs throughout a stack of cross-attention layers. This exhaustive process becomes computationally prohibitive when the number of candidate sentences is large. In contrast, sentence embedding techniques learn a sentence-to-vector mapping and compute the similarity between the sentence vectors via simple elementary operations. In this paper, we introduce Distilled Sentence Embedding (DSE) – a model that is based on knowledge distillation from cross-attentive models, focusing on sentence-pair tasks. The outline of DSE is as follows: Given a cross-attentive teacher model (e.g. a fine-tuned BERT), we train a sentence embedding based student model to reconstruct the sentence-pair scores obtained by the teacher model. We empirically demonstrate the effectiveness of DSE on five GLUE sentence-pair tasks. DSE significantly outperforms several ELMO variants and other sentence embedding methods, while accelerating computation of the query-candidate sentence-pairs similarities by several orders of magnitude, with an average relative degradation of 4.6% compared to BERT. Furthermore, we show that DSE produces sentence embeddings that reach state-of-the-art performance on universal sentence representation benchmarks. Our code is made publicly available at https://github.com/microsoft/Distilled-Sentence-Embedding.

【Keywords】:

396. Midas: Microcluster-Based Detector of Anomalies in Edge Streams.

Paper Link】 【Pages】:3242-3249

【Authors】: Siddharth Bhatia ; Bryan Hooi ; Minji Yoon ; Kijung Shin ; Christos Faloutsos

【Abstract】: Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose Midas, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. Midas has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data 108–505 times faster than state-of-the-art approaches; (c) it provides 46%-52% higher accuracy (in terms of AUC) than state-of-the-art approaches.

【Keywords】:

397. Exploratory Combinatorial Optimization with Reinforcement Learning.

Paper Link】 【Pages】:3243-3250

【Authors】: Thomas D. Barrett ; William R. Clements ; Jakob N. Foerster ; Alex Lvovsky

【Abstract】: Many real-world problems can be reduced to combinatorial optimization on a graph, where the subset or ordering of vertices that maximize some objective function must be found. With such tasks often NP-hard and analytically intractable, reinforcement learning (RL) has shown promise as a framework with which efficient heuristic methods to tackle these problems can be learned. Previous works construct the solution subset incrementally, adding one element at a time, however, the irreversible nature of this approach prevents the agent from revising its earlier decisions, which may be necessary given the complexity of the optimization task. We instead propose that the agent should seek to continuously improve the solution by learning to explore at test time. Our approach of exploratory combinatorial optimization (ECO-DQN) is, in principle, applicable to any combinatorial problem that can be defined on a graph. Experimentally, we show our method to produce state-of-the-art RL performance on the Maximum Cut problem. Moreover, because ECO-DQN can start from any arbitrary configuration, it can be combined with other search methods to further improve performance, which we demonstrate using a simple random search.

【Keywords】:

398. Event-Driven Continuous Time Bayesian Networks.

Paper Link】 【Pages】:3259-3266

【Authors】: Debarun Bhattacharjya ; Karthikeyan Shanmugam ; Tian Gao ; Nicholas Mattei ; Kush R. Varshney ; Dharmashankar Subramanian

【Abstract】: We introduce a novel event-driven continuous time Bayesian network (ECTBN) representation to model situations where a system's state variables could be influenced by occurrences of events of various types. In this way, the model parameters and graphical structure capture not only potential “causal” dynamics of system evolution but also the influence of event occurrences that may be interventions. We propose a greedy search procedure for structure learning based on the BIC score for a special class of ECTBNs, showing that it is asymptotically consistent and also effective for limited data. We demonstrate the power of the representation by applying it to model paths out of poverty for clients of CityLink Center, an integrated social service provider in Cincinnati, USA. Here the ECTBN formulation captures the effect of classes/counseling sessions on an individual's life outcome areas such as education, transportation, employment and financial education.

【Keywords】:

399. An Efficient Evolutionary Algorithm for Subset Selection with General Cost Constraints.

Paper Link】 【Pages】:3267-3274

【Authors】: Chao Bian ; Chao Feng ; Chao Qian ; Yang Yu

【Abstract】: In this paper, we study the problem of selecting a subset from a ground set to maximize a monotone objective function f such that a monotone cost function c is bounded by an upper limit. State-of-the-art algorithms include the generalized greedy algorithm and POMC. The former is an efficient fixed time algorithm, but the performance is limited by the greedy nature. The latter is an anytime algorithm that can find better subsets using more time, but without any polynomial-time approximation guarantee. In this paper, we propose a new anytime algorithm EAMC, which employs a simple evolutionary algorithm to optimize a surrogate objective integrating f and c. We prove that EAMC achieves the best known approximation guarantee in polynomial expected running time. Experimental results on the applications of maximum coverage, influence maximization and sensor placement show the excellent performance of EAMC.

【Keywords】:

400. A Stochastic Derivative-Free Optimization Method with Importance Sampling: Theory and Learning to Control.

Paper Link】 【Pages】:3275-3282

【Authors】: Adel Bibi ; El Houcine Bergou ; Ozan Sener ; Bernard Ghanem ; Peter Richtárik

【Abstract】: We consider the problem of unconstrained minimization of a smooth objective function in ℝn in a setting where only function evaluations are possible. While importance sampling is one of the most popular techniques used by machine learning practitioners to accelerate the convergence of their models when applicable, there is not much existing theory for this acceleration in the derivative-free setting. In this paper, we propose the first derivative free optimization method with importance sampling and derive new improved complexity results on non-convex, convex and strongly convex functions. We conduct extensive experiments on various synthetic and real LIBSVM datasets confirming our theoretical results. We test our method on a collection of continuous control tasks on MuJoCo environments with varying difficulty. Experiments show that our algorithm is practical for high dimensional continuous control problems where importance sampling results in a significant sample complexity improvement.

【Keywords】:

401. Proximal Distilled Evolutionary Reinforcement Learning.

Paper Link】 【Pages】:3283-3290

【Authors】: Cristian Bodnar ; Ben Day ; Pietro Lió

【Abstract】: Reinforcement Learning (RL) has achieved impressive performance in many complex environments due to the integration with Deep Neural Networks (DNNs). At the same time, Genetic Algorithms (GAs), often seen as a competing approach to RL, had limited success in scaling up to the DNNs required to solve challenging tasks. Contrary to this dichotomic view, in the physical world, evolution and learning are complementary processes that continuously interact. The recently proposed Evolutionary Reinforcement Learning (ERL) framework has demonstrated mutual benefits to performance when combining the two methods. However, ERL has not fully addressed the scalability problem of GAs. In this paper, we show that this problem is rooted in an unfortunate combination of a simple genetic encoding for DNNs and the use of traditional biologically-inspired variation operators. When applied to these encodings, the standard operators are destructive and cause catastrophic forgetting of the traits the networks acquired. We propose a novel algorithm called Proximal Distilled Evolutionary Reinforcement Learning (PDERL) that is characterised by a hierarchical integration between evolution and learning. The main innovation of PDERL is the use of learning-based variation operators that compensate for the simplicity of the genetic representation. Unlike traditional operators, our proposals meet the functional requirements of variation operators when applied on directly-encoded DNNs. We evaluate PDERL in five robot locomotion settings from the OpenAI gym. Our method outperforms ERL, as well as two state-of-the-art RL algorithms, PPO and TD3, in all tested environments.

【Keywords】:

402. Efficient Verification of ReLU-Based Neural Networks via Dependency Analysis.

Paper Link】 【Pages】:3291-3299

【Authors】: Elena Botoeva ; Panagiotis Kouvaros ; Jan Kronqvist ; Alessio Lomuscio ; Ruth Misener

【Abstract】: We introduce an efficient method for the verification of ReLU-based feed-forward neural networks. We derive an automated procedure that exploits dependency relations between the ReLU nodes, thereby pruning the search tree that needs to be considered by MILP-based formulations of the verification problem. We augment the resulting algorithm with methods for input domain splitting and symbolic interval propagation. We present Venus, the resulting verification toolkit, and evaluate it on the ACAS collision avoidance networks and models trained on the MNIST and CIFAR-10 datasets. The experimental results obtained indicate considerable gains over the present state-of-the-art tools.

【Keywords】:

403. Information-Theoretic Understanding of Population Risk Improvement with Model Compression.

Paper Link】 【Pages】:3300-3307

【Authors】: Yuheng Bu ; Weihao Gao ; Shaofeng Zou ; Venugopal V. Veeravalli

【Abstract】: We show that model compression can improve the population risk of a pre-trained model, by studying the tradeoff between the decrease in the generalization error and the increase in the empirical risk with model compression. We first prove that model compression reduces an information-theoretic bound on the generalization error; this allows for an interpretation of model compression as a regularization technique to avoid overfitting. We then characterize the increase in empirical risk with model compression using rate distortion theory. These results imply that the population risk could be improved by model compression if the decrease in generalization error exceeds the increase in empirical risk. We show through a linear regression example that such a decrease in population risk due to model compression is indeed possible. Our theoretical results further suggest that the Hessian-weighted K-means clustering compression approach can be improved by regularizing the distance between the clustering centers. We provide experiments with neural networks to support our theoretical assertions.

【Keywords】:

Paper Link】 【Pages】:3308-3315

【Authors】: Lei Cai ; Shuiwang Ji

【Abstract】: Deep models can be made scale-invariant when trained with multi-scale information. Images can be easily made multi-scale, given their grid-like structures. Extending this to generic graphs poses major challenges. For example, in link prediction tasks, inputs are represented as graphs consisting of nodes and edges. Currently, the state-of-the-art model for link prediction uses supervised heuristic learning, which learns graph structure features centered on two target nodes. It then learns graph neural networks to predict the existence of links based on graph structure features. Thus, the performance of link prediction models highly depends on graph structure features. In this work, we propose a novel node aggregation method that can transform the enclosing subgraph into different scales and preserve the relationship between two target nodes for link prediction. A theory for analyzing the information loss during the re-scaling procedure is also provided. Graphs in different scales can provide scale-invariant information, which enables graph neural networks to learn invariant features and improve link prediction performance. Our experimental results on 14 datasets from different areas demonstrate that our proposed method outperforms the state-of-the-art methods by employing multi-scale graphs without additional parameters.

【Keywords】:

405. Deterministic Value-Policy Gradients.

Paper Link】 【Pages】:3316-3323

【Authors】: Qingpeng Cai ; Ling Pan ; Pingzhong Tang

【Abstract】: Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.

【Keywords】:

406. Predicting Propositional Satisfiability via End-to-End Learning.

Paper Link】 【Pages】:3324-3331

【Authors】: Chris Cameron ; Rex Chen ; Jason S. Hartford ; Kevin Leyton-Brown

【Abstract】: Strangely enough, it is possible to use machine learning models to predict the satisfiability status of hard SAT problems with accuracy considerably higher than random guessing. Existing methods have relied on extensive, manual feature engineering and computationally complex features (e.g., based on linear programming relaxations). We show for the first time that even better performance can be achieved by end-to-end learning methods — i.e., models that map directly from raw problem inputs to predictions and take only linear time to evaluate. Our work leverages deep network models which capture a key invariance exhibited by SAT problems: satisfiability status is unaffected by reordering variables and clauses. We showed that end-to-end learning with deep networks can outperform previous work on random 3-SAT problems at the solubility phase transition, where: (1) exactly 50% of problems are satisfiable; and (2) empirical runtimes of known solution methods scale exponentially with problem size (e.g., we achieved 84% prediction accuracy on 600-variable problems, which take hours to solve with state-of-the-art methods). We also showed that deep networks can generalize across problem sizes (e.g., a network trained only on 100-variable problems, which typically take about 10 ms to solve, achieved 81% accuracy on 600-variable problems).

【Keywords】:

407. Active Ordinal Querying for Tuplewise Similarity Learning.

Paper Link】 【Pages】:3332-3340

【Authors】: Gregory Canal ; Stefano Fenu ; Christopher Rozell

【Abstract】: Many machine learning tasks such as clustering, classification, and dataset search benefit from embedding data points in a space where distances reflect notions of relative similarity as perceived by humans. A common way to construct such an embedding is to request triplet similarity queries to an oracle, comparing two objects with respect to a reference. This work generalizes triplet queries to tuple queries of arbitrary size that ask an oracle to rank multiple objects against a reference, and introduces an efficient and robust adaptive selection method called InfoTuple that uses a novel approach to mutual information maximization. We show that the performance of InfoTuple at various tuple sizes exceeds that of the state-of-the-art adaptive triplet selection method on synthetic tests and new human response datasets, and empirically demonstrate the significant gains in efficiency and query consistency achieved by querying larger tuples instead of triplets.

【Keywords】:

408. Fatigue-Aware Bandits for Dependent Click Models.

Paper Link】 【Pages】:3341-3348

【Authors】: Junyu Cao ; Wei Sun ; Zuo-Jun Max Shen ; Markus Ettl

【Abstract】: As recommender systems send a massive amount of content to keep users engaged, users may experience fatigue which is contributed by 1) an overexposure to irrelevant content, 2) boredom from seeing too many similar recommendations. To address this problem, we consider an online learning setting where a platform learns a policy to recommend content that takes user fatigue into account. We propose an extension of the Dependent Click Model (DCM) to describe users' behavior. We stipulate that for each piece of content, its attractiveness to a user depends on its intrinsic relevance and a discount factor which measures how many similar contents have been shown. Users view the recommended content sequentially and click on the ones that they find attractive. Users may leave the platform at any time, and the probability of exiting is higher when they do not like the content. Based on user's feedback, the platform learns the relevance of the underlying content as well as the discounting effect due to content fatigue. We refer to this learning task as “fatigue-aware DCM Bandit” problem. We consider two learning scenarios depending on whether the discounting effect is known. For each scenario, we propose a learning algorithm which simultaneously explores and exploits, and characterize its regret bound.

【Keywords】:

409. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks.

Paper Link】 【Pages】:3349-3356

【Authors】: Yuan Cao ; Quanquan Gu

【Abstract】: Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very good generalization performance in the over-parameterization regime, where DNNs can easily fit a random labeling of the training data. Very recently, a line of work explains in theory that with over-parameterization and proper random initialization, gradient-based methods can find the global minima of the training loss for DNNs. However, existing generalization error bounds are unable to explain the good generalization performance of over-parameterized DNNs. The major limitation of most existing generalization bounds is that they are based on uniform convergence and are independent of the training algorithm. In this work, we derive an algorithm-dependent generalization error bound for deep ReLU networks, and show that under certain assumptions on the data distribution, gradient descent (GD) with proper random initialization is able to train a sufficiently over-parameterized DNN to achieve arbitrarily small generalization error. Our work sheds light on explaining the good generalization performance of over-parameterized deep neural networks.

【Keywords】:

410. Exponential Family Graph Embeddings.

Paper Link】 【Pages】:3357-3364

【Authors】: Abdulkadir Çelikkanat ; Fragkiskos D. Malliaros

【Abstract】: Representing networks in a low dimensional latent space is a crucial task with many interesting applications in graph learning problems, such as link prediction and node classification. A widely applied network representation learning paradigm is based on the combination of random walks for sampling context nodes and the traditional Skip-Gram model to capture center-context node relationships. In this paper, we emphasize on exponential family distributions to capture rich interaction patterns between nodes in random walk sequences. We introduce the generic exponential family graph embedding model, that generalizes random walk-based network representation learning techniques to exponential family conditional distributions. We study three particular instances of this model, analyzing their properties and showing their relationship to existing unsupervised learning models. Our experimental evaluation on real-world datasets demonstrates that the proposed techniques outperform well-known baseline methods in two downstream machine learning tasks.

【Keywords】:

411. Asking the Right Questions to the Right Users: Active Learning with Imperfect Oracles.

Paper Link】 【Pages】:3365-3372

【Authors】: Shayok Chakraborty

【Abstract】: Active learning algorithms automatically identify the salient and exemplar samples from large amounts of unlabeled data and tremendously reduce human annotation effort in inducing a machine learning model. In a traditional active learning setup, the labeling oracles are assumed to be infallible, that is, they always provide correct answers (in terms of class labels) to the queried unlabeled instances. However, in real-world applications, oracles are often imperfect and provide incorrect label annotations. Oracles also have diverse expertise and while they may be noisy, certain oracles may provide accurate annotations to certain specific instances. In this paper, we propose a novel framework to address the challenging problem of active learning in the presence of multiple imperfect oracles. We pose the optimal sample and oracle selection as a constrained optimization problem and derive a linear programming relaxation to select a batch of (sample-oracle) pairs, which can potentially augment maximal information to the underlying classification model. Our extensive empirical studies on 9 challenging datasets (from a variety of application domains) corroborate the usefulness of our framework over competing baselines.

【Keywords】:

412. Lifelong Learning with a Changing Action Set.

Paper Link】 【Pages】:3373-3380

【Authors】: Yash Chandak ; Georgios Theocharous ; Chris Nota ; Philip S. Thomas

【Abstract】: In many real-world sequential decision making problems, the number of available actions (decisions) can vary over time. While problems like catastrophic forgetting, changing transition dynamics, changing rewards functions, etc. have been well-studied in the lifelong learning literature, the setting where the size of the action set changes remains unaddressed. In this paper, we present first steps towards developing an algorithm that autonomously adapts to an action set whose size changes over time. To tackle this open problem, we break it into two problems that can be solved iteratively: inferring the underlying, unknown, structure in the space of actions and optimizing a policy that leverages this structure. We demonstrate the efficiency of this approach on large-scale real-world lifelong learning problems.

【Keywords】:

413. Reinforcement Learning When All Actions Are Not Always Available.

Paper Link】 【Pages】:3381-3388

【Authors】: Yash Chandak ; Georgios Theocharous ; Blossom Metevier ; Philip S. Thomas

【Abstract】: The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not efficiently capture the setting where the set of available decisions (actions) at each time step is stochastic. Recently, the stochastic action set Markov decision process (SAS-MDP) formulation has been proposed, which better captures the concept of a stochastic action set. In this paper we argue that existing RL algorithms for SAS-MDPs can suffer from potential divergence issues, and present new policy gradient algorithms for SAS-MDPs that incorporate variance reduction techniques unique to this setting, and provide conditions for their convergence. We conclude with experiments that demonstrate the practicality of our approaches on tasks inspired by real-life use cases wherein the action set is stochastic.

【Keywords】:

414. A Restricted Black-Box Adversarial Framework Towards Attacking Graph Embedding Models.

Paper Link】 【Pages】:3389-3396

【Authors】: Heng Chang ; Yu Rong ; Tingyang Xu ; Wenbing Huang ; Honglei Zhang ; Peng Cui ; Wenwu Zhu ; Junzhou Huang

【Abstract】: With the great success of graph embedding model on both academic and industry area, the robustness of graph embedding against adversarial attack inevitably becomes a central problem in graph learning domain. Regardless of the fruitful progress, most of the current works perform the attack in a white-box fashion: they need to access the model predictions and labels to construct their adversarial loss. However, the inaccessibility of model predictions in real systems makes the white-box attack impractical to real graph learning system. This paper promotes current frameworks in a more general and flexible sense – we demand to attack various kinds of graph embedding model with black-box driven. To this end, we begin by investigating the theoretical connections between graph signal processing and graph embedding models in a principled way and formulate the graph embedding model as a general graph signal process with corresponding graph filter. As such, a generalized adversarial attacker: GF-Attack is constructed by the graph filter and feature matrix. Instead of accessing any knowledge of the target classifiers used in graph embedding, GF-Attack performs the attack only on the graph filter in a black-box attack fashion. To validate the generalization of GF-Attack, we construct the attacker on four popular graph embedding models. Extensive experimental results validate the effectiveness of our attacker on several benchmark datasets. Particularly by using our attack, even small graph perturbations like one-edge flip is able to consistently make a strong attack in performance to different graph embedding models.

【Keywords】:

415. Robust Data Programming with Precision-guided Labeling Functions.

Paper Link】 【Pages】:3397-3404

【Authors】: Oishik Chatterjee ; Ganesh Ramakrishnan ; Sunita Sarawagi

【Abstract】: Scarcity of labeled data is a bottleneck for supervised learning models. A paradigm that has evolved for dealing with this problem is data programming. An existing data programming paradigm allows human supervision to be provided as a set of discrete labeling functions (LF) that output possibly noisy labels to input instances and a generative model for consolidating the weak labels. We enhance and generalize this paradigm by supporting functions that output a continuous score (instead of a hard label) that noisily correlates with labels. We show across five applications that continuous LFs are more natural to program and lead to improved recall. We also show that accuracy of existing generative models is unstable with respect to initialization, training epochs, and learning rates. We give control to the data programmer to guide the training process by providing intuitive quality guides with each LF. We propose an elegant method of incorporating these guides into the generative model. Our overall method, called CAGE, makes the data programming paradigm more reliable than other tricks based on initialization, sign-penalties, or soft-accuracy constraints.

【Keywords】:

416. A New Ensemble Adversarial Attack Powered by Long-Term Gradient Memories.

Paper Link】 【Pages】:3405-3413

【Authors】: Zhaohui Che ; Ali Borji ; Guangtao Zhai ; Suiyi Ling ; Jing Li ; Patrick Le Callet

【Abstract】: Deep neural networks are vulnerable to adversarial attacks. More importantly, some adversarial examples crafted against an ensemble of pre-trained source models can transfer to other new target models, thus pose a security threat to black-box applications (when the attackers have no access to the target models). Despite adopting diverse architectures and parameters, source and target models often share similar decision boundaries. Therefore, if an adversary is capable of fooling several source models concurrently, it can potentially capture intrinsic transferable adversarial information that may allow it to fool a broad class of other black-box target models. Current ensemble attacks, however, only consider a limited number of source models to craft an adversary, and obtain poor transferability. In this paper, we propose a novel black-box attack, dubbed Serial-Mini-Batch-Ensemble-Attack (SMBEA). SMBEA divides a large number of pre-trained source models into several mini-batches. For each single batch, we design 3 new ensemble strategies to improve the intra-batch transferability. Besides, we propose a new algorithm that recursively accumulates the “long-term” gradient memories of the previous batch to the following batch. This way, the learned adversarial information can be preserved and the inter-batch transferability can be improved. Experiments indicate that our method outperforms state-of-the-art ensemble attacks over multiple pixel-to-pixel vision tasks including image translation and salient region prediction. Our method successfully fools two online black-box saliency prediction systems including DeepGaze-II (Kummerer 2017) and SALICON (Huang et al. 2017). Finally, we also contribute a new repository to promote the research on adversarial attack and defense over pixel-to-pixel tasks: https://github.com/CZHQuality/AAA-Pix2pix.

【Keywords】:

417. Toward A Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control.

Paper Link】 【Pages】:3414-3421

【Authors】: Chacha Chen ; Hua Wei ; Nan Xu ; Guanjie Zheng ; Ming Yang ; Yuanhao Xiong ; Kai Xu ; Zhenhui Li

【Abstract】: Traffic congestion plagues cities around the world. Recent years have witnessed an unprecedented trend in applying reinforcement learning for traffic signal control. However, the primary challenge is to control and coordinate traffic lights in large-scale urban networks. No one has ever tested RL models on a network of more than a thousand traffic lights. In this paper, we tackle the problem of multi-intersection traffic signal control, especially for large-scale networks, based on RL techniques and transportation theories. This problem is quite difficult because there are challenges such as scalability, signal coordination, data feasibility, etc. To address these challenges, we (1) design our RL agents utilizing ‘pressure’ concept to achieve signal coordination in region-level; (2) show that implicit coordination could be achieved by individual control agents with well-crafted reward design thus reducing the dimensionality; and (3) conduct extensive experiments on multiple scenarios, including a real-world scenario with 2510 traffic lights in Manhattan, New York City 1 2.

【Keywords】:

418. HoMM: Higher-Order Moment Matching for Unsupervised Domain Adaptation.

Paper Link】 【Pages】:3422-3429

【Authors】: Chao Chen ; Zhihang Fu ; Zhihong Chen ; Sheng Jin ; Zhaowei Cheng ; Xinyu Jin ; Xian-Sheng Hua

【Abstract】: Minimizing the discrepancy of feature distributions between different domains is one of the most promising directions in unsupervised domain adaptation. From the perspective of moment matching, most existing discrepancy-based methods are designed to match the second-order or lower moments, which however, have limited expression of statistical characteristic for non-Gaussian distributions. In this work, we propose a Higher-order Moment Matching (HoMM) method, and further extend the HoMM into reproducing kernel Hilbert spaces (RKHS). In particular, our proposed HoMM can perform arbitrary-order moment matching, we show that the first-order HoMM is equivalent to Maximum Mean Discrepancy (MMD) and the second-order HoMM is equivalent to Correlation Alignment (CORAL). Moreover, HoMM (order≥ 3) is expected to perform fine-grained domain alignment as higher-order statistics can approximate more complex, non-Gaussian distributions. Besides, we also exploit the pseudo-labeled target samples to learn discriminative representations in the target domain, which further improves the transfer performance. Extensive experiments are conducted, showing that our proposed HoMM consistently outperforms the existing moment matching methods by a large margin. Codes are available at https://github.com/chenchao666/HoMM-Master

【Keywords】:

419. Online Knowledge Distillation with Diverse Peers.

Paper Link】 【Pages】:3430-3437

【Authors】: Defang Chen ; Jian-Ping Mei ; Can Wang ; Yan Feng ; Chun Chen

【Abstract】: Distillation is an effective knowledge-transfer technique that uses predicted distributions of a powerful teacher model as soft targets to train a less-parameterized student model. A pre-trained high capacity teacher, however, is not always available. Recently proposed online variants use the aggregated intermediate predictions of multiple student models as targets to train each student model. Although group-derived targets give a good recipe for teacher-free distillation, group members are homogenized quickly with simple aggregation functions, leading to early saturated solutions. In this work, we propose Online Knowledge Distillation with Diverse peers (OKDDip), which performs two-level distillation during training with multiple auxiliary peers and one group leader. In the first-level distillation, each auxiliary peer holds an individual set of aggregation weights generated with an attention-based mechanism to derive its own targets from predictions of other auxiliary peers. Learning from distinct target distributions helps to boost peer diversity for effectiveness of group-based distillation. The second-level distillation is performed to transfer the knowledge in the ensemble of auxiliary peers further to the group leader, i.e., the model used for inference. Experimental results show that the proposed framework consistently gives better performance than state-of-the-art approaches without sacrificing training or inference complexity, demonstrating the effectiveness of the proposed two-level distillation framework.

【Keywords】:

420. Measuring and Relieving the Over-Smoothing Problem for Graph Neural Networks from the Topological View.

Paper Link】 【Pages】:3438-3445

【Authors】: Deli Chen ; Yankai Lin ; Wei Li ; Peng Li ; Jie Zhou ; Xu Sun

【Abstract】: Graph Neural Networks (GNNs) have achieved promising performance on a wide range of graph-based tasks. Despite their success, one severe limitation of GNNs is the over-smoothing issue (indistinguishable representations of nodes in different classes). In this work, we present a systematic and quantitative study on the over-smoothing issue of GNNs. First, we introduce two quantitative metrics, MAD and MADGap, to measure the smoothness and over-smoothness of the graph nodes representations, respectively. Then, we verify that smoothing is the nature of GNNs and the critical factor leading to over-smoothness is the low information-to-noise ratio of the message received by the nodes, which is partially determined by the graph topology. Finally, we propose two methods to alleviate the over-smoothing issue from the topological view: (1) MADReg which adds a MADGap-based regularizer to the training objective; (2) AdaEdge which optimizes the graph topology based on the model predictions. Extensive experiments on 7 widely-used graph datasets with 10 typical GNN models show that the two proposed methods are effective for relieving the over-smoothing issue, thus improving the performance of various GNN models.

【Keywords】:

421. ECGadv: Generating Adversarial Electrocardiogram to Misguide Arrhythmia Classification System.

Paper Link】 【Pages】:3446-3453

【Authors】: Huangxun Chen ; Chenyu Huang ; Qianyi Huang ; Qian Zhang ; Wei Wang

【Abstract】: Deep neural networks (DNNs)-powered Electrocardiogram (ECG) diagnosis systems recently achieve promising progress to take over tedious examinations by cardiologists. However, their vulnerability to adversarial attacks still lack comprehensive investigation. The existing attacks in image domain could not be directly applicable due to the distinct properties of ECGs in visualization and dynamic properties. Thus, this paper takes a step to thoroughly explore adversarial attacks on the DNN-powered ECG diagnosis system. We analyze the properties of ECGs to design effective attacks schemes under two attacks models respectively. Our results demonstrate the blind spots of DNN-powered diagnosis systems under adversarial attacks, which calls attention to adequate countermeasures.

【Keywords】:

422. LS-Tree: Model Interpretation When the Data Are Linguistic.

Paper Link】 【Pages】:3454-3461

【Authors】: Jianbo Chen ; Michael I. Jordan

【Abstract】: We study the problem of interpreting trained classification models in the setting of linguistic data sets. Leveraging a parse tree, we propose to assign least-squares-based importance scores to each word of an instance by exploiting syntactic constituency structure. We establish an axiomatic characterization of these importance scores by relating them to the Banzhaf value in coalitional game theory. Based on these importance scores, we develop a principled method for detecting and quantifying interactions between words in a sentence. We demonstrate that the proposed method can aid in interpretability and diagnostics for several widely-used language models.

【Keywords】:

423. Generative Adversarial Networks for Video-to-Video Domain Adaptation.

Paper Link】 【Pages】:3462-3469

【Authors】: Jiawei Chen ; Yuexiang Li ; Kai Ma ; Yefeng Zheng

【Abstract】: Endoscopic videos from multicentres often have different imaging conditions, e.g., color and illumination, which make the models trained on one domain usually fail to generalize well to another. Domain adaptation is one of the potential solutions to address the problem. However, few of existing works focused on the translation of video-based data. In this work, we propose a novel generative adversarial network (GAN), namely VideoGAN, to transfer the video-based data across different domains. As the frames of a video may have similar content and imaging conditions, the proposed VideoGAN has an X-shape generator to preserve the intra-video consistency during translation. Furthermore, a loss function, namely color histogram loss, is proposed to tune the color distribution of each translated frame. Two colonoscopic datasets from different centres, i.e., CVC-Clinic and ETIS-Larib, are adopted to evaluate the performance of domain adaptation of our VideoGAN. Experimental results demonstrate that the adapted colonoscopic video generated by our VideoGAN can significantly boost the segmentation accuracy, i.e., an improvement of 5%, of colorectal polyps on multicentre datasets. As our VideoGAN is a general network architecture, we also evaluate its performance with the CamVid driving video dataset on the cloudy-to-sunny translation task. Comprehensive experiments show that the domain gap could be substantially narrowed down by our VideoGAN.

【Keywords】:

424. Fast Adaptively Weighted Matrix Factorization for Recommendation with Implicit Feedback.

Paper Link】 【Pages】:3470-3477

【Authors】: Jiawei Chen ; Can Wang ; Sheng Zhou ; Qihao Shi ; Jingbang Chen ; Yan Feng ; Chun Chen

【Abstract】: Recommendation from implicit feedback is a highly challenging task due to the lack of the reliable observed negative data. A popular and effective approach for implicit recommendation is to treat unobserved data as negative but downweight their confidence. Naturally, how to assign confidence weights and how to handle the large number of the unobserved data are two key problems for implicit recommendation models. However, existing methods either pursuit fast learning by manually assigning simple confidence weights, which lacks flexibility and may create empirical bias in evaluating user's preference; or adaptively infer personalized confidence weights but suffer from low efficiency.To achieve both adaptive weights assignment and efficient model learning, we propose a fast adaptively weighted matrix factorization (FAWMF) based on variational auto-encoder. The personalized data confidence weights are adaptively assigned with a parameterized neural network (function) and the network can be inferred from the data. Further, to support fast and stable learning of FAWMF, a new specific batch-based learning algorithm fBGD has been developed, which trains on all feedback data but its complexity is linear to the number of observed data. Extensive experiments on real-world datasets demonstrate the superiority of the proposed FAWMF and its learning algorithm fBGD.

【Keywords】:

425. Variational Metric Scaling for Metric-Based Meta-Learning.

Paper Link】 【Pages】:3478-3485

【Authors】: Jiaxin Chen ; Li-Ming Zhan ; Xiao-Ming Wu ; Fu-Lai Chung

【Abstract】: Metric-based meta-learning has attracted a lot of attention due to its effectiveness and efficiency in few-shot learning. Recent studies show that metric scaling plays a crucial role in the performance of metric-based meta-learning algorithms. However, there still lacks a principled method for learning the metric scaling parameter automatically. In this paper, we recast metric-based meta-learning from a Bayesian perspective and develop a variational metric scaling framework for learning a proper metric scaling parameter. Firstly, we propose a stochastic variational method to learn a single global scaling parameter. To better fit the embedding space to a given data distribution, we extend our method to learn a dimensional scaling vector to transform the embedding space. Furthermore, to learn task-specific embeddings, we generate task-dependent dimensional scaling vectors with amortized variational inference. Our method is end-to-end without any pre-training and can be used as a simple plug-and-play module for existing metric-based meta-algorithms. Experiments on miniImageNet show that our methods can be used to consistently improve the performance of existing metric-based meta-algorithms including prototypical networks and TADAM.

【Keywords】:

426. A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks.

Paper Link】 【Pages】:3486-3494

【Authors】: Jinghui Chen ; Dongruo Zhou ; Jinfeng Yi ; Quanquan Gu

【Abstract】: Depending on how much information an adversary can access to, adversarial attacks can be classified as white-box attack and black-box attack. For white-box attack, optimization-based attack algorithms such as projected gradient descent (PGD) can achieve relatively high attack success rates within moderate iterates. However, they tend to generate adversarial examples near or upon the boundary of the perturbation set, resulting in large distortion. Furthermore, their corresponding black-box attack algorithms also suffer from high query complexities, thereby limiting their practical usefulness. In this paper, we focus on the problem of developing efficient and effective optimization-based adversarial attack algorithms. In particular, we propose a novel adversarial attack framework for both white-box and black-box settings based on a variant of Frank-Wolfe algorithm. We show in theory that the proposed attack algorithms are efficient with an O(1/√T) convergence rate. The empirical results of attacking the ImageNet and MNIST datasets also verify the efficiency and effectiveness of the proposed algorithms. More specifically, our proposed algorithms attain the best attack performances in both white-box and black-box attacks among all baselines, and are more time and query efficient than the state-of-the-art.

【Keywords】:

427. Weakly Supervised Disentanglement by Pairwise Similarities.

Paper Link】 【Pages】:3495-3502

【Authors】: Junxiang Chen ; Kayhan Batmanghelich

【Abstract】: Recently, researches related to unsupervised disentanglement learning with deep generative models have gained substantial popularity. However, without introducing supervision, there is no guarantee that the factors of interest can be successfully recovered (Locatello et al. 2018). Motivated by a real-world problem, we propose a setting where the user introduces weak supervision by providing similarities between instances based on a factor to be disentangled. The similarity is provided as either a binary (yes/no) or real-valued label describing whether a pair of instances are similar or not. We propose a new method for weakly supervised disentanglement of latent variables within the framework of Variational Autoencoder. Experimental results demonstrate that utilizing weak supervision improves the performance of the disentanglement method substantially.

【Keywords】:

428. Outlier Detection Ensemble with Embedded Feature Selection.

Paper Link】 【Pages】:3503-3512

【Authors】: Li Cheng ; Yijie Wang ; Xinwang Liu ; Bin Li

【Abstract】: Feature selection places an important role in improving the performance of outlier detection, especially for noisy data. Existing methods usually perform feature selection and outlier scoring separately, which would select feature subsets that may not optimally serve for outlier detection, leading to unsatisfying performance. In this paper, we propose an outlier detection ensemble framework with embedded feature selection (ODEFS), to address this issue. Specifically, for each random sub-sampling based learning component, ODEFS unifies feature selection and outlier detection into a pairwise ranking formulation to learn feature subsets that are tailored for the outlier detection method. Moreover, we adopt the thresholded self-paced learning to simultaneously optimize feature selection and example selection, which is helpful to improve the reliability of the training set. After that, we design an alternate algorithm with proved convergence to solve the resultant optimization problem. In addition, we analyze the generalization error bound of the proposed framework, which provides theoretical guarantee on the method and insightful practical guidance. Comprehensive experimental results on 12 real-world datasets from diverse domains validate the superiority of the proposed ODEFS.

【Keywords】:

429. Multi-View Clustering in Latent Embedding Space.

Paper Link】 【Pages】:3513-3520

【Authors】: Mansheng Chen ; Ling Huang ; Chang-Dong Wang ; Dong Huang

【Abstract】: Previous multi-view clustering algorithms mostly partition the multi-view data in their original feature space, the efficacy of which heavily and implicitly relies on the quality of the original feature presentation. In light of this, this paper proposes a novel approach termed Multi-view Clustering in Latent Embedding Space (MCLES), which is able to cluster the multi-view data in a learned latent embedding space while simultaneously learning the global structure and the cluster indicator matrix in a unified optimization framework. Specifically, in our framework, a latent embedding representation is firstly discovered which can effectively exploit the complementary information from different views. The global structure learning is then performed based on the learned latent embedding representation. Further, the cluster indicator matrix can be acquired directly with the learned global structure. An alternating optimization scheme is introduced to solve the optimization problem. Extensive experiments conducted on several real-world multi-view datasets have demonstrated the superiority of our approach.

【Keywords】:

430. Adversarial-Learned Loss for Domain Adaptation.

Paper Link】 【Pages】:3521-3528

【Authors】: Minghao Chen ; Shuai Zhao ; Haifeng Liu ; Deng Cai

【Abstract】: Recently, remarkable progress has been made in learning transferable representation across domains. Previous works in domain adaptation are majorly based on two techniques: domain-adversarial learning and self-training. However, domain-adversarial learning only aligns feature distributions between domains but does not consider whether the target features are discriminative. On the other hand, self-training utilizes the model predictions to enhance the discrimination of target features, but it is unable to explicitly align domain distributions. In order to combine the strengths of these two methods, we propose a novel method called Adversarial-Learned Loss for Domain Adaptation (ALDA). We first analyze the pseudo-label method, a typical self-training method. Nevertheless, there is a gap between pseudo-labels and the ground truth, which can cause incorrect training. Thus we introduce the confusion matrix, which is learned through an adversarial manner in ALDA, to reduce the gap and align the feature distributions. Finally, a new loss function is auto-constructed from the learned confusion matrix, which serves as the loss for unlabeled target samples. Our ALDA outperforms state-of-the-art approaches in four standard domain adaptation datasets. Our code is available at https://github.com/ZJULearning/ALDA.

【Keywords】:

431. Multi-Range Attentive Bicomponent Graph Convolutional Network for Traffic Forecasting.

Paper Link】 【Pages】:3529-3536

【Authors】: Weiqi Chen ; Ling Chen ; Yu Xie ; Wei Cao ; Yusong Gao ; Xiaojie Feng

【Abstract】: Traffic forecasting is of great importance to transportation management and public safety, and very challenging due to the complicated spatial-temporal dependency and essential uncertainty brought about by the road network and traffic conditions. Latest studies mainly focus on modeling the spatial dependency by utilizing graph convolutional networks (GCNs) throughout a fixed weighted graph. However, edges, i.e., the correlations between pair-wise nodes, are much more complicated and interact with each other. In this paper, we propose the Multi-Range Attentive Bicomponent GCN (MRA-BGCN), a novel deep learning model for traffic forecasting. We first build the node-wise graph according to the road network distance and the edge-wise graph according to various edge interaction patterns. Then, we implement the interactions of both nodes and edges using bicomponent graph convolution. The multi-range attention mechanism is introduced to aggregate information in different neighborhood ranges and automatically learn the importance of different ranges. Extensive experiments on two real-world road network traffic datasets, METR-LA and PEMS-BAY, show that our MRA-BGCN achieves the state-of-the-art results.

【Keywords】:

432. AutoDAL: Distributed Active Learning with Automatic Hyperparameter Selection.

Paper Link】 【Pages】:3537-3544

【Authors】: Xu Chen ; Brett Wujek

【Abstract】: Automated machine learning (AutoML) strives to establish an appropriate machine learning model for any dataset automatically with minimal human intervention. Although extensive research has been conducted on AutoML, most of it has focused on supervised learning. Research of automated semi-supervised learning and active learning algorithms is still limited. Implementation becomes more challenging when the algorithm is designed for a distributed computing environment. With this as motivation, we propose a novel automated learning system for distributed active learning (AutoDAL) to address these challenges. First, automated graph-based semi-supervised learning is conducted by aggregating the proposed cost functions from different compute nodes in a distributed manner. Subsequently, automated active learning is addressed by jointly optimizing hyperparameters in both the classification and query selection stages leveraging the graph loss minimization and entropy regularization. Moreover, we propose an efficient distributed active learning algorithm which is scalable for big data by first partitioning the unlabeled data and replicating the labeled data to different worker nodes in the classification stage, and then aggregating the data in the controller in the query selection stage. The proposed AutoDAL algorithm is applied to multiple benchmark datasets and a real-world electrocardiogram (ECG) dataset for classification. We demonstrate that the proposed AutoDAL algorithm is capable of achieving significantly better performance compared to several state-of-the-art AutoML approaches and active learning algorithms.

【Keywords】:

433. Optimal Attack against Autoregressive Models by Manipulating the Environment.

Paper Link】 【Pages】:3545-3552

【Authors】: Yiding Chen ; Xiaojin Zhu

【Abstract】: We describe an optimal adversarial attack formulation against autoregressive time series forecast using Linear Quadratic Regulator (LQR). In this threat model, the environment evolves according to a dynamical system; an autoregressive model observes the current environment state and predicts its future values; an attacker has the ability to modify the environment state in order to manipulate future autoregressive forecasts. The attacker's goal is to force autoregressive forecasts into tracking a target trajectory while minimizing its attack expenditure. In the white-box setting where the attacker knows the environment and forecast models, we present the optimal attack using LQR for linear models, and Model Predictive Control (MPC) for nonlinear models. In the black-box setting, we combine system identification and MPC. Experiments demonstrate the effectiveness of our attacks.

【Keywords】:

434. Multi-View Partial Multi-Label Learning with Graph-Based Disambiguation.

Paper Link】 【Pages】:3553-3560

【Authors】: Ze-Sen Chen ; Xuan Wu ; Qing-Guo Chen ; Yao Hu ; Min-Ling Zhang

【Abstract】: In multi-view multi-label learning (MVML), each training example is represented by different feature vectors and associated with multiple labels simultaneously. Nonetheless, the labeling quality of training examples is tend to be affected by annotation noises. In this paper, the problem of multi-view partial multi-label learning (MVPML) is studied, where the set of associated labels are assumed to be candidate ones and only partially valid. To solve the MVPML problem, a two-stage graph-based disambiguation approach is proposed. Firstly, the ground-truth labels of each training example are estimated by disambiguating the candidate labels with fused similarity graph. After that, the predictive model for each label is learned from embedding features generated from disambiguation-guided clustering analysis. Extensive experimental studies clearly validate the effectiveness of the proposed approach in solving the MVPML problem.

【Keywords】:

435. Compressed Self-Attention for Deep Metric Learning.

Paper Link】 【Pages】:3561-3568

【Authors】: Ziye Chen ; Mingming Gong ; Yanwu Xu ; Chaohui Wang ; Kun Zhang ; Bo Du

【Abstract】: In this paper, we aim to enhance self-attention (SA) mechanism for deep metric learning in visual perception, by capturing richer contextual dependencies in visual data. To this end, we propose a novel module, named compressed self-attention (CSA), which significantly reduces the computation and memory cost with a neglectable decrease in accuracy with respect to the original SA mechanism, thanks to the following two characteristics: i) it only needs to compute a small number of base attention maps for a small number of base feature vectors; and ii) the output at each spatial location can be simply obtained by an adaptive weighted average of the outputs calculated from the base attention maps. The high computational efficiency of CSA enables the application to high-resolution shallow layers in convolutional neural networks with little additional cost. In addition, CSA makes it practical to further partition the feature maps into groups along the channel dimension and compute attention maps for features in each group separately, thus increasing the diversity of long-range dependencies and accordingly boosting the accuracy. We evaluate the performance of CSA via extensive experiments on two metric learning tasks: person re-identification and local descriptor learning. Qualitative and quantitative comparisons with latest methods demonstrate the significance of CSA in this topic.

【Keywords】:

436. Semi-Supervised Learning under Class Distribution Mismatch.

Paper Link】 【Pages】:3569-3576

【Authors】: Yanbei Chen ; Xiatian Zhu ; Wei Li ; Shaogang Gong

【Abstract】: Semi-supervised learning (SSL) aims to avoid the need for collecting prohibitively expensive labelled training data. Whilst demonstrating impressive performance boost, existing SSL methods artificially assume that small labelled data and large unlabelled data are drawn from the same class distribution. In a more realistic scenario with class distribution mismatch between the two sets, they often suffer severe performance degradation due to error propagation introduced by irrelevant unlabelled samples. Our work addresses this under-studied and realistic SSL problem by a novel algorithm named Uncertainty-Aware Self-Distillation (UASD). Specifically, UASD produces soft targets that avoid catastrophic error propagation, and empower learning effectively from unconstrained unlabelled data with out-of-distribution (OOD) samples. This is based on joint Self-Distillation and OOD filtering in a unified formulation. Without bells and whistles, UASD significantly outperforms six state-of-the-art methods in more realistic SSL under class distribution mismatch on three popular image classification datasets: CIFAR10, CIFAR100, and TinyImageNet.

【Keywords】:

437. InstaNAS: Instance-Aware Neural Architecture Search.

Paper Link】 【Pages】:3577-3584

【Authors】: An-Chieh Cheng ; Chieh Hubert Lin ; Da-Cheng Juan ; Wei Wei ; Min Sun

【Abstract】: Conventional Neural Architecture Search (NAS) aims at finding a single architecture that achieves the best performance, which usually optimizes task related learning objectives such as accuracy. However, a single architecture may not be representative enough for the whole dataset with high diversity and variety. Intuitively, electing domain-expert architectures that are proficient in domain-specific features can further benefit architecture related objectives such as latency. In this paper, we propose InstaNAS—an instance-aware NAS framework—that employs a controller trained to search for a “distribution of architectures” instead of a single architecture; This allows the model to use sophisticated architectures for the difficult samples, which usually comes with large architecture related cost, and shallow architectures for those easy samples. During the inference phase, the controller assigns each of the unseen input samples with a domain expert architecture that can achieve high accuracy with customized inference costs. Experiments within a search space inspired by MobileNetV2 show InstaNAS can achieve up to 48.8% latency reduction without compromising accuracy on a series of datasets against MobileNetV2.

【Keywords】:

438. Distilling Portable Generative Adversarial Networks for Image Translation.

Paper Link】 【Pages】:3585-3592

【Authors】: Hanting Chen ; Yunhe Wang ; Han Shu ; Changyuan Wen ; Chunjing Xu ; Boxin Shi ; Chao Xu ; Chang Xu

【Abstract】: Despite Generative Adversarial Networks (GANs) have been widely used in various image-to-image translation tasks, they can be hardly applied on mobile devices due to their heavy computation and storage cost. Traditional network compression methods focus on visually recognition tasks, but never deal with generation tasks. Inspired by knowledge distillation, a student generator of fewer parameters is trained by inheriting the low-level and high-level information from the original heavy teacher generator. To promote the capability of student generator, we include a student discriminator to measure the distances between real images, and images generated by student and teacher generators. An adversarial learning process is therefore established to optimize student generator and student discriminator. Qualitative and quantitative analysis by conducting experiments on benchmark datasets demonstrate that the proposed method can learn portable generative models with strong performance.

【Keywords】:

439. Towards Better Forecasting by Fusing Near and Distant Future Visions.

Paper Link】 【Pages】:3593-3600

【Authors】: Jiezhu Cheng ; Kaizhu Huang ; Zibin Zheng

【Abstract】: Multivariate time series forecasting is an important yet challenging problem in machine learning. Most existing approaches only forecast the series value of one future moment, ignoring the interactions between predictions of future moments with different temporal distance. Such a deficiency probably prevents the model from getting enough information about the future, thus limiting the forecasting accuracy. To address this problem, we propose Multi-Level Construal Neural Network (MLCNN), a novel multi-task deep learning framework. Inspired by the Construal Level Theory of psychology, this model aims to improve the predictive performance by fusing forecasting information (i.e., future visions) of different future time. We first use the Convolution Neural Network to extract multi-level abstract representations of the raw data for near and distant future predictions. We then model the interplay between multiple predictive tasks and fuse their future visions through a modified Encoder-Decoder architecture. Finally, we combine traditional Autoregression model with the neural network to solve the scale insensitive problem. Experiments on three real-world datasets show that our method achieves statistically significant improvements compared to the most state-of-the-art baseline methods, with average 4.59% reduction on RMSE metric and average 6.87% reduction on MAE metric.

【Keywords】:

440. Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples.

Paper Link】 【Pages】:3601-3608

【Authors】: Minhao Cheng ; Jinfeng Yi ; Pin-Yu Chen ; Huan Zhang ; Cho-Jui Hsieh

【Abstract】: Crafting adversarial examples has become an important technique to evaluate the robustness of deep neural networks (DNNs). However, most existing works focus on attacking the image classification problem since its input space is continuous and output space is finite. In this paper, we study the much more challenging problem of crafting adversarial examples for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities. To address the challenges caused by the discrete input space, we propose a projected gradient method combined with group lasso and gradient regularization. To handle the almost infinite output space, we design some novel loss functions to conduct non-overlapping attack and targeted keyword attack. We apply our algorithm to machine translation and text summarization tasks, and verify the effectiveness of the proposed algorithm: by changing less than 3 words, we can make seq2seq model to produce desired outputs with high success rates. We also use an external sentiment classifier to verify the property of preserving semantic meanings for our generated adversarial examples. On the other hand, we recognize that, compared with the well-evaluated CNN-based classifiers, seq2seq models are intrinsically more robust to adversarial attacks.

【Keywords】:

441. Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions.

Paper Link】 【Pages】:3609-3616

【Authors】: Weiyu Cheng ; Yanyan Shen ; Linpeng Huang

【Abstract】: Various factorization-based methods have been proposed to leverage second-order, or higher-order cross features for boosting the performance of predictive models. They generally enumerate all the cross features under a predefined maximum order, and then identify useful feature interactions through model training, which suffer from two drawbacks. First, they have to make a trade-off between the expressiveness of higher-order cross features and the computational cost, resulting in suboptimal predictions. Second, enumerating all the cross features, including irrelevant ones, may introduce noisy feature combinations that degrade model performance. In this work, we propose the Adaptive Factorization Network (AFN), a new model that learns arbitrary-order cross features adaptively from data. The core of AFN is a logarithmic transformation layer that converts the power of each feature in a feature combination into the coefficient to be learned. The experimental results on four real datasets demonstrate the superior predictive performance of AFN against the state-of-the-arts.

【Keywords】:

442. Time2Graph: Revisiting Time Series Modeling with Dynamic Shapelets.

Paper Link】 【Pages】:3617-3624

【Authors】: Ziqiang Cheng ; Yang Yang ; Wei Wang ; Wenjie Hu ; Yueting Zhuang ; Guojie Song

【Abstract】: Time series modeling has attracted extensive research efforts; however, achieving both reliable efficiency and interpretability from a unified model still remains a challenging problem. Among the literature, shapelets offer interpretable and explanatory insights in the classification tasks, while most existing works ignore the differing representative power at different time slices, as well as (more importantly) the evolution pattern of shapelets. In this paper, we propose to extract time-aware shapelets by designing a two-level timing factor. Moreover, we define and construct the shapelet evolution graph, which captures how shapelets evolve over time and can be incorporated into the time series embeddings by graph embedding algorithms. To validate whether the representations obtained in this way can be applied effectively in various scenarios, we conduct experiments based on three public time series datasets, and two real-world datasets from different domains. Experimental results clearly show the improvements achieved by our approach compared with 16 state-of-the-art baselines.

【Keywords】:

443. Suspicion-Free Adversarial Attacks on Clustering Algorithms.

Paper Link】 【Pages】:3625-3632

【Authors】: Anshuman Chhabra ; Abhishek Roy ; Prasant Mohapatra

【Abstract】: Clustering algorithms are used in a large number of applications and play an important role in modern machine learning– yet, adversarial attacks on clustering algorithms seem to be broadly overlooked unlike supervised learning. In this paper, we seek to bridge this gap by proposing a black-box adversarial attack for clustering models for linearly separable clusters. Our attack works by perturbing a single sample close to the decision boundary, which leads to the misclustering of multiple unperturbed samples, named spill-over adversarial samples. We theoretically show the existence of such adversarial samples for the K-Means clustering. Our attack is especially strong as (1) we ensure the perturbed sample is not an outlier, hence not detectable, and (2) the exact metric used for clustering is not known to the attacker. We theoretically justify that the attack can indeed be successful without the knowledge of the true metric. We conclude by providing empirical results on a number of datasets, and clustering algorithms. To the best of our knowledge, this is the first work that generates spill-over adversarial samples without the knowledge of the true metric ensuring that the perturbed sample is not an outlier, and theoretically proves the above.

【Keywords】:

444. A General Approach to Fairness with Optimal Transport.

Paper Link】 【Pages】:3633-3640

【Authors】: Silvia Chiappa ; Ray Jiang ; Tom Stepleton ; Aldo Pacchiano ; Heinrich Jiang ; John Aslanides

【Abstract】: We propose a general approach to fairness based on transporting distributions corresponding to different sensitive attributes to a common distribution. We use optimal transport theory to derive target distributions and methods that allow us to achieve fairness with minimal changes to the unfair model. Our approach is applicable to both classification and regression problems, can enforce different notions of fairness, and enable us to achieve a Pareto-optimal trade-off between accuracy and fairness. We demonstrate that it outperforms previous approaches in several benchmark fairness datasets.

【Keywords】:

445. Active Learning in the Geometric Block Model.

Paper Link】 【Pages】:3641-3648

【Authors】: Eli Chien ; Antonia Maria Tulino ; Jaime Llorca

【Abstract】: The geometric block model is a recently proposed generative model for random graphs that is able to capture the inherent geometric properties of many community detection problems, providing more accurate characterizations of practical community structures compared with the popular stochastic block model. Galhotra et al. recently proposed a motif-counting algorithm for unsupervised community detection in the geometric block model that is proved to be near-optimal. They also characterized the regimes of the model parameters for which the proposed algorithm can achieve exact recovery. In this work, we initiate the study of active learning in the geometric block model. That is, we are interested in the problem of exactly recovering the community structure of random graphs following the geometric block model under arbitrary model parameters, by possibly querying the labels of a limited number of chosen nodes. We propose two active learning algorithms that combine the use of motif-counting with two different label query policies. Our main contribution is to show that sampling the labels of a vanishingly small fraction of nodes (sub-linear in the total number of nodes) is sufficient to achieve exact recovery in the regimes under which the state-of-the-art unsupervised method fails. We validate the superior performance of our algorithms via numerical simulations on both real and synthetic datasets.

【Keywords】:

446. Deep Mixed Effect Model Using Gaussian Processes: A Personalized and Reliable Prediction for Healthcare.

Paper Link】 【Pages】:3649-3657

【Authors】: Ingyo Chung ; Saehoon Kim ; Juho Lee ; Kwang Joon Kim ; Sung Ju Hwang ; Eunho Yang

【Abstract】: We present a personalized and reliable prediction model for healthcare, which can provide individually tailored medical services such as diagnosis, disease treatment, and prevention. Our proposed framework targets at making personalized and reliable predictions from time-series data, such as Electronic Health Records (EHR), by modeling two complementary components: i) a shared component that captures global trend across diverse patients and ii) a patient-specific component that models idiosyncratic variability for each patient. To this end, we propose a composite model of a deep neural network to learn complex global trends from the large number of patients, and Gaussian Processes (GP) to probabilistically model individual time-series given relatively small number of visits per patient. We evaluate our model on diverse and heterogeneous tasks from EHR datasets and show practical advantages over standard time-series deep models such as pure Recurrent Neural Network (RNN).

【Keywords】:

447. A Constraint-Based Approach to Learning and Explanation.

Paper Link】 【Pages】:3658-3665

【Authors】: Gabriele Ciravegna ; Francesco Giannini ; Stefano Melacci ; Marco Maggini ; Marco Gori

【Abstract】: In the last few years we have seen a remarkable progress from the cultivation of the idea of expressing domain knowledge by the mathematical notion of constraint. However, the progress has mostly involved the process of providing consistent solutions with a given set of constraints, whereas learning “new” constraints, that express new knowledge, is still an open challenge. In this paper we propose a novel approach to learning of constraints which is based on information theoretic principles. The basic idea consists in maximizing the transfer of information between task functions and a set of learnable constraints, implemented using neural networks subject to L1 regularization. This process leads to the unsupervised development of new constraints that are fulfilled in different sub-portions of the input domain. In addition, we define a simple procedure that can explain the behaviour of the newly devised constraints in terms of First-Order Logic formulas, thus extracting novel knowledge on the relationships between the original tasks. An experimental evaluation is provided to support the proposed approach, in which we also explore the regularization effects introduced by the proposed Information-Based Learning of Constraint (IBLC) algorithm.

【Keywords】:

448. Representing Closed Transformation Paths in Encoded Network Latent Space.

Paper Link】 【Pages】:3666-3675

【Authors】: Marissa Connor ; Christopher Rozell

【Abstract】: Deep generative networks have been widely used for learning mappings from a low-dimensional latent space to a high-dimensional data space. In many cases, data transformations are defined by linear paths in this latent space. However, the Euclidean structure of the latent space may be a poor match for the underlying latent structure in the data. In this work, we incorporate a generative manifold model into the latent space of an autoencoder in order to learn the low-dimensional manifold structure from the data and adapt the latent space to accommodate this structure. In particular, we focus on applications in which the data has closed transformation paths which extend from a starting point and return to nearly the same point. Through experiments on data with natural closed transformation paths, we show that this model introduces the ability to learn the latent dynamics of complex systems, generate transformation paths, and classify samples that belong on the same transformation path.

【Keywords】:

449. Forgetting to Learn Logic Programs.

Paper Link】 【Pages】:3676-3683

【Authors】: Andrew Cropper

【Abstract】: Most program induction approaches require predefined, often hand-engineered, background knowledge (BK). To overcome this limitation, we explore methods to automatically acquire BK through multi-task learning. In this approach, a learner adds learned programs to its BK so that they can be reused to help learn other programs. To improve learning performance, we explore the idea of forgetting, where a learner can additionally remove programs from its BK. We consider forgetting in an inductive logic programming (ILP) setting. We show that forgetting can significantly reduce both the size of the hypothesis space and the sample complexity of an ILP learner. We introduce Forgetgol, a multi-task ILP learner which supports forgetting. We experimentally compare Forgetgol against approaches that either remember or forget everything. Our experimental results show that Forgetgol outperforms the alternative approaches when learning from over 10,000 tasks.

【Keywords】:

450. Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking.

Paper Link】 【Pages】:3684-3692

【Authors】: Eric Crawford ; Joelle Pineau

【Abstract】: The ability to detect and track objects in the visual world is a crucial skill for any intelligent agent, as it is a necessary precursor to any object-level reasoning process. Moreover, it is important that agents learn to track objects without supervision (i.e. without access to annotated training videos) since this will allow agents to begin operating in new environments with minimal human assistance. The task of learning to discover and track objects in videos, which we call unsupervised object tracking, has grown in prominence in recent years; however, most architectures that address it still struggle to deal with large scenes containing many objects. In the current work, we propose an architecture that scales well to the large-scene, many-object setting by employing spatially invariant computations (convolutions and spatial attention) and representations (a spatially local object specification scheme). In a series of experiments, we demonstrate a number of attractive features of our architecture; most notably, that it outperforms competing methods at tracking objects in cluttered scenes with many objects, and that it can generalize well to videos that are larger and/or contain more objects than videos encountered during training.

【Keywords】:

451. Label Error Correction and Generation through Label Relationships.

Paper Link】 【Pages】:3693-3700

【Authors】: Zijun Cui ; Yong Zhang ; Qiang Ji

【Abstract】: For multi-label supervised learning, the quality of the label annotation is important. However, for many real world multi-label classification applications, label annotations often lack quality, in particular when label annotation requires special expertise, such as annotating fine-grained labels. The relationships among labels, on other hand, are usually stable and robust to errors. For this reason, we propose to capture and leverage label relationships at different levels to improve fine-grained label annotation quality and to generate labels. Two levels of labels, including object-level labels and property-level labels, are considered. The object-level labels characterize object category based on its overall appearance, while the property-level labels describe specific local object properties. A Bayesian network (BN) is learned to capture the relationships among the multiple labels at the two levels. A MAP inference is then performed to identify the most stable and consistent label relationships and they are then used to improve data annotations for the same dataset and to generate labels for a new dataset. Experimental evaluations on six benchmark databases for two different tasks (facial action unit and object attribute classification) demonstrate the effectiveness of the proposed method in improving data annotation and in generating effective new labels.

【Keywords】:

452. A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound.

Paper Link】 【Pages】:3701-3708

【Authors】: Gal Dalal ; Balázs Szörényi ; Gugan Thoppe

【Abstract】: Policy evaluation in reinforcement learning is often conducted using two-timescale stochastic approximation, which results in various gradient temporal difference methods such as GTD(0), GTD2, and TDC. Here, we provide convergence rate bounds for this suite of algorithms. Algorithms such as these have two iterates, θn and wn, which are updated using two distinct stepsize sequences, αn and βn, respectively. Assuming αn = n−α and βn = n−β with 1 > α > β > 0, we show that, with high probability, the two iterates converge to their respective solutions θ and w at rates given by ∥θn - θ∥ = Õ(n−α/2) and ∥wn - w∥ = Õ(n−β/2); here, Õ hides logarithmic terms. Via comparable lower bounds, we show that these bounds are, in fact, tight. To the best of our knowledge, ours is the first finite-time analysis which achieves these rates. While it was known that the two timescale components decouple asymptotically, our results depict this phenomenon more explicitly by showing that it in fact happens from some finite time onwards. Lastly, compared to existing works, our result applies to a broader family of stepsizes, including non-square summable ones.

【Keywords】:

453. Explainable Data Decompositions.

Paper Link】 【Pages】:3709-3716

【Authors】: Sebastian Dalleiger ; Jilles Vreeken

【Abstract】: Our goal is to discover the components of a dataset, characterize why we deem these components, explain how these components are different from each other, as well as identify what properties they share among each other. As is usual, we consider regions in the data to be components if they show significantly different distributions. What is not usual, however, is that we parameterize these distributions with patterns that are informative for one or more components. We do so because these patterns allow us to characterize what is going on in our data as well as explain our decomposition.We define the problem in terms of a regularized maximum likelihood, in which we use the Maximum Entropy principle to model each data component with a set of patterns. As the search space is large and unstructured, we propose the deterministic DISC algorithm to efficiently discover high-quality decompositions via an alternating optimization approach. Empirical evaluation on synthetic and real-world data shows that DISC efficiently discovers meaningful components and accurately characterises these in easily understandable terms.

【Keywords】:

454. A Skip-Connected Evolving Recurrent Neural Network for Data Stream Classification under Label Latency Scenario.

Paper Link】 【Pages】:3717-3724

【Authors】: Monidipa Das ; Mahardhika Pratama ; Jie Zhang ; Yew-Soon Ong

【Abstract】: Stream classification models for non-stationary environments often assume the immediate availability of data labels. However, in a practical scenario, it is quite natural that the data labels are available only after some temporal lag. This paper explores how a stream classifier model can be made adaptive to such label latency scenario. We propose SkipE-RNN, a self-evolutionary recurrent neural network with dynamically evolving skipped-recurrent-connection for the best utilization of previously observed label information while classifying the current data. When the data label is unavailable, SkipE-RNN uses an auto-learned mapping function to find the best match from the already known data labels and updates the network parameter accordingly. Later, upon availability of true data label, if the previously mapped label is found to be incorrect, SkipE-RNN employs a regularization technique along with the parameter updating process, so as to penalize the model. In addition, SkipE-RNN has inborn power of self-adjusting the network capacity by growing/pruning hidden nodes to cope with the evolving nature of data stream. Rigorous empirical evaluations using synthetic as well as real-world datasets reveal effectiveness of SkipE-RNN in both finitely delayed and infinitely delayed data label scenarios.

【Keywords】:

455. DNNs as Layers of Cooperating Classifiers.

Paper Link】 【Pages】:3725-3732

【Authors】: Marelie H. Davel ; Marthinus W. Theunissen ; Arnold M. Pretorius ; Etienne Barnard

【Abstract】: A robust theoretical framework that can describe and predict the generalization ability of DNNs in general circumstances remains elusive. Classical attempts have produced complexity metrics that rely heavily on global measures of compactness and capacity with little investigation into the effects of sub-component collaboration. We demonstrate intriguing regularities in the activation patterns of the hidden nodes within fully-connected feedforward networks. By tracing the origin of these patterns, we show how such networks can be viewed as the combination of two information processing systems: one continuous and one discrete. We describe how these two systems arise naturally from the gradient-based optimization process, and demonstrate the classification ability of the two systems, individually and in collaboration. This perspective on DNN classification offers a novel way to think about generalization, in which different subsets of the training data are used to train distinct classifiers; those classifiers are then combined to perform the classification task, and their consistency is crucial for accurate classification.

【Keywords】:

456. Making Existing Clusterings Fairer: Algorithms, Complexity Results and Insights.

Paper Link】 【Pages】:3733-3740

【Authors】: Ian Davidson ; S. S. Ravi

【Abstract】: We explore the area of fairness in clustering from the different perspective of modifying clusterings from existing algorithms to make them fairer whilst retaining their quality. We formulate the minimal cluster modification for fairness (MCMF) problem where the input is a given partitional clustering and the goal is to minimally change it so that the clustering is still of good quality and fairer. We show using an intricate case analysis that for a single protected variable, the problem is efficiently solvable (i.e., in the class P) by proving that the constraint matrix for an integer linear programming (ILP) formulation is totally unimodular (TU). Interestingly, we show that even for a single protected variable, the addition of simple pairwise guidance (to say ensure individual level fairness) makes the MCMF problem computationally intractable (i.e., NP-hard). Experimental results on Twitter, Census and NYT data sets show that our methods can modify existing clusterings for data sets in excess of 100,000 instances within minutes on laptops and find as fair but higher quality clusterings than fair by design clustering algorithms.

【Keywords】:

457. Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning.

Paper Link】 【Pages】:3741-3748

【Authors】: Kristopher De Asis ; Alan Chan ; Silviu Pitis ; Richard S. Sutton ; Daniel Graves

【Abstract】: We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a fixed number of future time steps. To learn the value function for horizon h, these algorithms bootstrap from the value function for horizon h−1, or some shorter horizon. Because no value function bootstraps from itself, fixed-horizon methods are immune to the stability problems that plague other off-policy TD methods using function approximation (also known as “the deadly triad”). Although fixed-horizon methods require the storage of additional value functions, this gives the agent additional predictive power, while the added complexity can be substantially reduced via parallel updates, shared weights, and n-step bootstrapping. We show how to use fixed-horizon value functions to solve reinforcement learning problems competitively with methods such as Q-learning that learn conventional value functions. We also prove convergence of fixed-horizon temporal difference methods with linear and general function approximation. Taken together, our results establish fixed-horizon TD methods as a viable new way of avoiding the stability problems of the deadly triad.

【Keywords】:

458. Capsule Routing via Variational Bayes.

Paper Link】 【Pages】:3749-3756

【Authors】: Fabio De Sousa Ribeiro ; Georgios Leontidis ; Stefanos D. Kollias

【Abstract】: Capsule networks are a recently proposed type of neural network shown to outperform alternatives in challenging shape recognition tasks. In capsule networks, scalar neurons are replaced with capsule vectors or matrices, whose entries represent different properties of objects. The relationships between objects and their parts are learned via trainable viewpoint-invariant transformation matrices, and the presence of a given object is decided by the level of agreement among votes from its parts. This interaction occurs between capsule layers and is a process called routing-by-agreement. In this paper, we propose a new capsule routing algorithm derived from Variational Bayes for fitting a mixture of transforming gaussians, and show it is possible transform our capsule network into a Capsule-VAE. Our Bayesian approach addresses some of the inherent weaknesses of MLE based models such as the variance-collapse by modelling uncertainty over capsule pose parameters. We outperform the state-of-the-art on smallNORB using ≃50% fewer capsules than previously reported, achieve competitive performances on CIFAR-10, Fashion-MNIST, SVHN, and demonstrate significant improvement in MNIST to affNIST generalisation over previous works.1

【Keywords】:

459. System Identification with Time-Aware Neural Sequence Models.

Paper Link】 【Pages】:3757-3764

【Authors】: Thomas Demeester

【Abstract】: Established recurrent neural networks are well-suited to solve a wide variety of prediction tasks involving discrete sequences. However, they do not perform as well in the task of dynamical system identification, when dealing with observations from continuous variables that are unevenly sampled in time, for example due to missing observations. We show how such neural sequence models can be adapted to deal with variable step sizes in a natural way. In particular, we introduce a ‘time-aware’ and stationary extension of existing models (including the Gated Recurrent Unit) that allows them to deal with unevenly sampled system observations by adapting to the observation times, while facilitating higher-order temporal behavior. We discuss the properties and demonstrate the validity of the proposed approach, based on samples from two industrial input/output processes.

【Keywords】:

460. Reinforcing Neural Network Stability with Attractor Dynamics.

Paper Link】 【Pages】:3765-3772

【Authors】: Hanming Deng ; Yang Hua ; Tao Song ; Zhengui Xue ; Ruhui Ma ; Neil Robertson ; Haibing Guan

【Abstract】: Recent approaches interpret deep neural works (DNNs) as dynamical systems, drawing the connection between stability in forward propagation and generalization of DNNs. In this paper, we take a step further to be the first to reinforce this stability of DNNs without changing their original structure and verify the impact of the reinforced stability on the network representation from various aspects. More specifically, we reinforce stability by modeling attractor dynamics of a DNN and propose relu-max attractor network (RMAN), a light-weight module readily to be deployed on state-of-the-art ResNet-like networks. RMAN is only needed during training so as to modify a ResNet's attractor dynamics by minimizing an energy function together with the loss of the original learning task. Through intensive experiments, we show that RMAN-modified attractor dynamics bring a more structured representation space to ResNet and its variants, and more importantly improve the generalization ability of ResNet-like networks in supervised tasks due to reinforced stability.

【Keywords】:

Paper Link】 【Pages】:3773-3780

【Authors】: Aryan Deshwal ; Syrine Belakaria ; Janardhan Rao Doppa ; Alan Fern

【Abstract】: We consider the problem of optimizing expensive black-box functions over discrete spaces (e.g., sets, sequences, graphs). The key challenge is to select a sequence of combinatorial structures to evaluate, in order to identify high-performing structures as quickly as possible. Our main contribution is to introduce and evaluate a new learning-to-search framework for this problem called L2S-DISCO. The key insight is to employ search procedures guided by control knowledge at each step to select the next structure and to improve the control knowledge as new function evaluations are observed. We provide a concrete instantiation of L2S-DISCO for local search procedure and empirically evaluate it on diverse real-world benchmarks. Results show the efficacy of L2S-DISCO over state-of-the-art algorithms in solving complex optimization problems.

【Keywords】:

462. Integrating Overlapping Datasets Using Bivariate Causal Discovery.

Paper Link】 【Pages】:3781-3790

【Authors】: Anish Dhir ; Ciarán M. Lee

【Abstract】: Causal knowledge is vital for effective reasoning in science, as causal relations, unlike correlations, allow one to reason about the outcomes of interventions. Algorithms that can discover causal relations from observational data are based on the assumption that all variables have been jointly measured in a single dataset. In many cases this assumption fails. Previous approaches to overcoming this shortcoming devised algorithms that returned all joint causal structures consistent with the conditional independence information contained in each individual dataset. But, as conditional independence tests only determine causal structure up to Markov equivalence, the number of consistent joint structures returned by these approaches can be quite large. The last decade has seen the development of elegant algorithms for discovering causal relations beyond conditional independence, which can distinguish among Markov equivalent structures. In this work we adapt and extend these so-called bivariate causal discovery algorithms to the problem of learning consistent causal structures from multiple datasets with overlapping variables belonging to the same generating process, providing a sound and complete algorithm that outperforms previous approaches on synthetic and real data.

【Keywords】:

463. Improving the Robustness of Wasserstein Embedding by Adversarial PAC-Bayesian Learning.

Paper Link】 【Pages】:3791-3800

【Authors】: Daizong Ding ; Mi Zhang ; Xudong Pan ; Min Yang ; Xiangnan He

【Abstract】: Node embedding is a crucial task in graph analysis. Recently, several methods are proposed to embed a node as a distribution rather than a vector to capture more information. Although these methods achieved noticeable improvements, their extra complexity brings new challenges. For example, the learned representations of nodes could be sensitive to external noises on the graph and vulnerable to adversarial behaviors. In this paper, we first derive an upper bound on generalization error for Wasserstein embedding via the PAC-Bayesian theory. Based on this, we propose an algorithm called Adversarial PAC-Bayesian Learning (APBL) in order to minimize the generalization error bound. Furthermore, we provide a model called Regularized Adversarial Wasserstein Embedding Network (RAWEN) as an implementation of APBL. Besides our comprehensive analysis of the robustness of RAWEN, our work for the first time explores more kinds of embedded distributions. For evaluations, we conduct extensive experiments to demonstrate the effectiveness and robustness of our proposed embedding model compared with the state-of-the-art methods.

【Keywords】:

464. Gradient-Aware Model-Based Policy Search.

Paper Link】 【Pages】:3801-3808

【Authors】: Pierluca D'Oro ; Alberto Maria Metelli ; Andrea Tirinzoni ; Matteo Papini ; Marcello Restelli

【Abstract】: Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement. We leverage a weighting scheme, derived from the minimization of the error on the model-based policy gradient estimator, in order to define a suitable objective function that is optimized for learning the approximate transition model. Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware Model-based Policy Search (GAMPS), which iteratively learns a transition model and uses it, together with the collected trajectories, to compute the new policy parameters. Finally, we empirically validate GAMPS on benchmark domains analyzing and discussing its properties.

【Keywords】:

465. Fairness in Network Representation by Latent Structural Heterogeneity in Observational Data.

Paper Link】 【Pages】:3809-3816

【Authors】: Xin Du ; Yulong Pei ; Wouter Duivesteijn ; Mykola Pechenizkiy

【Abstract】: While recent advances in machine learning put many focuses on fairness of algorithmic decision making, topics about fairness of representation, especially fairness of network representation, are still underexplored. Network representation learning learns a function mapping nodes to low-dimensional vectors. Structural properties, e.g. communities and roles, are preserved in the latent embedding space. In this paper, we argue that latent structural heterogeneity in the observational data could bias the classical network representation model. The unknown heterogeneous distribution across subgroups raises new challenges for fairness in machine learning. Pre-defined groups with sensitive attributes cannot properly tackle the potential unfairness of network representation. We propose a method which can automatically discover subgroups which are unfairly treated by the network representation model. The fairness measure we propose can evaluate complex targets with multi-degree interactions. We conduct randomly controlled experiments on synthetic datasets and verify our methods on real-world datasets. Both quantitative and quantitative results show that our method is effective to recover the fairness of network representations. Our research draws insight on how structural heterogeneity across subgroups restricted by attributes would affect the fairness of network representation learning.

【Keywords】:

466. On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning.

Paper Link】 【Pages】:3817-3824

【Authors】: Aritra Dutta ; El Houcine Bergou ; Ahmed M. Abdelmoniem ; Chen-Yu Ho ; Atal Narayan Sahu ; Marco Canini ; Panos Kalnis

【Abstract】: Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy between theory and practice: while theoretical analysis of most existing compression methods assumes compression is applied to the gradients of the entire model, many practical implementations operate individually on the gradients of each layer of the model.In this paper, we prove that layer-wise compression is, in theory, better, because the convergence rate is upper bounded by that of entire-model compression for a wide range of biased and unbiased compression methods. However, despite the theoretical bound, our experimental study of six well-known methods shows that convergence, in practice, may or may not be better, depending on the actual trained model and compression ratio. Our findings suggest that it would be advantageous for deep learning frameworks to include support for both layer-wise and entire-model compression.

【Keywords】:

467. An Information-Theoretic Quantification of Discrimination with Exempt Features.

Paper Link】 【Pages】:3825-3833

【Authors】: Sanghamitra Dutta ; Praveen Venkatesh ; Piotr Mardziel ; Anupam Datta ; Pulkit Grover

【Abstract】: The needs of a business (e.g., hiring) may require the use of certain features that are critical in a way that any discrimination arising due to them should be exempted. In this work, we propose a novel information-theoretic decomposition of the total discrimination (in a counterfactual sense) into a non-exempt component, which quantifies the part of the discrimination that cannot be accounted for by the critical features, and an exempt component, which quantifies the remaining discrimination. Our decomposition enables selective removal of the non-exempt component if desired. We arrive at this decomposition through examples and counterexamples that enable us to first obtain a set of desirable properties that any measure of non-exempt discrimination should satisfy. We then demonstrate that our proposed quantification of non-exempt discrimination satisfies all of them. This decomposition leverages a body of work from information theory called Partial Information Decomposition (PID). We also obtain an impossibility result showing that no observational measure of non-exempt discrimination can satisfy all of the desired properties, which leads us to relax our goals and examine alternative observational measures that satisfy only some of these properties. We then perform a case study using one observational measure to show how one might train a model allowing for exemption of discrimination due to critical features.

【Keywords】:

468. Unsupervised Metric Learning with Synthetic Examples.

Paper Link】 【Pages】:3834-3841

【Authors】: Ujjal Kr Dutta ; Mehrtash Harandi ; C. Chandra Sekhar

【Abstract】: Distance Metric Learning (DML) involves learning an embedding that brings similar examples closer while moving away dissimilar ones. Existing DML approaches make use of class labels to generate constraints for metric learning. In this paper, we address the less-studied problem of learning a metric in an unsupervised manner. We do not make use of class labels, but use unlabeled data to generate adversarial, synthetic constraints for learning a metric inducing embedding. Being a measure of uncertainty, we minimize the entropy of a conditional probability to learn the metric. Our stochastic formulation scales well to large datasets, and performs competitive to existing metric learning methods.

【Keywords】:

469. Polynomial Matrix Completion for Missing Data Imputation and Transductive Learning.

Paper Link】 【Pages】:3842-3849

【Authors】: Jicong Fan ; Yuqian Zhang ; Madeleine Udell

【Abstract】: This paper develops new methods to recover the missing entries of a high-rank or even full-rank matrix when the intrinsic dimension of the data is low compared to the ambient dimension. Specifically, we assume that the columns of a matrix are generated by polynomials acting on a low-dimensional intrinsic variable, and wish to recover the missing entries under this assumption. We show that we can identify the complete matrix of minimum intrinsic dimension by minimizing the rank of the matrix in a high dimensional feature space. We develop a new formulation of the resulting problem using the kernel trick together with a new relaxation of the rank objective, and propose an efficient optimization method. We also show how to use our methods to complete data drawn from multiple nonlinear manifolds. Comparative studies on synthetic data, subspace clustering with missing data, motion capture data recovery, and transductive learning verify the superiority of our methods over the state-of-the-art.

【Keywords】:

470. Distributionally Robust Counterfactual Risk Minimization.

Paper Link】 【Pages】:3850-3857

【Authors】: Louis Faury ; Ugo Tanielian ; Elvis Dohmatob ; Elena Smirnova ; Flavian Vasile

【Abstract】: This manuscript introduces the idea of using Distributionally Robust Optimization (DRO) for the Counterfactual Risk Minimization (CRM) problem. Tapping into a rich existing literature, we show that DRO is a principled tool for counterfactual decision making. We also show that well-established solutions to the CRM problem like sample variance penalization schemes are special instances of a more general DRO problem. In this unifying framework, a variety of distributionally robust counterfactual risk estimators can be constructed using various probability distances and divergences as uncertainty measures. We propose the use of Kullback-Leibler divergence as an alternative way to model uncertainty in CRM and derive a new robust counterfactual objective. In our experiments, we show that this approach outperforms the state-of-the-art on four benchmark datasets, validating the relevance of using other uncertainty measures in practical applications.

【Keywords】:

471. Regularized Training and Tight Certification for Randomized Smoothed Classifier with Provable Robustness.

Paper Link】 【Pages】:3858-3865

【Authors】: Huijie Feng ; Chunpeng Wu ; Guoyang Chen ; Weifeng Zhang ; Yang Ning

【Abstract】: Recently smoothing deep neural network based classifiers via isotropic Gaussian perturbation is shown to be an effective and scalable way to provide state-of-the-art probabilistic robustness guarantee against ℓ2 norm bounded adversarial perturbations. However, how to train a good base classifier that is accurate and robust when smoothed has not been fully investigated. In this work, we derive a new regularized risk, in which the regularizer can adaptively encourage the accuracy and robustness of the smoothed counterpart when training the base classifier. It is computationally efficient and can be implemented in parallel with other empirical defense methods. We discuss how to implement it under both standard (non-adversarial) and adversarial training scheme. At the same time, we also design a new certification algorithm, which can leverage the regularization effect to provide tighter robustness lower bound that holds with high probability. Our extensive experimentation demonstrates the effectiveness of the proposed training and certification approaches on CIFAR-10 and ImageNet datasets.

【Keywords】:

472. Privacy-Preserving Gaussian Process Regression - A Modular Approach to the Application of Homomorphic Encryption.

Paper Link】 【Pages】:3866-3873

【Authors】: Peter Fenner ; Edward Pyzer-Knapp

【Abstract】: Much of machine learning relies on the use of large amounts of data to train models to make predictions. When this data comes from multiple sources, for example when evaluation of data against a machine learning model is offered as a service, there can be privacy issues and legal concerns over the sharing of data. Fully homomorphic encryption (FHE) allows data to be computed on whilst encrypted, which can provide a solution to the problem of data privacy. However, FHE is both slow and restrictive, so existing algorithms must be manipulated to make them work efficiently under the FHE paradigm. Some commonly used machine learning algorithms, such as Gaussian process regression, are poorly suited to FHE and cannot be manipulated to work both efficiently and accurately. In this paper, we show that a modular approach, which applies FHE to only the sensitive steps of a workflow that need protection, allows one party to make predictions on their data using a Gaussian process regression model built from another party's data, without either party gaining access to the other's data, in a way which is both accurate and efficient. This construction is, to our knowledge, the first example of an effectively encrypted Gaussian process.

【Keywords】:

473. Learning Triple Embeddings from Knowledge Graphs.

Paper Link】 【Pages】:3874-3881

【Authors】: Valeria Fionda ; Giuseppe Pirrò

【Abstract】: Graph embedding techniques allow to learn high-quality feature vectors from graph structures and are useful in a variety of tasks, from node classification to clustering. Existing approaches have only focused on learning feature vectors for the nodes and predicates in a knowledge graph. To the best of our knowledge, none of them has tackled the problem of directly learning triple embeddings. The approaches that are closer to this task have focused on homogeneous graphs involving only one type of edge and obtain edge embeddings by applying some operation (e.g., average) on the embeddings of the endpoint nodes. The goal of this paper is to introduce Triple2Vec, a new technique to directly embed knowledge graph triples. We leverage the idea of line graph of a graph and extend it to the context of knowledge graphs. We introduce an edge weighting mechanism for the line graph based on semantic proximity. Embeddings are finally generated by adopting the SkipGram model, where sentences are replaced with graph walks. We evaluate our approach on different real-world knowledge graphs and compared it with related work. We also show an application of triple embeddings in the context of user-item recommendations.

【Keywords】:

474. Training Decision Trees as Replacement for Convolution Layers.

Paper Link】 【Pages】:3882-3889

【Authors】: Wolfgang Fuhl ; Gjergji Kasneci ; Wolfgang Rosenstiel ; Enkelejda Kasneci

【Abstract】: We present an alternative layer to convolution layers in convolutional neural networks (CNNs). Our approach reduces the complexity of convolutions by replacing it with binary decisions. Those binary decisions are used as indexes to conditional distributions where each weight represents a leaf in a decision tree. This means that only the indices to the weights need to be determined once, thus reducing the complexity of convolutions by the depth of the output tensor. Index computation is performed by simple binary decisions that require fewer cycles compared to conventionally used multiplications. In addition, we show how convolutions can be replaced by binary decisions. These binary decisions form indices in the conditional distributions and we show how they are used to replace 2D weight matrices as well as 3D weight tensors. These new layers can be trained like convolution layers in CNNs based on the backpropagation algorithm, for which we provide a formalization. Our results on multiple publicly available data sets show that our approach performs similar to conventional neuronal networks. Beyond the formalized reduction of complexity and the improved qualitative performance, we show the runtime improvement empirically compared to convolution layers.

【Keywords】:

475. Induction of Subgoal Automata for Reinforcement Learning.

Paper Link】 【Pages】:3890-3897

【Authors】: Daniel Furelos-Blanco ; Mark Law ; Alessandra Russo ; Krysia Broda ; Anders Jonsson

【Abstract】: In this work we present ISA, a novel approach for learning and exploiting subgoals in reinforcement learning (RL). Our method relies on inducing an automaton whose transitions are subgoals expressed as propositional formulas over a set of observable events. A state-of-the-art inductive logic programming system is used to learn the automaton from observation traces perceived by the RL agent. The reinforcement learning and automaton learning processes are interleaved: a new refined automaton is learned whenever the RL agent generates a trace not recognized by the current automaton. We evaluate ISA in several gridworld problems and show that it performs similarly to a method for which automata are given in advance. We also show that the learned automata can be exploited to speed up convergence through reward shaping and transfer learning across multiple tasks. Finally, we analyze the running time and the number of traces that ISA needs to learn an automata, and the impact that the number of observable events have on the learner's performance.

【Keywords】:

476. Fast and Deep Graph Neural Networks.

Paper Link】 【Pages】:3898-3905

【Authors】: Claudio Gallicchio ; Alessio Micheli

【Abstract】: We address the efficiency issue for the construction of a deep graph neural network (GNN). The approach exploits the idea of representing each input graph as a fixed point of a dynamical system (implemented through a recurrent neural network), and leverages a deep architectural organization of the recurrent units. Efficiency is gained by many aspects, including the use of small and very sparse networks, where the weights of the recurrent units are left untrained under the stability condition introduced in this work. This can be viewed as a way to study the intrinsic power of the architecture of a deep GNN, and also to provide insights for the set-up of more complex fully-trained models. Through experimental results, we show that even without training of the recurrent connections, the architecture of small deep GNN is surprisingly able to achieve or improve the state-of-the-art performance on a significant set of tasks in the field of graphs classification.

【Keywords】:

477. On the Parameterized Complexity of Clustering Incomplete Data into Subspaces of Small Rank.

Paper Link】 【Pages】:3906-3913

【Authors】: Robert Ganian ; Iyad Kanj ; Sebastian Ordyniak ; Stefan Szeider

【Abstract】: We consider a fundamental matrix completion problem where we are given an incomplete matrix and a set of constraints modeled as a CSP instance. The goal is to complete the matrix subject to the input constraints and in such a way that the complete matrix can be clustered into few subspaces with low rank. This problem generalizes several problems in data mining and machine learning, including the problem of completing a matrix into one with minimum rank. In addition to its ubiquitous applications in machine learning, the problem has strong connections to information theory, related to binary linear codes, and variants of it have been extensively studied from that perspective. We formalize the problem mentioned above and study its classical and parameterized complexity. We draw a detailed landscape of the complexity and parameterized complexity of the problem with respect to several natural parameters that are desirably small and with respect to several well-studied CSP fragments.

【Keywords】:

478. Adaptive Convolutional ReLUs.

Paper Link】 【Pages】:3914-3921

【Authors】: Hongyang Gao ; Lei Cai ; Shuiwang Ji

【Abstract】: Rectified linear units (ReLUs) are currently the most popular activation function used in neural networks. Although ReLUs can solve the gradient vanishing problem and accelerate training convergence, it suffers from the dying ReLU problem in which some neurons are never activated if the weights are not updated properly. In this work, we propose a novel activation function, known as the adaptive convolutional ReLU (ConvReLU), that can better mimic brain neuron activation behaviors and overcome the dying ReLU problem. With our novel parameter sharing scheme, ConvReLUs can be applied to convolution layers that allow each input neuron to be activated by different trainable thresholds without involving a large number of extra parameters. We employ the zero initialization scheme in ConvReLU to encourage trainable thresholds to be close to zero. Finally, we develop a partial replacement strategy that only replaces the ReLUs in the early layers of the network. This resolves the dying ReLU problem and retains sparse representations for linear classifiers. Experimental results demonstrate that our proposed ConvReLU has consistently better performance compared to ReLU, LeakyReLU, and PReLU. In addition, the partial replacement strategy is shown to be effective not only for our ConvReLU but also for LeakyReLU and PReLU.

【Keywords】:

479. Infinity Learning: Learning Markov Chains from Aggregate Steady-State Observations.

Paper Link】 【Pages】:3922-3929

【Authors】: Jianfei Gao ; Mohamed A. Zahran ; Amit Sheoran ; Sonia Fahmy ; Bruno Ribeiro

【Abstract】: We consider the task of learning a parametric Continuous Time Markov Chain (CTMC) sequence model without examples of sequences, where the training data consists entirely of aggregate steady-state statistics. Making the problem harder, we assume that the states we wish to predict are unobserved in the training data. Specifically, given a parametric model over the transition rates of a CTMC and some known transition rates, we wish to extrapolate its steady state distribution to states that are unobserved. A technical roadblock to learn a CTMC from its steady state has been that the chain rule to compute gradients will not work over the arbitrarily long sequences necessary to reach steady state —from where the aggregate statistics are sampled. To overcome this optimization challenge, we propose ∞-SGD, a principled stochastic gradient descent method that uses randomly-stopped estimators to avoid infinite sums required by the steady state computation, while learning even when only a subset of the CTMC states can be observed. We apply ∞-SGD to a real-world testbed and synthetic experiments showcasing its accuracy, ability to extrapolate the steady state distribution to unobserved states under unobserved conditions (heavy loads, when training under light loads), and succeeding in difficult scenarios where even a tailor-made extension of existing methods fails.

【Keywords】:

480. Tensor-SVD Based Graph Learning for Multi-View Subspace Clustering.

Paper Link】 【Pages】:3930-3937

【Authors】: Quanxue Gao ; Wei Xia ; Zhizhen Wan ; De-Yan Xie ; Pu Zhang

【Abstract】: Low-rank representation based on tensor-Singular Value Decomposition (t-SVD) has achieved impressive results for multi-view subspace clustering, but it does not well deal with noise and illumination changes embedded in multi-view data. The major reason is that all the singular values have the same contribution in tensor-nuclear norm based on t-SVD, which does not make sense in the existence of noise and illumination change. To improve the robustness and clustering performance, we study the weighted tensor-nuclear norm based on t-SVD and develop an efficient algorithm to optimize the weighted tensor-nuclear norm minimization (WTNNM) problem. We further apply the WTNNM algorithm to multi-view subspace clustering by exploiting the high order correlations embedded in different views. Extensive experimental results reveal that our WTNNM method is superior to several state-of-the-art multi-view subspace clustering methods in terms of performance.

【Keywords】:

481. Cross-Modal Subspace Clustering via Deep Canonical Correlation Analysis.

Paper Link】 【Pages】:3938-3945

【Authors】: Quanxue Gao ; Huanhuan Lian ; Qianqian Wang ; Gan Sun

【Abstract】: For cross-modal subspace clustering, the key point is how to exploit the correlation information between cross-modal data. However, most hierarchical and structural correlation information among cross-modal data cannot be well exploited due to its high-dimensional non-linear property. To tackle this problem, in this paper, we propose an unsupervised framework named Cross-Modal Subspace Clustering via Deep Canonical Correlation Analysis (CMSC-DCCA), which incorporates the correlation constraint with a self-expressive layer to make full use of information among the inter-modal data and the intra-modal data. More specifically, the proposed model consists of three components: 1) deep canonical correlation analysis (Deep CCA) model; 2) self-expressive layer; 3) Deep CCA decoders. The Deep CCA model consists of convolutional encoders and correlation constraint. Convolutional encoders are used to obtain the latent representations of cross-modal data, while adding the correlation constraint for the latent representations can make full use of the information of the inter-modal data. Furthermore, self-expressive layer works on latent representations and constrain it perform self-expression properties, which makes the shared coefficient matrix could capture the hierarchical intra-modal correlations of each modality. Then Deep CCA decoders reconstruct data to ensure that the encoded features can preserve the structure of the original data. Experimental results on several real-world datasets demonstrate the proposed method outperforms the state-of-the-art methods.

【Keywords】:

482. A Multi-Channel Neural Graphical Event Model with Negative Evidence.

Paper Link】 【Pages】:3946-3953

【Authors】: Tian Gao ; Dharmashankar Subramanian ; Karthikeyan Shanmugam ; Debarun Bhattacharjya ; Nicholas Mattei

【Abstract】: Event datasets are sequences of events of various types occurring irregularly over the time-line, and they are increasingly prevalent in numerous domains. Existing work for modeling events using conditional intensities rely on either using some underlying parametric form to capture historical dependencies, or on non-parametric models that focus primarily on tasks such as prediction. We propose a non-parametric deep neural network approach in order to estimate the underlying intensity functions. We use a novel multi-channel RNN that optimally reinforces the negative evidence of no observable events with the introduction of fake event epochs within each consecutive inter-event interval. We evaluate our method against state-of-the-art baselines on model fitting tasks as gauged by log-likelihood. Through experiments on both synthetic and real-world datasets, we find that our proposed approach outperforms existing baselines on most of the datasets studied.

【Keywords】:

483. Revisiting Bilinear Pooling: A Coding Perspective.

Paper Link】 【Pages】:3954-3961

【Authors】: Zhi Gao ; Yuwei Wu ; Xiaoxun Zhang ; Jindou Dai ; Yunde Jia ; Mehrtash Harandi

【Abstract】: Bilinear pooling has achieved state-of-the-art performance on fusing features in various machine learning tasks, owning to its ability to capture complex associations between features. Despite the success, bilinear pooling suffers from redundancy and burstiness issues, mainly due to the rank-one property of the resulting representation. In this paper, we prove that bilinear pooling is indeed a similarity-based coding-pooling formulation. This establishment then enables us to devise a new feature fusion algorithm, the factorized bilinear coding (FBC) method, to overcome the drawbacks of the bilinear pooling. We show that FBC can generate compact and discriminative representations with substantially fewer parameters. Experiments on two challenging tasks, namely image classification and visual question answering, demonstrate that our method surpasses the bilinear pooling technique by a large margin.

【Keywords】:

484. Improved Algorithms for Conservative Exploration in Bandits.

Paper Link】 【Pages】:3962-3969

【Authors】: Evrard Garcelon ; Mohammad Ghavamzadeh ; Alessandro Lazaric ; Matteo Pirotta

【Abstract】: In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a well-tested and reliable baseline policy running in production (e.g., a recommender system). Nonetheless, the baseline policy is often suboptimal. In this case, it is desirable to deploy online learning algorithms (e.g., a multi-armed bandit algorithm) that interact with the system to learn a better/optimal policy under the constraint that during the learning process the performance is almost never worse than the performance of the baseline itself. In this paper, we study the conservative learning problem in the contextual linear bandit setting and introduce a novel algorithm, the Conservative Constrained LinUCB (CLUCB2). We derive regret bounds for CLUCB2 that match existing results and empirically show that it outperforms state-of-the-art conservative bandit algorithms in a number of synthetic and real-world problems. Finally, we consider a more realistic constraint where the performance is verified only at predefined checkpoints (instead of at every step) and show how this relaxed constraint favorably impacts the regret and empirical performance of CLUCB2.

【Keywords】:

485. Modeling Dialogues with Hashcode Representations: A Nonparametric Approach.

Paper Link】 【Pages】:3970-3979

【Authors】: Sahil Garg ; Irina Rish ; Guillermo A. Cecchi ; Palash Goyal ; Sarik Ghazarian ; Shuyang Gao ; Greg Ver Steeg ; Aram Galstyan

【Abstract】: We propose a novel dialogue modeling framework, the first-ever nonparametric kernel functions based approach for dialogue modeling, which learns hashcodes as text representations; unlike traditional deep learning models, it handles well relatively small datasets, while also scaling to large ones. We also derive a novel lower bound on mutual information, used as a model-selection criterion favoring representations with better alignment between the utterances of participants in a collaborative dialogue setting, as well as higher predictability of the generated responses. As demonstrated on three real-life datasets, including prominently psychotherapy sessions, the proposed approach significantly outperforms several state-of-art neural network based dialogue systems, both in terms of computational efficiency, reducing training time from days or weeks to hours, and the response quality, achieving an order of magnitude improvement over competitors in frequency of being chosen as the best model by human evaluators.

【Keywords】:

486. Reinforcement Learning with Non-Markovian Rewards.

Paper Link】 【Pages】:3980-3987

【Authors】: Maor Gaon ; Ronen I. Brafman

【Abstract】: The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is that the rewards depend on the last state and action only. Yet, many real-world rewards are non-Markovian. For example, a reward for bringing coffee only if requested earlier and not yet served, is non-Markovian if the state only records current requests and deliveries. Past work considered the problem of modeling and solving MDPs with non-Markovian rewards (NMR), but we know of no principled approaches for RL with NMR. Here, we address the problem of policy learning from experience with such rewards. We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR. We also prove that some of these variants converge to an optimal policy in the limit.

【Keywords】:

487. Diachronic Embedding for Temporal Knowledge Graph Completion.

Paper Link】 【Pages】:3988-3995

【Authors】: Rishab Goel ; Seyed Mehran Kazemi ; Marcus Brubaker ; Pascal Poupart

【Abstract】: Knowledge graphs (KGs) typically contain temporal facts indicating relationships among entities at different times. Due to their incompleteness, several approaches have been proposed to infer new facts for a KG based on the existing ones–a problem known as KG completion. KG embedding approaches have proved effective for KG completion, however, they have been developed mostly for static KGs. Developing temporal KG embedding models is an increasingly important problem. In this paper, we build novel models for temporal KG completion through equipping static models with a diachronic entity embedding function which provides the characteristics of entities at any point in time. This is in contrast to the existing temporal KG embedding approaches where only static entity features are provided. The proposed embedding function is model-agnostic and can be potentially combined with any static model. We prove that combining it with SimplE, a recent model for static KG embedding, results in a fully expressive model for temporal KG completion. Our experiments indicate the superiority of our proposal compared to existing baselines.

【Keywords】:

488. Adversarially Robust Distillation.

Paper Link】 【Pages】:3996-4003

【Authors】: Micah Goldblum ; Liam Fowl ; Soheil Feizi ; Tom Goldstein

【Abstract】: Knowledge distillation is effective for producing small, high-performance neural networks for classification, but these small networks are vulnerable to adversarial attacks. This paper studies how adversarial robustness transfers from teacher to student during knowledge distillation. We find that a large amount of robustness may be inherited by the student even when distilled on only clean images. Second, we introduce Adversarially Robust Distillation (ARD) for distilling robustness onto student networks. In addition to producing small models with high test accuracy like conventional distillation, ARD also passes the superior robustness of large networks onto the student. In our experiments, we find that ARD student models decisively outperform adversarially trained networks of identical architecture in terms of robust accuracy, surpassing state-of-the-art methods on standard robustness benchmarks. Finally, we adapt recent fast adversarial training methods to ARD for accelerated robust distillation.

【Keywords】:

489. Robust Gradient-Based Markov Subsampling.

Paper Link】 【Pages】:4004-4011

【Authors】: Tieliang Gong ; Quanhan Xi ; Chen Xu

【Abstract】: Subsampling is a widely used and effective method to deal with the challenges brought by big data. Most subsampling procedures are designed based on the importance sampling framework, where samples with high importance measures are given corresponding sampling probabilities. However, in the highly noisy case, these samples may cause an unstable estimator which could lead to a misleading result. To tackle this issue, we propose a gradient-based Markov subsampling (GMS) algorithm to achieve robust estimation. The core idea is to construct a subset which allows us to conservatively correct a crude initial estimate towards the true signal. Specifically, GMS selects samples with small gradients via a probabilistic procedure, constructing a subset that is likely to exclude noisy samples and provide a safe improvement over the initial estimate. We show that the GMS estimator is statistically consistent at a rate which matches the optimal in the minimax sense. The promising performance of GMS is supported by simulation studies and real data examples.

【Keywords】:

490. Online Metric Learning for Multi-Label Classification.

Paper Link】 【Pages】:4012-4019

【Authors】: Xiuwen Gong ; Dong Yuan ; Wei Bao

【Abstract】: Existing research into online multi-label classification, such as online sequential multi-label extreme learning machine (OSML-ELM) and stochastic gradient descent (SGD), has achieved promising performance. However, these works lack an analysis of loss function and do not consider label dependency. Accordingly, to fill the current research gap, we propose a novel online metric learning paradigm for multi-label classification. More specifically, we first project instances and labels into a lower dimension for comparison, then leverage the large margin principle to learn a metric with an efficient optimization algorithm. Moreover, we provide theoretical analysis on the upper bound of the cumulative loss for our method. Comprehensive experiments on a number of benchmark multi-label datasets validate our theoretical approach and illustrate that our proposed online metric learning (OML) algorithm outperforms state-of-the-art methods.

【Keywords】:

491. Potential Passenger Flow Prediction: A Novel Study for Urban Transportation Development.

Paper Link】 【Pages】:4020-4027

【Authors】: Yongshun Gong ; Zhibin Li ; Jian Zhang ; Wei Liu ; Jinfeng Yi

【Abstract】: Recently, practical applications for passenger flow prediction have brought many benefits to urban transportation development. With the development of urbanization, a real-world demand from transportation managers is to construct a new metro station in one city area that never planned before. Authorities are interested in the picture of the future volume of commuters before constructing a new station, and estimate how would it affect other areas. In this paper, this specific problem is termed as potential passenger flow (PPF) prediction, which is a novel and important study connected with urban computing and intelligent transportation systems. For example, an accurate PPF predictor can provide invaluable knowledge to designers, such as the advice of station scales and influences on other areas, etc. To address this problem, we propose a multi-view localized correlation learning method. The core idea of our strategy is to learn the passenger flow correlations between the target areas and their localized areas with adaptive-weight. To improve the prediction accuracy, other domain knowledge is involved via a multi-view learning process. We conduct intensive experiments to evaluate the effectiveness of our method with real-world official transportation datasets. The results demonstrate that our method can achieve excellent performance compared with other available baselines. Besides, our method can provide an effective solution to the cold-start problem in the recommender system as well, which proved by its outperformed experimental results.

【Keywords】:

492. AlignFlow: Cycle Consistent Learning from Multiple Domains via Normalizing Flows.

Paper Link】 【Pages】:4028-4035

【Authors】: Aditya Grover ; Christopher Chute ; Rui Shu ; Zhangjie Cao ; Stefano Ermon

【Abstract】: Given datasets from multiple domains, a key challenge is to efficiently exploit these data sources for modeling a target domain. Variants of this problem have been studied in many contexts, such as cross-domain translation and domain adaptation. We propose AlignFlow, a generative modeling framework that models each domain via a normalizing flow. The use of normalizing flows allows for a) flexibility in specifying learning objectives via adversarial training, maximum likelihood estimation, or a hybrid of the two methods; and b) learning and exact inference of a shared representation in the latent space of the generative model. We derive a uniform set of conditions under which AlignFlow is marginally-consistent for the different learning objectives. Furthermore, we show that AlignFlow guarantees exact cycle consistency in mapping datapoints from a source domain to target and back to the source domain. Empirically, AlignFlow outperforms relevant baselines on image-to-image translation and unsupervised domain adaptation and can be used to simultaneously interpolate across the various domains using the learned representation.

【Keywords】:

493. Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack.

Paper Link】 【Pages】:4036-4043

【Authors】: Ziwei Guan ; Kaiyi Ji ; Donald J. Bucci Jr. ; Timothy Y. Hu ; Joseph Palombo ; Michael Liston ; Yingbin Liang

【Abstract】: The multi-armed bandit formalism has been extensively studied under various attack models, in which an adversary can modify the reward revealed to the player. Previous studies focused on scenarios where the attack value either is bounded at each round or has a vanishing probability of occurrence. These models do not capture powerful adversaries that can catastrophically perturb the revealed reward. This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks. Furthermore, the attack value does not necessarily follow a statistical distribution. We propose a novel sample median-based and exploration-aided UCB algorithm (called med-E-UCB) and a median-based ϵ-greedy algorithm (called med-ϵ-greedy). Both of these algorithms are provably robust to the aforementioned attack model. More specifically we show that both algorithms achieve O(log T) pseudo-regret (i.e., the optimal regret without attacks). We also provide a high probability guarantee of O(log T) regret with respect to random rewards and random occurrence of attacks. These bounds are achieved under arbitrary and unbounded reward perturbation as long as the attack probability does not exceed a certain constant threshold. We provide multiple synthetic simulations of the proposed algorithms to verify these claims and showcase the inability of existing techniques to achieve sublinear regret. We also provide experimental results of the algorithm operating in a cognitive radio setting using multiple software-defined radios.

【Keywords】:

494. Nonlinear Mixup: Out-Of-Manifold Data Augmentation for Text Classification.

Paper Link】 【Pages】:4044-4051

【Authors】: Hongyu Guo

【Abstract】: Data augmentation with Mixup (Zhang et al. 2018) has shown to be an effective model regularizer for current art deep classification networks. It generates out-of-manifold samples through linearly interpolating inputs and their corresponding labels of random sample pairs. Despite its great successes, Mixup requires convex combination of the inputs as well as the modeling targets of a sample pair, thus significantly limits the space of its synthetic samples and consequently its regularization effect. To cope with this limitation, we propose “nonlinear Mixup”. Unlike Mixup where the input and label pairs share the same, linear, scalar mixing policy, our approach embraces nonlinear interpolation policy for both the input and label pairs, where the mixing policy for the labels is adaptively learned based on the mixed input. Experiments on benchmark sentence classification datasets indicate that our approach significantly improves upon Mixup. Our empirical studies also show that the out-of-manifold samples generated by our strategy encourage training samples in each class to form a tight representation cluster that is far from others.

【Keywords】:

495. IWE-Net: Instance Weight Network for Locating Negative Comments and its application to improve Traffic User Experience.

Paper Link】 【Pages】:4052-4059

【Authors】: Lan-Zhe Guo ; Feng Kuang ; Zhang-Xun Liu ; Yu-Feng Li ; Nan Ma ; Xiao-Hu Qie

【Abstract】: Weakly supervised learning aims at coping with scarce labeled data. Previous weakly supervised studies typically assume that there is only one kind of weak supervision in data. In many applications, however, raw data usually contains more than one kind of weak supervision at the same time. For example, in user experience enhancement from Didi, one of the largest online ride-sharing platforms, the ride comment data contains severe label noise (due to the subjective factors of passengers) and severe label distribution bias (due to the sampling bias). We call such a problem as ‘compound weakly supervised learning’. In this paper, we propose the CWSL method to address this problem based on Didi ride-sharing comment data. Specifically, an instance reweighting strategy is employed to cope with severe label noise in comment data, where the weights for harmful noisy instances are small. Robust criteria like AUC rather than accuracy and the validation performance are optimized for the correction of biased data label. Alternating optimization and stochastic gradient methods accelerate the optimization on large-scale data. Experiments on Didi ride-sharing comment data clearly validate the effectiveness. We hope this work may shed some light on applying weakly supervised learning to complex real situations.

【Keywords】:

496. AdaFilter: Adaptive Filter Fine-Tuning for Deep Transfer Learning.

Paper Link】 【Pages】:4060-4066

【Authors】: Yunhui Guo ; Yandong Li ; Liqiang Wang ; Tajana Rosing

【Abstract】: There is an increasing number of pre-trained deep neural network models. However, it is still unclear how to effectively use these models for a new task. Transfer learning, which aims to transfer knowledge from source tasks to a target task, is an effective solution to this problem. Fine-tuning is a popular transfer learning technique for deep neural networks where a few rounds of training are applied to the parameters of a pre-trained model to adapt them to a new task. Despite its popularity, in this paper we show that fine-tuning suffers from several drawbacks. We propose an adaptive fine-tuning approach, called AdaFilter, which selects only a part of the convolutional filters in the pre-trained model to optimize on a per-example basis. We use a recurrent gated network to selectively fine-tune convolutional filters based on the activations of the previous layer. We experiment with 7 public image classification datasets and the results show that AdaFilter can reduce the average classification error of the standard fine-tuning by 2.54%.

【Keywords】:

497. High Tissue Contrast MRI Synthesis Using Multi-Stage Attention-GAN for Segmentation.

Paper Link】 【Pages】:4067-4074

【Authors】: Mohammad Hamghalam ; Baiying Lei ; Tianfu Wang

【Abstract】: Magnetic resonance imaging (MRI) provides varying tissue contrast images of internal organs based on a strong magnetic field. Despite the non-invasive advantage of MRI in frequent imaging, the low contrast MR images in the target area make tissue segmentation a challenging problem. This paper demonstrates the potential benefits of image-to-image translation techniques to generate synthetic high tissue contrast (HTC) images. Notably, we adopt a new cycle generative adversarial network (CycleGAN) with an attention mechanism to increase the contrast within underlying tissues. The attention block, as well as training on HTC images, guides our model to converge on certain tissues. To increase the resolution of HTC images, we employ multi-stage architecture to focus on one particular tissue as a foreground and filter out the irrelevant background in each stage. This multi-stage structure also alleviates the common artifacts of the synthetic images by decreasing the gap between source and target domains. We show the application of our method for synthesizing HTC images on brain MR scans, including glioma tumor. We also employ HTC MR images in both the end-to-end and two-stage segmentation structure to confirm the effectiveness of these images. The experiments over three competitive segmentation baselines on BraTS 2018 dataset indicate that incorporating the synthetic HTC images in the multi-modal segmentation framework improves the average Dice scores 0.8%, 0.6%, and 0.5% on the whole tumor, tumor core, and enhancing tumor, respectively, while eliminating one real MRI sequence from the segmentation procedure.

【Keywords】:

498. Robust Federated Learning via Collaborative Machine Teaching.

Paper Link】 【Pages】:4075-4082

【Authors】: Yufei Han ; Xiangliang Zhang

【Abstract】: For federated learning systems deployed in the wild, data flaws hosted on local agents are widely witnessed. On one hand, given a large amount (e.g. over 60%) of training data are corrupted by systematic sensor noise and environmental perturbations, the performances of federated model training can be degraded significantly. On the other hand, it is prohibitively expensive for either clients or service providers to set up manual sanitary checks to verify the quality of data instances. In our study, we echo this challenge by proposing a collaborative and privacy-preserving machine teaching method. Specifically, we use a few trusted instances provided by teachers as benign examples in the teaching process. Our collaborative teaching approach seeks jointly the optimal tuning on the distributed training set, such that the model learned from the tuned training set predicts labels of the trusted items correctly. The proposed method couples the process of teaching and learning and thus produces directly a robust prediction model despite the extremely pervasive systematic data corruption. The experimental study on real benchmark data sets demonstrates the validity of our method.

【Keywords】:

499. Interpretable and Differentially Private Predictions.

Paper Link】 【Pages】:4083-4090

【Authors】: Frederik Harder ; Matthias Bauer ; Mijung Park

【Abstract】: Interpretable predictions, which clarify why a machine learning model makes a particular decision, can compromise privacy by revealing the characteristics of individual data points. This raises the central question addressed in this paper: Can models be interpretable without compromising privacy? For complex “big” data fit by correspondingly rich models, balancing privacy and explainability is particularly challenging, such that this question has remained largely unexplored. In this paper, we propose a family of simple models with the aim of approximating complex models using several locally linear maps per class to provide high classification accuracy, as well as differentially private explanations on the classification. We illustrate the usefulness of our approach on several image benchmark datasets as well as a medical dataset.

【Keywords】:

500. SNEQ: Semi-Supervised Attributed Network Embedding with Attention-Based Quantisation.

Paper Link】 【Pages】:4091-4098

【Authors】: Tao He ; Lianli Gao ; Jingkuan Song ; Xin Wang ; Kejie Huang ; Yuanfang Li

【Abstract】:

【Keywords】:

501. SNEQ: Semi-Supervised Attributed Network Embedding with Attention-Based Quantisation.

Paper Link】 【Pages】:4091-4098

【Authors】: Tao He ; Lianli Gao ; Jingkuan Song ; Xin Wang ; Kejie Huang ; Yuanfang Li

【Abstract】: Learning accurate low-dimensional embeddings for a network is a crucial task as it facilitates many network analytics tasks. Moreover, the trained embeddings often require a significant amount of space to store, making storage and processing a challenge, especially as large-scale networks become more prevalent. In this paper, we present a novel semi-supervised network embedding and compression method, SNEQ, that is competitive with state-of-art embedding methods while being far more space- and time-efficient. SNEQ incorporates a novel quantisation method based on a self-attention layer that is trained in an end-to-end fashion, which is able to dramatically compress the size of the trained embeddings, thus reduces storage footprint and accelerates retrieval speed. Our evaluation on four real-world networks of diverse characteristics shows that SNEQ outperforms a number of state-of-the-art embedding methods in link prediction, node classification and node recommendation. Moreover, the quantised embedding shows a great advantage in terms of storage and time compared with continuous embeddings as well as hashing methods.

【Keywords】:

502. Heterogeneous Transfer Learning with Weighted Instance-Correspondence Data.

Paper Link】 【Pages】:4099-4106

【Authors】: Yuwei He ; Xiaoming Jin ; Guiguang Ding ; Yuchen Guo ; Jungong Han ; Jiyong Zhang ; Sicheng Zhao

【Abstract】: Instance-correspondence (IC) data are potent resources for heterogeneous transfer learning (HeTL) due to the capability of bridging the source and the target domains at the instance-level. To this end, people tend to use machine-generated IC data, because manually establishing IC data is expensive and primitive. However, existing IC data machine generators are not perfect and always produce the data that are not of high quality, thus hampering the performance of domain adaption. In this paper, instead of improving the IC data generator, which might not be an optimal way, we accept the fact that data quality variation does exist but find a better way to use the data. Specifically, we propose a novel heterogeneous transfer learning method named Transfer Learning with Weighted Correspondence (TLWC), which utilizes IC data to adapt the source domain to the target domain. Rather than treating IC data equally, TLWC can assign solid weights to each IC data pair depending on the quality of the data. We conduct extensive experiments on HeTL datasets and the state-of-the-art results verify the effectiveness of TLWC.

【Keywords】:

503. EPOC: Efficient Perception via Optimal Communication.

Paper Link】 【Pages】:4107-4114

【Authors】: Masoumeh Heidari Kapourchali ; Bonny Banerjee

【Abstract】: We propose an agent model capable of actively and selectively communicating with other agents to predict its environmental state efficiently. Selecting whom to communicate with is a challenge when the internal model of other agents is unobservable. Our agent learns a communication policy as a mapping from its belief state to with whom to communicate in an online and unsupervised manner, without any reinforcement. Human activity recognition from multimodal, multisource and heterogeneous sensor data is used as a testbed to evaluate the proposed model where each sensor is assumed to be monitored by an agent. The recognition accuracy on benchmark datasets is comparable to the state-of-the-art even though our model uses significantly fewer parameters and infers the state in a localized manner. The learned policy reduces number of communications. The agent is tolerant to communication failures and can recognize unreliable agents through their communication messages. To the best of our knowledge, this is the first work on learning communication policies by an agent for predicting its environmental state.

【Keywords】:

504. Eigenvalue Normalized Recurrent Neural Networks for Short Term Memory.

Paper Link】 【Pages】:4115-4122

【Authors】: Kyle Helfrich ; Qiang Ye

【Abstract】: Several variants of recurrent neural networks (RNNs) with orthogonal or unitary recurrent matrices have recently been developed to mitigate the vanishing/exploding gradient problem and to model long-term dependencies of sequences. However, with the eigenvalues of the recurrent matrix on the unit circle, the recurrent state retains all input information which may unnecessarily consume model capacity. In this paper, we address this issue by proposing an architecture that expands upon an orthogonal/unitary RNN with a state that is generated by a recurrent matrix with eigenvalues in the unit disc. Any input to this state dissipates in time and is replaced with new inputs, simulating short-term memory. A gradient descent algorithm is derived for learning such a recurrent matrix. The resulting method, called the Eigenvalue Normalized RNN (ENRNN), is shown to be highly competitive in several experiments.

【Keywords】:

505. Reasoning on Knowledge Graphs with Debate Dynamics.

Paper Link】 【Pages】:4123-4131

【Authors】: Marcel Hildebrandt ; Jorge Andres Quintero Serna ; Yunpu Ma ; Martin Ringsquandl ; Mitchell Joblin ; Volker Tresp

【Abstract】: We propose a novel method for automatic reasoning on knowledge graphs based on debate dynamics. The main idea is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments – paths in the knowledge graph – with the goal to promote the fact being true (thesis) or the fact being false (antithesis), respectively. Based on these arguments, a binary classifier, called the judge, decides whether the fact is true or false. The two agents can be considered as sparse, adversarial feature generators that present interpretable evidence for either the thesis or the antithesis. In contrast to other black-box methods, the arguments allow users to get an understanding of the decision of the judge. Since the focus of this work is to create an explainable method that maintains a competitive predictive accuracy, we benchmark our method on the triple classification and link prediction task. Thereby, we find that our method outperforms several baselines on the benchmark datasets FB15k-237, WN18RR, and Hetionet. We also conduct a survey and find that the extracted arguments are informative for users.

【Keywords】:

506. An Attention-Based Graph Neural Network for Heterogeneous Structural Learning.

Paper Link】 【Pages】:4132-4139

【Authors】: Huiting Hong ; Hantao Guo ; Yucheng Lin ; Xiaoqing Yang ; Zang Li ; Jieping Ye

【Abstract】: In this paper, we focus on graph representation learning of heterogeneous information network (HIN), in which various types of vertices are connected by various types of relations. Most of the existing methods conducted on HIN revise homogeneous graph embedding models via meta-paths to learn low-dimensional vector space of HIN. In this paper, we propose a novel Heterogeneous Graph Structural Attention Neural Network (HetSANN) to directly encode structural information of HIN without meta-path and achieve more informative representations. With this method, domain experts will not be needed to design meta-path schemes and the heterogeneous information can be processed automatically by our proposed model. Specifically, we implicitly represent heterogeneous information using the following two methods: 1) we model the transformation between heterogeneous vertices through a projection in low-dimensional entity spaces; 2) afterwards, we apply the graph neural network to aggregate multi-relational information of projected neighborhood by means of attention mechanism. We also present three extensions of HetSANN, i.e., voices-sharing product attention for the pairwise relationships in HIN, cycle-consistency loss to retain the transformation between heterogeneous entity spaces, and multi-task learning with full use of information. The experiments conducted on three public datasets demonstrate that our proposed models achieve significant and consistent improvements compared to state-of-the-art solutions.

【Keywords】:

507. End-to-End Unpaired Image Denoising with Conditional Adversarial Networks.

Paper Link】 【Pages】:4140-4149

【Authors】: Zhiwei Hong ; Xiaocheng Fan ; Tao Jiang ; Jianxing Feng

【Abstract】: Image denoising is a classic low level vision problem that attempts to recover a noise-free image from a noisy observation. Recent advances in deep neural networks have outperformed traditional prior based methods for image denoising. However, the existing methods either require paired noisy and clean images for training or impose certain assumptions on the noise distribution and data types. In this paper, we present an end-to-end unpaired image denoising framework (UIDNet) that denoises images with only unpaired clean and noisy training images. The critical component of our model is a noise learning module based on a conditional Generative Adversarial Network (cGAN). The model learns the noise distribution from the input noisy images and uses it to transform the input clean images to noisy ones without any assumption on the noise distribution and data types. This process results in pairs of clean and pseudo-noisy images. Such pairs are then used to train another denoising network similar to the existing denoising methods based on paired images. The noise learning and denoising components are integrated together so that they can be trained end-to-end. Extensive experimental evaluation has been performed on both synthetic and real data including real photographs and computer tomography (CT) images. The results demonstrate that our model outperforms the previous models trained on unpaired images as well as the state-of-the-art methods based on paired training data when proper training pairs are unavailable.

【Keywords】:

508. TellTail: Fast Scoring and Detection of Dense Subgraphs.

Paper Link】 【Pages】:4150-4157

【Authors】: Bryan Hooi ; Kijung Shin ; Hemank Lamba ; Christos Faloutsos

【Abstract】: Suppose you visit an e-commerce site, and see that 50 users each reviewed almost all of the same 500 products several times each: would you get suspicious? Similarly, given a Twitter follow graph, how can we design principled measures for identifying surprisingly dense subgraphs? Dense subgraphs often indicate interesting structure, such as network attacks in network traffic graphs. However, most existing dense subgraph measures either do not model normal variation, or model it using an Erdős-Renyi assumption - but this assumption has been discredited decades ago. What is the right assumption then? We propose a novel application of extreme value theory to the dense subgraph problem, which allows us to propose measures and algorithms which evaluate the surprisingness of a subgraph probabilistically, without requiring restrictive assumptions (e.g. Erdős-Renyi). We then improve the practicality of our approach by incorporating empirical observations about dense subgraph patterns in real graphs, and by proposing a fast pruning-based search algorithm. Our approach (a) provides theoretical guarantees of consistency, (b) scales quasi-linearly, and (c) outperforms baselines in synthetic and ground truth settings.

【Keywords】:

509. Query-Driven Multi-Instance Learning.

Paper Link】 【Pages】:4158-4165

【Authors】: Yen-Chi Hsu ; Cheng-Yao Hong ; Ming-Sui Lee ; Tyng-Luh Liu

【Abstract】: We introduce a query-driven approach (qMIL) to multi-instance learning where the queries aim to uncover the class labels embodied in a given bag of instances. Specifically, it solves a multi-instance multi-label learning (MIML) problem with a more challenging setting than the conventional one. Each MIML bag in our formulation is annotated only with a binary label indicating whether the bag contains the instance of a certain class and the query is specified by the word2vec of a class label/name. To learn a deep-net model for qMIL, we construct a network component that achieves a generalized compatibility measure for query-visual co-embedding and yields proper instance attentions to the given query. The bag representation is then formed as the attention-weighted sum of the instances' weights, and passed to the classification layer at the end of the network. In addition, the qMIL formulation is flexible for extending the network to classify unseen class labels, leading to a new technique to solve the zero-shot MIML task through an iterative querying process. Experimental results on action classification over video clips and three MIML datasets from MNIST, CIFAR10 and Scene are provided to demonstrate the effectiveness of our method.

【Keywords】:

510. Towards Interpretation of Pairwise Learning.

Paper Link】 【Pages】:4166-4173

【Authors】: Mengdi Huai ; Di Wang ; Chenglin Miao ; Aidong Zhang

【Abstract】: Recently, there are increasingly more attentions paid to an important family of learning problems called pairwise learning, in which the associated loss functions depend on pairs of instances. Despite the tremendous success of pairwise learning in many real-world applications, the lack of transparency behind the learned pairwise models makes it difficult for users to understand how particular decisions are made by these models, which further impedes users from trusting the predicted results. To tackle this problem, in this paper, we study feature importance scoring as a specific approach to the problem of interpreting the predictions of black-box pairwise models. Specifically, we first propose a novel adaptive Shapley-value-based interpretation method, based on which a vector of importance scores associated with the underlying features of a testing instance pair can be adaptively calculated with the consideration of feature correlations, and these scores can be used to indicate which features make key contributions to the final prediction. Considering that Shapley-value-based methods are usually computationally challenging, we further propose a novel robust approximation interpretation method for pairwise models. This method is not only much more efficient but also robust to data noise. To the best of our knowledge, we are the first to investigate how to enable interpretation in pairwise learning. Theoretical analysis and extensive experiments demonstrate the effectiveness of the proposed methods.

【Keywords】:

511. DWM: A Decomposable Winograd Method for Convolution Acceleration.

Paper Link】 【Pages】:4174-4181

【Authors】: Di Huang ; Xishan Zhang ; Rui Zhang ; Tian Zhi ; Deyuan He ; Jiaming Guo ; Chang Liu ; Qi Guo ; Zidong Du ; Shaoli Liu ; Tianshi Chen ; Yunji Chen

【Abstract】: Winograd's minimal filtering algorithm has been widely used in Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing. However, it is only effective on convolutions with kernel size as 3x3 and stride as 1, because it suffers from significantly increased FLOPs and numerical accuracy problem for kernel size larger than 3x3 and fails on convolution with stride larger than 1. In this paper, we propose a novel Decomposable Winograd Method (DWM), which breaks through the limitation of original Winograd's minimal filtering algorithm to a wide and general convolutions. DWM decomposes kernels with large size or large stride to several small kernels with stride as 1 for further applying Winograd method, so that DWM can reduce the number of multiplications while keeping the numerical accuracy. It enables the fast exploring of larger kernel size and larger stride value in CNNs for high performance and accuracy and even the potential for new CNNs. Comparing against the original Winograd, the proposed DWM is able to support all kinds of convolutions with a speedup of ∼2, without affecting the numerical accuracy.

【Keywords】:

512. Unsupervised Nonlinear Feature Selection from High-Dimensional Signed Networks.

Paper Link】 【Pages】:4182-4189

【Authors】: Qiang Huang ; Tingyu Xia ; Huiyan Sun ; Makoto Yamada ; Yi Chang

【Abstract】: With the rapid development of social media services in recent years, relational data are explosively growing. The signed network, which consists of a mixture of positive and negative links, is an effective way to represent the friendly and hostile relations among nodes, which can represent users or items. Because the features associated with a node of a signed network are usually incomplete, noisy, unlabeled, and high-dimensional, feature selection is an important procedure to eliminate irrelevant features. However, existing network-based feature selection methods are linear methods, which means they can only select features that having the linear dependency on the output values. Moreover, in many social data, most nodes are unlabeled; therefore, selecting features in an unsupervised manner is generally preferred. To this end, in this paper, we propose a nonlinear unsupervised feature selection method for signed networks, called SignedLasso. This method can select a small number of important features with nonlinear associations between inputs and output from a high-dimensional data. More specifically, we formulate unsupervised feature selection as a nonlinear feature selection problem with the Hilbert-Schmidt Independence Criterion Lasso (HSIC Lasso), which can find a small number of features in a nonlinear manner. Then, we propose the use of a deep learning-based node embedding to represent node similarity without label information and incorporate the node embedding into the HSIC Lasso. Through experiments on two real world datasets, we show that the proposed algorithm is superior to existing linear unsupervised feature selection methods.

【Keywords】:

513. Feature Variance Regularization: A Simple Way to Improve the Generalizability of Neural Networks.

Paper Link】 【Pages】:4190-4197

【Authors】: Ranran Huang ; Hanbo Sun ; Ji Liu ; Lu Tian ; Li Wang ; Yi Shan ; Yu Wang

【Abstract】: To improve the generalization ability of neural networks, we propose a novel regularization method that regularizes the empirical risk using a penalty on the empirical variance of the features. Intuitively, our approach introduces confusion into feature extraction and prevents the models from learning features that may relate to specific training samples. According to our theoretical analysis, our method encourages models to generate closer feature distributions for the training set and unobservable true data and minimize the expected risk as well, which allows the model to adapt to new samples better. We provide a thorough empirical justification of our approach, and achieves a greater improvement than other regularization methods. The experimental results show the effectiveness of our method on multiple visual tasks, including classification (CIFAR100, ImageNet, fine-grained datasets) and semantic segmentation (Cityscapes).

【Keywords】:

514. Meta-Learning PAC-Bayes Priors in Model Averaging.

Paper Link】 【Pages】:4198-4205

【Authors】: Yimin Huang ; Weiran Huang ; Liang Li ; Zhenguo Li

【Abstract】: Nowadays model uncertainty has become one of the most important problems in both academia and industry. In this paper, we mainly consider the scenario in which we have a common model set used for model averaging instead of selecting a single final model via a model selection procedure to account for this model's uncertainty in order to improve reliability and accuracy of inferences. Here one main challenge is to learn the prior over the model set. To tackle this problem, we propose two data-based algorithms to get proper priors for model averaging. One is for meta-learner, the analysts should use historical similar tasks to extract the information about the prior. The other one is for base-learner, a subsampling method is used to deal with the data step by step. Theoretically, an upper bound of risk for our algorithm is presented to guarantee the performance of the worst situation. In practice, both methods perform well in simulations and real data studies, especially with poor quality data.

【Keywords】:

515. DIANet: Dense-and-Implicit Attention Network.

Paper Link】 【Pages】:4206-4214

【Authors】: Zhongzhan Huang ; Senwei Liang ; Mingfu Liang ; Haizhao Yang

【Abstract】: Attention networks have successfully boosted the performance in various vision problems. Previous works lay emphasis on designing a new attention module and individually plug them into the networks. Our paper proposes a novel-and-simple framework that shares an attention module throughout different network layers to encourage the integration of layer-wise information and this parameter-sharing module is referred to as Dense-and-Implicit-Attention (DIA) unit. Many choices of modules can be used in the DIA unit. Since Long Short Term Memory (LSTM) has a capacity of capturing long-distance dependency, we focus on the case when the DIA unit is the modified LSTM (called DIA-LSTM). Experiments on benchmark datasets show that the DIA-LSTM unit is capable of emphasizing layer-wise feature interrelation and leads to significant improvement of image classification accuracy. We further empirically show that the DIA-LSTM has a strong regularization ability on stabilizing the training of deep networks by the experiments with the removal of skip connections (He et al. 2016a) or Batch Normalization (Ioffe and Szegedy 2015) in the whole residual network.

【Keywords】:

516. Collaborative Graph Convolutional Networks: Unsupervised Learning Meets Semi-Supervised Learning.

Paper Link】 【Pages】:4215-4222

【Authors】: Binyuan Hui ; Pengfei Zhu ; Qinghua Hu

【Abstract】: Graph convolutional networks (GCN) have achieved promising performance in attributed graph clustering and semi-supervised node classification because it is capable of modeling complex graphical structure, and jointly learning both features and relations of nodes. Inspired by the success of unsupervised learning in the training of deep models, we wonder whether graph-based unsupervised learning can collaboratively boost the performance of semi-supervised learning. In this paper, we propose a multi-task graph learning model, called collaborative graph convolutional networks (CGCN). CGCN is composed of an attributed graph clustering network and a semi-supervised node classification network. As Gaussian mixture models can effectively discover the inherent complex data distributions, a new end to end attributed graph clustering network is designed by combining variational graph auto-encoder with Gaussian mixture models (GMM-VGAE) rather than the classic k-means. If the pseudo-label of an unlabeled sample assigned by GMM-VGAE is consistent with the prediction of the semi-supervised GCN, it is selected to further boost the performance of semi-supervised learning with the help of the pseudo-labels. Extensive experiments on benchmark graph datasets validate the superiority of our proposed GMM-VGAE compared with the state-of-the-art attributed graph clustering networks. The performance of node classification is greatly improved by our proposed CGCN, which verifies graph-based unsupervised learning can be well exploited to enhance the performance of semi-supervised learning.

【Keywords】:

517. Control Flow Graph Embedding Based on Multi-Instance Decomposition for Bug Localization.

Paper Link】 【Pages】:4223-4230

【Authors】: Xuan Huo ; Ming Li ; Zhi-Hua Zhou

【Abstract】: During software maintenance, bug report is an effective way to identify potential bugs hidden in a software system. It is a great challenge to automatically locate the potential buggy source code according to a bug report. Traditional approaches usually represent bug reports and source code from a lexical perspective to measure their similarities. Recently, some deep learning models are proposed to learn the unified features by exploiting the local and sequential nature, which overcomes the difficulty in modeling the difference between natural and programming languages. However, only considering local and sequential information from one dimension is not enough to represent the semantics, some multi-dimension information such as structural and functional nature that carries additional semantics has not been well-captured. Such information beyond the lexical and structural terms is extremely vital in modeling program functionalities and behaviors, leading to a better representation for identifying buggy source code. In this paper, we propose a novel model named CG-CNN, which is a multi-instance learning framework that enhances the unified features for bug localization by exploiting structural and sequential nature from the control flow graph. Experimental results on widely-used software projects demonstrate the effectiveness of our proposed CG-CNN model.

【Keywords】:

518. Word-Level Contextual Sentiment Analysis with Interpretability.

Paper Link】 【Pages】:4231-4238

【Authors】: Tomoki Ito ; Kota Tsubouchi ; Hiroki Sakaji ; Tatsuo Yamashita ; Kiyoshi Izumi

【Abstract】: Word-level contextual sentiment analysis (WCSA) is an important task for mining reviews or opinions. When analyzing this type of sentiment in the industry, both the interpretability and practicality are often required. However, such a WCSA method has not been established. This study aims to develop a WCSA method with interpretability and practicality. To achieve this aim, we propose a novel neural network architecture called Sentiment Interpretable Neural Network (SINN). To realize this SINN practically, we propose a novel learning strategy called Lexical Initialization Learning (LEXIL). SINN is interpretable because it can extract word-level contextual sentiment through extracting word-level original sentiment and its local and global word-level contexts. Moreover, LEXIL can develop the SINN without any specific knowledge for context; therefore, this strategy is practical. Using real textual datasets, we experimentally demonstrate that the proposed LEXIL is effective for improving the interpretability of SINN and that the SINN features both the high WCSA ability and high interpretability.

【Keywords】:

519. Semi-Supervised Learning for Maximizing the Partial AUC.

Paper Link】 【Pages】:4239-4246

【Authors】: Tomoharu Iwata ; Akinori Fujino ; Naonori Ueda

【Abstract】: The partial area under a receiver operating characteristic curve (pAUC) is a performance measurement for binary classification problems that summarizes the true positive rate with the specific range of the false positive rate. Obtaining classifiers that achieve high pAUC is important in a wide variety of applications, such as cancer screening and spam filtering. Although many methods have been proposed for maximizing the pAUC, existing methods require many labeled data for training. In this paper, we propose a semi-supervised learning method for maximizing the pAUC, which trains a classifier with a small amount of labeled data and a large amount of unlabeled data. To exploit the unlabeled data, we derive two approximations of the pAUC: the first is calculated from positive and unlabeled data, and the second is calculated from negative and unlabeled data. A classifier is trained by maximizing the weighted sum of the two approximations of the pAUC and the pAUC that is calculated from positive and negative data. With experiments using various datasets, we demonstrate that the proposed method achieves higher test pAUCs than existing methods.

【Keywords】:

520. Co-Occurrence Estimation from Aggregated Data with Auxiliary Information.

Paper Link】 【Pages】:4247-4254

【Authors】: Tomoharu Iwata ; Naoki Marumo

【Abstract】: Complete co-occurrence data are unavailable in many applications, including purchase records and medical histories, because of their high cost or privacy protection. Even with such applications, aggregated data would be available, such as the number of purchasers for each item and the number of patients with each disease. We propose a method for estimating the co-occurrence of items from aggregated data with auxiliary information. For auxiliary information, we use item features that describe the characteristics of each item. Although many methods have been proposed for estimating the co-occurrence given aggregated data, no existing method can use auxiliary information. We also use records of a small number of users. With our proposed method, we introduce latent co-occurrence variables that represent the amount of co-occurrence for each pair of items. We model a probabilistic generative process of the latent co-occurrence variables by a multinomial distribution with Dirichlet priors. The parameters of the Dirichlet priors are parameterized with neural networks that take the auxiliary information as input, where neural networks are shared across different item pairs. The shared neural networks enable us to learn unknown relationships between auxiliary information and co-occurrence using the data of multiple items. The latent co-occurrence variables and the neural network parameters are estimated by maximizing the sum of the likelihood of the latent co-occurrence variables and the likelihood of the small records. We demonstrate the effectiveness of our proposed method using user-item rating datasets.

【Keywords】:

521. Class Prior Estimation with Biased Positives and Unlabeled Examples.

Paper Link】 【Pages】:4255-4263

【Authors】: Shantanu Jain ; Justin Delano ; Himanshu Sharma ; Predrag Radivojac

【Abstract】: Positive-unlabeled learning is often studied under the assumption that the labeled positive sample is drawn randomly from the true distribution of positives. In many application domains, however, certain regions in the support of the positive class-conditional distribution are over-represented while others are under-represented in the positive sample. Although this introduces problems in all aspects of positive-unlabeled learning, we begin to address this challenge by focusing on the estimation of class priors, quantities central to the estimation of posterior probabilities and the recovery of true classification performance. We start by making a set of assumptions to model the sampling bias. We then extend the identifiability theory of class priors from the unbiased to the biased setting. Finally, we derive an algorithm for estimating the class priors that relies on clustering to decompose the original problem into subproblems of unbiased positive-unlabeled learning. Our empirical investigation suggests feasibility of the correction strategy and overall good performance.

【Keywords】:

522. Maximizing Overall Diversity for Improved Uncertainty Estimates in Deep Ensembles.

Paper Link】 【Pages】:4264-4271

【Authors】: Siddhartha Jain ; Ge Liu ; Jonas Mueller ; David Gifford

【Abstract】: The inaccuracy of neural network models on inputs that do not stem from the distribution underlying the training data is problematic and at times unrecognized. Uncertainty estimates of model predictions are often based on the variation in predictions produced by a diverse ensemble of models applied to the same input. Here we describe Maximize Overall Diversity (MOD), an approach to improve ensemble-based uncertainty estimates by encouraging larger overall diversity in ensemble predictions across all possible inputs. We apply MOD to regression tasks including 38 Protein-DNA binding datasets, 9 UCI datasets, and the IMDB-Wiki image dataset. We also explore variants that utilize adversarial training techniques and data density estimation. For out-of-distribution test examples, MOD significantly improves predictive performance and uncertainty calibration without sacrificing performance on test data drawn from same distribution as the training data. We also find that in Bayesian optimization tasks, the performance of UCB acquisition is improved via MOD uncertainty estimates.

【Keywords】:

523. Invariant Representations through Adversarial Forgetting.

Paper Link】 【Pages】:4272-4279

【Authors】: Ayush Jaiswal ; Daniel Moyer ; Greg Ver Steeg ; Wael AbdAlmageed ; Premkumar Natarajan

【Abstract】: We propose a novel approach to achieving invariance for deep neural networks in the form of inducing amnesia to unwanted factors of data through a new adversarial forgetting mechanism. We show that the forgetting mechanism serves as an information-bottleneck, which is manipulated by the adversarial training to learn invariance to unwanted factors. Empirical results show that the proposed framework achieves state-of-the-art performance at learning invariance in both nuisance and bias settings on a diverse collection of datasets and tasks.

【Keywords】:

524. Bounding Regret in Empirical Games.

Paper Link】 【Pages】:4280-4287

【Authors】: Steven Jecmen ; Arunesh Sinha ; Zun Li ; Long Tran-Thanh

【Abstract】: Empirical game-theoretic analysis refers to a set of models and techniques for solving large-scale games. However, there is a lack of a quantitative guarantee about the quality of output approximate Nash equilibria (NE). A natural quantitative guarantee for such an approximate NE is the regret in the game (i.e. the best deviation gain). We formulate this deviation gain computation as a multi-armed bandit problem, with a new optimization goal unlike those studied in prior work. We propose an efficient algorithm Super-Arm UCB (SAUCB) for the problem and a number of variants. We present sample complexity results as well as extensive experiments that show the better performance of SAUCB compared to several baselines.

【Keywords】:

525. An Efficient Explorative Sampling Considering the Generative Boundaries of Deep Generative Neural Networks.

Paper Link】 【Pages】:4288-4295

【Authors】: Giyoung Jeon ; Haedong Jeong ; Jaesik Choi

【Abstract】: Deep generative neural networks (DGNNs) have achieved realistic and high-quality data generation. In particular, the adversarial training scheme has been applied to many DGNNs and has exhibited powerful performance. Despite of recent advances in generative networks, identifying the image generation mechanism still remains challenging. In this paper, we present an explorative sampling algorithm to analyze generation mechanism of DGNNs. Our method efficiently obtains samples with identical attributes from a query image in a perspective of the trained model. We define generative boundaries which determine the activation of nodes in the internal layer and probe inside the model with this information. To handle a large number of boundaries, we obtain the essential set of boundaries using optimization. By gathering samples within the region surrounded by generative boundaries, we can empirically reveal the characteristics of the internal layers of DGNNs. We also demonstrate that our algorithm can find more homogeneous, the model specific samples compared to the variations of ϵ-based sampling method.

【Keywords】:

526. DefogGAN: Predicting Hidden Information in the StarCraft Fog of War with Generative Adversarial Nets.

Paper Link】 【Pages】:4296-4303

【Authors】: Yonghyun Jeong ; Hyunjin Choi ; Byoungjip Kim ; Youngjune Gwon

【Abstract】: We propose DefogGAN, a generative approach to the problem of inferring state information hidden in the fog of war for real-time strategy (RTS) games. Given a partially observed state, DefogGAN generates defogged images of a game as predictive information. Such information can lead to create a strategic agent for the game. DefogGAN is a conditional GAN variant featuring pyramidal reconstruction loss to optimize on multiple feature resolution scales. We have validated DefogGAN empirically using a large dataset of professional StarCraft replays. Our results indicate that DefogGAN can predict the enemy buildings and combat units as accurately as professional players do and achieves a superior performance among state-of-the-art defoggers.

【Keywords】:

527. Sequential Recommendation with Relation-Aware Kernelized Self-Attention.

Paper Link】 【Pages】:4304-4311

【Authors】: Mingi Ji ; Weonyoung Joo ; Kyungwoo Song ; Yoon-Yeong Kim ; Il-Chul Moon

【Abstract】: Recent studies identified that sequential Recommendation is improved by the attention mechanism. By following this development, we propose Relation-Aware Kernelized Self-Attention (RKSA) adopting a self-attention mechanism of the Transformer with augmentation of a probabilistic model. The original self-attention of Transformer is a deterministic measure without relation-awareness. Therefore, we introduce a latent space to the self-attention, and the latent space models the recommendation context from relation as a multivariate skew-normal distribution with a kernelized covariance matrix from co-occurrences, item characteristics, and user information. This work merges the self-attention of the Transformer and the sequential recommendation by adding a probabilistic model of the recommendation task specifics. We experimented RKSA over the benchmark datasets, and RKSA shows significant improvements compared to the recent baseline models. Also, RKSA were able to produce a latent space model that answers the reasons for recommendation.

【Keywords】:

528. Maximum Margin Multi-Dimensional Classification.

Paper Link】 【Pages】:4312-4319

【Authors】: Bin-Bin Jia ; Min-Ling Zhang

【Abstract】: Multi-dimensional classification (MDC) assumes heterogenous class spaces for each example, where class variables from different class spaces characterize semantics of the example along different dimensions. Due to the heterogeneity of class spaces, the major difficulty in designing margin-based MDC techniques lies in that the modeling outputs from different class spaces are not comparable to each other. In this paper, a first attempt towards maximum margin multi-dimensional classification is investigated. Following the one-vs-one decomposition within each class space, the resulting models are optimized by leveraging classification margin maximization on individual class variable and model relationship regularization across class variables. We derive convex formulation for the maximum margin MDC problem, which can be tackled with alternating optimization admitting QP or closed-form solution in either alternating step. Experimental studies over real-world MDC data sets clearly validate effectiveness of the proposed maximum margin MDC techniques.

【Keywords】:

529. Representation Learning with Multiple Lipschitz-Constrained Alignments on Partially-Labeled Cross-Domain Data.

Paper Link】 【Pages】:4320-4327

【Authors】: Songlei Jian ; Liang Hu ; Longbing Cao ; Kai Lu

【Abstract】: The cross-domain representation learning plays an important role in tasks including domain adaptation and transfer learning. However, existing cross-domain representation learning focuses on building one shared space and ignores the unlabeled data in the source domain, which cannot effectively capture the distribution and structure heterogeneities in cross-domain data. To address this challenge, we propose a new cross-domain representation learning approach: MUltiple Lipschitz-constrained AligNments (MULAN) on partially-labeled cross-domain data. MULAN produces two representation spaces: a common representation space to incorporate knowledge from the source domain and a complementary representation space to complement the common representation with target local topological information by Lipschitz-constrained representation transformation. MULAN utilizes both unlabeled and labeled data in the source and target domains to address distribution heterogeneity by Lipschitz-constrained adversarial distribution alignment and structure heterogeneity by cluster assumption-based class alignment while keeping the target local topological information in complementary representation by self alignment. Moreover, MULAN is effectively equipped with a customized learning process and an iterative parameter updating process. MULAN shows its superior performance on partially-labeled semi-supervised domain adaptation and few-shot domain adaptation and outperforms the state-of-the-art visual domain adaptation models by up to 12.1%.

【Keywords】:

530. Algorithmic Improvements for Deep Reinforcement Learning Applied to Interactive Fiction.

Paper Link】 【Pages】:4328-4336

【Authors】: Vishal Jain ; William Fedus ; Hugo Larochelle ; Doina Precup ; Marc G. Bellemare

【Abstract】: Text-based games are a natural challenge domain for deep reinforcement learning algorithms. Their state and action spaces are combinatorially large, their reward function is sparse, and they are partially observable: the agent is informed of the consequences of its actions through textual feedback. In this paper we emphasize this latter point and consider the design of a deep reinforcement learning agent that can play from feedback alone. Our design recognizes and takes advantage of the structural characteristics of text-based games. We first propose a contextualisation mechanism, based on accumulated reward, which simplifies the learning problem and mitigates partial observability. We then study different methods that rely on the notion that most actions are ineffectual in any given situation, following Zahavy et al.'s idea of an admissible action. We evaluate these techniques in a series of text-based games of increasing difficulty based on the TextWorld framework, as well as the iconic game Zork. Empirically, we find that these techniques improve the performance of a baseline deep reinforcement learning agent applied to text-based games.

【Keywords】:

531. Generative Exploration and Exploitation.

Paper Link】 【Pages】:4337-4344

【Authors】: Jiechuan Jiang ; Zongqing Lu

【Abstract】: Sparse reward is one of the biggest challenges in reinforcement learning (RL). In this paper, we propose a novel method called Generative Exploration and Exploitation (GENE) to overcome sparse reward. GENE automatically generates start states to encourage the agent to explore the environment and to exploit received reward signals. GENE can adaptively tradeoff between exploration and exploitation according to the varying distributions of states experienced by the agent as the learning progresses. GENE relies on no prior knowledge about the environment and can be combined with any RL algorithm, no matter on-policy or off-policy, single-agent or multi-agent. Empirically, we demonstrate that GENE significantly outperforms existing methods in three tasks with only binary rewards, including Maze, Maze Ant, and Cooperative Navigation. Ablation studies verify the emergence of progressive exploration and automatic reversing.

【Keywords】:

532. Long Short-Term Sample Distillation.

Paper Link】 【Pages】:4345-4352

【Authors】: Liang Jiang ; Zujie Wen ; Zhongping Liang ; Yafang Wang ; Gerard de Melo ; Zhe Li ; Liangzhuang Ma ; Jiaxing Zhang ; Xiaolong Li ; Yuan Qi

【Abstract】: In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher–student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short-Term Sample Distillation, a novel training policy that simultaneously leverages multiple phases of the previous training process to guide the later training updates to a neural network, while efficiently proceeding in just one single generation pass. With Long Short-Term Sample Distillation, the supervision signal for each sample is decomposed into two parts: a long-term signal and a short-term one. The long-term teacher draws on snapshots from several epochs ago in order to provide steadfast guidance and to guarantee teacher–student differences, while the short-term one yields more up-to-date cues with the goal of enabling higher-quality updates. Moreover, the teachers for each sample are unique, such that, overall, the model learns from a very diverse set of teachers. Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method.

【Keywords】:

533. Rank Aggregation via Heterogeneous Thurstone Preference Models.

Paper Link】 【Pages】:4353-4360

【Authors】: Tao Jin ; Pan Xu ; Quanquan Gu ; Farzad Farnoud

【Abstract】: We propose the Heterogeneous Thurstone Model (HTM) for aggregating ranked data, which can take the accuracy levels of different users into account. By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's original framework, and as such, also extends the Bradley-Terry-Luce (BTL) model for pairwise comparisons to heterogeneous populations of users. Under this framework, we also propose a rank aggregation algorithm based on alternating gradient descent to estimate the underlying item scores and accuracy levels of different users simultaneously from noisy pairwise comparisons. We theoretically prove that the proposed algorithm converges linearly up to a statistical error which matches that of the state-of-the-art method for the single-user BTL model. We evaluate the proposed HTM model and algorithm on both synthetic and real data, demonstrating that it outperforms existing methods.

【Keywords】:

534. GraLSP: Graph Neural Networks with Local Structural Patterns.

Paper Link】 【Pages】:4361-4368

【Authors】: Yilun Jin ; Guojie Song ; Chuan Shi

【Abstract】: It is not until recently that graph neural networks (GNNs) are adopted to perform graph representation learning, among which, those based on the aggregation of features within the neighborhood of a node achieved great success. However, despite such achievements, GNNs illustrate defects in identifying some common structural patterns which, unfortunately, play significant roles in various network phenomena. In this paper, we propose GraLSP, a GNN framework which explicitly incorporates local structural patterns into the neighborhood aggregation through random anonymous walks. Specifically, we capture local graph structures via random anonymous walks, powerful and flexible tools that represent structural patterns. The walks are then fed into the feature aggregation, where we design various mechanisms to address the impact of structural features, including adaptive receptive radius, attention and amplification. In addition, we design objectives that capture similarities between structures and are optimized jointly with node proximity objectives. With the adequate leverage of structural patterns, our model is able to outperform competitive counterparts in various prediction tasks in multiple datasets.

【Keywords】:

535. Dynamic Instance Normalization for Arbitrary Style Transfer.

Paper Link】 【Pages】:4369-4376

【Authors】: Yongcheng Jing ; Xiao Liu ; Yukang Ding ; Xinchao Wang ; Errui Ding ; Mingli Song ; Shilei Wen

【Abstract】: Prior normalization methods rely on affine transformations to produce arbitrary image style transfers, of which the parameters are computed in a pre-defined way. Such manually-defined nature eventually results in the high-cost and shared encoders for both style and content encoding, making style transfer systems cumbersome to be deployed in resource-constrained environments like on the mobile-terminal side. In this paper, we propose a new and generalized normalization module, termed as Dynamic Instance Normalization (DIN), that allows for flexible and more efficient arbitrary style transfers. Comprising an instance normalization and a dynamic convolution, DIN encodes a style image into learnable convolution parameters, upon which the content image is stylized. Unlike conventional methods that use shared complex encoders to encode content and style, the proposed DIN introduces a sophisticated style encoder, yet comes with a compact and lightweight content encoder for fast inference. Experimental results demonstrate that the proposed approach yields very encouraging results on challenging style patterns and, to our best knowledge, for the first time enables an arbitrary style transfer using MobileNet-based lightweight architecture, leading to a reduction factor of more than twenty in computational cost as compared to existing approaches. Furthermore, the proposed DIN provides flexible support for state-of-the-art convolutional operations, and thus triggers novel functionalities, such as uniform-stroke placement for non-natural images and automatic spatial-stroke control.

【Keywords】:

536. InvNet: Encoding Geometric and Statistical Invariances in Deep Generative Models.

Paper Link】 【Pages】:4377-4384

【Authors】: Ameya Joshi ; Minsu Cho ; Viraj Shah ; Balaji Sesha Sarath Pokuri ; Soumik Sarkar ; Baskar Ganapathysubramanian ; Chinmay Hegde

【Abstract】: Generative Adversarial Networks (GANs), while widely successful in modeling complex data distributions, have not yet been sufficiently leveraged in scientific computing and design. Reasons for this include the lack of flexibility of GANs to represent discrete-valued image data, as well as the lack of control over physical properties of generated samples. We propose a new conditional generative modeling approach (InvNet) that efficiently enables modeling discrete-valued images, while allowing control over their parameterized geometric and statistical properties. We evaluate our approach on several synthetic and real world problems: navigating manifolds of geometric shapes with desired sizes; generation of binary two-phase materials; and the (challenging) problem of generating multi-orientation polycrystalline microstructures.

【Keywords】:

537. More Accurate Learning of k-DNF Reference Classes.

Paper Link】 【Pages】:4385-4393

【Authors】: Brendan Juba ; Hengxuan Li

【Abstract】: In machine learning, predictors trained on a given data distribution are usually guaranteed to perform well for further examples from the same distribution on average. This often may involve disregarding or diminishing the predictive power on atypical examples; or, in more extreme cases, a data distribution may be composed of a mixture of individually “atypical” heterogeneous populations, and the kind of simple predictors we can train may find it difficult to fit all of these populations simultaneously. In such cases, we may wish to make predictions for an atypical point by selecting a suitable reference class for that point: a subset of the data that is “more similar” to the given query point in an appropriate sense. Closely related tasks also arise in applications such as diagnosis or explaining the output of classifiers. We present new algorithms for computing k-DNF reference classes and establish much stronger approximation guarantees for their error rates.

【Keywords】:

538. Absum: Simple Regularization Method for Reducing Structural Sensitivity of Convolutional Neural Networks.

Paper Link】 【Pages】:4394-4403

【Authors】: Sekitoshi Kanai ; Yasutoshi Ida ; Yasuhiro Fujiwara ; Masanori Yamada ; Shuichi Adachi

【Abstract】: We propose Absum, which is a regularization method for improving adversarial robustness of convolutional neural networks (CNNs). Although CNNs can accurately recognize images, recent studies have shown that the convolution operations in CNNs commonly have structural sensitivity to specific noise composed of Fourier basis functions. By exploiting this sensitivity, they proposed a simple black-box adversarial attack: Single Fourier attack. To reduce structural sensitivity, we can use regularization of convolution filter weights since the sensitivity of linear transform can be assessed by the norm of the weights. However, standard regularization methods can prevent minimization of the loss function because they impose a tight constraint for obtaining high robustness. To solve this problem, Absum imposes a loose constraint; it penalizes the absolute values of the summation of the parameters in the convolution layers. Absum can improve robustness against single Fourier attack while being as simple and efficient as standard regularization methods (e.g., weight decay and L1 regularization). Our experiments demonstrate that Absum improves robustness against single Fourier attack more than standard regularization methods. Furthermore, we reveal that robust CNNs with Absum are more robust against transferred attacks due to decreasing the common sensitivity and against high-frequency noise than standard regularization methods. We also reveal that Absum can improve robustness against gradient-based attacks (projected gradient descent) when used with adversarial training.

【Keywords】:

539. Towards Oracle Knowledge Distillation with Neural Architecture Search.

Paper Link】 【Pages】:4404-4411

【Authors】: Minsoo Kang ; Jonghwan Mun ; Bohyung Han

【Abstract】: We present a novel framework of knowledge distillation that is capable of learning powerful and efficient student models from ensemble teacher networks. Our approach addresses the inherent model capacity issue between teacher and student and aims to maximize benefit from teacher models during distillation by reducing their capacity gap. Specifically, we employ a neural architecture search technique to augment useful structures and operations, where the searched network is appropriate for knowledge distillation towards student models and free from sacrificing its performance by fixing the network capacity. We also introduce an oracle knowledge distillation loss to facilitate model search and distillation using an ensemble-based teacher model, where a student network is learned to imitate oracle performance of the teacher. We perform extensive experiments on the image classification datasets—CIFAR-100 and TinyImageNet—using various networks. We also show that searching for a new student model is effective in both accuracy and memory size and that the searched models often outperform their teacher models thanks to neural architecture search with oracle knowledge distillation.

【Keywords】:

540. Large-Scale Multi-View Subspace Clustering in Linear Time.

Paper Link】 【Pages】:4412-4419

【Authors】: Zhao Kang ; Wangtao Zhou ; Zhitong Zhao ; Junming Shao ; Meng Han ; Zenglin Xu

【Abstract】: A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typically have a quadratic or even cubic complexity, are inefficient and inherently difficult to apply at large scales. In the era of big data, the computational issue becomes critical. To fill this gap, we propose a large-scale MVSC (LMVSC) algorithm with linear order complexity. Inspired by the idea of anchor graph, we first learn a smaller graph for each view. Then, a novel approach is designed to integrate those graphs so that we can implement spectral clustering on a smaller graph. Interestingly, it turns out that our model also applies to single-view scenario. Extensive experiments on various large-scale benchmark data sets validate the effectiveness and efficiency of our approach with respect to state-of-the-art clustering methods.

【Keywords】:

541. Nonlinear System Identification via Tensor Completion.

Paper Link】 【Pages】:4420-4427

【Authors】: Nikos Kargas ; Nicholas D. Sidiropoulos

【Abstract】: Function approximation from input and output data pairs constitutes a fundamental problem in supervised learning. Deep neural networks are currently the most popular method for learning to mimic the input-output relationship of a general nonlinear system, as they have proven to be very effective in approximating complex highly nonlinear functions. In this work, we show that identifying a general nonlinear function y = ƒ(x1,…,xN) from input-output examples can be formulated as a tensor completion problem and under certain conditions provably correct nonlinear system identification is possible. Specifically, we model the interactions between the N input variables and the scalar output of a system by a single N-way tensor, and setup a weighted low-rank tensor completion problem with smoothness regularization which we tackle using a block coordinate descent algorithm. We extend our method to the multi-output setting and the case of partially observed data, which cannot be readily handled by neural networks. Finally, we demonstrate the effectiveness of the approach using several regression tasks including some standard benchmarks and a challenging student grade prediction task.

【Keywords】:

542. Gradient Boosts the Approximate Vanishing Ideal.

Paper Link】 【Pages】:4428-4435

【Authors】: Hiroshi Kera ; Yoshihiko Hasegawa

【Abstract】: In the last decade, the approximate vanishing ideal and its basis construction algorithms have been extensively studied in computer algebra and machine learning as a general model to reconstruct the algebraic variety on which noisy data approximately lie. In particular, the basis construction algorithms developed in machine learning are widely used in applications across many fields because of their monomial-order-free property; however, they lose many of the theoretical properties of computer-algebraic algorithms. In this paper, we propose general methods that equip monomial-order-free algorithms with several advantageous theoretical properties. Specifically, we exploit the gradient to (i) sidestep the spurious vanishing problem in polynomial time to remove symbolically trivial redundant bases, (ii) achieve consistent output with respect to the translation and scaling of input, and (iii) remove nontrivially redundant bases. The proposed methods work in a fully numerical manner, whereas existing algorithms require the awkward monomial order or exponentially costly (and mostly symbolic) computation to realize properties (i) and (iii). To our knowledge, property (ii) has not been achieved by any existing basis construction algorithm of the approximate vanishing ideal.

【Keywords】:

543. Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy.

Paper Link】 【Pages】:4436-4443

【Authors】: Ramtin Keramati ; Christoph Dann ; Alex Tamkin ; Emma Brunskill

【Abstract】: While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CVaR. In this paper, we present the first algorithm for sample-efficient learning of CVaR-optimal policies in Markov decision processes based on the optimism in the face of uncertainty principle. This method relies on a novel optimistic version of the distributional Bellman operator that moves probability mass from the lower to the upper tail of the return distribution. We prove asymptotic convergence and optimism of this operator for the tabular policy evaluation case. We further demonstrate that our algorithm finds CVaR-optimal policies substantially faster than existing baselines in several simulated environments with discrete and continuous state spaces.

【Keywords】:

544. Options of Interest: Temporal Abstraction with Interest Functions.

Paper Link】 【Pages】:4444-4451

【Authors】: Khimya Khetarpal ; Martin Klissarov ; Maxime Chevalier-Boisvert ; Pierre-Luc Bacon ; Doina Precup

【Abstract】: Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments.

【Keywords】:

545. Plug-in, Trainable Gate for Streamlining Arbitrary Neural Networks.

Paper Link】 【Pages】:4452-4459

【Authors】: Jaedeok Kim ; Chiyoun Park ; Hyun-Joo Jung ; Yoonsuck Choe

【Abstract】: Architecture optimization, which is a technique for finding an efficient neural network that meets certain requirements, generally reduces to a set of multiple-choice selection problems among alternative sub-structures or parameters. The discrete nature of the selection problem, however, makes this optimization difficult. To tackle this problem we introduce a novel concept of a trainable gate function. The trainable gate function, which confers a differentiable property to discrete-valued variables, allows us to directly optimize loss functions that include non-differentiable discrete values such as 0-1 selection. The proposed trainable gate can be applied to pruning. Pruning can be carried out simply by appending the proposed trainable gate functions to each intermediate output tensor followed by fine-tuning the overall model, using any gradient-based training methods. So the proposed method can jointly optimize the selection of the pruned channels while fine-tuning the weights of the pruned model at the same time. Our experimental results demonstrate that the proposed method efficiently optimizes arbitrary neural networks in various tasks such as image classification, style transfer, optical flow estimation, and neural machine translation.

【Keywords】:

546. A Unified Framework for Knowledge Intensive Gradient Boosting: Leveraging Human Experts for Noisy Sparse Domains.

Paper Link】 【Pages】:4460-4468

【Authors】: Harsha Kokel ; Phillip Odom ; Shuo Yang ; Sriraam Natarajan

【Abstract】: Incorporating richer human inputs including qualitative constraints such as monotonic and synergistic influences has long been adapted inside AI. Inspired by this, we consider the problem of using such influence statements in the successful gradient-boosting framework. We develop a unified framework for both classification and regression settings that can both effectively and efficiently incorporate such constraints to accelerate learning to a better model. Our results in a large number of standard domains and two particularly novel real-world domains demonstrate the superiority of using domain knowledge rather than treating the human as a mere labeler.

【Keywords】:

547. Learning Student Networks with Few Data.

Paper Link】 【Pages】:4469-4476

【Authors】: Shumin Kong ; Tianyu Guo ; Shan You ; Chang Xu

【Abstract】: Recently, the teacher-student learning paradigm has drawn much attention in compressing neural networks on low-end edge devices, such as mobile phones and wearable watches. Current algorithms mainly assume the complete dataset for the teacher network is also available for the training of the student network. However, for real-world scenarios, users may only have access to part of training examples due to commercial profits or data privacy, and severe over-fitting issues would happen as a result. In this paper, we tackle the challenge of learning student networks with few data by investigating the ground-truth data-generating distribution underlying these few data. Taking Wasserstein distance as the measurement, we assume this ideal data distribution lies in a neighborhood of the discrete empirical distribution induced by the training examples. Thus we propose to safely optimize the worst-case cost within this neighborhood to boost the generalization. Furthermore, with theoretical analysis, we derive a novel and easy-to-implement loss for training the student network in an end-to-end fashion. Experimental results on benchmark datasets validate the effectiveness of our proposed method.

【Keywords】:

548. Specifying Weight Priors in Bayesian Deep Neural Networks with Empirical Bayes.

Paper Link】 【Pages】:4477-4484

【Authors】: Ranganath Krishnan ; Mahesh Subedar ; Omesh Tickoo

【Abstract】: Stochastic variational inference for Bayesian deep neural network (DNN) requires specifying priors and approximate posterior distributions over neural network weights. Specifying meaningful weight priors is a challenging problem, particularly for scaling variational inference to deeper architectures involving high dimensional weight space. We propose MOdel Priors with Empirical Bayes using DNN (MOPED) method to choose informed weight priors in Bayesian neural networks. We formulate a two-stage hierarchical modeling, first find the maximum likelihood estimates of weights with DNN, and then set the weight priors using empirical Bayes approach to infer the posterior with variational inference. We empirically evaluate the proposed approach on real-world tasks including image classification, video activity recognition and audio classification with varying complex neural network architectures. We also evaluate our proposed approach on diabetic retinopathy diagnosis task and benchmark with the state-of-the-art Bayesian deep learning techniques. We demonstrate MOPED method enables scalable variational inference and provides reliable uncertainty quantification.

【Keywords】:

549. Stable Prediction with Model Misspecification and Agnostic Distribution Shift.

Paper Link】 【Pages】:4485-4492

【Authors】: Kun Kuang ; Ruoxuan Xiong ; Peng Cui ; Susan Athey ; Bo Li

【Abstract】: For many machine learning algorithms, two main assumptions are required to guarantee performance. One is that the test data are drawn from the same distribution as the training data, and the other is that the model is correctly specified. In real applications, however, we often have little prior knowledge on the test data and on the underlying true model. Under model misspecification, agnostic distribution shift between training and test data leads to inaccuracy of parameter estimation and instability of prediction across unknown test data. To address these problems, we propose a novel Decorrelated Weighting Regression (DWR) algorithm which jointly optimizes a variable decorrelation regularizer and a weighted regression model. The variable decorrelation regularizer estimates a weight for each sample such that variables are decorrelated on the weighted training data. Then, these weights are used in the weighted regression to improve the accuracy of estimation on the effect of each variable, thus help to improve the stability of prediction across unknown test data. Extensive experiments clearly demonstrate that our DWR algorithm can significantly improve the accuracy of parameter estimation and stability of prediction with model misspecification and agnostic distribution shift.

【Keywords】:

550. Learning MAX-SAT from Contextual Examples for Combinatorial Optimisation.

Paper Link】 【Pages】:4493-4500

【Authors】: Mohit Kumar ; Samuel Kolb ; Stefano Teso ; Luc De Raedt

【Abstract】: Combinatorial optimization problems are ubiquitous in artificial intelligence. Designing the underlying models, however, requires substantial expertise, which is a limiting factor in practice. The models typically consist of hard and soft constraints, or combine hard constraints with a preference function. We introduce a novel setting for learning combinatorial optimisation problems from contextual examples. These positive and negative examples show – in a particular context – whether the solutions are good enough or not. We develop our framework using the MAX-SAT formalism. We provide learnability results within the realizable and agnostic settings, as well as hassle, an implementation based on syntax-guided synthesis and showcase its promise on recovering synthetic and benchmark instances from examples.

【Keywords】:

551. Google Research Football: A Novel Reinforcement Learning Environment.

Paper Link】 【Pages】:4501-4510

【Authors】: Karol Kurach ; Anton Raichuk ; Piotr Stanczyk ; Michal Zajac ; Olivier Bachem ; Lasse Espeholt ; Carlos Riquelme ; Damien Vincent ; Marcin Michalski ; Olivier Bousquet ; Sylvain Gelly

【Abstract】: Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner. We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator. The resulting environment is challenging, easy to use and customize, and it is available under a permissive open-source license. In addition, it provides support for multiplayer and multi-agent experiments. We propose three full-game scenarios of varying difficulty with the Football Benchmarks and report baseline results for three commonly used reinforcement algorithms (IMPALA, PPO, and Ape-X DQN). We also provide a diverse set of simpler scenarios with the Football Academy and showcase several promising research directions.

【Keywords】:

552. Correcting Predictions for Approximate Bayesian Inference.

Paper Link】 【Pages】:4511-4518

【Authors】: Tomasz Kusmierczyk ; Joseph Sakaya ; Arto Klami

【Abstract】: Bayesian models quantify uncertainty and facilitate optimal decision-making in downstream applications. For most models, however, practitioners are forced to use approximate inference techniques that lead to sub-optimal decisions due to incorrect posterior predictive distributions. We present a novel approach that corrects for inaccuracies in posterior inference by altering the decision-making process. We train a separate model to make optimal decisions under the approximate posterior, combining interpretable Bayesian modeling with optimization of direct predictive accuracy in a principled fashion. The solution is generally applicable as a plug-in module for predictive decision-making for arbitrary probabilistic programs, irrespective of the posterior inference strategy. We demonstrate the approach empirically in several problems, confirming its potential.

【Keywords】:

553. Improved Subsampled Randomized Hadamard Transform for Linear SVM.

Paper Link】 【Pages】:4519-4526

【Authors】: Zijian Lei ; Liang Lan

【Abstract】: Subsampled Randomized Hadamard Transform (SRHT), a popular random projection method that can efficiently project a d-dimensional data into r-dimensional space (r ≪ d) in O(dlog(d)) time, has been widely used to address the challenge of high-dimensionality in machine learning. SRHT works by rotating the input data matrix X ∈ ℝn × d by Randomized Walsh-Hadamard Transform followed with a subsequent uniform column sampling on the rotated matrix. Despite the advantages of SRHT, one limitation of SRHT is that it generates the new low-dimensional embedding without considering any specific properties of a given dataset. Therefore, this data-independent random projection method may result in inferior and unstable performance when used for a particular machine learning task, e.g., classification. To overcome this limitation, we analyze the effect of using SRHT for random projection in the context of linear SVM classification. Based on our analysis, we propose importance sampling and deterministic top-r sampling to produce effective low-dimensional embedding instead of uniform sampling SRHT. In addition, we also proposed a new supervised non-uniform sampling method. Our experimental results have demonstrated that our proposed methods can achieve higher classification accuracies than SRHT and other random projection methods on six real-life datasets.

【Keywords】:

554. A Simple and Efficient Tensor Calculus.

Paper Link】 【Pages】:4527-4534

【Authors】: Sören Laue ; Matthias Mitterreiter ; Joachim Giesen

【Abstract】: Computing derivatives of tensor expressions, also known as tensor calculus, is a fundamental task in machine learning. A key concern is the efficiency of evaluating the expressions and their derivatives that hinges on the representation of these expressions. Recently, an algorithm for computing higher order derivatives of tensor expressions like Jacobians or Hessians has been introduced that is a few orders of magnitude faster than previous state-of-the-art approaches. Unfortunately, the approach is based on Ricci notation and hence cannot be incorporated into automatic differentiation frameworks like TensorFlow, PyTorch, autograd, or JAX that use the simpler Einstein notation. This leaves two options, to either change the underlying tensor representation in these frameworks or to develop a new, provably correct algorithm based on Einstein notation. Obviously, the first option is impractical. Hence, we pursue the second option. Here, we show that using Ricci notation is not necessary for an efficient tensor calculus and develop an equally efficient method for the simpler Einstein notation. It turns out that turning to Einstein notation enables further improvements that lead to even better efficiency.

【Keywords】:

555. Proximity Preserving Binary Code Using Signed Graph-Cut.

Paper Link】 【Pages】:4535-4544

【Authors】: Inbal Lavi ; Shai Avidan ; Yoram Singer ; Yacov Hel-Or

【Abstract】: We introduce a binary embedding framework, called Proximity Preserving Code (PPC), which learns similarity and dissimilarity between data points to create a compact and affinity-preserving binary code. This code can be used to apply fast and memory-efficient approximation to nearest-neighbor searches. Our framework is flexible, enabling different proximity definitions between data points. In contrast to previous methods that extract binary codes based on unsigned graph partitioning, our system models the attractive and repulsive forces in the data by incorporating positive and negative graph weights. The proposed framework is shown to boil down to finding the minimal cut of a signed graph, a problem known to be NP-hard. We offer an efficient approximation and achieve superior results by constructing the code bit after bit. We show that the proposed approximation is superior to the commonly used spectral methods with respect to both accuracy and complexity. Thus, it is useful for many other problems that can be translated into signed graph cut.

【Keywords】:

556. Residual Neural Processes.

Paper Link】 【Pages】:4545-4552

【Authors】: Byung-Jun Lee ; Seunghoon Hong ; Kee-Eung Kim

【Abstract】: A Neural Process (NP) is a map from a set of observed input-output pairs to a predictive distribution over functions, which is designed to mimic other stochastic processes' inference mechanisms. NPs are shown to work effectively in tasks that require complex distributions, where traditional stochastic processes struggle, e.g. image completion tasks. This paper concerns the practical capacity of set function approximators despite their universality. By delving deeper into the relationship between an NP and a Bayesian last layer (BLL), it is possible to see that NPs may struggle in simple examples, which other stochastic processes can easily solve. In this paper, we propose a simple yet effective remedy; the Residual Neural Process (RNP) that leverages traditional BLL for faster training and better prediction. We demonstrate that the RNP shows faster convergence and better performance, both qualitatively and quantitatively.

【Keywords】:

557. Residual Continual Learning.

Paper Link】 【Pages】:4553-4560

【Authors】: Janghyeon Lee ; Donggyu Joo ; Hyeong Gwon Hong ; Junmo Kim

【Abstract】: We propose a novel continual learning method called Residual Continual Learning (ResCL). Our method can prevent the catastrophic forgetting phenomenon in sequential learning of multiple tasks, without any source task information except the original network. ResCL reparameterizes network parameters by linearly combining each layer of the original network and a fine-tuned network; therefore, the size of the network does not increase at all. To apply the proposed method to general convolutional neural networks, the effects of batch normalization layers are also considered. By utilizing residual-learning-like reparameterization and a special weight decay loss, the trade-off between source and target performance is effectively controlled. The proposed method exhibits state-of-the-art performance in various continual learning scenarios.

【Keywords】:

Paper Link】 【Pages】:4561-4568

【Authors】: Jongmin Lee ; Wonseok Jeon ; Geon-Hyeong Kim ; Kee-Eung Kim

【Abstract】: Monte-Carlo Tree Search (MCTS) is the state-of-the-art online planning algorithm for large problems with discrete action spaces. However, many real-world problems involve continuous action spaces, where MCTS is not as effective as in discrete action spaces. This is mainly due to common practices such as coarse discretization of the entire action space and failure to exploit local smoothness. In this paper, we introduce Value-Gradient UCT (VG-UCT), which combines traditional MCTS with gradient-based optimization of action particles. VG-UCT simultaneously performs a global search via UCT with respect to the finitely sampled set of actions and performs a local improvement via action value gradients. In the experiments, we demonstrate that our approach outperforms existing MCTS methods and other strong baseline algorithms for continuous action spaces.

【Keywords】:

559. URNet: User-Resizable Residual Networks with Conditional Gating Module.

Paper Link】 【Pages】:4569-4576

【Authors】: Sang-Ho Lee ; Simyung Chang ; Nojun Kwak

【Abstract】: Convolutional Neural Networks are widely used to process spatial scenes, but their computational cost is fixed and depends on the structure of the network used. There are methods to reduce the cost by compressing networks or varying its computational path dynamically according to the input image. However, since a user can not control the size of the learned model, it is difficult to respond dynamically if the amount of service requests suddenly increases. We propose User-Resizable Residual Networks (URNet), which allows users to adjust the computational cost of the network as needed during evaluation. URNet includes Conditional Gating Module (CGM) that determines the use of each residual block according to the input image and the desired cost. CGM is trained in a supervised manner using the newly proposed scale(cost) loss and its corresponding training methods. URNet can control the amount of computation and its inference path according to user's demand without degrading the accuracy significantly. In the experiments on ImageNet, URNet based on ResNet-101 maintains the accuracy of the baseline even when resizing it to approximately 80% of the original network, and demonstrates only about 1% accuracy degradation when using about 65% of the computation.

【Keywords】:

560. Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents.

Paper Link】 【Pages】:4577-4584

【Authors】: Xian Yeow Lee ; Sambit Ghadai ; Kai Liang Tan ; Chinmay Hegde ; Soumik Sarkar

【Abstract】: Robustness of Deep Reinforcement Learning (DRL) algorithms towards adversarial attacks in real world applications such as those deployed in cyber-physical systems (CPS) are of increasing concern. Numerous studies have investigated the mechanisms of attacks on the RL agent's state space. Nonetheless, attacks on the RL agent's action space (corresponding to actuators in engineering systems) are equally perverse, but such attacks are relatively less studied in the ML literature. In this work, we first frame the problem as an optimization problem of minimizing the cumulative reward of an RL agent with decoupled constraints as the budget of attack. We propose the white-box Myopic Action Space (MAS) attack algorithm that distributes the attacks across the action space dimensions. Next, we reformulate the optimization problem above with the same objective function, but with a temporally coupled constraint on the attack budget to take into account the approximated dynamics of the agent. This leads to the white-box Look-ahead Action Space (LAS) attack algorithm that distributes the attacks across the action and temporal dimensions. Our results showed that using the same amount of resources, the LAS attack deteriorates the agent's performance significantly more than the MAS attack. This reveals the possibility that with limited resource, an adversary can utilize the agent's dynamics to malevolently craft attacks that causes the agent to fail. Additionally, we leverage these attack strategies as a possible tool to gain insights on the potential vulnerabilities of DRL agents.

【Keywords】:

561. Robustness Certificates for Sparse Adversarial Attacks by Randomized Ablation.

Paper Link】 【Pages】:4585-4593

【Authors】: Alexander Levine ; Soheil Feizi

【Abstract】: Recently, techniques have been developed to provably guarantee the robustness of a classifier to adversarial perturbations of bounded L1 and L2 magnitudes by using randomized smoothing: the robust classification is a consensus of base classifications on randomly noised samples where the noise is additive. In this paper, we extend this technique to the L0 threat model. We propose an efficient and certifiably robust defense against sparse adversarial attacks by randomly ablating input features, rather than using additive noise. Experimentally, on MNIST, we can certify the classifications of over 50% of images to be robust to any distortion of at most 8 pixels. This is comparable to the observed empirical robustness of unprotected classifiers on MNIST to modern L0 attacks, demonstrating the tightness of the proposed robustness certificate. We also evaluate our certificate on ImageNet and CIFAR-10. Our certificates represent an improvement on those provided in a concurrent work (Lee et al. 2019) which uses random noise rather than ablation (median certificates of 8 pixels versus 4 pixels on MNIST; 16 pixels versus 1 pixel on ImageNet.) Additionally, we empirically demonstrate that our classifier is highly robust to modern sparse adversarial attacks on MNIST. Our classifications are robust, in median, to adversarial perturbations of up to 31 pixels, compared to 22 pixels reported as the state-of-the-art defense, at the cost of a slight decrease (around 2.3%) in the classification accuracy. Code and supplementary material is available at https://github.com/alevine0/randomizedAblation/.

【Keywords】:

562. Stochastically Robust Personalized Ranking for LSH Recommendation Retrieval.

Paper Link】 【Pages】:4594-4601

【Authors】: Dung D. Le ; Hady W. Lauw

【Abstract】: Locality Sensitive Hashing (LSH) has become one of the most commonly used approximate nearest neighbor search techniques to avoid the prohibitive cost of scanning through all data points. For recommender systems, LSH achieves efficient recommendation retrieval by encoding user and item vectors into binary hash codes, reducing the cost of exhaustively examining all the item vectors to identify the top-k items. However, conventional matrix factorization models may suffer from performance degeneration caused by randomly-drawn LSH hash functions, directly affecting the ultimate quality of the recommendations. In this paper, we propose a framework named øurmodel, which factors in the stochasticity of LSH hash functions when learning real-valued user and item latent vectors, eventually improving the recommendation accuracy after LSH indexing. Experiments on publicly available datasets show that the proposed framework not only effectively learns user's preferences for prediction, but also achieves high compatibility with LSH stochasticity, producing superior post-LSH indexing performances as compared to state-of-the-art baselines.

【Keywords】:

563. Beyond Unfolding: Exact Recovery of Latent Convex Tensor Decomposition Under Reshuffling.

Paper Link】 【Pages】:4602-4609

【Authors】: Chao Li ; Mohammad Emtiyaz Khan ; Zhun Sun ; Gang Niu ; Bo Han ; Shengli Xie ; Qibin Zhao

【Abstract】: Exact recovery of tensor decomposition (TD) methods is a desirable property in both unsupervised learning and scientific data analysis. The numerical defects of TD methods, however, limit their practical applications on real-world data. As an alternative, convex tensor decomposition (CTD) was proposed to alleviate these problems, but its exact-recovery property is not properly addressed so far. To this end, we focus on latent convex tensor decomposition (LCTD), a practically widely-used CTD model, and rigorously prove a sufficient condition for its exact-recovery property. Furthermore, we show that such property can be also achieved by a more general model than LCTD. In the new model, we generalize the classic tensor (un-)folding into reshuffling operation, a more flexible mapping to relocate the entries of the matrix into a tensor. Armed with the reshuffling operations and exact-recovery property, we explore a totally novel application for (generalized) LCTD, i.e., image steganography. Experimental results on synthetic data validate our theory, and results on image steganography show that our method outperforms the state-of-the-art methods.

【Keywords】:

564. Infrared-Visible Cross-Modal Person Re-Identification with an X Modality.

Paper Link】 【Pages】:4610-4617

【Authors】: Diangang Li ; Xing Wei ; Xiaopeng Hong ; Yihong Gong

【Abstract】: This paper focuses on the emerging Infrared-Visible cross-modal person re-identification task (IV-ReID), which takes infrared images as input and matches with visible color images. IV-ReID is important yet challenging, as there is a significant gap between the visible and infrared images. To reduce this ‘gap’, we introduce an auxiliary X modality as an assistant and reformulate infrared-visible dual-mode cross-modal learning as an X-Infrared-Visible three-mode learning problem. The X modality restates from RGB channels to a format with which cross-modal learning can be easily performed. With this idea, we propose an X-Infrared-Visible (XIV) ReID cross-modal learning framework. Firstly, the X modality is generated by a lightweight network, which is learnt in a self-supervised manner with the labels inherited from visible images. Secondly, under the XIV framework, cross-modal learning is guided by a carefully designed modality gap constraint, with information exchanged cross the visible, X, and infrared modalities. Extensive experiments are performed on two challenging datasets SYSU-MM01 and RegDB to evaluate the proposed XIV-ReID approach. Experimental results show that our method considerably achieves an absolute gain of over 7% in terms of rank 1 and mAP even compared with the latest state-of-the-art methods.

【Keywords】:

565. Automated Spectral Kernel Learning.

Paper Link】 【Pages】:4618-4625

【Authors】: Jian Li ; Yong Liu ; Weiping Wang

【Abstract】: The generalization performance of kernel methods is largely determined by the kernel, but spectral representations of stationary kernels are both input-independent and output-independent, which limits their applications on complicated tasks. In this paper, we propose an efficient learning framework that incorporates the process of finding suitable kernels and model training. Using non-stationary spectral kernels and backpropagation w.r.t. the objective, we obtain favorable spectral representations that depends on both inputs and outputs. Further, based on Rademacher complexity, we derive data-dependent generalization error bounds, where we investigate the effect of those factors and introduce regularization terms to improve the performance. Extensive experimental results validate the effectiveness of the proposed algorithm and coincide with our theoretical findings.

【Keywords】:

566. Graph Attention Based Proposal 3D ConvNets for Action Detection.

Paper Link】 【Pages】:4626-4633

【Authors】: Jin Li ; Xianglong Liu ; Zhuofan Zong ; Wanru Zhao ; Mingyuan Zhang ; Jingkuan Song

【Abstract】: The recent advances in 3D Convolutional Neural Networks (3D CNNs) have shown promising performance for untrimmed video action detection, employing the popular detection framework that heavily relies on the temporal action proposal generations as the input of the action detector and localization regressor. In practice the proposals usually contain strong intra and inter relations among them, mainly stemming from the temporal and spatial variations in the video actions. However, most of existing 3D CNNs ignore the relations and thus suffer from the redundant proposals degenerating the detection performance and efficiency. To address this problem, we propose graph attention based proposal 3D ConvNets (AGCN-P-3DCNNs) for video action detection. Specifically, our proposed graph attention is composed of intra attention based GCN and inter attention based GCN. We use intra attention to learn the intra long-range dependencies inside each action proposal and update node matrix of Intra Attention based GCN, and use inter attention to learn the inter dependencies between different action proposals as adjacency matrix of Inter Attention based GCN. Afterwards, we fuse intra and inter attention to model intra long-range dependencies and inter dependencies simultaneously. Another contribution is that we propose a simple and effective framewise classifier, which enhances the feature presentation capabilities of backbone model. Experiments on two proposal 3D ConvNets based models (P-C3D and P-ResNet) and two popular action detection benchmarks (THUMOS 2014, ActivityNet v1.3) demonstrate the state-of-the-art performance achieved by our method. Particularly, P-C3D embedded with our module achieves average mAP 3.7% improvement on THUMOS 2014 dataset compared to original model.

【Keywords】:

567. Symmetric Metric Learning with Adaptive Margin for Recommendation.

Paper Link】 【Pages】:4634-4641

【Authors】: Mingming Li ; Shuai Zhang ; Fuqing Zhu ; Wanhui Qian ; Liangjun Zang ; Jizhong Han ; Songlin Hu

【Abstract】: Metric learning based methods have attracted extensive interests in recommender systems. Current methods take the user-centric way in metric space to ensure the distance between user and negative item to be larger than that between the current user and positive item by a fixed margin. While they ignore the relations among positive item and negative item. As a result, these two items might be positioned closely, leading to incorrect results. Meanwhile, different users usually have different preferences, the fixed margin used in those methods can not be adaptive to various user biases, and thus decreases the performance as well. To address these two problems, a novel Symmetic Metric Learning with adaptive margin (SML) is proposed. In addition to the current user-centric metric, it symmetically introduces a positive item-centric metric which maintains closer distance from positive items to user, and push the negative items away from the positive items at the same time. Moreover, the dynamically adaptive margins are well trained to mitigate the impact of bias. Experimental results on three public recommendation datasets demonstrate that SML produces a competitive performance compared with several state-of-the-art methods.

【Keywords】:

568. Practical Federated Gradient Boosting Decision Trees.

Paper Link】 【Pages】:4642-4649

【Authors】: Qinbin Li ; Zeyi Wen ; Bingsheng He

【Abstract】: Gradient Boosting Decision Trees (GBDTs) have become very successful in recent years, with many awards in machine learning and data mining competitions. There have been several recent studies on how to train GBDTs in the federated learning setting. In this paper, we focus on horizontal federated learning, where data samples with the same features are distributed among multiple parties. However, existing studies are not efficient or effective enough for practical use. They suffer either from the inefficiency due to the usage of costly data transformations such as secure sharing and homomorphic encryption, or from the low model accuracy due to differential privacy designs. In this paper, we study a practical federated environment with relaxed privacy constraints. In this environment, a dishonest party might obtain some information about the other parties' data, but it is still impossible for the dishonest party to derive the actual raw data of other parties. Specifically, each party boosts a number of trees by exploiting similarity information based on locality-sensitive hashing. We prove that our framework is secure without exposing the original record to other parties, while the computation overhead in the training process is kept low. Our experimental studies show that, compared with normal training with the local data of each party, our approach can significantly improve the predictive accuracy, and achieve comparable accuracy to the original GBDT with the data from all parties.

【Keywords】:

569. New Efficient Multi-Spike Learning for Fast Processing and Robust Learning.

Paper Link】 【Pages】:4650-4657

【Authors】: Shenglan Li ; Qiang Yu

【Abstract】: Spiking neural networks (SNNs) are considered to be more biologically plausible and lower power consuming than traditional artificial neural networks (ANNs). SNNs use discrete spikes as input and output, but how to process and learn these discrete spikes efficiently and accurately still remains a challenging task. Moreover, most existing learning methods are inefficient with complicated neuron dynamics and learning procedures being involved. In this paper, we propose efficient alternatives by firstly introducing a simplified and efficient neuron model. Based on it, we develop two new multi-spike learning rules together with an event-driven scheme being presented to improve the processing efficiency. We show that, with the as-proposed rules, a single neuron can be trained to successfully perform challenging tasks such as multi-category classification and feature extraction. Our learning methods demonstrate a significant robustness against various strong noises. Moreover, experimental results on some real-world classification tasks show that our approaches yield higher efficiency with less requirement on computation resource, highlighting the advantages and potential of spike-based processing and driving more efforts towards neuromorphic computing.

【Keywords】:

570. Solving General Elliptical Mixture Models through an Approximate Wasserstein Manifold.

Paper Link】 【Pages】:4658-4666

【Authors】: Shengxi Li ; Zeyang Yu ; Min Xiang ; Danilo P. Mandic

【Abstract】: We address the estimation problem for general finite mixture models, with a particular focus on the elliptical mixture models (EMMs). Compared to the widely adopted Kullback–Leibler divergence, we show that the Wasserstein distance provides a more desirable optimisation space. We thus provide a stable solution to the EMMs that is both robust to initialisations and reaches a superior optimum by adaptively optimising along a manifold of an approximate Wasserstein distance. To this end, we first provide a unifying account of computable and identifiable EMMs, which serves as a basis to rigorously address the underpinning optimisation problem. Due to a probability constraint, solving this problem is extremely cumbersome and unstable, especially under the Wasserstein distance. To relieve this issue, we introduce an efficient optimisation method on a statistical manifold defined under an approximate Wasserstein distance, which allows for explicit metrics and computable operations, thus significantly stabilising and improving the EMM estimation. We further propose an adaptive method to accelerate the convergence. Experimental results demonstrate the excellent performance of the proposed EMM solver.

【Keywords】:

571. Coupled-View Deep Classifier Learning from Multiple Noisy Annotators.

Paper Link】 【Pages】:4667-4674

【Authors】: Shikun Li ; Shiming Ge ; Yingying Hua ; Chunhui Zhang ; Hao Wen ; Tengfei Liu ; Weiqiang Wang

【Abstract】: Typically, learning a deep classifier from massive cleanly annotated instances is effective but impractical in many real-world scenarios. An alternative is collecting and aggregating multiple noisy annotations for each instance to train the classifier. Inspired by that, this paper proposes to learn deep classifier from multiple noisy annotators via a coupled-view learning approach, where the learning view from data is represented by deep neural networks for data classification and the learning view from labels is described by a Naive Bayes classifier for label aggregation. Such coupled-view learning is converted to a supervised learning problem under the mutual supervision of the aggregated and predicted labels, and can be solved via alternate optimization to update labels and refine the classifiers. To alleviate the propagation of incorrect labels, small-loss metric is proposed to select reliable instances in both views. A co-teaching strategy with class-weighted loss is further leveraged in the deep classifier learning, which uses two networks with different learning abilities to teach each other, and the diverse errors introduced by noisy labels can be filtered out by peer networks. By these strategies, our approach can finally learn a robust data classifier which less overfits to label noise. Experimental results on synthetic and real data demonstrate the effectiveness and robustness of the proposed approach.

【Keywords】:

572. Stochastic Online Learning with Probabilistic Graph Feedback.

Paper Link】 【Pages】:4675-4682

【Authors】: Shuai Li ; Wei Chen ; Zheng Wen ; Kwong-Sak Leung

【Abstract】: We consider a problem of stochastic online learning with general probabilistic graph feedback, where each directed edge in the feedback graph has probability pij. Two cases are covered. (a) The one-step case, where after playing arm i the learner observes a sample reward feedback of arm j with independent probability pij. (b) The cascade case where after playing arm i the learner observes feedback of all arms j in a probabilistic cascade starting from i – for each (i,j) with probability pij, if arm i is played or observed, then a reward sample of arm j would be observed with independent probability pij. Previous works mainly focus on deterministic graphs which corresponds to one-step case with pij ∈ {0,1}, an adversarial sequence of graphs with certain topology guarantees, or a specific type of random graphs. We analyze the asymptotic lower bounds and design algorithms in both cases. The regret upper bounds of the algorithms match the lower bounds with high probability.

【Keywords】:

573. Relation Inference among Sensor Time Series in Smart Buildings with Metric Learning.

Paper Link】 【Pages】:4683-4690

【Authors】: Shuheng Li ; Dezhi Hong ; Hongning Wang

【Abstract】: Smart Building Technologies hold promise for better livability for residents and lower energy footprints. Yet, the rollout of these technologies, from demand response controls to fault detection and diagnosis, significantly lags behind and is impeded by the current practice of manual identification of sensing point relationships, e.g., how equipment is connected or which sensors are co-located in the same space. This manual process is still error-prone, albeit costly and laborious.We study relation inference among sensor time series. Our key insight is that, as equipment is connected or sensors co-locate in the same physical environment, they are affected by the same real-world events, e.g., a fan turning on or a person entering the room, thus exhibiting correlated changes in their time series data. To this end, we develop a deep metric learning solution that first converts the primitive sensor time series to the frequency domain, and then optimizes a representation of sensors that encodes their relations. Built upon the learned representation, our solution pinpoints the relationships among sensors via solving a combinatorial optimization problem. Extensive experiments on real-world buildings demonstrate the effectiveness of our solution.

【Keywords】:

574. Co-GCN for Multi-View Semi-Supervised Learning.

Paper Link】 【Pages】:4691-4698

【Authors】: Shu Li ; Wen-Tao Li ; Wei Wang

【Abstract】: In many real-world applications, the data have several disjoint sets of features and each set is called as a view. Researchers have developed many multi-view learning methods in the past decade. In this paper, we bring Graph Convolutional Network (GCN) into multi-view learning and propose a novel multi-view semi-supervised learning method Co-GCN by adaptively exploiting the graph information from the multiple views with combined Laplacians. Experimental results on real-world data sets verify that Co-GCN can achieve better performance compared with state-of-the-art multi-view semi-supervised methods.

【Keywords】:

575. Tweedie-Hawkes Processes: Interpreting the Phenomena of Outbreaks.

Paper Link】 【Pages】:4699-4706

【Authors】: Tianbo Li ; Yiping Ke

【Abstract】: Self-exciting event sequences, in which the occurrence of an event increases the probability of triggering subsequent ones, are common in many disciplines. In this paper, we propose a Bayesian model called Tweedie-Hawkes Processes (THP), which is able to model the outbreaks of events and find out the dominant factors behind. THP leverages on the Tweedie distribution in capturing various excitation effects. A variational EM algorithm is developed for model inference. Some theoretical properties of THP, including the sub-criticality, convergence of the learning algorithm and kernel selection method are discussed. Applications to Epidemiology and information diffusion analysis demonstrate the versatility of our model in various disciplines. Evaluations on real-world datasets show that THP outperforms the rival state-of-the-art baselines in the task of forecasting future events.

【Keywords】:

576. Neural Graph Embedding for Neural Architecture Search.

Paper Link】 【Pages】:4707-4714

【Authors】: Wei Li ; Shaogang Gong ; Xiatian Zhu

【Abstract】: Existing neural architecture search (NAS) methods often operate in discrete or continuous spaces directly, which ignores the graphical topology knowledge of neural networks. This leads to suboptimal search performance and efficiency, given the factor that neural networks are essentially directed acyclic graphs (DAG). In this work, we address this limitation by introducing a novel idea of neural graph embedding (NGE). Specifically, we represent the building block (i.e. the cell) of neural networks with a neural DAG, and learn it by leveraging a Graph Convolutional Network to propagate and model the intrinsic topology information of network architectures. This results in a generic neural network representation integrable with different existing NAS frameworks. Extensive experiments show the superiority of NGE over the state-of-the-art methods on image classification and semantic segmentation.

【Keywords】:

577. Understanding the Disharmony between Weight Normalization Family and Weight Decay.

Paper Link】 【Pages】:4715-4722

【Authors】: Xiang Li ; Shuo Chen ; Jian Yang

【Abstract】: The merits of fast convergence and potentially better performance of the weight normalization family have drawn increasing attention in recent years. These methods use standardization or normalization that changes the weight W to W′, which makes W′ independent to the magnitude of W. Surprisingly, W must be decayed during gradient descent, otherwise we will observe a severe under-fitting problem, which is very counter-intuitive since weight decay is widely known to prevent deep networks from over-fitting. Moreover, if we substitute (e.g., weight normalization) W′ = W∥W∥ in the original loss function ∑i L(ƒ(xi; W′),yi) + ½λ∥W′∥2, it is observed that the regularization term ½λ∥W′∥2 will be canceled as a constant ½ λ in the optimization objective. Therefore, to decay W, we need to explicitly append: ½λ∥W∥2. In this paper, we theoretically prove that ½λ∥W∥2 improves optimization only by modulating the effective learning rate and fairly has no influence on generalization when the weight normalization family is compositely employed. Furthermore, we also expose several serious problems when introducing weight decay term to weight normalization family, including the missing of global minimum, training instability and sensitivity of initialization. To address these problems, we propose an Adaptive Weight Shrink (AWS) scheme, which gradually shrinks the weights during optimization by a dynamic coefficient proportional to the magnitude of the parameter. This simple yet effective method appropriately controls the effective learning rate, which significantly improves the training stability and makes optimization more robust to initialization.

【Keywords】:

578. Do Subsampled Newton Methods Work for High-Dimensional Data?

Paper Link】 【Pages】:4723-4730

【Authors】: Xiang Li ; Shusen Wang ; Zhihua Zhang

【Abstract】: Subsampled Newton methods approximate Hessian matrices through subsampling techniques to alleviate the per-iteration cost. Previous results require Ω (d) samples to approximate Hessians, where d is the dimension of data points, making it less practical for high-dimensional data. The situation is deteriorated when d is comparably as large as the number of data points n, which requires to take the whole dataset into account, making subsampling not useful. This paper theoretically justifies the effectiveness of subsampled Newton methods on strongly convex empirical risk minimization with high dimensional data. Specifically, we provably require only Θ˜(deffγ) samples for approximating the Hessian matrices, where deffγ is the γ-ridge leverage and can be much smaller than d as long as nγ ≫ 1. Our theories work for three types of Newton methods: subsampled Netwon, distributed Newton, and proximal Newton.

【Keywords】:

579. FlowScope: Spotting Money Laundering Based on Graphs.

Paper Link】 【Pages】:4731-4738

【Authors】: Xiangfeng Li ; Shenghua Liu ; Zifeng Li ; Xiaotian Han ; Chuan Shi ; Bryan Hooi ; He Huang ; Xueqi Cheng

【Abstract】: Given a graph of the money transfers between accounts of a bank, how can we detect money laundering? Money laundering refers to criminals using the bank's services to move massive amounts of illegal money to untraceable destination accounts, in order to inject their illegal money into the legitimate financial system. Existing graph fraud detection approaches focus on dense subgraph detection, without considering the fact that money laundering involves high-volume flows of funds through chains of bank accounts, thereby decreasing their detection accuracy. Instead, we propose to model the transactions using a multipartite graph, and detect the complete flow of money from source to destination using a scalable algorithm, FlowScope. Theoretical analysis shows that FlowScope provides guarantees in terms of the amount of money that fraudsters can transfer without being detected. FlowScope outperforms state-of-the-art baselines in accurately detecting the accounts involved in money laundering, in both injected and real-world data settings.

【Keywords】:

580. On the Learning Property of Logistic and Softmax Losses for Deep Neural Networks.

Paper Link】 【Pages】:4739-4746

【Authors】: Xiangrui Li ; Xin Li ; Deng Pan ; Dongxiao Zhu

【Abstract】: Deep convolutional neural networks (CNNs) trained with logistic and softmax losses have made significant advancement in visual recognition tasks in computer vision. When training data exhibit class imbalances, the class-wise reweighted version of logistic and softmax losses are often used to boost performance of the unweighted version. In this paper, motivated to explain the reweighting mechanism, we explicate the learning property of those two loss functions by analyzing the necessary condition (e.g., gradient equals to zero) after training CNNs to converge to a local minimum. The analysis immediately provides us explanations for understanding (1) quantitative effects of the class-wise reweighting mechanism: deterministic effectiveness for binary classification using logistic loss yet indeterministic for multi-class classification using softmax loss; (2) disadvantage of logistic loss for single-label multi-class classification via one-vs.-all approach, which is due to the averaging effect on predicted probabilities for the negative class (e.g., non-target classes) in the learning process. With the disadvantage and advantage of logistic loss disentangled, we thereafter propose a novel reweighted logistic loss for multi-class classification. Our simple yet effective formulation improves ordinary logistic loss by focusing on learning hard non-target classes (target vs. non-target class in one-vs.-all) and turned out to be competitive with softmax loss. We evaluate our method on several benchmark datasets to demonstrate its effectiveness.

【Keywords】:

581. IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation.

Paper Link】 【Pages】:4747-4754

【Authors】: Xiaoyun Li ; Chengxi Wu ; Ping Li

【Abstract】: Feature selection is an important tool to deal with high dimensional data. In unsupervised case, many popular algorithms aim at maintaining the structure of the original data. In this paper, we propose a simple and effective feature selection algorithm to enhance sample similarity preservation through a new perspective, topology preservation, which is represented by persistent diagrams from the context of computational topology. This method is designed upon a unified feature selection framework called IVFS, which is inspired by random subset method. The scheme is flexible and can handle cases where the problem is analytically intractable. The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data. We demonstrate that our algorithm can provide satisfactory performance under a sharp sub-sampling rate, which supports efficient implementation of our proposed method to large scale datasets. Extensive experiments validate the effectiveness of the proposed feature selection scheme.

【Keywords】:

582. A Forest from the Trees: Generation through Neighborhoods.

Paper Link】 【Pages】:4755-4762

【Authors】: Yang Li ; Tianxiang Gao ; Junier Oliva

【Abstract】: In this work, we propose to learn a generative model using both learned features (through a latent space) and memories (through neighbors). Although human learning makes seamless use of both learned perceptual features and instance recall, current generative learning paradigms only make use of one of these two components. Take, for instance, flow models, which learn a latent space that follows a simple distribution. Conversely, kernel density techniques use instances to shift a simple distribution into an aggregate mixture model. Here we propose multiple methods to enhance the latent space of a flow model with neighborhood information. Not only does our proposed framework represent a more human-like approach by leveraging both learned features and memories, but it may also be viewed as a step forward in non-parametric methods. In addition, our proposed framework allows the user to easily control the properties of generated samples by targeting samples based on neighbors. The efficacy of our model is shown empirically with standard image datasets. We observe compelling results and a significant improvement over baselines. Combined further with a contrastive training mechanism, our proposed methods can effectively perform non-parametric novelty detection.

【Keywords】:

583. Efficient Automatic CASH via Rising Bandits.

Paper Link】 【Pages】:4763-4771

【Authors】: Yang Li ; Jiawei Jiang ; Jinyang Gao ; Yingxia Shao ; Ce Zhang ; Bin Cui

【Abstract】: The Combined Algorithm Selection and Hyperparameter optimization (CASH) is one of the most fundamental problems in Automatic Machine Learning (AutoML). The existing Bayesian optimization (BO) based solutions turn the CASH problem into a Hyperparameter Optimization (HPO) problem by combining the hyperparameters of all machine learning (ML) algorithms, and use BO methods to solve it. As a result, these methods suffer from the low-efficiency problem due to the huge hyperparameter space in CASH. To alleviate this issue, we propose the alternating optimization framework, where the HPO problem for each ML algorithm and the algorithm selection problem are optimized alternately. In this framework, the BO methods are used to solve the HPO problem for each ML algorithm separately, incorporating a much smaller hyperparameter space for BO methods. Furthermore, we introduce Rising Bandits, a CASH-oriented Multi-Armed Bandits (MAB) variant, to model the algorithm selection in CASH. This framework can take the advantages of both BO in solving the HPO problem with a relatively small hyperparameter space and the MABs in accelerating the algorithm selection. Moreover, we further develop an efficient online algorithm to solve the Rising Bandits with provably theoretical guarantees. The extensive experiments on 30 OpenML datasets demonstrate the superiority of the proposed approach over the competitive baselines.

【Keywords】:

584. Learning Signed Network Embedding via Graph Attention.

Paper Link】 【Pages】:4772-4779

【Authors】: Yu Li ; Yuan Tian ; Jiawei Zhang ; Yi Chang

【Abstract】: Learning the low-dimensional representations of graphs (i.e., network embedding) plays a critical role in network analysis and facilitates many downstream tasks. Recently graph convolutional networks (GCNs) have revolutionized the field of network embedding, and led to state-of-the-art performance in network analysis tasks such as link prediction and node classification. Nevertheless, most of the existing GCN-based network embedding methods are proposed for unsigned networks. However, in the real world, some of the networks are signed, where the links are annotated with different polarities, e.g., positive vs. negative. Since negative links may have different properties from the positive ones and can also significantly affect the quality of network embedding. Thus in this paper, we propose a novel network embedding framework SNEA to learn Signed Network Embedding via graph Attention. In particular, we propose a masked self-attentional layer, which leverages self-attention mechanism to estimate the importance coefficient for pair of nodes connected by different type of links during the embedding aggregation process. Then SNEA utilizes the masked self-attentional layers to aggregate more important information from neighboring nodes to generate the node embeddings based on balance theory. Experimental results demonstrate the effectiveness of the proposed framework through signed link prediction task on several real-world signed network datasets.

【Keywords】:

585. RTN: Reparameterized Ternary Network.

Paper Link】 【Pages】:4780-4787

【Authors】: Yuhang Li ; Xin Dong ; Sai Qian Zhang ; Haoli Bai ; Yuanpeng Chen ; Wei Wang

【Abstract】: To deploy deep neural networks on resource-limited devices, quantization has been widely explored. In this work, we study the extremely low-bit networks which have tremendous speed-up, memory saving with quantized activation and weights. We first bring up three omitted issues in extremely low-bit networks: the squashing range of quantized values; the gradient vanishing during backpropagation and the unexploited hardware acceleration of ternary networks. By reparameterizing quantized activation and weights vector with full precision scale and offset for fixed ternary vector, we decouple the range and magnitude from direction to extenuate above problems. Learnable scale and offset can automatically adjust the range of quantized values and sparsity without gradient vanishing. A novel encoding and computation pattern are designed to support efficient computing for our reparameterized ternary network (RTN). Experiments on ResNet-18 for ImageNet demonstrate that the proposed RTN finds a much better efficiency between bitwidth and accuracy and achieves up to 26.76% relative accuracy improvement compared with state-of-the-art methods. Moreover, we validate the proposed computation pattern on Field Programmable Gate Arrays (FPGA), and it brings 46.46 × and 89.17 × savings on power and area compared with the full precision convolution.

【Keywords】:

586. Learning to Auto Weight: Entirely Data-Driven and Highly Efficient Weighting Framework.

Paper Link】 【Pages】:4788-4795

【Authors】: Zhenmao Li ; Yichao Wu ; Ken Chen ; Yudong Wu ; Shunfeng Zhou ; Jiaheng Liu ; Junjie Yan

【Abstract】: Example weighting algorithm is an effective solution to the training bias problem, however, most previous typical methods are usually limited to human knowledge and require laborious tuning of hyperparameters. In this paper, we propose a novel example weighting framework called Learning to Auto Weight (LAW). The proposed framework finds step-dependent weighting policies adaptively, and can be jointly trained with target networks without any assumptions or prior knowledge about the dataset. It consists of three key components: Stage-based Searching Strategy (3SM) is adopted to shrink the huge searching space in a complete training process; Duplicate Network Reward (DNR) gives more accurate supervision by removing randomness during the searching process; Full Data Update (FDU) further improves the updating efficiency. Experimental results demonstrate the superiority of weighting policy explored by LAW over standard training pipeline. Compared with baselines, LAW can find a better weighting schedule which achieves much more superior accuracy on both biased CIFAR and ImageNet.

【Keywords】:

587. Adaptive Two-Dimensional Embedded Image Clustering.

Paper Link】 【Pages】:4796-4803

【Authors】: Zhihui Li ; Lina Yao ; Sen Wang ; Salil S. Kanhere ; Xue Li ; Huaxiang Zhang

【Abstract】: With the rapid development of mobile devices, people are generating huge volumes of images data every day for sharing on social media, which draws much research attention to understanding the contents of images. Image clustering plays an important role in image understanding systems. Often, most of the existing image clustering algorithms flatten digital images that are originally represented by matrices into 1D vectors as the image representation for the subsequent learning. The drawbacks of vector-based algorithms include limited consideration of spatial relationship between pixels and computational complexity, both of which blame to the simple vectorized representation. To overcome the drawbacks, we propose a novel image clustering framework that can work directly on matrices of images instead of flattened vectors. Specifically, the proposed algorithm simultaneously learn the clustering results and preserve the original correlation information within the image matrix. To solve the challenging objective function, we propose a fast iterative solution. Extensive experiments have been conducted on various benchmark datasets. The experimental results confirm the superiority of the proposed algorithm.

【Keywords】:

588. Tensor Completion for Weakly-Dependent Data on Graph for Metro Passenger Flow Prediction.

Paper Link】 【Pages】:4804-4810

【Authors】: Ziyue Li ; Nurettin Dorukhan Sergin ; Hao Yan ; Chen Zhang ; Fugee Tsung

【Abstract】: Low-rank tensor decomposition and completion have attracted significant interest from academia given the ubiquity of tensor data. However, low-rank structure is a global property, which will not be fulfilled when the data presents complex and weak dependencies given specific graph structures. One particular application that motivates this study is the spatiotemporal data analysis. As shown in the preliminary study, weakly dependencies can worsen the low-rank tensor completion performance. In this paper, we propose a novel low-rank CANDECOMP / PARAFAC (CP) tensor decomposition and completion framework by introducing the L1-norm penalty and Graph Laplacian penalty to model the weakly dependency on graph. We further propose an efficient optimization algorithm based on the Block Coordinate Descent for efficient estimation. A case study based on the metro passenger flow data in Hong Kong is conducted to demonstrate an improved performance over the regular tensor completion methods.

【Keywords】:

589. LMLFM: Longitudinal Multi-Level Factorization Machine.

Paper Link】 【Pages】:4811-4818

【Authors】: Junjie Liang ; Dongkuan Xu ; Yiwei Sun ; Vasant G. Honavar

【Abstract】: We consider the problem of learning predictive models from longitudinal data, consisting of irregularly repeated, sparse observations from a set of individuals over time. Such data often exhibit longitudinal correlation (LC) (correlations among observations for each individual over time), cluster correlation (CC) (correlations among individuals that have similar characteristics), or both. These correlations are often accounted for using mixed effects models that include fixed effects and random effects, where the fixed effects capture the regression parameters that are shared by all individuals, whereas random effects capture those parameters that vary across individuals. However, the current state-of-the-art methods are unable to select the most predictive fixed effects and random effects from a large number of variables, while accounting for complex correlation structure in the data and non-linear interactions among the variables. We propose Longitudinal Multi-Level Factorization Machine (LMLFM), to the best of our knowledge, the first model to address these challenges in learning predictive models from longitudinal data. We establish the convergence properties, and analyze the computational complexity, of LMLFM. We present results of experiments with both simulated and real-world longitudinal data which show that LMLFM outperforms the state-of-the-art methods in terms of predictive accuracy, variable selection ability, and scalability to data with large number of variables. The code and supplemental material is available at https://github.com/junjieliang672/LMLFM.

【Keywords】:

590. Instance Enhancement Batch Normalization: An Adaptive Regulator of Batch Noise.

Paper Link】 【Pages】:4819-4827

【Authors】: Senwei Liang ; Zhongzhan Huang ; Mingfu Liang ; Haizhao Yang

【Abstract】: Batch Normalization (BN) (Ioffe and Szegedy 2015) normalizes the features of an input image via statistics of a batch of images and hence BN will bring the noise to the gradient of training loss. Previous works indicate that the noise is important for the optimization and generalization of deep neural networks, but too much noise will harm the performance of networks. In our paper, we offer a new point of view that the self-attention mechanism can help to regulate the noise by enhancing instance-specific information to obtain a better regularization effect. Therefore, we propose an attention-based BN called Instance Enhancement Batch Normalization (IEBN) that recalibrates the information of each channel by a simple linear transformation. IEBN has a good capacity of regulating the batch noise and stabilizing network training to improve generalization even in the presence of two kinds of noise attacks during training. Finally, IEBN outperforms BN with only a light parameter increment in image classification tasks under different network structures and benchmark datasets.

【Keywords】:

591. Differentiable Algorithm for Marginalising Changepoints.

Paper Link】 【Pages】:4828-4835

【Authors】: Hyoungjin Lim ; Gwonsoo Che ; Wonyeol Lee ; Hongseok Yang

【Abstract】: We present an algorithm for marginalising changepoints in time-series models that assume a fixed number of unknown changepoints. Our algorithm is differentiable with respect to its inputs, which are the values of latent random variables other than changepoints. Also, it runs in time O(mn) where n is the number of time steps and m the number of changepoints, an improvement over a naive marginalisation method with O(nm) time complexity. We derive the algorithm by identifying quantities related to this marginalisation problem, showing that these quantities satisfy recursive relationships, and transforming the relationships to an algorithm via dynamic programming. Since our algorithm is differentiable, it can be applied to convert a model non-differentiable due to changepoints to a differentiable one, so that the resulting models can be analysed using gradient-based inference or learning techniques. We empirically show the effectiveness of our algorithm in this application by tackling the posterior inference problem on synthetic and real-world data.

【Keywords】:

592. OOGAN: Disentangling GAN with One-Hot Sampling and Orthogonal Regularization.

Paper Link】 【Pages】:4836-4843

【Authors】: Bingchen Liu ; Yizhe Zhu ; Zuohui Fu ; Gerard de Melo ; Ahmed Elgammal

【Abstract】: Exploring the potential of GANs for unsupervised disentanglement learning, this paper proposes a novel GAN-based disentanglement framework with One-Hot Sampling and Orthogonal Regularization (OOGAN). While previous works mostly attempt to tackle disentanglement learning through VAE and seek to implicitly minimize the Total Correlation (TC) objective with various sorts of approximation methods, we show that GANs have a natural advantage in disentangling with an alternating latent variable (noise) sampling method that is straightforward and robust. Furthermore, we provide a brand-new perspective on designing the structure of the generator and discriminator, demonstrating that a minor structural change and an orthogonal regularization on model weights entails an improved disentanglement. Instead of experimenting on simple toy datasets, we conduct experiments on higher-resolution images and show that OOGAN greatly pushes the boundary of unsupervised disentanglement.

【Keywords】:

593. Random Fourier Features via Fast Surrogate Leverage Weighted Sampling.

Paper Link】 【Pages】:4844-4851

【Authors】: Fanghui Liu ; Xiaolin Huang ; Yudong Chen ; Jie Yang ; Johan A. K. Suykens

【Abstract】: In this paper, we propose a fast surrogate leverage weighted sampling strategy to generate refined random Fourier features for kernel approximation. Compared to the current state-of-the-art method that uses the leverage weighted scheme (Li et al. 2019), our new strategy is simpler and more effective. It uses kernel alignment to guide the sampling process and it can avoid the matrix inversion operator when we compute the leverage function. Given n observations and s random features, our strategy can reduce the time complexity for sampling from O(ns2+s3) to O(ns2), while achieving comparable (or even slightly better) prediction performance when applied to kernel ridge regression (KRR). In addition, we provide theoretical guarantees on the generalization performance of our approach, and in particular characterize the number of random features required to achieve statistical guarantees in KRR. Experiments on several benchmark datasets demonstrate that our algorithm achieves comparable prediction performance and takes less time cost when compared to (Li et al. 2019).

【Keywords】:

594. EC-GAN: Inferring Brain Effective Connectivity via Generative Adversarial Networks.

Paper Link】 【Pages】:4852-4859

【Authors】: Jinduo Liu ; Junzhong Ji ; Guangxu Xun ; Liuyi Yao ; Mengdi Huai ; Aidong Zhang

【Abstract】: Inferring effective connectivity between different brain regions from functional magnetic resonance imaging (fMRI) data is an important advanced study in neuroinformatics in recent years. However, current methods have limited usage in effective connectivity studies due to the high noise and small sample size of fMRI data. In this paper, we propose a novel framework for inferring effective connectivity based on generative adversarial networks (GAN), named as EC-GAN. The proposed framework EC-GAN infers effective connectivity via an adversarial process, in which we simultaneously train two models: a generator and a discriminator. The generator consists of a set of effective connectivity generators based on structural equation models which can generate the fMRI time series of each brain region via effective connectivity. Meanwhile, the discriminator is employed to distinguish between the joint distributions of the real and generated fMRI time series. Experimental results on simulated data show that EC-GAN can better infer effective connectivity compared to other state-of-the-art methods. The real-world experiments indicate that EC-GAN can provide a new and reliable perspective analyzing the effective connectivity of fMRI data.

【Keywords】:

595. A Cluster-Weighted Kernel K-Means Method for Multi-View Clustering.

Paper Link】 【Pages】:4860-4867

【Authors】: Jing Liu ; Fuyuan Cao ; Xiao-Zhi Gao ; Liqin Yu ; Jiye Liang

【Abstract】: Clustering by jointly exploiting information from multiple views can yield better performance than clustering on one single view. Some existing multi-view clustering methods aim at learning a weight for each view to determine its contribution to the final solution. However, the view-weighted scheme can only indicate the overall importance of a view, which fails to recognize the importance of each inner cluster of a view. A view with higher weight cannot guarantee all clusters in this view have higher importance than them in other views. In this paper, we propose a cluster-weighted kernel k-means method for multi-view clustering. Each inner cluster of each view is assigned a weight, which is learned based on the intra-cluster similarity of the cluster compared with all its corresponding clusters in different views, to make the cluster with higher intra-cluster similarity have a higher weight among the corresponding clusters. The cluster labels are learned simultaneously with the cluster weights in an alternative updating way, by minimizing the weighted sum-of-squared errors of the kernel k-means. Compared with the view-weighted scheme, the cluster-weighted scheme enhances the interpretability for the clustering results. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the proposed method.

【Keywords】:

596. Attribute Propagation Network for Graph Zero-Shot Learning.

Paper Link】 【Pages】:4868-4875

【Authors】: Lu Liu ; Tianyi Zhou ; Guodong Long ; Jing Jiang ; Chengqi Zhang

【Abstract】: The goal of zero-shot learning (ZSL) is to train a model to classify samples of classes that were not seen during training. To address this challenging task, most ZSL methods relate unseen test classes to seen(training) classes via a pre-defined set of attributes that can describe all classes in the same semantic space, so the knowledge learned on the training classes can be adapted to unseen classes. In this paper, we aim to optimize the attribute space for ZSL by training a propagation mechanism to refine the semantic attributes of each class based on its neighbors and related classes on a graph of classes. We show that the propagated attributes can produce classifiers for zero-shot classes with significantly improved performance in different ZSL settings. The graph of classes is usually free or very cheap to acquire such as WordNet or ImageNet classes. When the graph is not provided, given pre-defined semantic embeddings of the classes, we can learn a mechanism to generate the graph in an end-to-end manner along with the propagation mechanism. However, this graph-aided technique has not been well-explored in the literature. In this paper, we introduce the “attribute propagation network (APNet)”, which is composed of 1) a graph propagation model generating attribute vector for each class and 2) a parameterized nearest neighbor (NN) classifier categorizing an image to the class with the nearest attribute vector to the image's embedding. For better generalization over unseen classes, different from previous methods, we adopt a meta-learning strategy to train the propagation mechanism and the similarity metric for the NN classifier on multiple sub-graphs, each associated with a classification task over a subset of training classes. In experiments with two zero-shot learning settings and five benchmark datasets, APNet achieves either compelling performance or new state-of-the-art results.

【Keywords】:

597. AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates.

Paper Link】 【Pages】:4876-4883

【Authors】: Ning Liu ; Xiaolong Ma ; Zhiyuan Xu ; Yanzhi Wang ; Jian Tang ; Jieping Ye

【Abstract】: Structured weight pruning is a representative model compression technique of DNNs to reduce the storage and computation requirements and accelerate inference. An automatic hyperparameter determination process is necessary due to the large number of flexible hyperparameters. This work proposes AutoCompress, an automatic structured pruning framework with the following key performance improvements: (i) effectively incorporate the combination of structured pruning schemes in the automatic process; (ii) adopt the state-of-art ADMM-based structured weight pruning as the core algorithm, and propose an innovative additional purification step for further weight reduction without accuracy loss; and (iii) develop effective heuristic search method enhanced by experience-based guided search, replacing the prior deep reinforcement learning technique which has underlying incompatibility with the target pruning problem. Extensive experiments on CIFAR-10 and ImageNet datasets demonstrate that AutoCompress is the key to achieve ultra-high pruning rates on the number of weights and FLOPs that cannot be achieved before. As an example, AutoCompress outperforms the prior work on automatic model compression by up to 33× in pruning rate (120× reduction in the actual parameter count) under the same accuracy. Significant inference speedup has been observed from the AutoCompress framework on actual measurements on smartphone. We release models of this work at anonymous link: http://bit.ly/2VZ63dS.

【Keywords】:

598. Stochastic Loss Function.

Paper Link】 【Pages】:4884-4891

【Authors】: Qingliang Liu ; Jinmei Lai

【Abstract】: Training deep neural networks is inherently subject to the predefined and fixed loss functions during optimizing. To improve learning efficiency, we develop Stochastic Loss Function (SLF) to dynamically and automatically generating appropriate gradients to train deep networks in the same round of back-propagation, while maintaining the completeness and differentiability of the training pipeline. In SLF, a generic loss function is formulated as a joint optimization problem of network weights and loss parameters. In order to guarantee the requisite efficiency, gradients with the respect to the generic differentiable loss are leveraged for selecting loss function and optimizing network weights. Extensive experiments on a variety of popular datasets strongly demonstrate that SLF is capable of obtaining appropriate gradients at different stages during training, and can significantly improve the performance of various deep models on real world tasks including classification, clustering, regression, neural machine translation, and objection detection.

【Keywords】:

599. An ADMM Based Framework for AutoML Pipeline Configuration.

Paper Link】 【Pages】:4892-4899

【Authors】: Sijia Liu ; Parikshit Ram ; Deepak Vijaykeerthy ; Djallel Bouneffouf ; Gregory Bramble ; Horst Samulowitz ; Dakuo Wang ; Andrew Conn ; Alexander G. Gray

【Abstract】: We study the AutoML problem of automatically configuring machine learning pipelines by jointly selecting algorithms and their appropriate hyper-parameters for all steps in supervised learning pipelines. This black-box (gradient-free) optimization with mixed integer & continuous variables is a challenging problem. We propose a novel AutoML scheme by leveraging the alternating direction method of multipliers (ADMM). The proposed framework is able to (i) decompose the optimization problem into easier sub-problems that have a reduced number of variables and circumvent the challenge of mixed variable categories, and (ii) incorporate black-box constraints alongside the black-box optimization objective. We empirically evaluate the flexibility (in utilizing existing AutoML techniques), effectiveness (against open source AutoML toolkits), and unique capability (of executing AutoML with practically motivated black-box constraints) of our proposed scheme on a collection of binary classification data sets from UCI ML & OpenML repositories. We observe that on an average our framework provides significant gains in comparison to other AutoML frameworks (Auto-sklearn & TPOT), highlighting the practical advantages of this framework.

【Keywords】:

600. Layerwise Sparse Coding for Pruned Deep Neural Networks with Extreme Compression Ratio.

Paper Link】 【Pages】:4900-4907

【Authors】: Xiao Liu ; Wenbin Li ; Jing Huo ; Lili Yao ; Yang Gao

【Abstract】: Deep neural network compression is important and increasingly developed especially in resource-constrained environments, such as autonomous drones and wearable devices. Basically, we can easily and largely reduce the number of weights of a trained deep model by adopting a widely used model compression technique, e.g., pruning. In this way, two kinds of data are usually preserved for this compressed model, i.e., non-zero weights and meta-data, where meta-data is employed to help encode and decode these non-zero weights. Although we can obtain an ideally small number of non-zero weights through pruning, existing sparse matrix coding methods still need a much larger amount of meta-data (may several times larger than non-zero weights), which will be a severe bottleneck of the deploying of very deep models. To tackle this issue, we propose a layerwise sparse coding (LSC) method to maximize the compression ratio by extremely reducing the amount of meta-data. We first divide a sparse matrix into multiple small blocks and remove zero blocks, and then propose a novel signed relative index (SRI) algorithm to encode the remaining non-zero blocks (with much less meta-data). In addition, the proposed LSC performs parallel matrix multiplication without full decoding, while traditional methods cannot. Through extensive experiments, we demonstrate that LSC achieves substantial gains in pruned DNN compression (e.g., 51.03x compression ratio on ADMM-Lenet) and inference computation (i.e., time reduction and extremely less memory bandwidth), over state-of-the-art baselines.

【Keywords】:

601. Weighted-Sampling Audio Adversarial Example Attack.

Paper Link】 【Pages】:4908-4915

【Authors】: Xiaolei Liu ; Kun Wan ; Yufei Ding ; Xiaosong Zhang ; Qingxin Zhu

【Abstract】: Recent studies have highlighted audio adversarial examples as a ubiquitous threat to state-of-the-art automatic speech recognition systems. Thorough studies on how to effectively generate adversarial examples are essential to prevent potential attacks. Despite many research on this, the efficiency and the robustness of existing works are not yet satisfactory. In this paper, we propose weighted-sampling audio adversarial examples, focusing on the numbers and the weights of distortion to reinforce the attack. Further, we apply a denoising method in the loss function to make the adversarial attack more imperceptible. Experiments show that our method is the first in the field to generate audio adversarial examples with low noise and high audio robustness at the minute time-consuming level 1.

【Keywords】:

Paper Link】 【Pages】:4916-4923

【Authors】: Yanbei Liu ; Xiao Wang ; Shu Wu ; Zhitao Xiao

【Abstract】: We address the problem of disentangled representation learning with independent latent factors in graph convolutional networks (GCNs). The current methods usually learn node representation by describing its neighborhood as a perceptual whole in a holistic manner while ignoring the entanglement of the latent factors. However, a real-world graph is formed by the complex interaction of many latent factors (e.g., the same hobby, education or work in social network). While little effort has been made toward exploring the disentangled representation in GCNs. In this paper, we propose a novel Independence Promoted Graph Disentangled Networks (IPGDN) to learn disentangled node representation while enhancing the independence among node representations. In particular, we firstly present disentangled representation learning by neighborhood routing mechanism, and then employ the Hilbert-Schmidt Independence Criterion (HSIC) to enforce independence between the latent representations, which is effectively integrated into a graph convolutional framework as a regularizer at the output layer. Experimental studies on real-world graphs validate our model and demonstrate that our algorithms outperform the state-of-the-arts by a wide margin in different network applications, including semi-supervised graph classification, graph clustering and graph visualization.

【Keywords】:

603. Adaptive Activation Network and Functional Regularization for Efficient and Flexible Deep Multi-Task Learning.

Paper Link】 【Pages】:4924-4931

【Authors】: Yingru Liu ; Xuewen Yang ; Dongliang Xie ; Xin Wang ; Li Shen ; Haozhi Huang ; Niranjan Balasubramanian

【Abstract】: Multi-task learning (MTL) is a common paradigm that seeks to improve the generalization performance of task learning by training related tasks simultaneously. However, it is still a challenging problem to search the flexible and accurate architecture that can be shared among multiple tasks. In this paper, we propose a novel deep learning model called Task Adaptive Activation Network (TAAN) that can automatically learn the optimal network architecture for MTL. The main principle of TAAN is to derive flexible activation functions for different tasks from the data with other parameters of the network fully shared. We further propose two functional regularization methods that improve the MTL performance of TAAN. The improved performance of both TAAN and the regularization methods is demonstrated by comprehensive experiments.

【Keywords】:

604. Diversified Interactive Recommendation with Implicit Feedback.

Paper Link】 【Pages】:4932-4939

【Authors】: Yong Liu ; Yingtai Xiao ; Qiong Wu ; Chunyan Miao ; Juyong Zhang ; Binqiang Zhao ; Haihong Tang

【Abstract】: Interactive recommender systems that enable the interactions between users and the recommender system have attracted increasing research attention. Previous methods mainly focus on optimizing recommendation accuracy. However, they usually ignore the diversity of the recommendation results, thus usually results in unsatisfying user experiences. In this paper, we propose a novel diversified recommendation model, named Diversified Contextual Combinatorial Bandit (DC2B), for interactive recommendation with users' implicit feedback. Specifically, DC2B employs determinantal point process in the recommendation procedure to promote diversity of the recommendation results. To learn the model parameters, a Thompson sampling-type algorithm based on variational Bayesian inference is proposed. In addition, theoretical regret analysis is also provided to guarantee the performance of DC2B. Extensive experiments on real datasets are performed to demonstrate the effectiveness of the proposed method in balancing the recommendation accuracy and diversity.

【Keywords】:

605. IPO: Interior-Point Policy Optimization under Constraints.

Paper Link】 【Pages】:4940-4947

【Authors】: Yongshuai Liu ; Jiaxin Ding ; Xin Liu

【Abstract】: In this paper, we study reinforcement learning (RL) algorithms to solve real-world decision problems with the objective of maximizing the long-term reward as well as satisfying cumulative constraints. We propose a novel first-order policy optimization method, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method. Our proposed method is easy to implement with performance guarantees and can handle general types of cumulative multi-constraint settings. We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction.

【Keywords】:

606. Collaborative Sampling in Generative Adversarial Networks.

Paper Link】 【Pages】:4948-4956

【Authors】: Yuejiang Liu ; Parth Kothari ; Alexandre Alahi

【Abstract】: The standard practice in Generative Adversarial Networks (GANs) discards the discriminator during sampling. However, this sampling method loses valuable information learned by the discriminator regarding the data distribution. In this work, we propose a collaborative sampling scheme between the generator and the discriminator for improved data generation. Guided by the discriminator, our approach refines the generated samples through gradient-based updates at a particular layer of the generator, shifting the generator distribution closer to the real data distribution. Additionally, we present a practical discriminator shaping method that can smoothen the loss landscape provided by the discriminator for effective sample refinement. Through extensive experiments on synthetic and image datasets, we demonstrate that our proposed method can improve generated samples both quantitatively and qualitatively, offering a new degree of freedom in GAN sampling.

【Keywords】:

607. Uncertainty Aware Graph Gaussian Process for Semi-Supervised Learning.

Paper Link】 【Pages】:4957-4964

【Authors】: Zhao-Yang Liu ; Shao-Yuan Li ; Songcan Chen ; Yao Hu ; Sheng-Jun Huang

【Abstract】: Graph-based semi-supervised learning (GSSL) studies the problem where in addition to a set of data points with few available labels, there also exists a graph structure that describes the underlying relationship between data items. In practice, structure uncertainty often occurs in graphs when edges exist between data with different labels, which may further results in prediction uncertainty of labels. Considering that Gaussian process generalizes well with few labels and can naturally model uncertainty, in this paper, we propose an Uncertainty aware Graph Gaussian Process based approach (UaGGP) for GSSL. UaGGP exploits the prediction uncertainty and label smooth regularization to guide each other during learning. To further subdue the effect of irrelevant neighbors, UaGGP also aggregates the clean representation in the original space and the learned representation. Experiments on benchmarks demonstrate the effectiveness of the proposed approach.

【Keywords】:

608. Interactive Rare-Category-of-Interest Mining from Large Datasets.

Paper Link】 【Pages】:4965-4972

【Authors】: Zhenguang Liu ; Sihao Hu ; Yifang Yin ; Jianhai Chen ; Kevin Chiew ; Luming Zhang ; Zetian Wu

【Abstract】: In the era of big data, rare category data examples are often of key importance despite their scarcity, e.g., rare bird audio is usually more valuable than common bird audio. However, existing efforts on rare category mining consider only the statistical characteristics of rare category data examples, while ignoring their ‘true’ interestingness to the user. Moreover, current approaches are unable to support real-time user interactions due to their prohibitive computational costs for answering a single user query.In this paper, we contribute a new model named IRim, which can interactively mine rare category data examples of interest over large datasets. The mining process is carried out by two steps, namely rare category detection (RCD) followed by rare category exploration (RCE). In RCD, by introducing an offline phase and high-level knowledge abstractions, IRim reduces the time complexity of answering a user query from quadratic to logarithmic. In RCE, by proposing a collaborative-reconstruction based approach, we are able to explicitly encode both user preference and rare category characteristics. Extensive experiments on five diverse real-world datasets show that our method achieves the response time in seconds for user interactions, and outperforms state-of-the-art competitors significantly in accuracy and number of queries. As a side contribution, we construct and release two benchmark datasets which to our knowledge are the first public datasets tailored for rare category mining task.

【Keywords】:

609. Towards Fine-Grained Temporal Network Representation via Time-Reinforced Random Walk.

Paper Link】 【Pages】:4973-4980

【Authors】: Zhining Liu ; Dawei Zhou ; Yada Zhu ; Jinjie Gu ; Jingrui He

【Abstract】: Encoding a large-scale network into a low-dimensional space is a fundamental step for various network analytic problems, such as node classification, link prediction, community detection, etc. Existing methods focus on learning the network representation from either the static graphs or time-aggregated graphs (e.g., time-evolving graphs). However, many real systems are not static or time-aggregated as the nodes and edges are timestamped and dynamically changing over time. For examples, in anti-money laundering analysis, cycles formed with time-ordered transactions might be red flags in online transaction networks; in novelty detection, a star-shaped structure appearing in a short burst might be an underlying hot topic in social networks. Existing embedding models might not be able to well preserve such fine-grained network dynamics due to the incapability of dealing with continuous-time and the negligence of fine-grained interactions. To bridge this gap, in this paper, we propose a fine-grained temporal network embedding framework named FiGTNE, which aims to learn a comprehensive network representation that preserves the rich and complex network context in the temporal network. In particular, we start from the notion of fine-grained temporal networks, where the temporal network can be represented as a series of timestamped nodes and edges. Then, we propose the time-reinforced random walk (TRRW) with a bi-level context sampling strategy to explore the essential structures and temporal contexts in temporal networks. Extensive experimental results on real graphs demonstrate the efficacy of our FiGTNE framework.

【Keywords】:

610. Incentivized Exploration for Multi-Armed Bandits under Reward Drift.

Paper Link】 【Pages】:4981-4988

【Authors】: Zhiyuan Liu ; Huazheng Wang ; Fan Shen ; Kai Liu ; Lijun Chen

【Abstract】: We study incentivized exploration for the multi-armed bandit (MAB) problem where the players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on reward. We seek to understand the impact of this drifted reward feedback by analyzing the performance of three instantiations of the incentivized MAB algorithm: UCB, ε-Greedy, and Thompson Sampling. Our results show that they all achieve O(log T) regret and compensation under the drifted reward, and are therefore effective in incentivizing exploration. Numerical examples are provided to complement the theoretical analysis.

【Keywords】:

611. Structured Sparsification of Gated Recurrent Neural Networks.

Paper Link】 【Pages】:4989-4996

【Authors】: Ekaterina Lobacheva ; Nadezhda Chirkova ; Alexander Markovich ; Dmitry P. Vetrov

【Abstract】: One of the most popular approaches for neural network compression is sparsification — learning sparse weight matrices. In structured sparsification, weights are set to zero by groups corresponding to structure units, e. g. neurons. We further develop the structured sparsification approach for the gated recurrent neural networks, e. g. Long Short-Term Memory (LSTM). Specifically, in addition to the sparsification of individual weights and neurons, we propose sparsifying the preactivations of gates. This makes some gates constant and simplifies an LSTM structure. We test our approach on the text classification and language modeling tasks. Our method improves the neuron-wise compression of the model in most of the tasks. We also observe that the resulting structure of gate sparsity depends on the task and connect the learned structures to the specifics of the particular tasks.

【Keywords】:

612. Cost-Effective Incentive Allocation via Structured Counterfactual Inference.

Paper Link】 【Pages】:4997-5004

【Authors】: Romain Lopez ; Chenchen Li ; Xiang Yan ; Junwu Xiong ; Michael I. Jordan ; Yuan Qi ; Le Song

【Abstract】: We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tries to learn a policy for allocating strategic financial incentives to customers and observes only bandit feedback. In contrast to traditional policy optimization frameworks, we take into account the additional reward structure and budget constraints common in this setting, and develop a new two-step method for solving this constrained counterfactual policy optimization problem. Our method first casts the reward estimation problem as a domain adaptation problem with supplementary structure, and then subsequently uses the estimators for optimizing the policy with constraints. We also establish theoretical error bounds for our estimation procedure and we empirically show that the approach leads to significant improvement on both synthetic and real datasets.

【Keywords】:

613. Structured Output Learning with Conditional Generative Flows.

Paper Link】 【Pages】:5005-5012

【Authors】: You Lu ; Bert Huang

【Abstract】: Traditional structured prediction models try to learn the conditional likelihood, i.e., p(y|x), to capture the relationship between the structured output y and the input features x. For many models, computing the likelihood is intractable. These models are therefore hard to train, requiring the use of surrogate objectives or variational inference to approximate likelihood. In this paper, we propose conditional Glow (c-Glow), a conditional generative flow for structured output learning. C-Glow benefits from the ability of flow-based models to compute p(y|x exactly and efficiently. Learning with c-Glow does not require a surrogate objective or performing inference during training. Once trained, we can directly and efficiently generate conditional samples. We develop a sample-based prediction method, which can use this advantage to do efficient and effective inference. In our experiments, we test c-Glow on five different tasks. C-Glow outperforms the state-of-the-art baselines in some tasks and predicts comparable outputs in the other tasks. The results show that c-Glow is versatile and is applicable to many different structured prediction problems.

【Keywords】:

614. Enhancing Nearest Neighbor Based Entropy Estimator for High Dimensional Distributions via Bootstrapping Local Ellipsoid.

Paper Link】 【Pages】:5013-5020

【Authors】: Chien Lu ; Jaakko Peltonen

【Abstract】: An ellipsoid-based, improved kNN entropy estimator based on random samples of distribution for high dimensionality is developed. We argue that the inaccuracy of the classical kNN estimator in high dimensional spaces results from the local uniformity assumption and the proposed method mitigates the local uniformity assumption by two crucial extensions, a local ellipsoid-based volume correction and a correction acceptance testing procedure. Relevant theoretical contributions are provided and several experiments from simple to complicated cases have shown that the proposed estimator can effectively reduce the bias especially in high dimensionalities, outperforming current state of the art alternative estimators.

【Keywords】:

615. Learning from the Past: Continual Meta-Learning with Bayesian Graph Neural Networks.

Paper Link】 【Pages】:5021-5028

【Authors】: Yadan Luo ; Zi Huang ; Zheng Zhang ; Ziwei Wang ; Mahsa Baktashmotlagh ; Yang Yang

【Abstract】: Meta-learning for few-shot learning allows a machine to leverage previously acquired knowledge as a prior, thus improving the performance on novel tasks with only small amounts of data. However, most mainstream models suffer from catastrophic forgetting and insufficient robustness issues, thereby failing to fully retain or exploit long-term knowledge while being prone to cause severe error accumulation. In this paper, we propose a novel Continual Meta-Learning approach with Bayesian Graph Neural Networks (CML-BGNN) that mathematically formulates meta-learning as continual learning of a sequence of tasks. With each task forming as a graph, the intra- and inter-task correlations can be well preserved via message-passing and history transition. To remedy topological uncertainty from graph initialization, we utilize Bayes by Backprop strategy that approximates the posterior distribution of task-specific parameters with amortized inference networks, which are seamlessly integrated into the end-to-end edge learning. Extensive experiments conducted on the miniImageNet and tieredImageNet datasets demonstrate the effectiveness and efficiency of the proposed method, improving the performance by 42.8% compared with state-of-the-art on the miniImageNet 5-way 1-shot classification task.

【Keywords】:

616. Unsupervised Domain Adaptation via Discriminative Manifold Embedding and Alignment.

Paper Link】 【Pages】:5029-5036

【Authors】: You-Wei Luo ; Chuan-Xian Ren ; Pengfei Ge ; Ke-Kun Huang ; Yu-Feng Yu

【Abstract】: Unsupervised domain adaptation is effective in leveraging the rich information from the source domain to the unsupervised target domain. Though deep learning and adversarial strategy make an important breakthrough in the adaptability of features, there are two issues to be further explored. First, the hard-assigned pseudo labels on the target domain are risky to the intrinsic data structure. Second, the batch-wise training manner in deep learning limits the description of the global structure. In this paper, a Riemannian manifold learning framework is proposed to achieve transferability and discriminability consistently. As to the first problem, this method establishes a probabilistic discriminant criterion on the target domain via soft labels. Further, this criterion is extended to a global approximation scheme for the second issue; such approximation is also memory-saving. The manifold metric alignment is exploited to be compatible with the embedding space. A theoretical error bound is derived to facilitate the alignment. Extensive experiments have been conducted to investigate the proposal and results of the comparison study manifest the superiority of consistent manifold learning framework.

【Keywords】:

617. Fastened CROWN: Tightened Neural Network Robustness Certificates.

Paper Link】 【Pages】:5037-5044

【Authors】: Zhaoyang Lyu ; Ching-Yun Ko ; Zhifeng Kong ; Ngai Wong ; Dahua Lin ; Luca Daniel

【Abstract】: The rapid growth of deep learning applications in real life is accompanied by severe safety concerns. To mitigate this uneasy phenomenon, much research has been done providing reliable evaluations of the fragility level in different deep neural networks. Apart from devising adversarial attacks, quantifiers that certify safeguarded regions have also been designed in the past five years. The summarizing work in (Salman et al. 2019) unifies a family of existing verifiers under a convex relaxation framework. We draw inspiration from such work and further demonstrate the optimality of deterministic CROWN (Zhang et al. 2018) solutions in a given linear programming problem under mild constraints. Given this theoretical result, the computationally expensive linear programming based method is shown to be unnecessary. We then propose an optimization-based approach FROWN (Fastened CROWN): a general algorithm to tighten robustness certificates for neural networks. Extensive experiments on various networks trained individually verify the effectiveness of FROWN in safeguarding larger robust regions.

【Keywords】:

618. Memory Augmented Graph Neural Networks for Sequential Recommendation.

Paper Link】 【Pages】:5045-5052

【Authors】: Chen Ma ; Liheng Ma ; Yingxue Zhang ; Jianing Sun ; Xue Liu ; Mark Coates

【Abstract】: The chronological order of user-item interactions can reveal time-evolving and sequential user behaviors in many recommender systems. The items that users will interact with may depend on the items accessed in the past. However, the substantial increase of users and items makes sequential recommender systems still face non-trivial challenges: (1) the hardness of modeling the short-term user interests; (2) the difficulty of capturing the long-term user interests; (3) the effective modeling of item co-occurrence patterns. To tackle these challenges, we propose a memory augmented graph neural network (MA-GNN) to capture both the long- and short-term user interests. Specifically, we apply a graph neural network to model the item contextual information within a short-term period and utilize a shared memory network to capture the long-range dependencies between items. In addition to the modeling of user interests, we employ a bilinear function to capture the co-occurrence patterns of related items. We extensively evaluate our model on five real-world datasets, comparing with several state-of-the-art methods and using a variety of performance metrics. The experimental results demonstrate the effectiveness of our model for the task of Top-K sequential recommendation.

【Keywords】:

619. Inefficiency of K-FAC for Large Batch Size Training.

Paper Link】 【Pages】:5053-5060

【Authors】: Linjian Ma ; Gabe Montague ; Jiayu Ye ; Zhewei Yao ; Amir Gholami ; Kurt Keutzer ; Michael W. Mahoney

【Abstract】: There have been several recent work claiming record times for ImageNet training. This is achieved by using large batch sizes during training to leverage parallel resources to produce faster wall-clock training times per training epoch. However, often these solutions require massive hyper-parameter tuning, which is an important cost that is often ignored. In this work, we perform an extensive analysis of large batch size training for two popular methods that is Stochastic Gradient Descent (SGD) as well as Kronecker-Factored Approximate Curvature (K-FAC) method. We evaluate the performance of these methods in terms of both wall-clock time and aggregate computational cost, and study the hyper-parameter sensitivity by performing more than 512 experiments per batch size for each of these methods. We perform experiments on multiple different models on two datasets of CIFAR-10 and SVHN. The results show that beyond a critical batch size both K-FAC and SGD significantly deviate from ideal strong scaling behaviour, and that despite common belief K-FAC does not exhibit improved large-batch scalability behavior, as compared to SGD.

【Keywords】:

620. Temporal Pyramid Recurrent Neural Network.

Paper Link】 【Pages】:5061-5068

【Authors】: Qianli Ma ; Zhenxi Lin ; Enhuan Chen ; Garrison W. Cottrell

【Abstract】: Learning long-term and multi-scale dependencies in sequential data is a challenging task for recurrent neural networks (RNNs). In this paper, a novel RNN structure called temporal pyramid RNN (TP-RNN) is proposed to achieve these two goals. TP-RNN is a pyramid-like structure and generally has multiple layers. In each layer of the network, there are several sub-pyramids connected by a shortcut path to the output, which can efficiently aggregate historical information from hidden states and provide many gradient feedback short-paths. This avoids back-propagating through many hidden states as in usual RNNs. In particular, in the multi-layer structure of TP-RNN, the input sequence of the higher layer is a large-scale aggregated state sequence produced by the sub-pyramids in the previous layer, instead of the usual sequence of hidden states. In this way, TP-RNN can explicitly learn multi-scale dependencies with multi-scale input sequences of different layers, and shorten the input sequence and gradient feedback paths of each layer. This avoids the vanishing gradient problem in deep RNNs and allows the network to efficiently learn long-term dependencies. We evaluate TP-RNN on several sequence modeling tasks, including the masked addition problem, pixel-by-pixel image classification, signal recognition and speaker identification. Experimental results demonstrate that TP-RNN consistently outperforms existing RNNs for learning long-term and multi-scale dependencies in sequential data.

【Keywords】:

621. Adversarial Dynamic Shapelet Networks.

Paper Link】 【Pages】:5069-5076

【Authors】: Qianli Ma ; Wanqing Zhuang ; Sen Li ; Desen Huang ; Garrison W. Cottrell

【Abstract】: Shapelets are discriminative subsequences for time series classification. Recently, learning time-series shapelets (LTS) was proposed to learn shapelets by gradient descent directly. Although learning-based shapelet methods achieve better results than previous methods, they still have two shortcomings. First, the learned shapelets are fixed after training and cannot adapt to time series with deformations at the testing phase. Second, the shapelets learned by back-propagation may not be similar to any real subsequences, which is contrary to the original intention of shapelets and reduces model interpretability. In this paper, we propose a novel shapelet learning model called Adversarial Dynamic Shapelet Networks (ADSNs). An adversarial training strategy is employed to prevent the generated shapelets from diverging from the actual subsequences of a time series. During inference, a shapelet generator produces sample-specific shapelets, and a dynamic shapelet transformation uses the generated shapelets to extract discriminative features. Thus, ADSN can dynamically generate shapelets that are similar to the real subsequences rather than having arbitrary shapes. The proposed model has high modeling flexibility while retaining the interpretability of shapelet-based methods. Experiments conducted on extensive time series data sets show that ADSN is state-of-the-art compared to existing shapelet-based methods. The visualization analysis also shows the effectiveness of dynamic shapelet generation and adversarial training.

【Keywords】:

622. Online Planner Selection with Graph Neural Networks and Adaptive Scheduling.

Paper Link】 【Pages】:5077-5084

【Authors】: Tengfei Ma ; Patrick Ferber ; Siyu Huo ; Jie Chen ; Michael Katz

【Abstract】: Automated planning is one of the foundational areas of AI. Since no single planner can work well for all tasks and domains, portfolio-based techniques have become increasingly popular in recent years. In particular, deep learning emerges as a promising methodology for online planner selection. Owing to the recent development of structural graph representations of planning tasks, we propose a graph neural network (GNN) approach to selecting candidate planners. GNNs are advantageous over a straightforward alternative, the convolutional neural networks, in that they are invariant to node permutations and that they incorporate node labels for better inference.Additionally, for cost-optimal planning, we propose a two-stage adaptive scheduling method to further improve the likelihood that a given task is solved in time. The scheduler may switch at halftime to a different planner, conditioned on the observed performance of the first one. Experimental results validate the effectiveness of the proposed method against strong baselines, both deep learning and non-deep learning based.The code is available at https://github.com/matenure/GNN_planner.

【Keywords】:

623. The HSIC Bottleneck: Deep Learning without Back-Propagation.

Paper Link】 【Pages】:5085-5092

【Authors】: Kurt Wan-Duo Ma ; J. P. Lewis ; W. Bastiaan Kleijn

【Abstract】: We introduce the HSIC (Hilbert-Schmidt independence criterion) bottleneck for training deep neural networks. The HSIC bottleneck is an alternative to the conventional cross-entropy loss and backpropagation that has a number of distinct advantages. It mitigates exploding and vanishing gradients, resulting in the ability to learn very deep networks without skip connections. There is no requirement for symmetric feedback or update locking. We find that the HSIC bottleneck provides performance on MNIST/FashionMNIST/CIFAR10 classification comparable to backpropagation with a cross-entropy target, even when the system is not encouraged to make the output resemble the classification labels. Appending a single layer trained with SGD (without backpropagation) to reformat the information further improves performance.

【Keywords】:

624. Projective Quadratic Regression for Online Learning.

Paper Link】 【Pages】:5093-5100

【Authors】: Wenye Ma

【Abstract】: This paper considers online convex optimization (OCO) problems - the paramount framework for online learning algorithm design. The loss function of learning task in OCO setting is based on streaming data so that OCO is a powerful tool to model large scale applications such as online recommender systems. Meanwhile, real-world data are usually of extreme high-dimensional due to modern feature engineering techniques so that the quadratic regression is impractical. Factorization Machine as well as its variants are efficient models for capturing feature interactions with low-rank matrix model but they can't fulfill the OCO setting due to their non-convexity. In this paper, We propose a projective quadratic regression (PQR) model. First, it can capture the import second-order feature information. Second, it is a convex model, so the requirements of OCO are fulfilled and the global optimal solution can be achieved. Moreover, existing modern online optimization methods such as Online Gradient Descent (OGD) or Follow-The-Regularized-Leader (FTRL) can be applied directly. In addition, by choosing a proper hyper-parameter, we show that it has the same order of space and time complexity as the linear model and thus can handle high-dimensional data. Experimental results demonstrate the performance of the proposed PQR model in terms of accuracy and efficiency by comparing with the state-of-the-art methods.

【Keywords】:

625. Particle Filter Recurrent Neural Networks.

Paper Link】 【Pages】:5101-5108

【Authors】: Xiao Ma ; Péter Karkus ; David Hsu ; Wee Sun Lee

【Abstract】: Recurrent neural networks (RNNs) have been extraordinarily successful for prediction with sequential data. To tackle highly variable and multi-modal real-world data, we introduce Particle Filter Recurrent Neural Networks (PF-RNNs), a new RNN family that explicitly models uncertainty in its internal structure: while an RNN relies on a long, deterministic latent state vector, a PF-RNN maintains a latent state distribution, approximated as a set of particles. For effective learning, we provide a fully differentiable particle filter algorithm that updates the PF-RNN latent state distribution according to the Bayes rule. Experiments demonstrate that the proposed PF-RNNs outperform the corresponding standard gated RNNs on a synthetic robot localization dataset and 10 real-world sequence prediction datasets for text classification, stock price prediction, etc.

【Keywords】:

626. Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance.

Paper Link】 【Pages】:5109-5116

【Authors】: Mingxuan Jing ; Xiaojian Ma ; Wenbing Huang ; Fuchun Sun ; Chao Yang ; Bin Fang ; Huaping Liu

【Abstract】: In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations. Most of existing RLfD methods require demonstrations to be perfect and sufficient, which yet is unrealistic to meet in practice. To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD in a formal way, and then point out that previous methods suffer from two issues in terms of optimality and convergence, respectively. Upon the theoretical findings we have derived, we tackle these two issues by regarding the expert guidance as a soft constraint on regulating the policy exploration of the agent, which eventually leads to a constrained optimization problem. We further demonstrate that such problem is able to be addressed efficiently by performing a local linear search on its dual form. Considerable empirical evaluations on a comprehensive collection of benchmarks indicate our method attains consistent improvement over other RLfD counterparts.

【Keywords】:

627. PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-Time Execution on Mobile Devices.

Paper Link】 【Pages】:5117-5124

【Authors】: Xiaolong Ma ; Fu-Ming Guo ; Wei Niu ; Xue Lin ; Jian Tang ; Kaisheng Ma ; Bin Ren ; Yanzhi Wang

【Abstract】: Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method. There are currently two mainstreams of pruning methods representing two extremes of pruning regularity: non-structured, fine-grained pruning can achieve high sparsity and accuracy, but is not hardware friendly; structured, coarse-grained pruning exploits hardware-efficient structures in pruning, but suffers from accuracy drop when the pruning rate is high. In this paper, we introduce PCONV, comprising a new sparsity dimension, – fine-grained pruning patterns inside the coarse-grained structures. PCONV comprises two types of sparsities, Sparse Convolution Patterns (SCP) which is generated from intra-convolution kernel pruning and connectivity sparsity generated from inter-convolution kernel pruning. Essentially, SCP enhances accuracy due to its special vision properties, and connectivity sparsity increases pruning rate while maintaining balanced workload on filter computation. To deploy PCONV, we develop a novel compiler-assisted DNN inference framework and execute PCONV models in real-time without accuracy compromise, which cannot be achieved in prior work. Our experimental results show that, PCONV outperforms three state-of-art end-to-end DNN frameworks, TensorFlow-Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 39.2 ×, 11.4 ×, and 6.3 ×, respectively, with no accuracy loss. Mobile devices can achieve real-time inference on large-scale DNNs.

【Keywords】:

628. Count-Based Exploration with the Successor Representation.

Paper Link】 【Pages】:5125-5133

【Authors】: Marlos C. Machado ; Marc G. Bellemare ; Michael Bowling

【Abstract】: In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required. Our approach is based on the successor representation (SR), which was originally introduced as a representation defining state generalization by the similarity of successor states. Here we show that the norm of the SR, while it is being learned, can be used as a reward bonus to incentivize exploration. In order to better understand this transient behavior of the norm of the SR we introduce the substochastic successor representation (SSR) and we show that it implicitly counts the number of times each state (or feature) has been observed. We use this result to introduce an algorithm that performs as well as some theoretically sample-efficient approaches. Finally, we extend these ideas to a deep RL algorithm and show that it achieves state-of-the-art performance in Atari 2600 games when in a low sample-complexity regime.

【Keywords】:

629. Graph-Hist: Graph Classification from Latent Feature Histograms with Application to Bot Detection.

Paper Link】 【Pages】:5134-5141

【Authors】: Thomas Magelinski ; David M. Beskow ; Kathleen M. Carley

【Abstract】: Neural networks are increasingly used for graph classification in a variety of contexts. Social media is a critical application area in this space, however the characteristics of social media graphs differ from those seen in most popular benchmark datasets. Social networks tend to be large and sparse, while benchmarks are small and dense. Classically, large and sparse networks are analyzed by studying the distribution of local properties. Inspired by this, we introduce Graph-Hist: an end-to-end architecture that extracts a graph's latent local features, bins nodes together along 1-D cross sections of the feature space, and classifies the graph based on this multi-channel histogram. We show that Graph-Hist improves state of the art performance on true social media benchmark datasets, while still performing well on other benchmarks. Finally, we demonstrate Graph-Hist's performance by conducting bot detection in social media. While sophisticated bot and cyborg accounts increasingly evade traditional detection methods, they leave artificial artifacts in their conversational graph that are detected through graph classification. We apply Graph-Hist to classify these conversational graphs. In the process, we confirm that social media graphs are different than most baselines and that Graph-Hist outperforms existing bot-detection models.

【Keywords】:

630. Learning Agent Communication under Limited Bandwidth by Message Pruning.

Paper Link】 【Pages】:5142-5149

【Authors】: Hangyu Mao ; Zhengchao Zhang ; Zhen Xiao ; Zhibo Gong ; Yan Ni

【Abstract】: Communication is a crucial factor for the big multi-agent world to stay organized and productive. Recently, Deep Reinforcement Learning (DRL) has been applied to learn the communication strategy and the control policy for multiple agents. However, the practical limited bandwidth in multi-agent communication has been largely ignored by the existing DRL methods. Specifically, many methods keep sending messages incessantly, which consumes too much bandwidth. As a result, they are inapplicable to multi-agent systems with limited bandwidth. To handle this problem, we propose a gating mechanism to adaptively prune less beneficial messages. We evaluate the gating mechanism on several tasks. Experiments demonstrate that it can prune a lot of messages with little impact on performance. In fact, the performance may be greatly improved by pruning redundant messages. Moreover, the proposed gating mechanism is applicable to several previous methods, equipping them the ability to address bandwidth restricted settings.

【Keywords】:

631. Multi-Zone Unit for Recurrent Neural Networks.

Paper Link】 【Pages】:5150-5157

【Authors】: Fandong Meng ; Jinchao Zhang ; Yang Liu ; Jie Zhou

【Abstract】: Recurrent neural networks (RNNs) have been widely used to deal with sequence learning problems. The input-dependent transition function, which folds new observations into hidden states to sequentially construct fixed-length representations of arbitrary-length sequences, plays a critical role in RNNs. Based on single space composition, transition functions in existing RNNs often have difficulty in capturing complicated long-range dependencies. In this paper, we introduce a new Multi-zone Unit (MZU) for RNNs. The key idea is to design a transition function that is capable of modeling multiple space composition. The MZU consists of three components: zone generation, zone composition, and zone aggregation. Experimental results on multiple datasets of the character-level language modeling task and the aspect-based sentiment analysis task demonstrate the superiority of the MZU.

【Keywords】:

632. Neural Inheritance Relation Guided One-Shot Layer Assignment Search.

Paper Link】 【Pages】:5158-5165

【Authors】: Rang Meng ; Weijie Chen ; Di Xie ; Yuan Zhang ; Shiliang Pu

【Abstract】: Layer assignment is seldom picked out as an independent research topic in neural architecture search. In this paper, for the first time, we systematically investigate the impact of different layer assignments to the network performance by building an architecture dataset of layer assignment on CIFAR-100. Through analyzing this dataset, we discover a neural inheritance relation among the networks with different layer assignments, that is, the optimal layer assignments for deeper networks always inherit from those for shallow networks. Inspired by this neural inheritance relation, we propose an efficient one-shot layer assignment search approach via inherited sampling. Specifically, the optimal layer assignment searched in the shallow network can be provided as a strong sampling priori to train and search the deeper ones in supernet, which extremely reduces the network search space. Comprehensive experiments carried out on CIFAR-100 illustrate the efficiency of our proposed method. Our search results are strongly consistent with the optimal ones directly selected from the architecture dataset. To further confirm the generalization of our proposed method, we also conduct experiments on Tiny-ImageNet and ImageNet. Our searched results are remarkably superior to the handcrafted ones under the unchanged computational budgets. The neural inheritance relation discovered in this paper can provide insights to the universal neural architecture search.

【Keywords】:

633. Regularized Wasserstein Means for Aligning Distributional Data.

Paper Link】 【Pages】:5166-5173

【Authors】: Liang Mi ; Wen Zhang ; Yalin Wang

【Abstract】: We propose to align distributional data from the perspective of Wasserstein means. We raise the problem of regularizing Wasserstein means and propose several terms tailored to tackle different problems. Our formulation is based on the variational transportation to distribute a sparse discrete measure into the target domain. The resulting sparse representation well captures the desired property of the domain while reducing the mapping cost. We demonstrate the scalability and robustness of our method with examples in domain adaptation, point set registration, and skeleton layout.

【Keywords】:

634. Deep Embedded Non-Redundant Clustering.

Paper Link】 【Pages】:5174-5181

【Authors】: Lukas Miklautz ; Dominik Mautz ; Muzaffer Can Altinigneli ; Christian Böhm ; Claudia Plant

【Abstract】: Complex data types like images can be clustered in multiple valid ways. Non-redundant clustering aims at extracting those meaningful groupings by discouraging redundancy between clusterings. Unfortunately, clustering images in pixel space directly has been shown to work unsatisfactory. This has increased interest in combining the high representational power of deep learning with clustering, termed deep clustering. Algorithms of this type combine the non-linear embedding of an autoencoder with a clustering objective and optimize both simultaneously. None of these algorithms try to find multiple non-redundant clusterings. In this paper, we propose the novel Embedded Non-Redundant Clustering algorithm (ENRC). It is the first algorithm that combines neural-network-based representation learning with non-redundant clustering. ENRC can find multiple highly non-redundant clusterings of different dimensionalities within a data set. This is achieved by (softly) assigning each dimension of the embedded space to the different clusterings. For instance, in image data sets it can group the objects by color, material and shape, without the need for explicit feature engineering. We show the viability of ENRC in extensive experiments and empirically demonstrate the advantage of combining non-linear representation learning with non-redundant clustering.

【Keywords】:

635. Differentiable Reasoning on Large Knowledge Bases and Natural Language.

Paper Link】 【Pages】:5182-5190

【Authors】: Pasquale Minervini ; Matko Bosnjak ; Tim Rocktäschel ; Sebastian Riedel ; Edward Grefenstette

【Abstract】: Reasoning with knowledge expressed in natural language and Knowledge Bases (KBs) is a major challenge for Artificial Intelligence, with applications in machine reading, dialogue, and question answering. General neural architectures that jointly learn representations and transformations of text are very data-inefficient, and it is hard to analyse their reasoning process. These issues are addressed by end-to-end differentiable reasoning systems such as Neural Theorem Provers (NTPs), although they can only be used with small-scale symbolic KBs. In this paper we first propose Greedy NTPs (GNTPs), an extension to NTPs addressing their complexity and scalability limitations, thus making them applicable to real-world datasets. This result is achieved by dynamically constructing the computation graph of NTPs and including only the most promising proof paths during inference, thus obtaining orders of magnitude more efficient models 1. Then, we propose a novel approach for jointly reasoning over KBs and textual mentions, by embedding logic facts and natural language sentences in a shared embedding space. We show that GNTPs perform on par with NTPs at a fraction of their cost while achieving competitive link prediction results on large datasets, providing explanations for predictions, and inducing interpretable models.

【Keywords】:

636. Improved Knowledge Distillation via Teacher Assistant.

Paper Link】 【Pages】:5191-5198

【Authors】: Seyed-Iman Mirzadeh ; Mehrdad Farajtabar ; Ang Li ; Nir Levine ; Akihiro Matsukawa ; Hassan Ghasemzadeh

【Abstract】: Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress these networks, and a popular method is knowledge distillation, where a large (teacher) pre-trained network is used to train a smaller (student) network. However, in this paper, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation, which employs an intermediate-sized network (teacher assistant) to bridge the gap between the student and the teacher. Moreover, we study the effect of teacher assistant size and extend the framework to multi-step distillation. Theoretical analysis and extensive experiments on CIFAR-10,100 and ImageNet datasets and on CNN and ResNet architectures substantiate the effectiveness of our proposed approach.

【Keywords】:

637. On Adaptivity in Information-Constrained Online Learning.

Paper Link】 【Pages】:5199-5206

【Authors】: Siddharth Mitra ; Aditya Gopalan

【Abstract】: We study how to adapt to smoothly-varying (‘easy’) environments in well-known online learning problems where acquiring information is expensive. For the problem of label efficient prediction, which is a budgeted version of prediction with expert advice, we present an online algorithm whose regret depends optimally on the number of labels allowed and Q (the quadratic variation of the losses of the best action in hindsight), along with a parameter-free counterpart whose regret depends optimally on Q (the quadratic variation of the losses of all the actions). These quantities can be significantly smaller than T (the total time horizon), yielding an improvement over existing, variation-independent results for the problem. We then extend our analysis to handle label efficient prediction with bandit (partial) feedback, i.e., label efficient bandits. Our work builds upon the framework of optimistic online mirror descent, and leverages second order corrections along with a carefully designed hybrid regularizer that encodes the constrained information structure of the problem. We then consider revealing action-partial monitoring games – a version of label efficient prediction with additive information costs – which in general are known to lie in the hard class of games having minimax regret of order T2/3. We provide a strategy with an O((QT)1/3 bound for revealing action games, along with one with a O((QT)1/3) bound for the full class of hard partial monitoring games, both being strict improvements over current bounds.

【Keywords】:

638. Metareasoning in Modular Software Systems: On-the-Fly Configuration Using Reinforcement Learning with Rich Contextual Representations.

Paper Link】 【Pages】:5207-5215

【Authors】: Aditya Modi ; Debadeepta Dey ; Alekh Agarwal ; Adith Swaminathan ; Besmira Nushi ; Sean Andrist ; Eric Horvitz

【Abstract】: Assemblies of modular subsystems are being pressed into service to perform sensing, reasoning, and decision making in high-stakes, time-critical tasks in areas such as transportation, healthcare, and industrial automation. We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system. The challenge of doing system-wide optimization is a combinatorial problem. Local attempts to boost the performance of a specific module by modifying its configuration often leads to losses in overall utility of the system's performance as the distribution of inputs to downstream modules changes drastically. We present metareasoning techniques which consider a rich representation of the input, monitor the state of the entire pipeline, and adjust the configuration of modules on-the-fly so as to maximize the utility of a system's operation. We show significant improvement in both real-world and synthetic pipelines across a variety of reinforcement learning techniques.

【Keywords】:

639. Self-Supervised Learning for Generalizable Out-of-Distribution Detection.

Paper Link】 【Pages】:5216-5223

【Authors】: Sina Mohseni ; Mandar Pitale ; J. B. S. Yadawa ; Zhangyang Wang

【Abstract】: The real-world deployment of Deep Neural Networks (DNNs) in safety-critical applications such as autonomous vehicles needs to address a variety of DNNs' vulnerabilities, one of which being detecting and rejecting out-of-distribution outliers that might result in unpredictable fatal errors. We propose a new technique relying on self-supervision for generalizable out-of-distribution (OOD) feature learning and rejecting those samples at the inference time. Our technique does not need to pre-know the distribution of targeted OOD samples and incur no extra overheads compared to other methods. We perform multiple image classification experiments and observe our technique to perform favorably against state-of-the-art OOD detection methods. Interestingly, we witness that our method also reduces in-distribution classification risk via rejecting samples near the boundaries of the training set distribution.

【Keywords】:

640. Learning Weighted Model Integration Distributions.

Paper Link】 【Pages】:5224-5231

【Authors】: Paolo Morettin ; Samuel Kolb ; Stefano Teso ; Andrea Passerini

【Abstract】: Weighted model integration (WMI) is a framework for probabilistic inference over distributions with discrete and continuous variables and structured supports. Despite the growing popularity of WMI, existing density estimators ignore the problem of learning a structured support, and thus fail to handle unfeasible configurations and piecewise-linear relations between continuous variables. We propose lariat, a novel method to tackle this challenging problem. In a first step, our approach induces an SMT(ℒℛA) formula representing the support of the structured distribution. Next, it combines the latter with a density learned using a state-of-the-art estimation method. The overall model automatically accounts for the discontinuous nature of the underlying structured distribution. Our experimental results with synthetic and real-world data highlight the promise of the approach.

【Keywords】:

641. An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies.

Paper Link】 【Pages】:5232-5239

【Authors】: Mirco Mutti ; Marcello Restelli

【Abstract】: What is a good exploration strategy for an agent that interacts with an environment in the absence of external rewards? Ideally, we would like to get a policy driving towards a uniform state-action visitation (highly exploring) in a minimum number of steps (fast mixing), in order to ease efficient learning of any goal-conditioned policy later on. Unfortunately, it is remarkably arduous to directly learn an optimal policy of this nature. In this paper, we propose a novel surrogate objective for learning highly exploring and fast mixing policies, which focuses on maximizing a lower bound to the entropy of the steady-state distribution induced by the policy. In particular, we introduce three novel lower bounds, that lead to as many optimization problems, that tradeoff the theoretical guarantees with computational complexity. Then, we present a model-based reinforcement learning algorithm, IDE3AL, to learn an optimal policy according to the introduced objective. Finally, we provide an empirical evaluation of this algorithm on a set of hard-exploration tasks.

【Keywords】:

642. Efficiently Enumerating Substrings with Statistically Significant Frequencies of Locally Optimal Occurrences in Gigantic String.

Paper Link】 【Pages】:5240-5247

【Authors】: Atsuyoshi Nakamura ; Ichigaku Takigawa ; Hiroshi Mamitsuka

【Abstract】: We propose new frequent substring pattern mining which can enumerate all substrings with statistically significant frequencies of their locally optimal occurrences from a given single sequence. Our target application is genome sequences, around a half being said to be covered by interspersed and consecutive (tandem) repeats, and detecting these repeats is an important task in molecular life sciences. We evaluate the statistical significance of frequent substrings by using a string generation model with a memoryless stationary information source. We combine this idea with an existing algorithm, ESFLOO-0G.C (Nakamura et al. 2016), to enumerate all statistically significant substrings with locally optimal occurrences. We further develop a parallelized version of our algorithm. Experimental results using synthetic datasets showed the proposed algorithm achieved far higher F-measure in extracting substrings (with various lengths and frequencies) embedded in a randomly generated string with noise, than conventional algorithms. The large-scale experiment using the whole human genome sequence with 3,095,677,412 bases (letters) showed that our parallel algorithm covers 75% of the whole positions analyzed, around 4% and 24% higher than the recent report and the current cutting-edge knowledge, implying a biologically unique finding.

【Keywords】:

643. Pairwise Fairness for Ranking and Regression.

Paper Link】 【Pages】:5248-5255

【Authors】: Harikrishna Narasimhan ; Andrew Cotter ; Maya R. Gupta ; Serena Wang

【Abstract】: We present pairwise fairness metrics for ranking models and regression models that form analogues of statistical fairness notions such as equal opportunity, equal accuracy, and statistical parity. Our pairwise formulation supports both discrete protected groups, and continuous protected attributes. We show that the resulting training problems can be efficiently and effectively solved using existing constrained optimization and robust optimization techniques developed for fair classification. Experiments illustrate the broad applicability and trade-offs of these methods.

【Keywords】:

644. Bayesian Optimization for Categorical and Category-Specific Continuous Inputs.

Paper Link】 【Pages】:5256-5263

【Authors】: Dang Nguyen ; Sunil Gupta ; Santu Rana ; Alistair Shilton ; Svetha Venkatesh

【Abstract】: Many real-world functions are defined over both categorical and category-specific continuous variables and thus cannot be optimized by traditional Bayesian optimization (BO) methods. To optimize such functions, we propose a new method that formulates the problem as a multi-armed bandit problem, wherein each category corresponds to an arm with its reward distribution centered around the optimum of the objective function in continuous variables. Our goal is to identify the best arm and the maximizer of the corresponding continuous function simultaneously. Our algorithm uses a Thompson sampling scheme that helps connecting both multi-arm bandit and BO in a unified framework. We extend our method to batch BO to allow parallel optimization when multiple resources are available. We theoretically analyze our method for convergence and prove sub-linear regret bounds. We perform a variety of experiments: optimization of several benchmark functions, hyper-parameter tuning of a neural network, and automatic selection of the best machine learning model along with its optimal hyper-parameters (a.k.a automated machine learning). Comparisons with other methods demonstrate the effectiveness of our proposed method.

【Keywords】:

645. Reliable Multilabel Classification: Prediction with Partial Abstention.

Paper Link】 【Pages】:5264-5271

【Authors】: Vu-Linh Nguyen ; Eyke Hüllermeier

【Abstract】: In contrast to conventional (single-label) classification, the setting of multilabel classification (MLC) allows an instance to belong to several classes simultaneously. Thus, instead of selecting a single class label, predictions take the form of a subset of all labels. In this paper, we study an extension of the setting of MLC, in which the learner is allowed to partially abstain from a prediction, that is, to deliver predictions on some but not necessarily all class labels. We propose a formalization of MLC with abstention in terms of a generalized loss minimization problem and present first results for the case of the Hamming loss, rank loss, and F-measure, both theoretical and experimental.

【Keywords】:

646. On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models.

Paper Link】 【Pages】:5272-5280

【Authors】: Erik Nijkamp ; Mitch Hill ; Tian Han ; Song-Chun Zhu ; Ying Nian Wu

【Abstract】: This study investigates the effects of Markov chain Monte Carlo (MCMC) sampling in unsupervised Maximum Likelihood (ML) learning. Our attention is restricted to the family of unnormalized probability densities for which the negative log density (or energy function) is a ConvNet. We find that many of the techniques used to stabilize training in previous studies are not necessary. ML learning with a ConvNet potential requires only a few hyper-parameters and no regularization. Using this minimal framework, we identify a variety of ML learning outcomes that depend solely on the implementation of MCMC sampling.On one hand, we show that it is easy to train an energy-based model which can sample realistic images with short-run Langevin. ML can be effective and stable even when MCMC samples have much higher energy than true steady-state samples throughout training. Based on this insight, we introduce an ML method with purely noise-initialized MCMC, high-quality short-run synthesis, and the same budget as ML with informative MCMC initialization such as CD or PCD. Unlike previous models, our energy model can obtain realistic high-diversity samples from a noise signal after training.On the other hand, ConvNet potentials learned with non-convergent MCMC do not have a valid steady-state and cannot be considered approximate unnormalized densities of the training data because long-run MCMC samples differ greatly from observed images. We show that it is much harder to train a ConvNet potential to learn a steady-state over realistic images. To our knowledge, long-run MCMC samples of all previous models lose the realism of short-run samples. With correct tuning of Langevin noise, we train the first ConvNet potentials for which long-run and steady-state MCMC samples are realistic images.

【Keywords】:

647. Brain-Mediated Transfer Learning of Convolutional Neural Networks.

Paper Link】 【Pages】:5281-5288

【Authors】: Satoshi Nishida ; Yusuke Nakano ; Antoine Blanc ; Naoya Maeda ; Masataka Kado ; Shinji Nishimoto

【Abstract】: The human brain can effectively learn a new task from a small number of samples, which indicates that the brain can transfer its prior knowledge to solve tasks in different domains. This function is analogous to transfer learning (TL) in the field of machine learning. TL uses a well-trained feature space in a specific task domain to improve performance in new tasks with insufficient training data. TL with rich feature representations, such as features of convolutional neural networks (CNNs), shows high generalization ability across different task domains. However, such TL is still insufficient in making machine learning attain generalization ability comparable to that of the human brain. To examine if the internal representation of the brain could be used to achieve more efficient TL, we introduce a method for TL mediated by human brains. Our method transforms feature representations of audiovisual inputs in CNNs into those in activation patterns of individual brains via their association learned ahead using measured brain responses. Then, to estimate labels reflecting human cognition and behavior induced by the audiovisual inputs, the transformed representations are used for TL. We demonstrate that our brain-mediated TL (BTL) shows higher performance in the label estimation than the standard TL. In addition, we illustrate that the estimations mediated by different brains vary from brain to brain, and the variability reflects the individual variability in perception. Thus, our BTL provides a framework to improve the generalization ability of machine-learning feature representations and enable machine learning to estimate human-like cognition and behavior, including individual variability.

【Keywords】:

648. Maximum Likelihood Embedding of Logistic Random Dot Product Graphs.

Paper Link】 【Pages】:5289-5297

【Authors】: Luke J. O'Connor ; Muriel Médard ; Soheil Feizi

【Abstract】: A latent space model for a family of random graphs assigns real-valued vectors to nodes of the graph such that edge probabilities are determined by latent positions. Latent space models provide a natural statistical framework for graph visualizing and clustering. A latent space model of particular interest is the Random Dot Product Graph (RDPG), which can be fit using an efficient spectral method; however, this method is based on a heuristic that can fail, even in simple cases. Here, we consider a closely related latent space model, the Logistic RDPG, which uses a logistic link function to map from latent positions to edge likelihoods. Over this model, we show that asymptotically exact maximum likelihood inference of latent position vectors can be achieved using an efficient spectral method. Our method involves computing top eigenvectors of a normalized adjacency matrix and scaling eigenvectors using a regression step. The novel regression scaling step is an essential part of the proposed method. In simulations, we show that our proposed method is more accurate and more robust than common practices. We also show the effectiveness of our approach over standard real networks of the karate club and political blogs.

【Keywords】:

649. Radial and Directional Posteriors for Bayesian Deep Learning.

Paper Link】 【Pages】:5298-5305

【Authors】: ChangYong Oh ; Kamil Adamczewski ; Mijung Park

【Abstract】: We propose a new variational family for Bayesian neural networks. We decompose the variational posterior into two components, where the radial component captures the strength of each neuron in terms of its magnitude; while the directional component captures the statistical dependencies among the weight parameters. The dependencies learned via the directional density provide better modeling performance compared to the widely-used Gaussian mean-field-type variational family. In addition, the strength of input and output neurons learned via our posterior provides a structured way to compress neural networks. Indeed, experiments show that our variational family improves predictive performance and yields compressed networks simultaneously.

【Keywords】:

650. Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces.

Paper Link】 【Pages】:5306-5314

【Authors】: Takamasa Okudono ; Masaki Waga ; Taro Sekiyama ; Ichiro Hasuo

【Abstract】: We present a method to extract a weighted finite automaton (WFA) from a recurrent neural network (RNN). Our method is based on the WFA learning algorithm by Balle and Mohri, which is in turn an extension of Angluin's classic L* algorithm. Our technical novelty is in the use of regression methods for the so-called equivalence queries, thus exploiting the internal state space of an RNN to prioritize counterexample candidates. This way we achieve a quantitative/weighted extension of the recent work by Weiss, Goldberg and Yahav that extracts DFAs. We experimentally evaluate the accuracy, expressivity and efficiency of the extracted WFAs.

【Keywords】:

651. Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data.

Paper Link】 【Pages】:5315-5322

【Authors】: Kyoung-Woon On ; Eun-Sol Kim ; Yu-Jung Heo ; Byoung-Tak Zhang

【Abstract】: Conventional sequential learning methods such as Recurrent Neural Networks (RNNs) focus on interactions between consecutive inputs, i.e. first-order Markovian dependency. However, most of sequential data, as seen with videos, have complex dependency structures that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods. Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering these complex structures of the video. The CB-GLNs represent video data as a graph, with nodes and edges corresponding to frames of the video and their dependencies respectively. The CB-GLNs find compositional dependencies of the data in multilevel graph forms via a parameterized kernel with graph-cut and a message passing framework. We evaluate the proposed method on the two different tasks for video understanding: Video theme classification (Youtube-8M dataset (Abu-El-Haija et al. 2016)) and Video Question and Answering (TVQA dataset(Lei et al. 2018)). The experimental results show that our model efficiently learns the semantic compositional structure of video data. Furthermore, our model achieves the highest performance in comparison to other baseline methods.

【Keywords】:

652. Uncorrected Least-Squares Temporal Difference with Lambda-Return.

Paper Link】 【Pages】:5323-5330

【Authors】: Takayuki Osogami

【Abstract】: Temporal difference, TD(λ), learning is a foundation of reinforcement learning and also of interest in its own right for the tasks of prediction. Recently, true online TD(λ) has been shown to closely approximate the “forward view” at every step, while conventional TD(λ) does this only at the end of an episode. We re-examine least-squares temporal difference, LSTD(λ), which has been derived from conventional TD(λ). We design Uncorrected LSTD(λ) in such a way that, when λ = 1, Uncorrected LSTD(1) is equivalent to the least-squares method for the linear regression of Monte Carlo (MC) return at every step, while conventional LSTD(1) has this equivalence only at the end of an episode, since the MC return is corrected to be unbiased. We prove that Uncorrected LSTD(λ) can have smaller variance than conventional LSTD(λ), and this allows Uncorrected LSTD(λ) to sometimes outperform conventional LSTD(λ) in practice. When λ = 0, however, Uncorrected LSTD(0) is not equivalent to LSTD. We thus also propose Mixed LSTD(λ), which % mixes the two LSTD(λ)s in a way that it matches conventional LSTD(λ) at λ = 0 and Uncorrected LSTD(λ) at λ = 1. In numerical experiments, we study how the three LSTD(λ)s behave under limited training data.

【Keywords】:

653. Linear Bandits with Feature Feedback.

Paper Link】 【Pages】:5331-5338

【Authors】: Urvashi Oswal ; Aniruddha Bhargava ; Robert Nowak

【Abstract】: This paper explores a new form of the linear bandit problem in which the algorithm receives the usual stochastic rewards as well as stochastic feedback about which features are relevant to the rewards, the latter feedback being the novel aspect. The focus of this paper is the development of new theory and algorithms for linear bandits with feature feedback which can achieve regret over time horizon T that scales like k√T, without prior knowledge of which features are relevant nor the number k of relevant features. In comparison, the regret of traditional linear bandits is d√T, where d is the total number of (relevant and irrelevant) features, so the improvement can be dramatic if k ≪ d. The computational complexity of the algorithm is proportional to k rather than d, making it much more suitable for real-world applications compared to traditional linear bandits. We demonstrate the performance of the algorithm with synthetic and real human-labeled data.

【Keywords】:

654. Overcoming Catastrophic Forgetting by Neuron-Level Plasticity Control.

Paper Link】 【Pages】:5339-5346

【Authors】: Inyoung Paik ; Sangjun Oh ; Taeyeong Kwak ; Injung Kim

【Abstract】: To address the issue of catastrophic forgetting in neural networks, we propose a novel, simple, and effective solution called neuron-level plasticity control (NPC). While learning a new task, the proposed method preserves the existing knowledge from the previous tasks by controlling the plasticity of the network at the neuron level. NPC estimates the importance value of each neuron and consolidates important neurons by applying lower learning rates, rather than restricting individual connection weights to stay close to the values optimized for the previous tasks. The experimental results on the several datasets show that neuron-level consolidation is substantially more effective compared to connection-level consolidation approaches.

【Keywords】:

655. Adversarial Localized Energy Network for Structured Prediction.

Paper Link】 【Pages】:5347-5354

【Authors】: Pingbo Pan ; Ping Liu ; Yan Yan ; Tianbao Yang ; Yi Yang

【Abstract】: This paper focuses on energy model based structured output prediction. Though inheriting the benefits from energy-based models to handle the sophisticated cases, previous deep energy-based methods suffered from the substantial computation cost introduced by the enormous amounts of gradient steps in the inference process. To boost the efficiency and accuracy of the energy-based models on structured output prediction, we propose a novel method analogous to the adversarial learning framework. Specifically, in our proposed framework, the generator consists of an inference network while the discriminator is comprised of an energy network. The two sub-modules, i.e., the inference network and the energy network, can benefit each other mutually during the whole computation process. On the one hand, our modified inference network can boost the efficiency by predicting good initializations and reducing the searching space for the inference process; On the other hand, inheriting the benefits of the energy network, the energy module in our network can evaluate the quality of the generated output from the inference network and correspondingly provides a resourceful guide to the training of the inference network. In the ideal case, the adversarial learning strategy makes sure the two sub-modules can achieve an equilibrium state after steps. We conduct extensive experiments to verify the effectiveness and efficiency of our proposed method.

【Keywords】:

656. Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks.

Paper Link】 【Pages】:5355-5362

【Authors】: Fabio Pardo ; Vitaly Levdik ; Petar Kormushev

【Abstract】: Being able to reach any desired location in the environment can be a valuable asset for an agent. Learning a policy to navigate between all pairs of states individually is often not feasible. An all-goals updating algorithm uses each transition to learn Q-values towards all goals simultaneously and off-policy. However the expensive numerous updates in parallel limited the approach to small tabular cases so far. To tackle this problem we propose to use convolutional network architectures to generate Q-values and updates for a large number of goals at once. We demonstrate the accuracy and generalization qualities of the proposed method on randomly generated mazes and Sokoban puzzles. In the case of on-screen goal coordinates the resulting mapping from frames to distance-maps directly informs the agent about which places are reachable and in how many steps. As an example of application we show that replacing the random actions in ε-greedy exploration by several actions towards feasible goals generates better exploratory trajectories on Montezuma's Revenge and Super Mario All-Stars games.

【Keywords】:

657. EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs.

Paper Link】 【Pages】:5363-5370

【Authors】: Aldo Pareja ; Giacomo Domeniconi ; Jie Chen ; Tengfei Ma ; Toyotaro Suzumura ; Hiroki Kanezashi ; Tim Kaler ; Tao B. Schardl ; Charles E. Leiserson

【Abstract】: Graph representation learning resurges as a trending research subject owing to the widespread use of deep learning for Euclidean data, which inspire various creative designs of neural networks in the non-Euclidean domain, particularly graphs. With the success of these graph neural networks (GNN) in the static setting, we approach further practical scenarios where the graph dynamically evolves. Existing approaches typically resort to node embeddings and use a recurrent neural network (RNN, broadly speaking) to regulate the embeddings and learn the temporal dynamics. These methods require the knowledge of a node in the full time span (including both training and testing) and are less applicable to the frequent change of the node set. In some extreme scenarios, the node sets at different time steps may completely differ. To resolve this challenge, we propose EvolveGCN, which adapts the graph convolutional network (GCN) model along the temporal dimension without resorting to node embeddings. The proposed approach captures the dynamism of the graph sequence through using an RNN to evolve the GCN parameters. Two architectures are considered for the parameter evolution. We evaluate the proposed approach on tasks including link prediction, edge classification, and node classification. The experimental results indicate a generally higher performance of EvolveGCN compared with related approaches. The code is available at https://github.com/IBM/EvolveGCN.

【Keywords】:

658. Unsupervised Attributed Multiplex Network Embedding.

Paper Link】 【Pages】:5371-5378

【Authors】: Chanyoung Park ; Donghyun Kim ; Jiawei Han ; Hwanjo Yu

【Abstract】: Nodes in a multiplex network are connected by multiple types of relations. However, most existing network embedding methods assume that only a single type of relation exists between nodes. Even for those that consider the multiplexity of a network, they overlook node attributes, resort to node labels for training, and fail to model the global properties of a graph. We present a simple yet effective unsupervised network embedding method for attributed multiplex network called DMGI, inspired by Deep Graph Infomax (DGI) that maximizes the mutual information between local patches of a graph, and the global representation of the entire graph. We devise a systematic way to jointly integrate the node embeddings from multiple graphs by introducing 1) the consensus regularization framework that minimizes the disagreements among the relation-type specific node embeddings, and 2) the universal discriminator that discriminates true samples regardless of the relation types. We also show that the attention mechanism infers the importance of each relation type, and thus can be useful for filtering unnecessary relation types as a preprocessing step. Extensive experiments on various downstream tasks demonstrate that DMGI outperforms the state-of-the-art methods, even though DMGI is fully unsupervised.

【Keywords】:

659. Achieving Fairness in the Stochastic Multi-Armed Bandit Problem.

Paper Link】 【Pages】:5379-5386

【Authors】: Vishakha Patil ; Ganesh Ghalme ; Vineet Nair ; Y. Narahari

【Abstract】: We study an interesting variant of the stochastic multi-armed bandit problem, which we call the Fair-MAB problem, where, in addition to the objective of maximizing the sum of expected rewards, the algorithm also needs to ensure that at any time, each arm is pulled at least a pre-specified fraction of times. We investigate the interplay between learning and fairness in terms of a pre-specified vector denoting the fractions of guaranteed pulls. We define a fairness-aware regret, which we call r-Regret, that takes into account the above fairness constraints and extends the conventional notion of regret in a natural way. Our primary contribution is to obtain a complete characterization of a class of Fair-MAB algorithms via two parameters: the unfairness tolerance and the learning algorithm used as a black-box. For this class of algorithms, we provide a fairness guarantee that holds uniformly over time, irrespective of the choice of the learning algorithm. Further, when the learning algorithm is UCB1, we show that our algorithm achieves constant r-Regret for a large enough time horizon. Finally, we analyze the cost of fairness in terms of the conventional notion of regret. We conclude by experimentally validating our theoretical results.

【Keywords】:

660. Motif-Matching Based Subgraph-Level Attentional Convolutional Network for Graph Classification.

Paper Link】 【Pages】:5387-5394

【Authors】: Hao Peng ; Jianxin Li ; Qiran Gong ; Yuanxing Ning ; Senzhang Wang ; Lifang He

【Abstract】: Graph classification is critically important to many real-world applications that are associated with graph data such as chemical drug analysis and social network mining. Traditional methods usually require feature engineering to extract the graph features that can help discriminate the graphs of different classes. Although recently deep learning based graph embedding approaches are proposed to automatically learn graph features, they mostly use a few vertex arrangements extracted from the graph for feature learning, which may lose some structural information. In this work, we present a novel motif-based attentional graph convolution neural network for graph classification, which can learn more discriminative and richer graph features. Specifically, a motif-matching guided subgraph normalization method is developed to better preserve the spatial information. A novel subgraph-level self-attention network is also proposed to capture the different impacts or weights of different subgraphs. Experimental results on both bioinformatics and social network datasets show that the proposed models significantly improve graph classification performance over both traditional graph kernel methods and recent deep learning approaches.

【Keywords】:

661. A Bayesian Approach for Estimating Causal Effects from Observational Data.

Paper Link】 【Pages】:5395-5402

【Authors】: Johan Pensar ; Topi Talvitie ; Antti Hyttinen ; Mikko Koivisto

【Abstract】: We present a novel Bayesian method for the challenging task of estimating causal effects from passively observed data when the underlying causal DAG structure is unknown. To rigorously capture the inherent uncertainty associated with the estimate, our method builds a Bayesian posterior distribution of the linear causal effect, by integrating Bayesian linear regression and averaging over DAGs. For computing the exact posterior for all cause-effect variable pairs, we give an algorithm that runs in time O(3d d) for d variables, being feasible up to 20 variables. We also give a variant that computes the posterior probabilities of all pairwise ancestor relations within the same time complexity, significantly improving the fastest previous algorithm. In simulations, our Bayesian method outperforms previous methods in estimation accuracy, especially for small sample sizes. We further show that our method for effect estimation is well-adapted for detecting strong causal effects markedly deviating from zero, while our variant for computing posteriors of ancestor relations is the method of choice for detecting the mere existence of a causal relation. Finally, we apply our method on observational flow cytometry data, detecting several causal relations that concur with previous findings from experimental data.

【Keywords】:

662. Generalized Hidden Parameter MDPs: Transferable Model-Based RL in a Handful of Trials.

Paper Link】 【Pages】:5403-5411

【Authors】: Christian F. Perez ; Felipe Petroski Such ; Theofanis Karaletsos

【Abstract】: There is broad interest in creating RL agents that can solve many (related) tasks and adapt to new tasks and environments after initial training. Model-based RL leverages learned surrogate models that describe dynamics and rewards of individual tasks, such that planning in a good surrogate can lead to good control of the true system. Rather than solving each task individually from scratch, hierarchical models can exploit the fact that tasks are often related by (unobserved) causal factors of variation in order to achieve efficient generalization, as in learning how the mass of an item affects the force required to lift it can generalize to previously unobserved masses. We propose Generalized Hidden Parameter MDPs (GHP-MDPs) that describe a family of MDPs where both dynamics and reward can change as a function of hidden parameters that vary across tasks. The GHP-MDP augments model-based RL with latent variables that capture these hidden parameters, facilitating transfer across tasks. We also explore a variant of the model that incorporates explicit latent structure mirroring the causal factors of variation across tasks (for instance: agent properties, environmental factors, and goals). We experimentally demonstrate state-of-the-art performance and sample-efficiency on a new challenging MuJoCo task using reward and dynamics latent spaces, while beating a previous state-of-the-art baseline with > 10× less data. Using test-time inference of the latent variables, our approach generalizes in a single episode to novel combinations of dynamics and reward, and to novel rewards.

【Keywords】:

663. CAG: A Real-Time Low-Cost Enhanced-Robustness High-Transferability Content-Aware Adversarial Attack Generator.

Paper Link】 【Pages】:5412-5419

【Authors】: Huy Phan ; Yi Xie ; Siyu Liao ; Jie Chen ; Bo Yuan

【Abstract】: Deep neural networks (DNNs) are vulnerable to adversarial attack despite their tremendous success in many artificial intelligence fields. Adversarial attack is a method that causes the intended misclassfication by adding imperceptible perturbations to legitimate inputs. To date, researchers have developed numerous types of adversarial attack methods. However, from the perspective of practical deployment, these methods suffer from several drawbacks such as long attack generating time, high memory cost, insufficient robustness and low transferability. To address the drawbacks, we propose a Content-aware Adversarial Attack Generator (CAG) to achieve real-time, low-cost, enhanced-robustness and high-transferability adversarial attack. First, as a type of generative model-based attack, CAG shows significant speedup (at least 500 times) in generating adversarial examples compared to the state-of-the-art attacks such as PGD and C&W. Furthermore, CAG only needs a single generative model to perform targeted attack to any targeted class. Because CAG encodes the label information into a trainable embedding layer, it differs from prior generative model-based adversarial attacks that use n different copies of generative models for n different targeted classes. As a result, CAG significantly reduces the required memory cost for generating adversarial examples. Moreover, CAG can generate adversarial perturbations that focus on the critical areas of input by integrating the class activation maps information in the training process, and hence improve the robustness of CAG attack against the state-of-art adversarial defenses. In addition, CAG exhibits high transferability across different DNN classifier models in black-box attack scenario by introducing random dropout in the process of generating perturbations. Extensive experiments on different datasets and DNN models have verified the real-time, low-cost, enhanced-robustness, and high-transferability benefits of CAG.

【Keywords】:

664. Diversified Bayesian Nonnegative Matrix Factorization.

Paper Link】 【Pages】:5420-5427

【Authors】: Maoying Qiao ; Jun Yu ; Tongliang Liu ; Xinchao Wang ; Dacheng Tao

【Abstract】: Nonnegative matrix factorization (NMF) has been widely employed in a variety of scenarios due to its capability of inducing semantic part-based representation. However, because of the non-convexity of its objective, the factorization is generally not unique and may inaccurately discover intrinsic “parts” from the data. In this paper, we approach this issue using a Bayesian framework. We propose to assign a diversity prior to the parts of the factorization to induce correctness based on the assumption that useful parts should be distinct and thus well-spread. A Bayesian framework including this diversity prior is then established. This framework aims at inducing factorizations embracing both good data fitness from maximizing likelihood and large separability from the diversity prior. Specifically, the diversity prior is formulated with determinantal point processes (DPP) and is seamlessly embedded into a Bayesian NMF framework. To carry out the inference, a Monte Carlo Markov Chain (MCMC) based procedure is derived. Experiments conducted on a synthetic dataset and a real-world MULAN dataset for multi-label learning (MLL) task demonstrate the superiority of the proposed method.

【Keywords】:

665. Stochastic Approximate Gradient Descent via the Langevin Algorithm.

Paper Link】 【Pages】:5428-5435

【Authors】: Yixuan Qiu ; Xiao Wang

【Abstract】: We introduce a novel and efficient algorithm called the stochastic approximate gradient descent (SAGD), as an alternative to the stochastic gradient descent for cases where unbiased stochastic gradients cannot be trivially obtained. Traditional methods for such problems rely on general-purpose sampling techniques such as Markov chain Monte Carlo, which typically requires manual intervention for tuning parameters and does not work efficiently in practice. Instead, SAGD makes use of the Langevin algorithm to construct stochastic gradients that are biased in finite steps but accurate asymptotically, enabling us to theoretically establish the convergence guarantee for SAGD. Inspired by our theoretical analysis, we also provide useful guidelines for its practical implementation. Finally, we show that SAGD performs well experimentally in popular statistical and machine learning problems such as the expectation-maximization algorithm and the variational autoencoders.

【Keywords】:

666. Temporal Network Embedding with High-Order Nonlinear Information.

Paper Link】 【Pages】:5436-5443

【Authors】: Zhenyu Qiu ; Wenbin Hu ; Jia Wu ; Weiwei Liu ; Bo Du ; Xiaohua Jia

【Abstract】: Temporal network embedding, which aims to learn the low-dimensional representations of nodes in temporal networks that can capture and preserve the network structure and evolution pattern, has attracted much attention from the scientific community. However, existing methods suffer from two main disadvantages: 1) they cannot preserve the node temporal proximity that capture important properties of the network structure; and 2) they cannot represent the nonlinear structure of temporal networks. In this paper, we propose a high-order nonlinear information preserving (HNIP) embedding method to address these issues. Specifically, we define three orders of temporal proximities by exploring network historical information with a time exponential decay model to quantify the temporal proximity between nodes. Then, we propose a novel deep guided auto-encoder to capture the highly nonlinear structure. Meanwhile, the training set of the guide auto-encoder is generated by the temporal random walk (TRW) algorithm. By training the proposed deep guided auto-encoder with a specific mini-batch stochastic gradient descent algorithm, HNIP can efficiently preserves the temporal proximities and highly nonlinear structure of temporal networks. Experimental results on four real-world networks demonstrate the effectiveness of the proposed method.

【Keywords】:

667. A New Burrows Wheeler Transform Markov Distance.

Paper Link】 【Pages】:5444-5453

【Authors】: Edward Raff ; Charles Nicholas ; Mark McLean

【Abstract】: Prior work inspired by compression algorithms has described how the Burrows Wheeler Transform can be used to create a distance measure for bioinformatics problems. We describe issues with this approach that were not widely known, and introduce our new Burrows Wheeler Markov Distance (BWMD) as an alternative. The BWMD avoids the shortcomings of earlier efforts, and allows us to tackle problems in variable length DNA sequence clustering. BWMD is also more adaptable to other domains, which we demonstrate on malware classification tasks. Unlike other compression-based distance metrics known to us, BWMD works by embedding sequences into a fixed-length feature vector. This allows us to provide significantly improved clustering performance on larger malware corpora, a weakness of prior methods.

【Keywords】:

668. How Should an Agent Practice?

Paper Link】 【Pages】:5454-5461

【Authors】: Janarthanan Rajendran ; Richard L. Lewis ; Vivek Veeriah ; Honglak Lee ; Satinder Singh

【Abstract】: We present a method for learning intrinsic reward functions to drive the learning of an agent during periods of practice in which extrinsic task rewards are not available. During practice, the environment may differ from the one available for training and evaluation with extrinsic rewards. We refer to this setup of alternating periods of practice and objective evaluation as practice-match, drawing an analogy to regimes of skill acquisition common for humans in sports and games. The agent must effectively use periods in the practice environment so that performance improves during matches. In the proposed method the intrinsic practice reward is learned through a meta-gradient approach that adapts the practice reward parameters to reduce the extrinsic match reward loss computed from matches. We illustrate the method on a simple grid world, and evaluate it in two games in which the practice environment differs from match: Pong with practice against a wall without an opponent, and PacMan with practice in a maze without ghosts. The results show gains from learning in practice in addition to match periods over learning in matches only.

【Keywords】:

669. Synthesizing Action Sequences for Modifying Model Decisions.

Paper Link】 【Pages】:5462-5469

【Authors】: Goutham Ramakrishnan ; Yun Chan Lee ; Aws Albarghouthi

【Abstract】: When a model makes a consequential decision, e.g., denying someone a loan, it needs to additionally generate actionable, realistic feedback on what the person can do to favorably change the decision. We cast this problem through the lens of program synthesis, in which our goal is to synthesize an optimal (realistically cheapest or simplest) sequence of actions that if a person executes successfully can change their classification. We present a novel and general approach that combines search-based program synthesis and test-time adversarial attacks to construct action sequences over a domain-specific set of actions. We demonstrate the effectiveness of our approach on a number of deep neural networks.

【Keywords】:

670. ASAP: Adaptive Structure Aware Pooling for Learning Hierarchical Graph Representations.

Paper Link】 【Pages】:5470-5477

【Authors】: Ekagra Ranjan ; Soumya Sanyal ; Partha P. Talukdar

【Abstract】: Graph Neural Networks (GNN) have been shown to work effectively for modeling graph structured data to solve tasks such as node classification, link prediction and graph classification. There has been some recent progress in defining the notion of pooling in graphs whereby the model tries to generate a graph level representation by downsampling and summarizing the information present in the nodes. Existing pooling methods either fail to effectively capture the graph substructure or do not easily scale to large graphs. In this work, we propose ASAP (Adaptive Structure Aware Pooling), a sparse and differentiable pooling method that addresses the limitations of previous graph pooling architectures. ASAP utilizes a novel self-attention network along with a modified GNN formulation to capture the importance of each node in a given graph. It also learns a sparse soft cluster assignment for nodes at each layer to effectively pool the subgraphs to form the pooled graph. Through extensive experiments on multiple datasets and theoretical analysis, we motivate our choice of the components used in ASAP. Our experimental results show that combining existing GNN architectures with ASAP leads to state-of-the-art results on multiple graph classification benchmarks. ASAP has an average improvement of 4%, compared to current sparse hierarchical state-of-the-art method. We make the source code of ASAP available to encourage reproducible research 1.

【Keywords】:

671. Abstract Interpretation of Decision Tree Ensemble Classifiers.

Paper Link】 【Pages】:5478-5486

【Authors】: Francesco Ranzato ; Marco Zanella

【Abstract】: We study the problem of formally and automatically verifying robustness properties of decision tree ensemble classifiers such as random forests and gradient boosted decision tree models. A recent stream of works showed how abstract interpretation, which is ubiquitously used in static program analysis, can be successfully deployed to formally verify (deep) neural networks. In this work we push forward this line of research by designing a general and principled abstract interpretation-based framework for the formal verification of robustness and stability properties of decision tree ensemble models. Our abstract interpretation-based method may induce complete robustness checks of standard adversarial perturbations and output concrete adversarial attacks. We implemented our abstract verification technique in a tool called silva, which leverages an abstract domain of not necessarily closed real hyperrectangles and is instantiated to verify random forests and gradient boosted decision trees. Our experimental evaluation on the MNIST dataset shows that silva provides a precise and efficient tool which advances the current state of the art in tree ensembles verification.

【Keywords】:

672. Optimizing Nondecomposable Data Dependent Regularizers via Lagrangian Reparameterization Offers Significant Performance and Efficiency Gains.

Paper Link】 【Pages】:5487-5494

【Authors】: Sathya N. Ravi ; Abhay Venkatesh ; Glenn M. Fung ; Vikas Singh

【Abstract】: Data dependent regularization is known to benefit a wide variety of problems in machine learning. Often, these regularizers cannot be easily decomposed into a sum over a finite number of terms, e.g., a sum over individual example-wise terms. The Fβ measure, Area under the ROC curve (AUCROC) and Precision at a fixed recall (P@R) are some prominent examples that are used in many applications. We find that for most medium to large sized datasets, scalability issues severely limit our ability in leveraging the benefits of such regularizers. Importantly, the key technical impediment despite some recent progress is that, such objectives remain difficult to optimize via backpropapagation procedures. While an efficient general-purpose strategy for this problem still remains elusive, in this paper, we show that for many data-dependent nondecomposable regularizers that are relevant in applications, sizable gains in efficiency are possible with minimal code-level changes; in other words, no specialized tools or numerical schemes are needed. Our procedure involves a reparameterization followed by a partial dualization – this leads to a formulation that has provably cheap projection operators. We present a detailed analysis of runtime and convergence properties of our algorithm. On the experimental side, we show that a direct use of our scheme significantly improves the state of the art IOU measures reported for MSCOCO Stuff segmentation dataset.

【Keywords】:

673. DARB: A Density-Adaptive Regular-Block Pruning for Deep Neural Networks.

Paper Link】 【Pages】:5495-5502

【Authors】: Ao Ren ; Tao Zhang ; Yuhao Wang ; Sheng Lin ; Peiyan Dong ; Yen-Kuang Chen ; Yuan Xie ; Yanzhi Wang

【Abstract】: The rapidly growing parameter volume of deep neural networks (DNNs) hinders the artificial intelligence applications on resource constrained devices, such as mobile and wearable devices. Neural network pruning, as one of the mainstream model compression techniques, is under extensive study to reduce the model size and thus the amount of computation. And thereby, the state-of-the-art DNNs are able to be deployed on those devices with high runtime energy efficiency. In contrast to irregular pruning that incurs high index storage and decoding overhead, structured pruning techniques have been proposed as the promising solutions. However, prior studies on structured pruning tackle the problem mainly from the perspective of facilitating hardware implementation, without diving into the deep to analyze the characteristics of sparse neural networks. The neglect on the study of sparse neural networks causes inefficient trade-off between regularity and pruning ratio. Consequently, the potential of structurally pruning neural networks is not sufficiently mined.In this work, we examine the structural characteristics of the irregularly pruned weight matrices, such as the diverse redundancy of different rows, the sensitivity of different rows to pruning, and the position characteristics of retained weights. By leveraging the gained insights as a guidance, we first propose the novel block-max weight masking (BMWM) method, which can effectively retain the salient weights while imposing high regularity to the weight matrix. As a further optimization, we propose a density-adaptive regular-block (DARB) pruning that can effectively take advantage of the intrinsic characteristics of neural networks, and thereby outperform prior structured pruning work with high pruning ratio and decoding efficiency. Our experimental results show that DARB can achieve 13× to 25× pruning ratio, which are 2.8× to 4.3× improvements than the state-of-the-art counterparts on multiple neural network models and tasks. Moreover, DARB can achieve 14.3× decoding efficiency than block pruning with higher pruning ratio.

【Keywords】:

674. Delay-Adaptive Distributed Stochastic Optimization.

Paper Link】 【Pages】:5503-5510

【Authors】: Zhaolin Ren ; Zhengyuan Zhou ; Linhai Qiu ; Ajay Deshpande ; Jayant Kalagnanam

【Abstract】: In large-scale optimization problems, distributed asynchronous stochastic gradient descent (DASGD) is a commonly used algorithm. In most applications, there are often a large number of computing nodes asynchronously computing gradient information. As such, the gradient information received at a given iteration is often stale. In the presence of such delays, which can be unbounded, the convergence of DASGD is uncertain. The contribution of this paper is twofold. First, we propose a delay-adaptive variant of DASGD where we adjust each iteration's step-size based on the size of the delay, and prove asymptotic convergence of the algorithm on variationally coherent stochastic problems, a class of functions which properly includes convex, quasi-convex and star-convex functions. Second, we extend the convergence results of standard DASGD, used usually for problems with bounded domains, to problems with unbounded domains. In this way, we extend the frontier of theoretical guarantees for distributed asynchronous optimization, and provide new insights for practitioners working on large-scale optimization problems.

【Keywords】:

675. Fairness for Robust Log Loss Classification.

Paper Link】 【Pages】:5511-5518

【Authors】: Ashkan Rezaei ; Rizal Fathony ; Omid Memarrast ; Brian D. Ziebart

【Abstract】: Developing classification methods with high accuracy that also avoid unfair treatment of different groups has become increasingly important for data-driven decision making in social applications. Many existing methods enforce fairness constraints on a selected classifier (e.g., logistic regression) by directly forming constrained optimizations. We instead re-derive a new classifier from the first principles of distributional robustness that incorporates fairness criteria into a worst-case logarithmic loss minimization. This construction takes the form of a minimax game and produces a parametric exponential family conditional distribution that resembles truncated logistic regression. We present the theoretical benefits of our approach in terms of its convexity and asymptotic convergence. We then demonstrate the practical advantages of our approach on three benchmark fairness datasets.

【Keywords】:

676. On the Role of Weight Sharing During Deep Option Learning.

Paper Link】 【Pages】:5519-5526

【Authors】: Matthew Riemer ; Ignacio Cases ; Clemens Rosenbaum ; Miao Liu ; Gerald Tesauro

【Abstract】: The options framework is a popular approach for building temporally extended actions in reinforcement learning. In particular, the option-critic architecture provides general purpose policy gradient theorems for learning actions from scratch that are extended in time. However, past work makes the key assumption that each of the components of option-critic has independent parameters. In this work we note that while this key assumption of the policy gradient theorems of option-critic holds in the tabular case, it is always violated in practice for the deep function approximation setting. We thus reconsider this assumption and consider more general extensions of option-critic and hierarchical option-critic training that optimize for the full architecture with each update. It turns out that not assuming parameter independence challenges a belief in prior work that training the policy over options can be disentangled from the dynamics of the underlying options. In fact, learning can be sped up by focusing the policy over options on states where options are actually likely to terminate. We put our new algorithms to the test in application to sample efficient learning of Atari games, and demonstrate significantly improved stability and faster convergence when learning long options. 1

【Keywords】:

677. Ensembles of Locally Independent Prediction Models.

Paper Link】 【Pages】:5527-5536

【Authors】: Andrew Slavin Ross ; Weiwei Pan ; Leo A. Celi ; Finale Doshi-Velez

【Abstract】: Ensembles depend on diversity for improved performance. Many ensemble training methods, therefore, attempt to optimize for diversity, which they almost always define in terms of differences in training set predictions. In this paper, however, we demonstrate the diversity of predictions on the training set does not necessarily imply diversity under mild covariate shift, which can harm generalization in practical settings. To address this issue, we introduce a new diversity metric and associated method of training ensembles of models that extrapolate differently on local patches of the data manifold. Across a variety of synthetic and real-world tasks, we find that our method improves generalization and diversity in qualitatively novel ways, especially under data limits and covariate shift.

【Keywords】:

678. Actionable Ethics through Neural Learning.

Paper Link】 【Pages】:5537-5544

【Authors】: Daniele Rossini ; Danilo Croce ; Sara Mancini ; Massimo Pellegrino ; Roberto Basili

【Abstract】: While AI is going to produce a great impact on society, its alignment with human values and expectations is an essential step towards a correct harnessing of AI potentials for good. There is a corresponding growing need for mature and established technical standards to enable the assessment of an AI application as the evaluation of its graded adherence to formalized ethics. This is clearly dependent on methods to inject ethical awareness at all stages of an AI application development and use. For this reason we introduce the notion of Embedding Principles of ethics by Design (EPbD) as a comprehensive inductive framework. Although extending generic AI applications, it mainly aims at learning the ethical behaviour through numerical optimization, i.e. deep neural models. The core idea is to support ethics by integrating automated reasoning over formal knowledge and induction from ethically enriched training data. A deep neural network is proposed here to model both the functional as well as the ethical conditions characterizing a target decision. In this way, the discovery of latent ethical knowledge is enabled and made available to the learning process. The application of the above framework to a banking application, i.e. AI-driven Digital Lending, is used to show how accurate classification can be achieved without neglecting the ethical dimension. Results over existing datasets demonstrate that the ethical compliance of the sources can be used to output models able to optimally fine tune the balance between business and ethical accuracy.

【Keywords】:

679. Generative Continual Concept Learning.

Paper Link】 【Pages】:5545-5552

【Authors】: Mohammad Rostami ; Soheil Kolouri ; Praveen K. Pilly ; James L. McClelland

【Abstract】: After learning a concept, humans are also able to continually generalize their learned concepts to new domains by observing only a few labeled instances without any interference with the past learned knowledge. In contrast, learning concepts efficiently in a continual learning setting remains an open challenge for current Artificial Intelligence algorithms as persistent model retraining is necessary. Inspired by the Parallel Distributed Processing learning and the Complementary Learning Systems theories, we develop a computational model that is able to expand its previously learned concepts efficiently to new domains using a few labeled samples. We couple the new form of a concept to its past learned forms in an embedding space for effective continual learning. Doing so, a generative distribution is learned such that it is shared across the tasks in the embedding space and models the abstract concepts. This procedure enables the model to generate pseudo-data points to replay the past experience to tackle catastrophic forgetting.

【Keywords】:

680. Linear Context Transform Block.

Paper Link】 【Pages】:5553-5560

【Authors】: Dongsheng Ruan ; Jun Wen ; Nenggan Zheng ; Min Zheng

【Abstract】: Squeeze-and-Excitation (SE) block presents a channel attention mechanism for modeling global context via explicitly capturing dependencies across channels. However, we are still far from understanding how the SE block works. In this work, we first revisit the SE block, and then present a detailed empirical study of the relationship between global context and attention distribution, based on which we propose a simple yet effective module, called Linear Context Transform (LCT) block. We divide all channels into different groups and normalize the globally aggregated context features within each channel group, reducing the disturbance from irrelevant channels. Through linear transform of the normalized context features, we model global context for each channel independently. The LCT block is extremely lightweight and easy to be plugged into different backbone models while with negligible parameters and computational burden increase. Extensive experiments show that the LCT block outperforms the SE block in image classification task on the ImageNet and object detection/segmentation on the COCO dataset with different backbone models. Moreover, LCT yields consistent performance gains over existing state-of-the-art detection architectures, e.g., 1.5∼1.7% APbbox and 1.0%∼1.2% APmask improvements on the COCO benchmark, irrespective of different baseline models of varied capacities. We hope our simple yet effective approach will shed some light on future research of attention-based models.

【Keywords】:

681. Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations.

Paper Link】 【Pages】:5561-5569

【Authors】: Nadine Rueegg ; Christoph Lassner ; Michael J. Black ; Konrad Schindler

【Abstract】: The goal of many computer vision systems is to transform image pixels into 3D representations. Recent popular models use neural networks to regress directly from pixels to 3D object parameters. Such an approach works well when supervision is available, but in problems like human pose and shape estimation, it is difficult to obtain natural images with 3D ground truth. To go one step further, we propose a new architecture that facilitates unsupervised, or lightly supervised, learning. The idea is to break the problem into a series of transformations between increasingly abstract representations. Each step involves a cycle designed to be learnable without annotated training data, and the chain of cycles delivers the final solution. Specifically, we use 2D body part segments as an intermediate representation that contains enough information to be lifted to 3D, and at the same time is simple enough to be learned in an unsupervised way. We demonstrate the method by learning 3D human pose and shape from un-paired and un-annotated images. We also explore varying amounts of paired data and show that cycling greatly alleviates the need for paired data. While we present results for modeling humans, our formulation is general and can be applied to other vision problems.

【Keywords】:

682. Weakly Supervised Sequence Tagging from Noisy Rules.

Paper Link】 【Pages】:5570-5578

【Authors】: Esteban Safranchik ; Shiying Luo ; Stephen H. Bach

【Abstract】: We propose a framework for training sequence tagging models with weak supervision consisting of multiple heuristic rules of unknown accuracy. In addition to supporting rules that vote on tags in the output sequence, we introduce a new type of weak supervision, called linking rules, that vote on how sequence elements should be grouped into spans with the same tag. These rules are an alternative to candidate span generators that require significantly more human effort. To estimate the accuracies of the rules and combine their conflicting outputs into training data, we introduce a new type of generative model, linked hidden Markov models (linked HMMs), and prove they are generically identifiable (up to a tag permutation) without any observed training labels. We find that linked HMMs provide an average 7 F1 point boost on benchmark named entity recognition tasks versus generative models that assume the tags are i.i.d. Further, neural sequence taggers trained with these structure-aware generative models outperform comparable state-of-the-art approaches to weak supervision by an average of 2.6 F1 points.

【Keywords】:

683. Random Intersection Graphs and Missing Data.

Paper Link】 【Pages】:5579-5585

【Authors】: Dror Salti ; Yakir Berchenko

【Abstract】: Random-graphs and statistical inference with missing data are two separate topics that have been widely explored each in its field. In this paper we demonstrate the relationship between these two different topics and take a novel view of the data matrix as a random intersection graph. We use graph properties and theoretical results from random-graph theory, such as connectivity and the emergence of the giant component, to identify two threshold phenomena in statistical inference with missing data: loss of identifiability and slower convergence of algorithms that are pertinent to statistical inference such as expectation-maximization (EM). We provide two examples corresponding to these threshold phenomena and illustrate the theoretical predictions with simulations that are consistent with our reduction.

【Keywords】:

684. Rank3DGAN: Semantic Mesh Generation Using Relative Attributes.

Paper Link】 【Pages】:5586-5594

【Authors】: Yassir Saquil ; Qun-Ce Xu ; Yong-Liang Yang ; Peter Hall

【Abstract】: In this paper, we investigate a novel problem of using generative adversarial networks in the task of 3D shape generation according to semantic attributes. Recent works map 3D shapes into 2D parameter domain, which enables training Generative Adversarial Networks (GANs) for 3D shape generation task. We extend these architectures to the conditional setting, where we generate 3D shapes with respect to subjective attributes defined by the user. Given pairwise comparisons of 3D shapes, our model performs two tasks: it learns a generative model with a controlled latent space, and a ranking function for the 3D shapes based on their multi-chart representation in 2D. The capability of the model is demonstrated with experiments on HumanShape, Basel Face Model and reconstructed 3D CUB datasets. We also present various applications that benefit from our model, such as multi-attribute exploration, mesh editing, and mesh attribute transfer.

【Keywords】:

685. Weighted Sampling for Combined Model Selection and Hyperparameter Tuning.

Paper Link】 【Pages】:5595-5603

【Authors】: Dimitrios Sarigiannis ; Thomas P. Parnell ; Haralampos Pozidis

【Abstract】: The combined algorithm selection and hyperparameter tuning (CASH) problem is characterized by large hierarchical hyperparameter spaces. Model-free hyperparameter tuning methods can explore such large spaces efficiently since they are highly parallelizable across multiple machines. When no prior knowledge or meta-data exists to boost their performance, these methods commonly sample random configurations following a uniform distribution. In this work, we propose a novel sampling distribution as an alternative to uniform sampling and prove theoretically that it has a better chance of finding the best configuration in a worst-case setting. In order to compare competing methods rigorously in an experimental setting, one must perform statistical hypothesis testing. We show that there is little-to-no agreement in the automated machine learning literature regarding which methods should be used. We contrast this disparity with the methods recommended by the broader statistics literature, and identify a suitable approach. We then select three popular model-free solutions to CASH and evaluate their performance, with uniform sampling as well as the proposed sampling scheme, across 67 datasets from the OpenML platform. We investigate the trade-off between exploration and exploitation across the three algorithms, and verify empirically that the proposed sampling distribution improves performance in all cases.

【Keywords】:

686. Graph Representation Learning via Ladder Gamma Variational Autoencoders.

Paper Link】 【Pages】:5604-5611

【Authors】: Arindam Sarkar ; Nikhil Mehta ; Piyush Rai

【Abstract】: We present a probabilistic framework for community discovery and link prediction for graph-structured data, based on a novel, gamma ladder variational autoencoder (VAE) architecture. We model each node in the graph via a deep hierarchy of gamma-distributed embeddings, and define each link probability via a nonlinear function of the bottom-most layer's embeddings of its associated nodes. In addition to leveraging the representational power of multiple layers of stochastic variables via the ladder VAE architecture, our framework offers the following benefits: (1) Unlike existing ladder VAE architectures based on real-valued latent variables, the gamma-distributed latent variables naturally result in non-negativity and sparsity of the learned embeddings, and facilitate their direct interpretation as membership of nodes into (possibly multiple) communities/topics; (2) A novel recognition model for our gamma ladder VAE architecture allows fast inference of node embeddings; and (3) The framework also extends naturally to incorporate node side information (features and/or labels). Our framework is also fairly modular and can leverage a wide variety of graph neural networks as the VAE encoder. We report both quantitative and qualitative results on several benchmark datasets and compare our model with several state-of-the-art methods.

【Keywords】:

687. Learning Counterfactual Representations for Estimating Individual Dose-Response Curves.

Paper Link】 【Pages】:5612-5619

【Authors】: Patrick Schwab ; Lorenz Linhardt ; Stefan Bauer ; Joachim M. Buhmann ; Walter Karlen

【Abstract】: Estimating what would be an individual's potential response to varying levels of exposure to a treatment is of high practical relevance for several important fields, such as healthcare, economics and public policy. However, existing methods for learning to estimate counterfactual outcomes from observational data are either focused on estimating average dose-response curves, or limited to settings with only two treatments that do not have an associated dosage parameter. Here, we present a novel machine-learning approach towards learning counterfactual representations for estimating individual dose-response curves for any number of treatments with continuous dosage parameters with neural networks. Building on the established potential outcomes framework, we introduce performance metrics, model selection criteria, model architectures, and open benchmarks for estimating individual dose-response curves. Our experiments show that the methods developed in this work set a new state-of-the-art in estimating individual dose-response.

【Keywords】:

688. Uncertainty-Aware Deep Classifiers Using Generative Models.

Paper Link】 【Pages】:5620-5627

【Authors】: Murat Sensoy ; Lance M. Kaplan ; Federico Cerutti ; Maryam Saleki

【Abstract】: Deep neural networks are often ignorant about what they do not know and overconfident when they make uninformed predictions. Some recent approaches quantify classification uncertainty directly by training the model to output high uncertainty for the data samples close to class boundaries or from the outside of the training distribution. These approaches use an auxiliary data set during training to represent out-of-distribution samples. However, selection or creation of such an auxiliary data set is non-trivial, especially for high dimensional data such as images. In this work we develop a novel neural network model that is able to express both aleatoric and epistemic uncertainty to distinguish decision boundary and out-of-distribution regions of the feature space. To this end, variational autoencoders and generative adversarial networks are incorporated to automatically generate out-of-distribution exemplars for training. Through extensive analysis, we demonstrate that the proposed approach provides better estimates of uncertainty for in- and out-of-distribution samples, and adversarial examples on well-known data sets against state-of-the-art approaches including recent Bayesian approaches for neural networks and anomaly detection methods.

【Keywords】:

689. Empirical Bounds on Linear Regions of Deep Rectifier Networks.

Paper Link】 【Pages】:5628-5635

【Authors】: Thiago Serra ; Srikumar Ramalingam

【Abstract】: We can compare the expressiveness of neural networks that use rectified linear units (ReLUs) by the number of linear regions, which reflect the number of pieces of the piecewise linear functions modeled by such networks. However, enumerating these regions is prohibitive and the known analytical bounds are identical for networks with same dimensions. In this work, we approximate the number of linear regions through empirical bounds based on features of the trained network and probabilistic inference. Our first contribution is a method to sample the activation patterns defined by ReLUs using universal hash functions. This method is based on a Mixed-Integer Linear Programming (MILP) formulation of the network and an algorithm for probabilistic lower bounds of MILP solution sets that we call MIPBound, which is considerably faster than exact counting and reaches values in similar orders of magnitude. Our second contribution is a tighter activation-based bound for the maximum number of linear regions, which is particularly stronger in networks with narrow layers. Combined, these bounds yield a fast proxy for the number of linear regions of a deep neural network.

【Keywords】:

690. Universal Adversarial Training.

Paper Link】 【Pages】:5636-5643

【Authors】: Ali Shafahi ; Mahyar Najibi ; Zheng Xu ; John P. Dickerson ; Larry S. Davis ; Tom Goldstein

【Abstract】: Standard adversarial attacks change the predicted class label of a selected image by adding specially tailored small perturbations to its pixels. In contrast, a universal perturbation is an update that can be added to any image in a broad class of images, while still changing the predicted class label. We study the efficient generation of universal adversarial perturbations, and also efficient methods for hardening networks to these attacks. We propose a simple optimization-based universal attack that reduces the top-1 accuracy of various network architectures on ImageNet to less than 20%, while learning the universal perturbation 13× faster than the standard method.To defend against these perturbations, we propose universal adversarial training, which models the problem of robust classifier generation as a two-player min-max game, and produces robust models with only 2× the cost of natural training. We also propose a simultaneous stochastic gradient method that is almost free of extra computation, which allows us to do universal adversarial training on ImageNet.

【Keywords】:

691. Sequential Mode Estimation with Oracle Queries.

Paper Link】 【Pages】:5644-5651

【Authors】: Dhruti Shah ; Tuhinangshu Choudhury ; Nikhil Karamchandani ; Aditya Gopalan

【Abstract】: We consider the problem of adaptively PAC-learning a probability distribution 𝒫's mode by querying an oracle for information about a sequence of i.i.d. samples X1, X2, … generated from 𝒫. We consider two different query models: (a) each query is an index i for which the oracle reveals the value of the sample Xi, (b) each query is comprised of two indices i and j for which the oracle reveals if the samples Xi and Xj are the same or not. For these query models, we give sequential mode-estimation algorithms which, at each time t, either make a query to the corresponding oracle based on past observations, or decide to stop and output an estimate for the distribution's mode, required to be correct with a specified confidence. We analyze the query complexity of these algorithms for any underlying distribution 𝒫, and derive corresponding lower bounds on the optimal query complexity under the two querying models.

【Keywords】:

692. Online Active Learning of Reject Option Classifiers.

Paper Link】 【Pages】:5652-5659

【Authors】: Kulin Shah ; Naresh Manwani

【Abstract】: Active learning is an important technique to reduce the number of labeled examples in supervised learning. Active learning for binary classification has been well addressed in machine learning. However, active learning of the reject option classifier remains unaddressed. In this paper, we propose novel algorithms for active learning of reject option classifiers. We develop an active learning algorithm using double ramp loss function. We provide mistake bounds for this algorithm. We also propose a new loss function called double sigmoid loss function for reject option and corresponding active learning algorithm. We offer a convergence guarantee for this algorithm. We provide extensive experimental results to show the effectiveness of the proposed algorithms. The proposed algorithms efficiently reduce the number of label examples required.

【Keywords】:

693. Improved PAC-Bayesian Bounds for Linear Regression.

Paper Link】 【Pages】:5660-5667

【Authors】: Vera Shalaeva ; Alireza Fakhrizadeh Esfahani ; Pascal Germain ; Mihály Petreczky

【Abstract】: In this paper, we improve the PAC-Bayesian error bound for linear regression derived in Germain et al. (2016). The improvements are two-fold. First, the proposed error bound is tighter, and converges to the generalization loss with a well-chosen temperature parameter. Second, the error bound also holds for training data that are not independently sampled. In particular, the error bound applies to certain time series generated by well-known classes of dynamical models, such as ARX models.

【Keywords】:

694. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs.

Paper Link】 【Pages】:5668-5675

【Authors】: Lior Shani ; Yonathan Efroni ; Shie Mannor

【Abstract】: Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be ‘close’ to one another, is iteratively solved. Nevertheless, TRPO has been considered a heuristic algorithm inspired by Conservative Policy Iteration (CPI). We show that the adaptive scaling mechanism used in TRPO is in fact the natural “RL version” of traditional trust-region methods from convex analysis. We first analyze TRPO in the planning setting, in which we have access to the model and the entire state space. Then, we consider sample-based TRPO and establish Õ(1/√N) convergence rate to the global optimum. Importantly, the adaptive scaling mechanism allows us to analyze TRPO in regularized MDPs for which we prove fast rates of Õ(1/N), much like results in convex optimization. This is the first result in RL of better rates when regularizing the instantaneous cost or reward.

【Keywords】:

695. Transfer Value Iteration Networks.

Paper Link】 【Pages】:5676-5683

【Authors】: Junyi Shen ; Hankz Hankui Zhuo ; Jin Xu ; Bin Zhong ; Sinno Jialin Pan

【Abstract】: Value iteration networks (VINs) have been demonstrated to have a good generalization ability for reinforcement learning tasks across similar domains. However, based on our experiments, a policy learned by VINs still fail to generalize well on the domain whose action space and feature space are not identical to those in the domain where it is trained. In this paper, we propose a transfer learning approach on top of VINs, termed Transfer VINs (TVINs), such that a learned policy from a source domain can be generalized to a target domain with only limited training data, even if the source domain and the target domain have domain-specific actions and features. We empirically verify that our proposed TVINs outperform VINs when the source and the target domains have similar but not identical action and feature spaces. Furthermore, we show that the performance improvement is consistent across different environments, maze sizes, dataset sizes as well as different values of hyperparameters such as number of iteration and kernel size.

【Keywords】:

696. AUC Optimization with a Reject Option.

Paper Link】 【Pages】:5684-5691

【Authors】: Song-Qing Shen ; Bin-Bin Yang ; Wei Gao

【Abstract】: Making an erroneous decision may cause serious results in diverse mission-critical tasks such as medical diagnosis and bioinformatics. Previous work focuses on classification with a reject option, i.e., abstain rather than classify an instance of low confidence. Most mission-critical tasks are always accompanied with class imbalance and cost sensitivity, where AUC has been shown a preferable measure than accuracy in classification. In this work, we propose the framework of AUC optimization with a reject option, and the basic idea is to withhold the decision of ranking a pair of positive and negative instances with a lower cost, rather than mis-ranking. We obtain the Bayes optimal solution for ranking, and learn the reject function and score function for ranking, simultaneously. An online algorithm has been developed for AUC optimization with a reject option, by considering the convex relaxation and plug-in rule. We verify, both theoretically and empirically, the effectiveness of the proposed algorithm.

【Keywords】:

697. Stable Learning via Sample Reweighting.

Paper Link】 【Pages】:5692-5699

【Authors】: Zheyan Shen ; Peng Cui ; Tong Zhang ; Kun Kuang

【Abstract】: We consider the problem of learning linear prediction models with model misspecification bias. In such case, the collinearity among input variables may inflate the error of parameter estimation, resulting in instability of prediction results when training and test distributions do not match. In this paper we theoretically analyze this fundamental problem and propose a sample reweighting method that reduces collinearity among input variables. Our method can be seen as a pretreatment of data to improve the condition of design matrix, and it can then be combined with any standard learning method for parameter estimation and variable selection. Empirical studies on both simulation and real datasets demonstrate the effectiveness of our method in terms of more stable performance across different distributed data.

【Keywords】:

698. Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference.

Paper Link】 【Pages】:5700-5708

【Authors】: Jianghao Shen ; Yue Wang ; Pengfei Xu ; Yonggan Fu ; Zhangyang Wang ; Yingyan Lin

【Abstract】: While increasingly deep networks are still in general desired for achieving state-of-the-art performance, for many specific inputs a simpler network might already suffice. Existing works exploited this observation by learning to skip convolutional layers in an input-dependent manner. However, we argue their binary decision scheme, i.e., either fully executing or completely bypassing one layer for a specific input, can be enhanced by introducing finer-grained, “softer” decisions. We therefore propose a Dynamic Fractional Skipping (DFS) framework. The core idea of DFS is to hypothesize layer-wise quantization (to different bitwidths) as intermediate “soft” choices to be made between fully utilizing and skipping a layer. For each input, DFS dynamically assigns a bitwidth to both weights and activations of each layer, where fully executing and skipping could be viewed as two “extremes” (i.e., full bitwidth and zero bitwidth). In this way, DFS can “fractionally” exploit a layer's expressive power during input-adaptive inference, enabling finer-grained accuracy-computational cost trade-offs. It presents a unified view to link input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive experimental results demonstrate the superior tradeoff between computational cost and model expressive power (accuracy) achieved by DFS. More visualizations also indicate a smooth and consistent transition in the DFS behaviors, especially the learned choices between layer skipping and different quantizations when the total computational budgets vary, validating our hypothesis that layer quantization could be viewed as intermediate variants of layer skipping. Our source code and supplementary material are available at https://github.com/Torment123/DFS.

【Keywords】:

699. Revisiting Image Aesthetic Assessment via Self-Supervised Feature Learning.

Paper Link】 【Pages】:5709-5716

【Authors】: Kekai Sheng ; Weiming Dong ; Menglei Chai ; Guohui Wang ; Peng Zhou ; Feiyue Huang ; Bao-Gang Hu ; Rongrong Ji ; Chongyang Ma

【Abstract】: Visual aesthetic assessment has been an active research field for decades. Although latest methods have achieved promising performance on benchmark datasets, they typically rely on a large number of manual annotations including both aesthetic labels and related image attributes. In this paper, we revisit the problem of image aesthetic assessment from the self-supervised feature learning perspective. Our motivation is that a suitable feature representation for image aesthetic assessment should be able to distinguish different expert-designed image manipulations, which have close relationships with negative aesthetic effects. To this end, we design two novel pretext tasks to identify the types and parameters of editing operations applied to synthetic instances. The features from our pretext tasks are then adapted for a one-layer linear classifier to evaluate the performance in terms of binary aesthetic classification. We conduct extensive quantitative experiments on three benchmark datasets and demonstrate that our approach can faithfully extract aesthetics-aware features and outperform alternative pretext schemes. Moreover, we achieve comparable results to state-of-the-art supervised methods that use 10 million labels from ImageNet.

【Keywords】:

700. Gamma-Nets: Generalizing Value Estimation over Timescale.

Paper Link】 【Pages】:5717-5725

【Authors】: Craig Sherstan ; Shibhansh Dohare ; James MacGlashan ; Johannes Günther ; Patrick M. Pilarski

【Abstract】: Temporal abstraction is a key requirement for agents making decisions over long time horizons—a fundamental challenge in reinforcement learning. There are many reasons why value estimates at multiple timescales might be useful; recent work has shown that value estimates at different time scales can be the basis for creating more advanced discounting functions and for driving representation learning. Further, predictions at many different timescales serve to broaden an agent's model of its environment. One predictive approach of interest within an online learning setting is general value function (GVFs), which represent models of an agent's world as a collection of predictive questions each defined by a policy, a signal to be predicted, and a prediction timescale. In this paper we present Γ-nets, a method for generalizing value function estimation over timescale, allowing a given GVF to be trained and queried for arbitrary timescales so as to greatly increase the predictive ability and scalability of a GVF-based model. The key to our approach is to use timescale as one of the value estimator's inputs. As a result, the prediction target for any timescale is available at every timestep and we are free to train on any number of timescales. We first provide two demonstrations by 1) predicting a square wave and 2) predicting sensorimotor signals on a robot arm using a linear function approximator. Next, we empirically evaluate Γ-nets in the deep reinforcement learning setting using policy evaluation on a set of Atari video games. Our results show that Γ-nets can be effective for predicting arbitrary timescales, with only a small cost in accuracy as compared to learning estimators for fixed timescales. Γ-nets provide a method for accurately and compactly making predictions at many timescales without requiring a priori knowledge of the task, making it a valuable contribution to ongoing work on model-based planning, representation learning, and lifelong learning algorithms.

【Keywords】:

701. Deep Time-Stream Framework for Click-through Rate Prediction by Tracking Interest Evolution.

Paper Link】 【Pages】:5726-5733

【Authors】: Shu-Ting Shi ; Wenhao Zheng ; Jun Tang ; Qing-Guo Chen ; Yao Hu ; Jianke Zhu ; Ming Li

【Abstract】: Click-through rate (CTR) prediction is an essential task in industrial applications such as video recommendation. Recently, deep learning models have been proposed to learn the representation of users' overall interests, while ignoring the fact that interests may dynamically change over time. We argue that it is necessary to consider the continuous-time information in CTR models to track user interest trend from rich historical behaviors. In this paper, we propose a novel Deep Time-Stream framework (DTS) which introduces the time information by an ordinary differential equations (ODE). DTS continuously models the evolution of interests using a neural network, and thus is able to tackle the challenge of dynamically representing users' interests based on their historical behaviors. In addition, our framework can be seamlessly applied to any existing deep CTR models by leveraging the additional Time-Stream Module, while no changes are made to the original CTR models. Experiments on public dataset as well as real industry dataset with billions of samples demonstrate the effectiveness of proposed approaches, which achieve superior performance compared with existing methods.

【Keywords】:

702. Quadruply Stochastic Gradient Method for Large Scale Nonlinear Semi-Supervised Ordinal Regression AUC Optimization.

Paper Link】 【Pages】:5734-5741

【Authors】: Wanli Shi ; Bin Gu ; Xiang Li ; Heng Huang

【Abstract】: Semi-supervised ordinal regression (S2OR) problems are ubiquitous in real-world applications, where only a few ordered instances are labeled and massive instances remain unlabeled. Recent researches have shown that directly optimizing concordance index or AUC can impose a better ranking on the data than optimizing the traditional error rate in ordinal regression (OR) problems. In this paper, we propose an unbiased objective function for S2OR AUC optimization based on ordinal binary decomposition approach. Besides, to handle the large-scale kernelized learning problems, we propose a scalable algorithm called QS3ORAO using the doubly stochastic gradients (DSG) framework for functional optimization. Theoretically, we prove that our method can converge to the optimal solution at the rate of O(1/t), where t is the number of iterations for stochastic data sampling. Extensive experimental results on various benchmark and real-world datasets also demonstrate that our method is efficient and effective while retaining similar generalization performance.

【Keywords】:

703. Loss-Based Attention for Deep Multiple Instance Learning.

Paper Link】 【Pages】:5742-5749

【Authors】: Xiaoshuang Shi ; Fuyong Xing ; Yuanpu Xie ; Zizhao Zhang ; Lei Cui ; Lin Yang

【Abstract】: Although attention mechanisms have been widely used in deep learning for many tasks, they are rarely utilized to solve multiple instance learning (MIL) problems, where only a general category label is given for multiple instances contained in one bag. Additionally, previous deep MIL methods firstly utilize the attention mechanism to learn instance weights and then employ a fully connected layer to predict the bag label, so that the bag prediction is largely determined by the effectiveness of learned instance weights. To alleviate this issue, in this paper, we propose a novel loss based attention mechanism, which simultaneously learns instance weights and predictions, and bag predictions for deep multiple instance learning. Specifically, it calculates instance weights based on the loss function, e.g. softmax+cross-entropy, and shares the parameters with the fully connected layer, which is to predict instance and bag predictions. Additionally, a regularization term consisting of learned weights and cross-entropy functions is utilized to boost the recall of instances, and a consistency cost is used to smooth the training process of neural networks for boosting the model generalization performance. Extensive experiments on multiple types of benchmark databases demonstrate that the proposed attention mechanism is a general, effective and efficient framework, which can achieve superior bag and image classification performance over other state-of-the-art MIL methods, with obtaining higher instance precision and recall than previous attention mechanisms. Source codes are available on https://github.com/xsshi2015/Loss-Attention.

【Keywords】:

704. Deep Message Passing on Sets.

Paper Link】 【Pages】:5750-5757

【Authors】: Yifeng Shi ; Junier Oliva ; Marc Niethammer

【Abstract】: Modern methods for learning over graph input data have shown the fruitfulness of accounting for relationships among elements in a collection. However, most methods that learn over set input data use only rudimentary approaches to exploit intra-collection relationships. In this work we introduce Deep Message Passing on Sets (DMPS), a novel method that incorporates relational learning for sets. DMPS not only connects learning on graphs with learning on sets via deep kernel learning, but it also bridges message passing on sets and traditional diffusion dynamics commonly used in denoising models. Based on these connections, we develop two new blocks for relational learning on sets: the set-denoising block and the set-residual block. The former is motivated by the connection between message passing on general graphs and diffusion-based denoising models, whereas the latter is inspired by the well-known residual network. In addition to demonstrating the interpretability of our model by learning the true underlying relational structure experimentally, we also show the effectiveness of our approach on both synthetic and real-world datasets by achieving results that are competitive with or outperform the state-of-the-art. For readers who are interested in the detailed derivations of serveral results that we present in this work, please see the supplementary material at: https://arxiv.org/abs/1909.09877.

【Keywords】:

705. Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting.

Paper Link】 【Pages】:5758-5766

【Authors】: Qiquan Shi ; Jiaming Yin ; Jiajun Cai ; Andrzej Cichocki ; Tatsuya Yokota ; Lei Chen ; Mingxuan Yuan ; Jia Zeng

【Abstract】: This work proposes a novel approach for multiple time series forecasting. At first, multi-way delay embedding transform (MDT) is employed to represent time series as low-rank block Hankel tensors (BHT). Then, the higher-order tensors are projected to compressed core tensors by applying Tucker decomposition. At the same time, the generalized tensor Autoregressive Integrated Moving Average (ARIMA) is explicitly used on consecutive core tensors to predict future samples. In this manner, the proposed approach tactically incorporates the unique advantages of MDT tensorization (to exploit mutual correlations) and tensor ARIMA coupled with low-rank Tucker decomposition into a unified framework. This framework exploits the low-rank structure of block Hankel tensors in the embedded space and captures the intrinsic correlations among multiple TS, which thus can improve the forecasting results, especially for multiple short time series. Experiments conducted on three public datasets and two industrial datasets verify that the proposed BHT-ARIMA effectively improves forecasting accuracy and reduces computational cost compared with the state-of-the-art methods.

【Keywords】:

706. Morphism-Based Learning for Structured Data.

Paper Link】 【Pages】:5767-5775

【Authors】: Kilho Shin ; Dave Shepard

【Abstract】: In mathematics, morphism is a term that indicates structure-preserving mappings between mathematical structures of the same type. Linear transformations for linear spaces, homomorphisms for algebraic structures and continuous functions for topological spaces are examples. Many data researched in machine learning, on the other hand, can include mathematical structures in them. Strings are totally ordered sets, and trees can be understood not only as graphs but also as partially ordered sets with respect to an ancestor-to-descendent order and semigroups with respect to the binary operation to determine nearest common ancestor. In this paper, we propose a generic and theoretic framework to investigate similarity of structured data through structure-preserving one-to-one partial mappings, which we call morphisms. Through morphisms, useful and important methods studied in the literature can be abstracted into common concepts, although they have been studied separately. When we study new structures of data, we will be able to extend the legacy methods for the purpose of studying the new structure, if we can define morphisms properly. Also, this view reveals hidden relations between methods known in the literature and can let us understand them more clearly. For example, we see that the center star algorithm, which was originally developed to compute sequential multiple alignments, can be abstracted so that it not only applies to data structures other than strings but also can be used to solve problems of pattern extraction. The methods that we study in this paper include edit distance, multiple alignment, pattern extraction and kernel, but it is sure that there exist much more methods that can be abstracted within our framework.

【Keywords】:

707. Hierarchically Clustered Representation Learning.

Paper Link】 【Pages】:5776-5783

【Authors】: Su-Jin Shin ; Kyungwoo Song ; Il-Chul Moon

【Abstract】: The joint optimization of representation learning and clustering in the embedding space has experienced a breakthrough in recent years. In spite of the advance, clustering with representation learning has been limited to flat-level categories, which often involves cohesive clustering with a focus on instance relations. To overcome the limitations of flat clustering, we introduce hierarchically-clustered representation learning (HCRL), which simultaneously optimizes representation learning and hierarchical clustering in the embedding space. Compared with a few prior works, HCRL firstly attempts to consider a generation of deep embeddings from every component of the hierarchy, not just leaf components. In addition to obtaining hierarchically clustered embeddings, we can reconstruct data by the various abstraction levels, infer the intrinsic hierarchical structure, and learn the level-proportion features. We conducted evaluations with image and text domains, and our quantitative analyses showed competent likelihoods and the best accuracies compared with the baselines.

【Keywords】:

708. HLHLp: Quantized Neural Networks Training for Reaching Flat Minima in Loss Surface.

Paper Link】 【Pages】:5784-5791

【Authors】: Sungho Shin ; Jinhwan Park ; Yoonho Boo ; Wonyong Sung

【Abstract】: Quantization of deep neural networks is extremely essential for efficient implementations. Low-precision networks are typically designed to represent original floating-point counterparts with high fidelity, and several elaborate quantization algorithms have been developed. We propose a novel training scheme for quantized neural networks to reach flat minima in the loss surface with the aid of quantization noise. The proposed training scheme employs high-low-high-low precision in an alternating manner for network training. The learning rate is also abruptly changed at each stage for coarse- or fine-tuning. With the proposed training technique, we show quite good performance improvements for convolutional neural networks when compared to the previous fine-tuning based quantization scheme. We achieve the state-of-the-art results for recurrent neural network based language modeling with 2-bit weight and activation.

【Keywords】:

709. Uncertainty-Aware Action Advising for Deep Reinforcement Learning Agents.

Paper Link】 【Pages】:5792-5799

【Authors】: Felipe Leno da Silva ; Pablo Hernandez-Leal ; Bilal Kartal ; Matthew E. Taylor

【Abstract】: Although Reinforcement Learning (RL) has been one of the most successful approaches for learning in sequential decision making problems, the sample-complexity of RL techniques still represents a major challenge for practical applications. To combat this challenge, whenever a competent policy (e.g., either a legacy system or a human demonstrator) is available, the agent could leverage samples from this policy (advice) to improve sample-efficiency. However, advice is normally limited, hence it should ideally be directed to states where the agent is uncertain on the best action to execute. In this work, we propose Requesting Confidence-Moderated Policy advice (RCMP), an action-advising framework where the agent asks for advice when its epistemic uncertainty is high for a certain state. RCMP takes into account that the advice is limited and might be suboptimal. We also describe a technique to estimate the agent uncertainty by performing minor modifications in standard value-function-based RL methods. Our empirical evaluations show that RCMP performs better than Importance Advising, not receiving advice, and receiving it at random states in Gridworld and Atari Pong scenarios.

【Keywords】:

710. Efficient Facial Feature Learning with Wide Ensemble-Based Convolutional Neural Networks.

Paper Link】 【Pages】:5800-5809

【Authors】: Henrique Siqueira ; Sven Magg ; Stefan Wermter

【Abstract】: Ensemble methods, traditionally built with independently trained de-correlated models, have proven to be efficient methods for reducing the remaining residual generalization error, which results in robust and accurate methods for real-world applications. In the context of deep learning, however, training an ensemble of deep networks is costly and generates high redundancy which is inefficient. In this paper, we present experiments on Ensembles with Shared Representations (ESRs) based on convolutional networks to demonstrate, quantitatively and qualitatively, their data processing efficiency and scalability to large-scale datasets of facial expressions. We show that redundancy and computational load can be dramatically reduced by varying the branching level of the ESR without loss of diversity and generalization power, which are both important for ensemble performance. Experiments on large-scale datasets suggest that ESRs reduce the remaining residual generalization error on the AffectNet and FER+ datasets, reach human-level performance, and outperform state-of-the-art methods on facial expression recognition in the wild using emotion and affect concepts.

【Keywords】:

711. Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers.

Paper Link】 【Pages】:5810-5817

【Authors】: Masoumeh Soflaei ; Hongyu Guo ; Ali Al-Bashabsheh ; Yongyi Mao ; Richong Zhang

【Abstract】: We consider the problem of learning a neural network classifier. Under the information bottleneck (IB) principle, we associate with this classification problem a representation learning problem, which we call “IB learning”. We show that IB learning is, in fact, equivalent to a special class of the quantization problem. The classical results in rate-distortion theory then suggest that IB learning can benefit from a “vector quantization” approach, namely, simultaneously learning the representations of multiple input objects. Such an approach assisted with some variational techniques, result in a novel learning framework, “Aggregated Learning”, for classification with neural network models. In this framework, several objects are jointly classified by a single neural network. The effectiveness of this framework is verified through extensive experiments on standard image recognition and text classification tasks.

【Keywords】:

712. Bivariate Beta-LSTM.

Paper Link】 【Pages】:5818-5825

【Authors】: Kyungwoo Song ; JoonHo Jang ; Seungjae Shin ; Il-Chul Moon

【Abstract】: Long Short-Term Memory (LSTM) infers the long term dependency through a cell state maintained by the input and the forget gate structures, which models a gate output as a value in [0,1] through a sigmoid function. However, due to the graduality of the sigmoid function, the sigmoid gate is not flexible in representing multi-modality or skewness. Besides, the previous models lack modeling on the correlation between the gates, which would be a new method to adopt inductive bias for a relationship between previous and current input. This paper proposes a new gate structure with the bivariate Beta distribution. The proposed gate structure enables probabilistic modeling on the gates within the LSTM cell so that the modelers can customize the cell state flow with priors and distributions. Moreover, we theoretically show the higher upper bound of the gradient compared to the sigmoid function, and we empirically observed that the bivariate Beta distribution gate structure provides higher gradient values in training. We demonstrate the effectiveness of the bivariate Beta gate structure on the sentence classification, image classification, polyphonic music modeling, and image caption generation.

【Keywords】:

713. Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards.

Paper Link】 【Pages】:5826-5833

【Authors】: Yuhang Song ; Jianyi Wang ; Thomas Lukasiewicz ; Zhenghua Xu ; Shangtong Zhang ; Andrzej Wojcicki ; Mai Xu

【Abstract】: Intrinsic rewards were introduced to simulate how human intelligence works; they are usually evaluated by intrinsically-motivated play, i.e., playing games without extrinsic rewards but evaluated with extrinsic rewards. However, none of the existing intrinsic reward approaches can achieve human-level performance under this very challenging setting of intrinsically-motivated play. In this work, we propose a novel megalomania-driven intrinsic reward (called mega-reward), which, to our knowledge, is the first approach that achieves human-level performance in intrinsically-motivated play. Intuitively, mega-reward comes from the observation that infants' intelligence develops when they try to gain more control on entities in an environment; therefore, mega-reward aims to maximize the control capabilities of agents on given entities in a given environment. To formalize mega-reward, a relational transition model is proposed to bridge the gaps between direct and latent control. Experimental studies show that mega-reward (i) can greatly outperform all state-of-the-art intrinsic reward approaches, (ii) generally achieves the same level of performance as Ex-PPO and professional human-level scores, and (iii) has also a superior performance when it is incorporated with extrinsic rewards.

【Keywords】:

714. Infomax Neural Joint Source-Channel Coding via Adversarial Bit Flip.

Paper Link】 【Pages】:5834-5841

【Authors】: Yuxuan Song ; Minkai Xu ; Lantao Yu ; Hao Zhou ; Shuo Shao ; Yong Yu

【Abstract】: Although Shannon theory states that it is asymptotically optimal to separate the source and channel coding as two independent processes, in many practical communication scenarios this decomposition is limited by the finite bit-length and computational power for decoding. Recently, neural joint source-channel coding (NECST) (Choi et al. 2018) is proposed to sidestep this problem. While it leverages the advancements of amortized inference and deep learning (Kingma and Welling 2013; Grover and Ermon 2018) to improve the encoding and decoding process, it still cannot always achieve compelling results in terms of compression and error correction performance due to the limited robustness of its learned coding networks. In this paper, motivated by the inherent connections between neural joint source-channel coding and discrete representation learning, we propose a novel regularization method called Infomax Adversarial-Bit-Flip (IABF) to improve the stability and robustness of the neural joint source-channel coding scheme. More specifically, on the encoder side, we propose to explicitly maximize the mutual information between the codeword and data; while on the decoder side, the amortized reconstruction is regularized within an adversarial framework. Extensive experiments conducted on various real-world datasets evidence that our IABF can achieve state-of-the-art performances on both compression and error correction benchmarks and outperform the baselines by a significant margin.

【Keywords】:

715. Benign Examples: Imperceptible Changes Can Enhance Image Translation Performance.

Paper Link】 【Pages】:5842-5850

【Authors】: Vignesh Srinivasan ; Klaus-Robert Müller ; Wojciech Samek ; Shinichi Nakajima

【Abstract】: Unpaired image-to-image domain translation involves the task of transferring an image in one domain to another domain without having pairs of data for supervision. Several methods have been proposed to address this task using Generative Adversarial Networks (GANs) and cycle consistency constraint enforcing the translated image to be mapped back to the original domain. This way, a Deep Neural Network (DNN) learns mapping such that the input training distribution transferred to the target domain matches the target training distribution. However, not all test images are expected to fall inside the data manifold in the input space where the DNN has learned to perform the mapping very well. Such images can have a poor mapping to the target domain. In this paper, we propose to perform Langevin dynamics, which makes a subtle change in the input space bringing them close to the data manifold, producing benign examples. The effect is significant improvement of the mapped image on the target domain. We also show that the score function estimation by denoising autoencoder (DAE), can practically be replaced with any autoencoding structure, which most image-to-image translation methods contain intrinsically due to the cycle consistency constraint. Thus, no additional training is required. We show advantages of our approach for several state-of-the-art image-to-image domain translation models. Quantitative evaluation shows that our proposed method leads to a substantial increase in the accuracy to the target label on multiple state-of-the-art image classifiers, while qualitative user study proves that our method better represents the target domain, achieving better human preference scores.

【Keywords】:

716. Scalable Probabilistic Matrix Factorization with Graph-Based Priors.

Paper Link】 【Pages】:5851-5858

【Authors】: Jonathan Strahl ; Jaakko Peltonen ; Hiroshi Mamitsuka ; Samuel Kaski

【Abstract】: In matrix factorization, available graph side-information may not be well suited for the matrix completion problem, having edges that disagree with the latent-feature relations learnt from the incomplete data matrix. We show that removing these contested edges improves prediction accuracy and scalability. We identify the contested edges through a highly-efficient graphical lasso approximation. The identification and removal of contested edges adds no computational complexity to state-of-the-art graph-regularized matrix factorization, remaining linear with respect to the number of non-zeros. Computational load even decreases proportional to the number of edges removed. Formulating a probabilistic generative model and using expectation maximization to extend graph-regularised alternating least squares (GRALS) guarantees convergence. Rich simulated experiments illustrate the desired properties of the resulting algorithm. On real data experiments we demonstrate improved prediction accuracy with fewer graph edges (empirical evidence that graph side-information is often inaccurate). A 300 thousand dimensional graph with three million edges (Yahoo music side-information) can be analyzed in under ten minutes on a standard laptop computer demonstrating the efficiency of our graph update.

【Keywords】:

717. Learning Efficient Representations for Fake Speech Detection.

Paper Link】 【Pages】:5859-5866

【Authors】: Nishant Subramani ; Delip Rao

【Abstract】: Synthetic speech or “fake speech” which matches personal vocal traits has become better and cheaper due to advances in deep learning-based speech synthesis and voice conversion approaches. This increased accessibility of synthetic speech systems and the growing misuse of them highlights the critical need to build countermeasures. Furthermore, new synthesis models evolve all the time and the efficacy of previously trained detection models on these unseen attack vectors is poor. In this paper, we focus on: 1) How can we build highly accurate, yet parameter and sample-efficient models for fake speech detection? 2) How can we rapidly adapt detection models to new sources of fake speech? We present four parameter-efficient convolutional architectures for fake speech detection with best detection F1 scores of around 97 points on a large dataset of fake and bonafide speech. We show how the fake speech detection task naturally lends itself to a novel multi-task problem further improving F1 scores for a mere 0.5% increase in model parameters. Our multi-task setting also helps in data-sparse situations, commonplace in adversarial settings. We investigate an alternative approach to the data-sparsity problem using transfer learning and show that it is possible to meet purely supervised detection performance for unseen attack vectors with as little as 6.25% of the training data. This is the first known application of transfer learning in adversarial settings for speech. Finally, we show how well our transfer learning approach adapts in an instance-efficient way to new attack vectors using the Real-Time Voice Cloning toolkit. We exceed the purely supervised detection performance (99.18 F1) with as little as 6.25% of the data.

【Keywords】:

718. Lifelong Spectral Clustering.

Paper Link】 【Pages】:5867-5874

【Authors】: Gan Sun ; Yang Cong ; Qianqian Wang ; Jun Li ; Yun Fu

【Abstract】: In the past decades, spectral clustering (SC) has become one of the most effective clustering algorithms. However, most previous studies focus on spectral clustering tasks with a fixed task set, which cannot incorporate with a new spectral clustering task without accessing to previously learned tasks. In this paper, we aim to explore the problem of spectral clustering in a lifelong machine learning framework, i.e., Lifelong Spectral Clustering (L2SC). Its goal is to efficiently learn a model for a new spectral clustering task by selectively transferring previously accumulated experience from knowledge library. Specifically, the knowledge library of L2SC contains two components: 1) orthogonal basis library: capturing latent cluster centers among the clusters in each pair of tasks; 2) feature embedding library: embedding the feature manifold information shared among multiple related tasks. As a new spectral clustering task arrives, L2SC firstly transfers knowledge from both basis library and feature library to obtain encoding matrix, and further redefines the library base over time to maximize performance across all the clustering tasks. Meanwhile, a general online update formulation is derived to alternatively update the basis library and feature library. Finally, the empirical experiments on several real-world benchmark datasets demonstrate that our L2SC model can effectively improve the clustering performance when comparing with other state-of-the-art spectral clustering algorithms.

【Keywords】:

719. New Interpretations of Normalization Methods in Deep Learning.

Paper Link】 【Pages】:5875-5882

【Authors】: Jiacheng Sun ; Xiangyong Cao ; Hanwen Liang ; Weiran Huang ; Zewei Chen ; Zhenguo Li

【Abstract】: In recent years, a variety of normalization methods have been proposed to help training neural networks, such as batch normalization (BN), layer normalization (LN), weight normalization (WN), group normalization (GN), etc. However, some necessary tools to analyze all these normalization methods are lacking. In this paper, we first propose a lemma to define some necessary tools. Then, we use these tools to make a deep analysis on popular normalization methods and obtain the following conclusions: 1) Most of the normalization methods can be interpreted in a unified framework, namely normalizing pre-activations or weights onto a sphere; 2) Since most of the existing normalization methods are scaling invariant, we can conduct optimization on a sphere with scaling symmetry removed, which can help to stabilize the training of network; 3) We prove that training with these normalization methods can make the norm of weights increase, which could cause adversarial vulnerability as it amplifies the attack. Finally, a series of experiments are conducted to verify these claims.

【Keywords】:

720. Stealthy and Efficient Adversarial Attacks against Deep Reinforcement Learning.

Paper Link】 【Pages】:5883-5891

【Authors】: Jianwen Sun ; Tianwei Zhang ; Xiaofei Xie ; Lei Ma ; Yan Zheng ; Kangjie Chen ; Yang Liu

【Abstract】: Adversarial attacks against conventional Deep Learning (DL) systems and algorithms have been widely studied, and various defenses were proposed. However, the possibility and feasibility of such attacks against Deep Reinforcement Learning (DRL) are less explored. As DRL has achieved great success in various complex tasks, designing effective adversarial attacks is an indispensable prerequisite towards building robust DRL algorithms. In this paper, we introduce two novel adversarial attack techniques to stealthily and efficiently attack the DRL agents. These two techniques enable an adversary to inject adversarial samples in a minimal set of critical moments while causing the most severe damage to the agent. The first technique is the critical point attack: the adversary builds a model to predict the future environmental states and agent's actions, assesses the damage of each possible attack strategy, and selects the optimal one. The second technique is the antagonist attack: the adversary automatically learns a domain-agnostic model to discover the critical moments of attacking the agent in an episode. Experimental results demonstrate the effectiveness of our techniques. Specifically, to successfully attack the DRL agent, our critical point technique only requires 1 (TORCS) or 2 (Atari Pong and Breakout) steps, and the antagonist technique needs fewer than 5 steps (4 Mujoco tasks), which are significant improvements over state-of-the-art methods.

【Keywords】:

721. Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labeled Nodes.

Paper Link】 【Pages】:5892-5899

【Authors】: Ke Sun ; Zhouchen Lin ; Zhanxing Zhu

【Abstract】: Graph Convolutional Networks (GCNs) play a crucial role in graph learning tasks, however, learning graph embedding with few supervised signals is still a difficult problem. In this paper, we propose a novel training algorithm for Graph Convolutional Network, called Multi-Stage Self-Supervised (M3S) Training Algorithm, combined with self-supervised learning approach, focusing on improving the generalization performance of GCNs on graphs with few labeled nodes. Firstly, a Multi-Stage Training Framework is provided as the basis of M3S training method. Then we leverage DeepCluster technique, a popular form of self-supervised learning, and design corresponding aligning mechanism on the embedding space to refine the Multi-Stage Training Framework, resulting in M3S Training Algorithm. Finally, extensive experimental results verify the superior performance of our algorithm on graphs with few labeled nodes under different label rates compared with other state-of-the-art approaches.

【Keywords】:

722. Attentive Experience Replay.

Paper Link】 【Pages】:5900-5907

【Authors】: Peiquan Sun ; Wengang Zhou ; Houqiang Li

【Abstract】: Experience replay (ER) has become an important component of deep reinforcement learning (RL) algorithms. ER enables RL algorithms to reuse past experiences for the update of current policy. By reusing a previous state for training, the RL agent would learn more accurate value estimation and better decision on that state. However, as the policy is continually updated, some states in past experiences become rarely visited, and optimization over these states might not improve the overall performance of current policy. To tackle this issue, we propose a new replay strategy to prioritize the transitions that contain states frequently visited by current policy. We introduce Attentive Experience Replay (AER), a novel experience replay algorithm that samples transitions according to the similarities between their states and the agent's state. We couple AER with different off-policy algorithms and demonstrate that AER makes consistent improvements on the suite of OpenAI gym tasks.

【Keywords】:

723. Revisiting Probability Distribution Assumptions for Information Theoretic Feature Selection.

Paper Link】 【Pages】:5908-5915

【Authors】: Yuan Sun ; Wei Wang ; Michael Kirley ; Xiaodong Li ; Jeffrey Chan

【Abstract】: Feature selection has been shown to be beneficial for many data mining and machine learning tasks, especially for big data analytics. Mutual Information (MI) is a well-known information-theoretic approach used to evaluate the relevance of feature subsets and class labels. However, estimating high-dimensional MI poses significant challenges. Consequently, a great deal of research has focused on using low-order MI approximations or computing a lower bound on MI called Variational Information (VI). These methods often require certain assumptions made on the probability distributions of features such that these distributions are realistic yet tractable to compute. In this paper, we reveal two sets of distribution assumptions underlying many MI and VI based methods: Feature Independence Distribution and Geometric Mean Distribution. We systematically analyze their strengths and weaknesses and propose a logical extension called Arithmetic Mean Distribution, which leads to an unbiased and normalised estimation of probability densities. We conduct detailed empirical studies across a suite of 29 real-world classification problems and illustrate improved prediction accuracy of our methods based on the identification of more informative features, thus providing support for our theoretical findings.

【Keywords】:

724. Adversarial Transformations for Semi-Supervised Learning.

Paper Link】 【Pages】:5916-5923

【Authors】: Teppei Suzuki ; Ikuro Sato

【Abstract】: We propose a Regularization framework based on Adversarial Transformations (RAT) for semi-supervised learning. RAT is designed to enhance robustness of the output distribution of class prediction for a given data against input perturbation. RAT is an extension of Virtual Adversarial Training (VAT) in such a way that RAT adversraialy transforms data along the underlying data distribution by a rich set of data transformation functions that leave class label invariant, whereas VAT simply produces adversarial additive noises. In addition, we verified that a technique of gradually increasing of perturbation region further improves the robustness. In experiments, we show that RAT significantly improves classification performance on CIFAR-10 and SVHN compared to existing regularization methods under standard semi-supervised image classification settings.

【Keywords】:

725. CGD: Multi-View Clustering via Cross-View Graph Diffusion.

Paper Link】 【Pages】:5924-5931

【Authors】: Chang Tang ; Xinwang Liu ; Xinzhong Zhu ; En Zhu ; Zhigang Luo ; Lizhe Wang ; Wen Gao

【Abstract】: Graph based multi-view clustering has been paid great attention by exploring the neighborhood relationship among data points from multiple views. Though achieving great success in various applications, we observe that most of previous methods learn a consensus graph by building certain data representation models, which at least bears the following drawbacks. First, their clustering performance highly depends on the data representation capability of the model. Second, solving these resultant optimization models usually results in high computational complexity. Third, there are often some hyper-parameters in these models need to tune for obtaining the optimal results. In this work, we propose a general, effective and parameter-free method with convergence guarantee to learn a unified graph for multi-view data clustering via cross-view graph diffusion (CGD), which is the first attempt to employ diffusion process for multi-view clustering. The proposed CGD takes the traditional predefined graph matrices of different views as input, and learns an improved graph for each single view via an iterative cross diffusion process by 1) capturing the underlying manifold geometry structure of original data points, and 2) leveraging the complementary information among multiple graphs. The final unified graph used for clustering is obtained by averaging the improved view associated graphs. Extensive experiments on several benchmark datasets are conducted to demonstrate the effectiveness of the proposed method in terms of seven clustering evaluation metrics.

【Keywords】:

726. Label Enhancement with Sample Correlations via Low-Rank Representation.

Paper Link】 【Pages】:5932-5939

【Authors】: Haoyu Tang ; Jihua Zhu ; Qinghai Zheng ; Jun Wang ; Shanmin Pang ; Zhongyu Li

【Abstract】: Compared with single-label and multi-label annotations, label distribution describes the instance by multiple labels with different intensities and accommodates to more-general conditions. Nevertheless, label distribution learning is unavailable in many real-world applications because most existing datasets merely provide logical labels. To handle this problem, a novel label enhancement method, Label Enhancement with Sample Correlations via low-rank representation, is proposed in this paper. Unlike most existing methods, a low-rank representation method is employed so as to capture the global relationships of samples and predict implicit label correlation to achieve label enhancement. Extensive experiments on 14 datasets demonstrate that the algorithm accomplishes state-of-the-art results as compared to previous label enhancement baselines.

【Keywords】:

727. Discriminative Adversarial Domain Adaptation.

Paper Link】 【Pages】:5940-5947

【Authors】: Hui Tang ; Kui Jia

【Abstract】: Given labeled instances on a source domain and unlabeled ones on a target domain, unsupervised domain adaptation aims to learn a task classifier that can well classify target instances. Recent advances rely on domain-adversarial training of deep networks to learn domain-invariant features. However, due to an issue of mode collapse induced by the separate design of task and domain classifiers, these methods are limited in aligning the joint distributions of feature and category across domains. To overcome it, we propose a novel adversarial learning method termed Discriminative Adversarial Domain Adaptation (DADA). Based on an integrated category and domain classifier, DADA has a novel adversarial objective that encourages a mutually inhibitory relation between category and domain predictions for any input instance. We show that under practical conditions, it defines a minimax game that can promote the joint distribution alignment. Except for the traditional closed set domain adaptation, we also extend DADA for extremely challenging problem settings of partial and open set domain adaptation. Experiments show the efficacy of our proposed methods and we achieve the new state of the art for all the three settings on benchmark datasets.

【Keywords】:

728. Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning.

Paper Link】 【Pages】:5948-5955

【Authors】: Tian Tan ; Zhihan Xiong ; Vikranth R. Dwaracherla

【Abstract】: It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments.

【Keywords】:

729. Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values.

Paper Link】 【Pages】:5956-5963

【Authors】: Xianfeng Tang ; Huaxiu Yao ; Yiwei Sun ; Charu C. Aggarwal ; Prasenjit Mitra ; Suhang Wang

【Abstract】: Multivariate time series (MTS) forecasting is widely used in various domains, such as meteorology and traffic. Due to limitations on data collection, transmission, and storage, real-world MTS data usually contains missing values, making it infeasible to apply existing MTS forecasting models such as linear regression and recurrent neural networks. Though many efforts have been devoted to this problem, most of them solely rely on local dependencies for imputing missing values, which ignores global temporal dynamics. Local dependencies/patterns would become less useful when the missing ratio is high, or the data have consecutive missing values; while exploring global patterns can alleviate such problem. Thus, jointly modeling local and global temporal dynamics is very promising for MTS forecasting with missing values. However, work in this direction is rather limited. Therefore, we study a novel problem of MTS forecasting with missing values by jointly exploring local and global temporal dynamics. We propose a new framework øurs, which leverages memory network to explore global patterns given estimations from local perspectives. We further introduce adversarial training to enhance the modeling of global temporal distribution. Experimental results on real-world datasets show the effectiveness of øurs for MTS forecasting with missing values and its robustness under various missing ratios.

【Keywords】:

730. Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks.

Paper Link】 【Pages】:5964-5971

【Authors】: Yehui Tang ; Yunhe Wang ; Yixing Xu ; Boxin Shi ; Chao Xu ; Chunjing Xu ; Chang Xu

【Abstract】: Deep neural networks often consist of a great number of trainable parameters for extracting powerful features from given datasets. One one hand, massive trainable parameters significantly enhance the performance of these deep networks. One the other hand, they bring the problem of over-fitting. To this end, dropout based methods disable some elements in the output feature maps during the training phase for reducing the co-adaptation of neurons. Although the generalization ability of the resulting models can be enhanced by these approaches, the conventional binary dropout is not the optimal solution. Therefore, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks and propose a feature distortion method for addressing the aforementioned problem. In the training period, randomly selected elements in the feature maps will be replaced with specific values by exploiting the generalization error bound. The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated on several benchmark image datasets.

【Keywords】:

731. Reborn Filters: Pruning Convolutional Neural Networks with Limited Data.

Paper Link】 【Pages】:5972-5980

【Authors】: Yehui Tang ; Shan You ; Chang Xu ; Jin Han ; Chen Qian ; Boxin Shi ; Chao Xu ; Changshui Zhang

【Abstract】: Channel pruning is effective in compressing the pretrained CNNs for their deployment on low-end edge devices. Most existing methods independently prune some of the original channels and need the complete original dataset to fix the performance drop after pruning. However, due to commercial protection or data privacy, users may only have access to a tiny portion of training examples, which could be insufficient for the performance recovery. In this paper, for pruning with limited data, we propose to use all original filters to directly develop new compact filters, named reborn filters, so that all useful structure priors in the original filters can be well preserved into the pruned networks, alleviating the performance drop accordingly. During training, reborn filters can be easily implemented via 1×1 convolutional layers and then be fused in the inference stage for acceleration. Based on reborn filters, the proposed channel pruning algorithm shows its effectiveness and superiority on extensive experiments.

【Keywords】:

732. Discretizing Continuous Action Space for On-Policy Optimization.

Paper Link】 【Pages】:5981-5988

【Authors】: Yunhao Tang ; Shipra Agrawal

【Abstract】: In this work, we show that discretizing action space for continuous control is a simple yet powerful technique for on-policy optimization. The explosion in the number of discrete actions can be efficiently addressed by a policy with factorized distribution across action dimensions. We show that the discrete policy achieves significant performance gains with state-of-the-art on-policy optimization algorithms (PPO, TRPO, ACKTR) especially on high-dimensional tasks with complex dynamics. Additionally, we show that an ordinal parameterization of the discrete distribution can introduce the inductive bias that encodes the natural ordering between discrete actions. This ordinal architecture further significantly improves the performance of PPO/TRPO.

【Keywords】:

733. Bi-Objective Continual Learning: Learning 'New' While Consolidating 'Known'.

Paper Link】 【Pages】:5989-5996

【Authors】: Xiaoyu Tao ; Xiaopeng Hong ; Xinyuan Chang ; Yihong Gong

【Abstract】: In this paper, we propose a novel single-task continual learning framework named Bi-Objective Continual Learning (BOCL). BOCL aims at both consolidating historical knowledge and learning from new data. On one hand, we propose to preserve the old knowledge using a small set of pillars, and develop the pillar consolidation (PLC) loss to preserve the old knowledge and to alleviate the catastrophic forgetting problem. On the other hand, we develop the contrastive pillar (CPL) loss term to improve the classification performance, and examine several data sampling strategies for efficient onsite learning from ‘new’ with a reasonable amount of computational resources. Comprehensive experiments on CIFAR10/100, CORe50 and a subset of ImageNet validate the BOCL framework. We also reveal the performance accuracy of different sampling strategies when used to finetune a given CNN model. The code will be released.

【Keywords】:

734. Scalable Variational Bayesian Kernel Selection for Sparse Gaussian Process Regression.

Paper Link】 【Pages】:5997-6004

【Authors】: Tong Teng ; Jie Chen ; Yehong Zhang ; Bryan Kian Hsiang Low

【Abstract】: This paper presents a variational Bayesian kernel selection (VBKS) algorithm for sparse Gaussian process regression (SGPR) models. In contrast to existing GP kernel selection algorithms that aim to select only one kernel with the highest model evidence, our VBKS algorithm considers the kernel as a random variable and learns its belief from data such that the uncertainty of the kernel can be interpreted and exploited to avoid overconfident GP predictions. To achieve this, we represent the probabilistic kernel as an additional variational variable in a variational inference (VI) framework for SGPR models where its posterior belief is learned together with that of the other variational variables (i.e., inducing variables and kernel hyperparameters). In particular, we transform the discrete kernel belief into a continuous parametric distribution via reparameterization in order to apply VI. Though it is computationally challenging to jointly optimize a large number of hyperparameters due to many kernels being evaluated simultaneously by our VBKS algorithm, we show that the variational lower bound of the log-marginal likelihood can be decomposed into an additive form such that each additive term depends only on a disjoint subset of the variational variables and can thus be optimized independently. Stochastic optimization is then used to maximize the variational lower bound by iteratively improving the variational approximation of the exact posterior belief via stochastic gradient ascent, which incurs constant time per iteration and hence scales to big data. We empirically evaluate the performance of our VBKS algorithm on synthetic and massive real-world datasets.

【Keywords】:

735. Building Calibrated Deep Models via Uncertainty Matching with Auxiliary Interval Predictors.

Paper Link】 【Pages】:6005-6012

【Authors】: Jayaraman J. Thiagarajan ; Bindya Venkatesh ; Prasanna Sattigeri ; Peer-Timo Bremer

【Abstract】: With rapid adoption of deep learning in critical applications, the question of when and how much to trust these models often arises, which drives the need to quantify the inherent uncertainties. While identifying all sources that account for the stochasticity of models is challenging, it is common to augment predictions with confidence intervals to convey the expected variations in a model's behavior. We require prediction intervals to be well-calibrated, reflect the true uncertainties, and to be sharp. However, existing techniques for obtaining prediction intervals are known to produce unsatisfactory results in at least one of these criteria. To address this challenge, we develop a novel approach for building calibrated estimators. More specifically, we use separate models for prediction and interval estimation, and pose a bi-level optimization problem that allows the former to leverage estimates from the latter through an uncertainty matching strategy. Using experiments in regression, time-series forecasting, and object localization, we show that our approach achieves significant improvements over existing uncertainty quantification methods, both in terms of model fidelity and calibration error.

【Keywords】:

736. Network as Regularization for Training Deep Neural Networks: Framework, Model and Performance.

Paper Link】 【Pages】:6013-6020

【Authors】: Kai Tian ; Yi Xu ; Jihong Guan ; Shuigeng Zhou

【Abstract】: Despite powerful representation ability, deep neural networks (DNNs) are prone to over-fitting, because of over-parametrization. Existing works have explored various regularization techniques to tackle the over-fitting problem. Some of them employed soft targets rather than one-hot labels to guide network training (e.g. label smoothing in classification tasks), which are called target-based regularization approaches in this paper. To alleviate the over-fitting problem, here we propose a new and general regularization framework that introduces an auxiliary network to dynamically incorporate guided semantic disturbance to the labels. We call it Network as Regularization (NaR in short). During training, the disturbance is constructed by a convex combination of the predictions of the target network and the auxiliary network. These two networks are initialized separately. And the auxiliary network is trained independently from the target network, while providing instance-level and class-level semantic information to the latter progressively. We conduct extensive experiments to validate the effectiveness of the proposed method. Experimental results show that NaR outperforms many state-of-the-art target-based regularization methods, and other regularization approaches (e.g. mixup) can also benefit from combining with NaR.

【Keywords】:

737. Sanity Checks for Saliency Metrics.

Paper Link】 【Pages】:6021-6029

【Authors】: Richard Tomsett ; Dan Harborne ; Supriyo Chakraborty ; Prudhvi Gurram ; Alun D. Preece

【Abstract】: Saliency maps are a popular approach to creating post-hoc explanations of image classifier outputs. These methods produce estimates of the relevance of each pixel to the classification output score, which can be displayed as a saliency map that highlights important pixels. Despite a proliferation of such methods, little effort has been made to quantify how good these saliency maps are at capturing the true relevance of the pixels to the classifier output (i.e. their “fidelity”). We therefore investigate existing metrics for evaluating the fidelity of saliency methods (i.e. saliency metrics). We find that there is little consistency in the literature in how such metrics are calculated, and show that such inconsistencies can have a significant effect on the measured fidelity. Further, we apply measures of reliability developed in the psychometric testing literature to assess the consistency of saliency metrics when applied to individual saliency maps. Our results show that saliency metrics can be statistically unreliable and inconsistent, indicating that comparative rankings between saliency methods generated using such metrics can be untrustworthy.

【Keywords】:

738. Differential Equation Units: Learning Functional Forms of Activation Functions from Data.

Paper Link】 【Pages】:6030-6037

【Authors】: MohamadAli Torkamani ; Shiv Shankar ; Amirmohammad Rooshenas ; Phillip Wallis

【Abstract】: Most deep neural networks use simple, fixed activation functions, such as sigmoids or rectified linear units, regardless of domain or network structure. We introduce differential equation units (DEUs), an improvement to modern neural networks, which enables each neuron to learn a particular nonlinear activation function from a family of solutions to an ordinary differential equation. Specifically, each neuron may change its functional form during training based on the behavior of the other parts of the network. We show that using neurons with DEU activation functions results in a more compact network capable of achieving comparable, if not superior, performance when compared to much larger networks.

【Keywords】:

739. Order-Free Learning Alleviating Exposure Bias in Multi-Label Classification.

Paper Link】 【Pages】:6038-6045

【Authors】: Che-Ping Tsai ; Hung-yi Lee

【Abstract】: Multi-label classification (MLC) assigns multiple labels to each sample. Prior studies show that MLC can be transformed to a sequence prediction problem with a recurrent neural network (RNN) decoder to model the label dependency. However, training a RNN decoder requires a predefined order of labels, which is not directly available in the MLC specification. Besides, RNN thus trained tends to overfit the label combinations in the training set and have difficulty generating unseen label sequences. In this paper, we propose a new framework for MLC which does not rely on a predefined label order and thus alleviates exposure bias. The experimental results on three multi-label classification benchmark datasets show that our method outperforms competitive baselines by a large margin. We also find the proposed approach has a higher probability of generating label combinations not seen during training than the baseline models. The result shows that the proposed approach has better generalization capability.

【Keywords】:

740. Learning to Crawl.

Paper Link】 【Pages】:6046-6053

【Authors】: Utkarsh Upadhyay ; Róbert Busa-Fekete ; Wojciech Kotlowski ; Dávid Pál ; Balázs Szörényi

【Abstract】: Web crawling is the problem of keeping a cache of webpages fresh, i.e., having the most recent copy available when a page is requested. This problem is usually coupled with the natural restriction that the bandwidth available to the web crawler is limited. The corresponding optimization problem was solved optimally by Azar et al. (2018) under the assumption that, for each webpage, both the elapsed time between two changes and the elapsed time between two requests follows a Poisson distribution with known parameters. In this paper, we study the same control problem but under the assumption that the change rates are unknown a priori, and thus we need to estimate them in an online fashion using only partial observations (i.e., single-bit signals indicating whether the page has changed since the last refresh). As a point of departure, we characterise the conditions under which one can solve the problem with such partial observability. Next, we propose a practical estimator and compute confidence intervals for it in terms of the elapsed time between the observations. Finally, we show that the explore-and-commit algorithm achieves an O(√T) regret with a carefully chosen exploration horizon. Our simulation study shows that our online policy scales well and achieves close to optimal performance for a wide range of parameters.

【Keywords】:

741. Transfer Learning for Anomaly Detection through Localized and Unsupervised Instance Selection.

Paper Link】 【Pages】:6054-6061

【Authors】: Vincent Vercruyssen ; Wannes Meert ; Jesse Davis

【Abstract】: Anomaly detection attempts to identify instances that deviate from expected behavior. Constructing performant anomaly detectors on real-world problems often requires some labeled data, which can be difficult and costly to obtain. However, often one considers multiple, related anomaly detection tasks. Therefore, it may be possible to transfer labeled instances from a related anomaly detection task to the problem at hand. This paper proposes a novel transfer learning algorithm for anomaly detection that selects and transfers relevant labeled instances from a source anomaly detection task to a target one. Then, it classifies target instances using a novel semi-supervised nearest-neighbors technique that considers both unlabeled target and transferred, labeled source instances. The algorithm outperforms a multitude of state-of-the-art transfer learning methods and unsupervised anomaly detection methods on a large benchmark. Furthermore, it outperforms its rivals on a real-world task of detecting anomalous water usage in retail stores.

【Keywords】:

742. Meta-Learning for Generalized Zero-Shot Learning.

Paper Link】 【Pages】:6062-6069

【Authors】: Vinay Kumar Verma ; Dhanajit Brahma ; Piyush Rai

【Abstract】: Learning to classify unseen class samples at test time is popularly referred to as zero-shot learning (ZSL). If test samples can be from training (seen) as well as unseen classes, it is a more challenging problem due to the existence of strong bias towards seen classes. This problem is generally known as generalized zero-shot learning (GZSL). Thanks to the recent advances in generative models such as VAEs and GANs, sample synthesis based approaches have gained considerable attention for solving this problem. These approaches are able to handle the problem of class bias by synthesizing unseen class samples. However, these ZSL/GZSL models suffer due to the following key limitations: (i) Their training stage learns a class-conditioned generator using only seen class data and the training stage does not explicitly learn to generate the unseen class samples; (ii) They do not learn a generic optimal parameter which can easily generalize for both seen and unseen class generation; and (iii) If we only have access to a very few samples per seen class, these models tend to perform poorly. In this paper, we propose a meta-learning based generative model that naturally handles these limitations. The proposed model is based on integrating model-agnostic meta learning with a Wasserstein GAN (WGAN) to handle (i) and (iii), and uses a novel task distribution to handle (ii). Our proposed model yields significant improvements on standard ZSL as well as more challenging GZSL setting. In ZSL setting, our model yields 4.5%, 6.0%, 9.8%, and 27.9% relative improvements over the current state-of-the-art on CUB, AWA1, AWA2, and aPY datasets, respectively.

【Keywords】:

743. Deep Conservative Policy Iteration.

Paper Link】 【Pages】:6070-6077

【Authors】: Nino Vieillard ; Olivier Pietquin ; Matthieu Geist

【Abstract】: Conservative Policy Iteration (CPI) is a founding algorithm of Approximate Dynamic Programming (ADP). Its core principle is to stabilize greediness through stochastic mixtures of consecutive policies. It comes with strong theoretical guarantees, and inspired approaches in deep Reinforcement Learning (RL). However, CPI itself has rarely been implemented, never with neural networks, and only experimented on toy problems. In this paper, we show how CPI can be practically combined with deep RL with discrete actions, in an off-policy manner. We also introduce adaptive mixture rates inspired by the theory. We experiment thoroughly the resulting algorithm on the simple Cartpole problem, and validate the proposed method on a representative subset of Atari games. Overall, this work suggests that revisiting classic ADP may lead to improved and more stable deep RL algorithms.

【Keywords】:

744. Justification-Based Reliability in Machine Learning.

Paper Link】 【Pages】:6078-6085

【Authors】: Nurali Virani ; Naresh Iyer ; Zhaoyuan Yang

【Abstract】: With the advent of Deep Learning, the field of machine learning (ML) has surpassed human-level performance on diverse classification tasks. At the same time, there is a stark need to characterize and quantify reliability of a model's prediction on individual samples. This is especially true in applications of such models in safety-critical domains of industrial control and healthcare. To address this need, we link the question of reliability of a model's individual prediction to the epistemic uncertainty of the model's prediction. More specifically, we extend the theory of Justified True Belief (JTB) in epistemology, created to study the validity and limits of human-acquired knowledge, towards characterizing the validity and limits of knowledge in supervised classifiers. We present an analysis of neural network classifiers linking the reliability of its prediction on a test input to characteristics of the support gathered from the input and hidden layers of the network. We hypothesize that the JTB analysis exposes the epistemic uncertainty (or ignorance) of a model with respect to its inference, thereby allowing for the inference to be only as strong as the justification permits. We explore various forms of support (for e.g., k-nearest neighbors (k-NN) and ℓp-norm based) generated for an input, using the training data to construct a justification for the prediction with that input. Through experiments conducted on simulated and real datasets, we demonstrate that our approach can provide reliability for individual predictions and characterize regions where such reliability cannot be ascertained.

【Keywords】:

745. Fast and Efficient Boolean Matrix Factorization by Geometric Segmentation.

Paper Link】 【Pages】:6086-6093

【Authors】: Changlin Wan ; Wennan Chang ; Tong Zhao ; Mengya Li ; Sha Cao ; Chi Zhang

【Abstract】: Boolean matrix has been used to represent digital information in many fields, including bank transaction, crime records, natural language processing, protein-protein interaction, etc. Boolean matrix factorization (BMF) aims to find an approximation of a binary matrix as the Boolean product of two low rank Boolean matrices, which could generate vast amount of information for the patterns of relationships between the features and samples. Inspired by binary matrix permutation theories and geometric segmentation, we developed a fast and efficient BMF approach, called MEBF (Median Expansion for Boolean Factorization). Overall, MEBF adopted a heuristic approach to locate binary patterns presented as submatrices that are dense in 1's. At each iteration, MEBF permutates the rows and columns such that the permutated matrix is approximately Upper Triangular-Like (UTL) with so-called Simultaneous Consecutive-ones Property (SC1P). The largest submatrix dense in 1 would lie on the upper triangular area of the permutated matrix, and its location was determined based on a geometric segmentation of a triangular. We compared MEBF with other state of the art approaches on data scenarios with different density and noise levels. MEBF demonstrated superior performances in lower reconstruction error, and higher computational efficiency, as well as more accurate density patterns than popular methods such as ASSO, PANDA and Message Passing. We demonstrated the application of MEBF on both binary and non-binary data sets, and revealed its further potential in knowledge retrieving and data denoising.

【Keywords】:

746. Reinforcement Learning Based Meta-Path Discovery in Large-Scale Heterogeneous Information Networks.

Paper Link】 【Pages】:6094-6101

【Authors】: Guojia Wan ; Bo Du ; Shirui Pan ; Gholamreza Haffari

【Abstract】: Meta-paths are important tools for a wide variety of data mining and network analysis tasks in Heterogeneous Information Networks (HINs), due to their flexibility and interpretability to capture the complex semantic relation among objects. To date, most HIN analysis still relies on hand-crafting meta-paths, which requires rich domain knowledge that is extremely difficult to obtain in complex, large-scale, and schema-rich HINs. In this work, we present a novel framework, Meta-path Discovery with Reinforcement Learning (MPDRL), to identify informative meta-paths from complex and large-scale HINs. To capture different semantic information between objects, we propose a novel multi-hop reasoning strategy in a reinforcement learning framework which aims to infer the next promising relation that links a source entity to a target entity. To improve the efficiency, moreover, we develop a type context representation embedded approach to scale the RL framework to handle million-scale HINs. As multi-hop reasoning generates rich meta-paths with various length, we further perform a meta-path induction step to summarize the important meta-paths using Lowest Common Ancestor principle. Experimental results on two large-scale HINs, Yago and NELL, validate our approach and demonstrate that our algorithm not only achieves superior performance in the link prediction task, but also identifies useful meta-paths that would have been ignored by human experts.

【Keywords】:

747. Robust Tensor Decomposition via Orientation Invariant Tubal Nuclear Norms.

Paper Link】 【Pages】:6102-6109

【Authors】: Andong Wang ; Chao Li ; Zhong Jin ; Qibin Zhao

【Abstract】: Low-rank tensor recovery has been widely applied to computer vision and machine learning. Recently, tubal nuclear norm (TNN) based optimization is proposed with superior performance as compared to other tensor nuclear norms. However, one major limitation is its orientation sensitivity due to low-rankness strictly defined along tubal orientation and it cannot simultaneously model spectral low-rankness in multiple orientations. To this end, we introduce two new tensor norms called OITNN-O and OITNN-L to exploit multi-orientational spectral low-rankness for an arbitrary K-way (K ≥ 3) tensors. We further formulate two robust tensor decomposition models via the proposed norms and develop two algorithms as the solutions. Theoretically, we establish non-asymptotic error bounds which can predict the scaling behavior of the estimation error. Experiments on real-world datasets demonstrate the superiority and effectiveness of the proposed norms.

【Keywords】:

748. Robust Self-Weighted Multi-View Projection Clustering.

Paper Link】 【Pages】:6110-6117

【Authors】: Beilei Wang ; Yun Xiao ; Zhihui Li ; Xuanhong Wang ; Xiaojiang Chen ; Dingyi Fang

【Abstract】: Many real-world applications involve data collected from different views and with high data dimensionality. Furthermore, multi-view data always has unavoidable noise. Clustering on this kind of high-dimensional and noisy multi-view data remains a challenge due to the curse of dimensionality and ineffective de-noising and integration of multiple views. Aiming at this problem, in this paper, we propose a Robust Self-weighted Multi-view Projection Clustering (RSwMPC) based on ℓ2,1-norm, which can simultaneously reduce dimensionality, suppress noise and learn local structure graph. Then the obtained optimal graph can be directly used for clustering while no further processing is required. In addition, a new method is introduced to automatically learn the optimal weight of each view with no need to generate additional parameters to adjust the weight. Extensive experimental results on different synthetic datasets and real-world datasets demonstrate that the proposed algorithm outperforms other state-of-the-art methods on clustering performance and robustness.

【Keywords】:

749. Learning General Latent-Variable Graphical Models with Predictive Belief Propagation.

Paper Link】 【Pages】:6118-6126

【Authors】: Borui Wang ; Geoffrey Gordon

【Abstract】: Learning general latent-variable probabilistic graphical models is a key theoretical challenge in machine learning and artificial intelligence. All previous methods, including the EM algorithm and the spectral algorithms, face severe limitations that largely restrict their applicability and affect their performance. In order to overcome these limitations, in this paper we introduce a novel formulation of message-passing inference over junction trees named predictive belief propagation, and propose a new learning and inference algorithm for general latent-variable graphical models based on this formulation. Our proposed algorithm reduces the hard parameter learning problem into a sequence of supervised learning problems, and unifies the learning of different kinds of latent graphical models into a single learning framework, which is local-optima-free and statistically consistent. We then give a proof of the correctness of our algorithm and show in experiments on both synthetic and real datasets that our algorithm significantly outperforms both the EM algorithm and the spectral algorithm while also being orders of magnitude faster to compute.

【Keywords】:

750. SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback.

Paper Link】 【Pages】:6127-6136

【Authors】: Chao Wang ; Hengshu Zhu ; Chen Zhu ; Chuan Qin ; Hui Xiong

【Abstract】: The recent development of online recommender systems has a focus on collaborative ranking from implicit feedback, such as user clicks and purchases. Different from explicit ratings, which reflect graded user preferences, the implicit feedback only generates positive and unobserved labels. While considerable efforts have been made in this direction, the well-known pairwise and listwise approaches have still been limited by various challenges. Specifically, for the pairwise approaches, the assumption of independent pairwise preference is not always held in practice. Also, the listwise approaches cannot efficiently accommodate “ties” due to the precondition of the entire list permutation. To this end, in this paper, we propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to inherently accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons and can be implemented with matrix factorization and neural networks. Meanwhile, we also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to √M/N, where M and N are the numbers of items and users, respectively. Finally, extensive experiments on four real-world datasets clearly validate the superiority of SetRank compared with various state-of-the-art baselines.

【Keywords】:

751. Estimating Stochastic Linear Combination of Non-Linear Regressions.

Paper Link】 【Pages】:6137-6144

【Authors】: Di Wang ; Xiangyu Guo ; Chaowen Guan ; Shi Li ; Jinhui Xu

【Abstract】: In this paper we study the problem of estimating stochastic linear combination of non-linear regressions, which has a close connection with many machine learning and statistical models such as non-linear regressions, the Single Index, Multi-index, Varying Coefficient Index Models and Two-layer Neural Networks. Specifically, we first show that with some mild assumptions, if the variate vector x is multivariate Gaussian, then there is an algorithm whose output vectors have ℓ2-norm estimation errors of O(√p/n) with high probability, where p is the dimension of x and n is the number of samples. Then we extend our result to the case where x is sub-Gaussian using the zero-bias transformation, which could be seen as a generalization of the classic Stein's lemma. We also show that with some additional assumptions there is an algorithm whose output vectors have ℓ∞-norm estimation errors of O(1/√p + √p/n) with high probability. Finally, for both Gaussian and sub-Gaussian cases we propose a faster sub-sampling based algorithm and show that when the sub-sample sizes are large enough then the estimation errors will not be sacrificed by too much. Experiments for both cases support our theoretical results. To the best of our knowledge, this is the first work that studies and provides theoretical guarantees for the stochastic linear combination of non-linear regressions model.

【Keywords】:

752. Compact Autoregressive Network.

Paper Link】 【Pages】:6145-6152

【Authors】: Di Wang ; Feiqing Huang ; Jingyu Zhao ; Guodong Li ; Guangjian Tian

【Abstract】: Autoregressive networks can achieve promising performance in many sequence modeling tasks with short-range dependence. However, when handling high-dimensional inputs and outputs, the massive amount of parameters in the network leads to expensive computational cost and low learning efficiency. The problem can be alleviated slightly by introducing one more narrow hidden layer to the network, but the sample size required to achieve a certain training error is still substantial. To address this challenge, we rearrange the weight matrices of a linear autoregressive network into a tensor form, and then make use of Tucker decomposition to represent low-rank structures. This leads to a novel compact autoregressive network, called Tucker AutoRegressive (TAR) net. Interestingly, the TAR net can be applied to sequences with long-range dependence since the dimension along the sequential order is reduced. Theoretical studies show that the TAR net improves the learning efficiency, and requires much fewer samples for model training. Experiments on synthetic and real-world datasets demonstrate the promising performance of the proposed compact network.

【Keywords】:

753. Neural Cognitive Diagnosis for Intelligent Education Systems.

Paper Link】 【Pages】:6153-6161

【Authors】: Fei Wang ; Qi Liu ; Enhong Chen ; Zhenya Huang ; Yuying Chen ; Yu Yin ; Zai Huang ; Shijin Wang

【Abstract】: Cognitive diagnosis is a fundamental issue in intelligent education, which aims to discover the proficiency level of students on specific knowledge concepts. Existing approaches usually mine linear interactions of student exercising process by manual-designed function (e.g., logistic function), which is not sufficient for capturing complex relations between students and exercises. In this paper, we propose a general Neural Cognitive Diagnosis (NeuralCD) framework, which incorporates neural networks to learn the complex exercising interactions, for getting both accurate and interpretable diagnosis results. Specifically, we project students and exercises to factor vectors and leverage multi neural layers for modeling their interactions, where the monotonicity assumption is applied to ensure the interpretability of both factors. Furthermore, we propose two implementations of NeuralCD by specializing the required concepts of each exercise, i.e., the NeuralCDM with traditional Q-matrix and the improved NeuralCDM+ exploring the rich text content. Extensive experimental results on real-world datasets show the effectiveness of NeuralCD framework with both accuracy and interpretability.

【Keywords】:

754. Adapting to Smoothness: A More Universal Algorithm for Online Convex Optimization.

Paper Link】 【Pages】:6162-6169

【Authors】: Guanghui Wang ; Shiyin Lu ; Yao Hu ; Lijun Zhang

【Abstract】: We aim to design universal algorithms for online convex optimization, which can handle multiple common types of loss functions simultaneously. The previous state-of-the-art universal method has achieved the minimax optimality for general convex, exponentially concave and strongly convex loss functions. However, it remains an open problem whether smoothness can be exploited to further improve the theoretical guarantees. In this paper, we provide an affirmative answer by developing a novel algorithm, namely UFO, which achieves O(√L), O(d log L) and O(log L) regret bounds for the three types of loss functions respectively under the assumption of smoothness, where L is the cumulative loss of the best comparator in hindsight, and d is dimensionality. Thus, our regret bounds are much tighter when the comparator has a small loss, and ensure the minimax optimality in the worst case. In addition, it is worth pointing out that UFO is the first to achieve the O(log L*) regret bound for strongly convex and smooth functions, which is tighter than the existing small-loss bound by an O(d) factor.

【Keywords】:

755. Repetitive Reprediction Deep Decipher for Semi-Supervised Learning.

Paper Link】 【Pages】:6170-6177

【Authors】: Guo-Hua Wang ; Jianxin Wu

【Abstract】: Most recent semi-supervised deep learning (deep SSL) methods used a similar paradigm: use network predictions to update pseudo-labels and use pseudo-labels to update network parameters iteratively. However, they lack theoretical support and cannot explain why predictions are good candidates for pseudo-labels. In this paper, we propose a principled end-to-end framework named deep decipher (D2) for SSL. Within the D2 framework, we prove that pseudo-labels are related to network predictions by an exponential link function, which gives a theoretical support for using predictions as pseudo-labels. Furthermore, we demonstrate that updating pseudo-labels by network predictions will make them uncertain. To mitigate this problem, we propose a training strategy called repetitive reprediction (R2). Finally, the proposed R2-D2 method is tested on the large-scale ImageNet dataset and outperforms state-of-the-art methods by 5 percentage points.

【Keywords】:

756. Incorporating Label Embedding and Feature Augmentation for Multi-Dimensional Classification.

Paper Link】 【Pages】:6178-6185

【Authors】: Haobo Wang ; Chen Chen ; Weiwei Liu ; Ke Chen ; Tianlei Hu ; Gang Chen

【Abstract】: Feature augmentation, which manipulates the feature space by integrating the label information, is one of the most popular strategies for solving Multi-Dimensional Classification (MDC) problems. However, the vanilla feature augmentation approaches fail to consider the intra-class exclusiveness, and may achieve degenerated performance. To fill this gap, a novel neural network based model is proposed which seamlessly integrates the Label Embedding and Feature Augmentation (LEFA) techniques to learn label correlations. Specifically, based on attentional factorization machine, a cross correlation aware network is introduced to learn a low-dimensional label representation that simultaneously depicts the inter-class correlations and the intra-class exclusiveness. Then the learned latent label vector can be used to augment the original feature space. Extensive experiments on seven real-world datasets demonstrate the superiority of LEFA over state-of-the-art MDC approaches.

【Keywords】:

757. M-NAS: Meta Neural Architecture Search.

Paper Link】 【Pages】:6186-6193

【Authors】: Jiaxing Wang ; Jiaxiang Wu ; Haoli Bai ; Jian Cheng

【Abstract】: Neural Architecture Search (NAS) has recently outperformed hand-crafted networks in various areas. However, most prevalent NAS methods only focus on a pre-defined task. For a previously unseen task, the architecture is either searched from scratch, which is inefficient, or transferred from the one obtained on some other task, which might be sub-optimal. In this paper, we investigate a previously unexplored problem: whether a universal NAS method exists, such that task-aware architectures can be effectively generated? Towards this problem, we propose Meta Neural Architecture Search (M-NAS). To obtain task-specific architectures, M-NAS adopts a task-aware architecture controller for child model generation. Since optimal weights for different tasks and architectures span diversely, we resort to meta-learning, and learn meta-weights that efficiently adapt to a new task on the corresponding architecture with only several gradient descent steps. Experimental results demonstrate the superiority of M-NAS against a number of competitive baselines on both toy regression and few shot classification problems.

【Keywords】:

Paper Link】 【Pages】:6194-6201

【Authors】: Jing Wang ; Weiqing Min ; Sujuan Hou ; Shengnan Ma ; Yuanjie Zheng ; Haishuai Wang ; Shuqiang Jiang

【Abstract】: Logo classification has gained increasing attention for its various applications, such as copyright infringement detection, product recommendation and contextual advertising. Compared with other types of object images, the real-world logo images have larger variety in logo appearance and more complexity in their background. Therefore, recognizing the logo from images is challenging. To support efforts towards scalable logo classification task, we have curated a dataset, Logo-2K+, a new large-scale publicly available real-world logo dataset with 2,341 categories and 167,140 images. Compared with existing popular logo datasets, such as FlickrLogos-32 and LOGO-Net, Logo-2K+ has more comprehensive coverage of logo categories and larger quantity of logo images. Moreover, we propose a Discriminative Region Navigation and Augmentation Network (DRNA-Net), which is capable of discovering more informative logo regions and augmenting these image regions for logo classification. DRNA-Net consists of four sub-networks: the navigator sub-network first selected informative logo-relevant regions guided by the teacher sub-network, which can evaluate its confidence belonging to the ground-truth logo class. The data augmentation sub-network then augments the selected regions via both region cropping and region dropping. Finally, the scrutinizer sub-network fuses features from augmented regions and the whole image for logo classification. Comprehensive experiments on Logo-2K+ and other three existing benchmark datasets demonstrate the effectiveness of proposed method. Logo-2K+ and the proposed strong baseline DRNA-Net are expected to further the development of scalable logo image recognition, and the Logo-2K+ dataset can be found at https://github.com/msn199959/Logo-2k-plus-Dataset.

【Keywords】:

759. Reinforcement Learning with Perturbed Rewards.

Paper Link】 【Pages】:6202-6209

【Authors】: Jingkang Wang ; Yang Liu ; Bo Li

【Abstract】: Recent studies have shown that reinforcement learning (RL) models are vulnerable in various noisy scenarios. For instance, the observed reward channel is often subject to noise in practice (e.g., when rewards are collected through sensors), and is therefore not credible. In addition, for applications such as robotics, a deep reinforcement learning (DRL) algorithm can be manipulated to produce arbitrary errors by receiving corrupted rewards. In this paper, we consider noisy RL problems with perturbed rewards, which can be approximated with a confusion matrix. We develop a robust RL framework that enables agents to learn in noisy environments where only perturbed rewards are observed. Our solution framework builds on existing RL/DRL algorithms and firstly addresses the biased noisy reward setting without any assumptions on the true distribution (e.g., zero-mean Gaussian noise as made in previous works). The core ideas of our solution include estimating a reward confusion matrix and defining a set of unbiased surrogate rewards. We prove the convergence and sample complexity of our approach. Extensive experiments on different DRL platforms show that trained policies based on our estimated surrogate reward can achieve higher expected rewards, and converge faster than existing baselines. For instance, the state-of-the-art PPO algorithm is able to obtain 84.6% and 80.8% improvements on average score for five Atari games, with error rates as 10% and 30% respectively.

【Keywords】:

760. Crowdfunding Dynamics Tracking: A Reinforcement Learning Approach.

Paper Link】 【Pages】:6210-6218

【Authors】: Jun Wang ; Hefu Zhang ; Qi Liu ; Zhen Pan ; Hanqing Tao

【Abstract】: Recent years have witnessed the increasing interests in research of crowdfunding mechanism. In this area, dynamics tracking is a significant issue but is still under exploration. Existing studies either fit the fluctuations of time-series or employ regularization terms to constrain learned tendencies. However, few of them take into account the inherent decision-making process between investors and crowdfunding dynamics. To address the problem, in this paper, we propose a Trajectory-based Continuous Control for Crowdfunding (TC3) algorithm to predict the funding progress in crowdfunding. Specifically, actor-critic frameworks are employed to model the relationship between investors and campaigns, where all of the investors are viewed as an agent that could interact with the environment derived from the real dynamics of campaigns. Then, to further explore the in-depth implications of patterns (i.e., typical characters) in funding series, we propose to subdivide them into fast-growing and slow-growing ones. Moreover, for the purpose of switching from different kinds of patterns, the actor component of TC3 is extended with a structure of options, which comes to the TC3-Options. Finally, extensive experiments on the Indiegogo dataset not only demonstrate the effectiveness of our methods, but also validate our assumption that the entire pattern learned by TC3-Options is indeed the U-shaped one.

【Keywords】:

761. Differentially Private Learning with Small Public Data.

Paper Link】 【Pages】:6219-6226

【Authors】: Jun Wang ; Zhi-Hua Zhou

【Abstract】: Differentially private learning tackles tasks where the data are private and the learning process is subject to differential privacy requirements. In real applications, however, some public data are generally available in addition to private data, and it is interesting to consider how to exploit them. In this paper, we study a common situation where a small amount of public data can be used when solving the Empirical Risk Minimization problem over a private database. Specifically, we propose Private-Public Stochastic Gradient Descent, which utilizes such public information to adjust parameters in differentially private stochastic gradient descent and fine-tunes the final result with model reuse. Our method keeps differential privacy for the private database, and empirical study validates its superiority compared with existing approaches.

【Keywords】:

762. Dual Relation Semi-Supervised Multi-Label Learning.

Paper Link】 【Pages】:6227-6234

【Authors】: Lichen Wang ; Yunyu Liu ; Can Qin ; Gan Sun ; Yun Fu

【Abstract】: Multi-label learning (MLL) solves the problem that one single sample corresponds to multiple labels. It is a challenging task due to the long-tail label distribution and the sophisticated label relations. Semi-supervised MLL methods utilize a small-scale labeled samples and large-scale unlabeled samples to enhance the performance. However, these approaches mainly focus on exploring the data distribution in feature space while ignoring mining the label relation inside of each instance. To this end, we proposed a Dual Relation Semi-supervised Multi-label Learning (DRML) approach which jointly explores the feature distribution and the label relation simultaneously. A dual-classifier domain adaptation strategy is proposed to align features while generating pseudo labels to improve learning performance. A relation network is proposed to explore the relation knowledge. As a result, DRML effectively explores the feature-label and label-label relations in both labeled and unlabeled samples. It is an end-to-end model without any extra knowledge. Extensive experiments illustrate the effectiveness and efficiency of our method1.

【Keywords】:

763. A Knowledge Transfer Framework for Differentially Private Sparse Learning.

Paper Link】 【Pages】:6235-6242

【Authors】: Lingxiao Wang ; Quanquan Gu

【Abstract】: We study the problem of estimating high dimensional models with underlying sparse structures while preserving the privacy of each training example. We develop a differentially private high-dimensional sparse learning framework using the idea of knowledge transfer. More specifically, we propose to distill the knowledge from a “teacher” estimator trained on a private dataset, by creating a new dataset from auxiliary features, and then train a differentially private “student” estimator using this new dataset. In addition, we establish the linear convergence rate as well as the utility guarantee for our proposed method. For sparse linear regression and sparse logistic regression, our method achieves improved utility guarantees compared with the best known results (Kifer, Smith and Thakurta 2012; Wang and Gu 2019). We further demonstrate the superiority of our framework through both synthetic and real-world data experiments.

【Keywords】:

764. Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling.

Paper Link】 【Pages】:6243-6250

【Authors】: Qian Wang ; Toby P. Breckon

【Abstract】: Unsupervised domain adaptation aims to address the problem of classifying unlabeled samples from the target domain whilst labeled samples are only available from the source domain and the data distributions are different in these two domains. As a result, classifiers trained from labeled samples in the source domain suffer from significant performance drop when directly applied to the samples from the target domain. To address this issue, different approaches have been proposed to learn domain-invariant features or domain-specific classifiers. In either case, the lack of labeled samples in the target domain can be an issue which is usually overcome by pseudo-labeling. Inaccurate pseudo-labeling, however, could result in catastrophic error accumulation during learning. In this paper, we propose a novel selective pseudo-labeling strategy based on structured prediction. The idea of structured prediction is inspired by the fact that samples in the target domain are well clustered within the deep feature space so that unsupervised clustering analysis can be used to facilitate accurate pseudo-labeling. Experimental results on four datasets (i.e. Office-Caltech, Office31, ImageCLEF-DA and Office-Home) validate our approach outperforms contemporary state-of-the-art methods.

【Keywords】:

765. Learning from Weak-Label Data: A Deep Forest Expedition.

Paper Link】 【Pages】:6251-6258

【Authors】: Qian-Wei Wang ; Liang Yang ; Yu-Feng Li

【Abstract】: Weak-label learning deals with the problem where each training example is associated with multiple ground-truth labels simultaneously but only partially provided. This circumstance is frequently encountered when the number of classes is very large or when there exists a large ambiguity between class labels, and significantly influences the performance of multi-label learning. In this paper, we propose LCForest, which is the first tree ensemble based deep learning method for weak-label learning. Rather than formulating the problem as a regularized framework, we employ the recently proposed cascade forest structure, which processes information layer-by-layer, and endow it with the ability of exploiting from weak-label data by a concise and highly efficient label complement structure. Specifically, in each layer, the label vector of each instance from testing-fold is modified with the predictions of random forests trained with the corresponding training-fold. Since the ground-truth label matrix is inaccessible, we can not estimate the performance via cross-validation directly. In order to control the growth of cascade forest, we adopt label frequency estimation and the complement flag mechanism. Experiments show that the proposed LCForest method compares favorably against the existing state-of-the-art multi-label and weak-label learning methods.

【Keywords】:

766. Intention Nets: Psychology-Inspired User Choice Behavior Modeling for Next-Basket Prediction.

Paper Link】 【Pages】:6259-6266

【Authors】: Shoujin Wang ; Liang Hu ; Yan Wang ; Quan Z. Sheng ; Mehmet A. Orgun ; Longbing Cao

【Abstract】: Human behaviors are complex, which are often observed as a sequence of heterogeneous actions. In this paper, we take user choices for shopping baskets as a typical case to study the complexity of user behaviors. Most of existing approaches often model user behaviors in a mechanical way, namely treating a user action sequence as homogeneous sequential data, such as hourly temperatures, which fails to consider the complexity in user behaviors. In fact, users' choices are driven by certain underlying intentions (e.g., feeding the baby or relieving pain) according to Psychological theories. Moreover, the durations of intentions to drive user actions are quite different; some of them may be persistent while others may be transient. According to Psychological theories, we develop a hierarchical framework to describe the goal, intentions and action sequences, based on which, we design Intention Nets (IntNet). In IntNet, multiple Action Chain Nets are constructed to model the user actions driven by different intentions, and a specially designed Persistent-Transient Intention Unit models the different intention durations. We apply the IntNet to next-basket prediction, a recent challenging task in recommender systems. Extensive experiments on real-world datasets show the superiority of our Psychology-inspired model IntNet over the state-of-the-art approaches.

【Keywords】:

767. Multi-Component Graph Convolutional Collaborative Filtering.

Paper Link】 【Pages】:6267-6274

【Authors】: Xiao Wang ; Ruijia Wang ; Chuan Shi ; Guojie Song ; Qingyong Li

【Abstract】: The interactions of users and items in recommender system could be naturally modeled as a user-item bipartite graph. In recent years, we have witnessed an emerging research effort in exploring user-item graph for collaborative filtering methods. Nevertheless, the formation of user-item interactions typically arises from highly complex latent purchasing motivations, such as high cost performance or eye-catching appearance, which are indistinguishably represented by the edges. The existing approaches still remain the differences between various purchasing motivations unexplored, rendering the inability to capture fine-grained user preference. Therefore, in this paper we propose a novel Multi-Component graph convolutional Collaborative Filtering (MCCF) approach to distinguish the latent purchasing motivations underneath the observed explicit user-item interactions. Specifically, there are two elaborately designed modules, decomposer and combiner, inside MCCF. The former first decomposes the edges in user-item graph to identify the latent components that may cause the purchasing relationship; the latter then recombines these latent components automatically to obtain unified embeddings for prediction. Furthermore, the sparse regularizer and weighted random sample strategy are utilized to alleviate the overfitting problem and accelerate the optimization. Empirical results on three real datasets and a synthetic dataset not only show the significant performance gains of MCCF, but also well demonstrate the necessity of considering multiple components.

【Keywords】:

768. Attention-Guide Walk Model in Heterogeneous Information Network for Multi-Style Recommendation Explanation.

Paper Link】 【Pages】:6275-6282

【Authors】: Xin Wang ; Ying Wang ; Yunzhi Ling

【Abstract】: Explainable Recommendation aims at not only providing the recommended items to users, but also making users aware why these items are recommended. Too many interactive factors between users and items can be used to interpret the recommendation in a heterogeneous information network. However, these interactive factors are usually massive, implicit and noisy. The existing recommendation explanation approaches only consider the single explanation style, such as aspect-level or review-level. To address these issues, we propose a framework (MSRE) of generating the multi-style recommendation explanation with the attention-guide walk model on affiliation relations and interaction relations in the heterogeneous information network. Inspired by the attention mechanism, we determine the important contexts for recommendation explanation and learn joint representation of multi-style user-item interactions for enhancing recommendation performance. Constructing extensive experiments on three real-world datasets verifies the effectiveness of our framework on both recommendation performance and recommendation explanation.

【Keywords】:

769. Federated Latent Dirichlet Allocation: A Local Differential Privacy Based Framework.

Paper Link】 【Pages】:6283-6290

【Authors】: Yansheng Wang ; Yongxin Tong ; Dingyuan Shi

【Abstract】: Latent Dirichlet Allocation (LDA) is a widely adopted topic model for industrial-grade text mining applications. However, its performance heavily relies on the collection of large amount of text data from users' everyday life for model training. Such data collection risks severe privacy leakage if the data collector is untrustworthy. To protect text data privacy while allowing accurate model training, we investigate federated learning of LDA models. That is, the model is collaboratively trained between an untrustworthy data collector and multiple users, where raw text data of each user are stored locally and not uploaded to the data collector. To this end, we propose FedLDA, a local differential privacy (LDP) based framework for federated learning of LDA models. Central in FedLDA is a novel LDP mechanism called Random Response with Priori (RRP), which provides theoretical guarantees on both data privacy and model accuracy. We also design techniques to reduce the communication cost between the data collector and the users during model training. Extensive experiments on three open datasets verified the effectiveness of our solution.

【Keywords】:

770. Transductive Ensemble Learning for Neural Machine Translation.

Paper Link】 【Pages】:6291-6298

【Authors】: Yiren Wang ; Lijun Wu ; Yingce Xia ; Tao Qin ; ChengXiang Zhai ; Tie-Yan Liu

【Abstract】: Ensemble learning, which aggregates multiple diverse models for inference, is a common practice to improve the accuracy of machine learning tasks. However, it has been observed that the conventional ensemble methods only bring marginal improvement for neural machine translation (NMT) when individual models are strong or there are a large number of individual models. In this paper, we study how to effectively aggregate multiple NMT models under the transductive setting where the source sentences of the test set are known. We propose a simple yet effective approach named transductive ensemble learning (TEL), in which we use all individual models to translate the source test set into the target language space and then finetune a strong model on the translated synthetic corpus. We conduct extensive experiments on different settings (with/without monolingual data) and different language pairs (English↔{German, Finnish}). The results show that our approach boosts strong individual models with significant improvement and benefits a lot from more individual models. Specifically, we achieve the state-of-the-art performances on the WMT2016-2018 English↔German translations.

【Keywords】:

771. Dynamic Network Pruning with Interpretable Layerwise Channel Selection.

Paper Link】 【Pages】:6299-6306

【Authors】: Yulong Wang ; Xiaolu Zhang ; Xiaolin Hu ; Bo Zhang ; Hang Su

【Abstract】: Dynamic network pruning achieves runtime acceleration by dynamically determining the inference paths based on different inputs. However, previous methods directly generate continuous decision values for each weight channel, which cannot reflect a clear and interpretable pruning process. In this paper, we propose to explicitly model the discrete weight channel selections, which encourages more diverse weights utilization, and achieves more sparse runtime inference paths. Meanwhile, with the help of interpretable layerwise channel selections in the dynamic network, we can visualize the network decision paths explicitly for model interpretability. We observe that there are clear differences in the layerwise decisions between normal and adversarial examples. Therefore, we propose a novel adversarial example detection algorithm by discriminating the runtime decision features. Experiments show that our dynamic network achieves higher prediction accuracy under the similar computing budgets on CIFAR10 and ImageNet datasets compared to traditional static pruning methods and other dynamic pruning approaches. The proposed adversarial detection algorithm can significantly improve the state-of-the-art detection rate across multiple attacks, which provides an opportunity to build an interpretable and robust model.

【Keywords】:

772. An Objective for Hierarchical Clustering in Euclidean Space and Its Connection to Bisecting K-means.

Paper Link】 【Pages】:6307-6314

【Authors】: Yuyan Wang ; Benjamin Moseley

【Abstract】: This paper explores hierarchical clustering in the case where pairs of points have dissimilarity scores (e.g. distances) as a part of the input. The recently introduced objective for points with dissimilarity scores results in every tree being a ½ approximation if the distances form a metric. This shows the objective does not make a significant distinction between a good and poor hierarchical clustering in metric spaces.Motivated by this, the paper develops a new global objective for hierarchical clustering in Euclidean space. The objective captures the criterion that has motivated the use of divisive clustering algorithms: that when a split happens, points in the same cluster should be more similar than points in different clusters. Moreover, this objective gives reasonable results on ground-truth inputs for hierarchical clustering.The paper builds a theoretical connection between this objective and the bisecting k-means algorithm. This paper proves that the optimal 2-means solution results in a constant approximation for the objective. This is the first paper to show the bisecting k-means algorithm optimizes a natural global objective over the entire tree.

【Keywords】:

773. Non-Local U-Nets for Biomedical Image Segmentation.

Paper Link】 【Pages】:6315-6322

【Authors】: Zhengyang Wang ; Na Zou ; Dinggang Shen ; Shuiwang Ji

【Abstract】: Deep learning has shown its great promise in various biomedical image segmentation tasks. Existing models are typically based on U-Net and rely on an encoder-decoder architecture with stacked local operators to aggregate long-range information gradually. However, only using the local operators limits the efficiency and effectiveness. In this work, we propose the non-local U-Nets, which are equipped with flexible global aggregation blocks, for biomedical image segmentation. These blocks can be inserted into U-Net as size-preserving processes, as well as down-sampling and up-sampling layers. We perform thorough experiments on the 3D multimodality isointense infant brain MR image segmentation task to evaluate the non-local U-Nets. Results show that our proposed models achieve top performances with fewer parameters and faster computation.

【Keywords】:

774. Attention-over-Attention Field-Aware Factorization Machine.

Paper Link】 【Pages】:6323-6330

【Authors】: Zhibo Wang ; Jinxin Ma ; Yongquan Zhang ; Qian Wang ; Ju Ren ; Peng Sun

【Abstract】: Factorization Machine (FM) has been a popular approach in supervised predictive tasks, such as click-through rate prediction and recommender systems, due to its great performance and efficiency. Recently, several variants of FM have been proposed to improve its performance. However, most of the state-of-the-art prediction algorithms neglected the field information of features, and they also failed to discriminate the importance of feature interactions due to the problem of redundant features. In this paper, we present a novel algorithm called Attention-over-Attention Field-aware Factorization Machine (AoAFFM) for better capturing the characteristics of feature interactions. Specifically, we propose the field-aware embedding layer to exploit the field information of features, and combine it with the attention-over-attention mechanism to learn both feature-level and interaction-level attention to estimate the weight of feature interactions. Experimental results show that the proposed AoAFFM improves FM and FFM with large margin, and outperforms state-of-the-art algorithms on three public benchmark datasets.

【Keywords】:

775. Transparent Classification with Multilayer Logical Perceptrons and Random Binarization.

Paper Link】 【Pages】:6331-6339

【Authors】: Zhuo Wang ; Wei Zhang ; Ning Liu ; Jianyong Wang

【Abstract】: Models with transparent inner structure and high classification performance are required to reduce potential risk and provide trust for users in domains like health care, finance, security, etc. However, existing models are hard to simultaneously satisfy the above two properties. In this paper, we propose a new hierarchical rule-based model for classification tasks, named Concept Rule Sets (CRS), which has both a strong expressive ability and a transparent inner structure. To address the challenge of efficiently learning the non-differentiable CRS model, we propose a novel neural network architecture, Multilayer Logical Perceptron (MLLP), which is a continuous version of CRS. Using MLLP and the Random Binarization (RB) method we proposed, we can search the discrete solution of CRS in continuous space using gradient descent and ensure the discrete CRS acts almost the same as the corresponding continuous MLLP. Experiments on 12 public data sets show that CRS outperforms the state-of-the-art approaches and the complexity of the learned CRS is close to the simple decision tree.

【Keywords】:

776. Less Is Better: Unweighted Data Subsampling via Influence Function.

Paper Link】 【Pages】:6340-6347

【Authors】: Zifeng Wang ; Hong Zhu ; Zhenhua Dong ; Xiuqiang He ; Shao-Lun Huang

【Abstract】: In the time of Big Data, training complex models on large-scale data sets is challenging, making it appealing to reduce data volume for saving computation resources by subsampling. Most previous works in subsampling are weighted methods designed to help the performance of subset-model approach the full-set-model, hence the weighted methods have no chance to acquire a subset-model that is better than the full-set-model. However, we question that how can we achieve better model with less data? In this work, we propose a novel Unweighted Influence Data Subsampling (UIDS) method, and prove that the subset-model acquired through our method can outperform the full-set-model. Besides, we show that overly confident on a given test set for sampling is common in Influence-based subsampling methods, which can eventually cause our subset-model's failure in out-of-sample test. To mitigate it, we develop a probabilistic sampling scheme to control the worst-case risk over all distributions close to the empirical distribution. The experiment results demonstrate our methods superiority over existed subsampling methods in diverse tasks, such as text classification, image classification, click-through prediction, etc.

【Keywords】:

777. Multi-View Multiple Clusterings Using Deep Matrix Factorization.

Paper Link】 【Pages】:6348-6355

【Authors】: Shaowei Wei ; Jun Wang ; Guoxian Yu ; Carlotta Domeniconi ; Xiangliang Zhang

【Abstract】: Multi-view clustering aims at integrating complementary information from multiple heterogeneous views to improve clustering results. Existing multi-view clustering solutions can only output a single clustering of the data. Due to their multiplicity, multi-view data, can have different groupings that are reasonable and interesting from different perspectives. However, how to find multiple, meaningful, and diverse clustering results from multi-view data is still a rarely studied and challenging topic in multi-view clustering and multiple clusterings. In this paper, we introduce a deep matrix factorization based solution (DMClusts) to discover multiple clusterings. DMClusts gradually factorizes multi-view data matrices into representational subspaces layer-by-layer and generates one clustering in each layer. To enforce the diversity between generated clusterings, it minimizes a new redundancy quantification term derived from the proximity between samples in these subspaces. We further introduce an iterative optimization procedure to simultaneously seek multiple clusterings with quality and diversity. Experimental results on benchmark datasets confirm that DMClusts outperforms state-of-the-art multiple clustering solutions.

【Keywords】:

778. Towards Certificated Model Robustness Against Weight Perturbations.

Paper Link】 【Pages】:6356-6363

【Authors】: Tsui-Wei Weng ; Pu Zhao ; Sijia Liu ; Pin-Yu Chen ; Xue Lin ; Luca Daniel

【Abstract】: This work studies the sensitivity of neural networks to weight perturbations, firstly corresponding to a newly developed threat model that perturbs the neural network parameters. We propose an efficient approach to compute a certified robustness bound of weight perturbations, within which neural networks will not make erroneous outputs as desired by the adversary. In addition, we identify a useful connection between our developed certification method and the problem of weight quantization, a popular model compression technique in deep neural networks (DNNs) and a ‘must-try’ step in the design of DNN inference engines on resource constrained computing platforms, such as mobiles, FPGA, and ASIC. Specifically, we study the problem of weight quantization – weight perturbations in the non-adversarial setting – through the lens of certificated robustness, and we demonstrate significant improvements on the generalization ability of quantized networks through our robustness-aware quantization scheme.

【Keywords】:

779. ODIN: ODE-Informed Regression for Parameter and State Inference in Time-Continuous Dynamical Systems.

Paper Link】 【Pages】:6364-6371

【Authors】: Philippe Wenk ; Gabriele Abbati ; Michael A. Osborne ; Bernhard Schölkopf ; Andreas Krause ; Stefan Bauer

【Abstract】: Parameter inference in ordinary differential equations is an important problem in many applied sciences and in engineering, especially in a data-scarce setting. In this work, we introduce a novel generative modeling approach based on constrained Gaussian processes and leverage it to build a computationally and data efficient algorithm for state and parameter inference. In an extensive set of experiments, our approach outperforms the current state of the art for parameter inference both in terms of accuracy and computational cost. It also shows promising results for the much more challenging problem of model selection.

【Keywords】:

780. Characterizing Membership Privacy in Stochastic Gradient Langevin Dynamics.

Paper Link】 【Pages】:6372-6379

【Authors】: Bingzhe Wu ; Chaochao Chen ; Shiwan Zhao ; Cen Chen ; Yuan Yao ; Guangyu Sun ; Li Wang ; Xiaolu Zhang ; Jun Zhou

【Abstract】: Bayesian deep learning is recently regarded as an intrinsic way to characterize the weight uncertainty of deep neural networks (DNNs). Stochastic Gradient Langevin Dynamics (SGLD) is an effective method to enable Bayesian deep learning on large-scale datasets. Previous theoretical studies have shown various appealing properties of SGLD, ranging from the convergence properties to the generalization bounds. In this paper, we study the properties of SGLD from a novel perspective of membership privacy protection (i.e., preventing the membership attack). The membership attack, which aims to determine whether a specific sample is used for training a given DNN model, has emerged as a common threat against deep learning algorithms. To this end, we build a theoretical framework to analyze the information leakage (w.r.t. the training dataset) of a model trained using SGLD. Based on this framework, we demonstrate that SGLD can prevent the information leakage of the training dataset to a certain extent. Moreover, our theoretical analysis can be naturally extended to other types of Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) methods. Empirical results on different datasets and models verify our theoretical findings and suggest that the SGLD algorithm can not only reduce the information leakage but also improve the generalization ability of the DNN models in real-world applications.

【Keywords】:

781. Vector Quantization-Based Regularization for Autoencoders.

Paper Link】 【Pages】:6380-6387

【Authors】: Hanwei Wu ; Markus Flierl

【Abstract】: Autoencoders and their variations provide unsupervised models for learning low-dimensional representations for downstream tasks. Without proper regularization, autoencoder models are susceptible to the overfitting problem and the so-called posterior collapse phenomenon. In this paper, we introduce a quantization-based regularizer in the bottleneck stage of autoencoder models to learn meaningful latent representations. We combine both perspectives of Vector Quantized-Variational AutoEncoders (VQ-VAE) and classical denoising regularization methods of neural networks. We interpret quantizers as regularizers that constrain latent representations while fostering a similarity-preserving mapping at the encoder. Before quantization, we impose noise on the latent codes and use a Bayesian estimator to optimize the quantizer-based representation. The introduced bottleneck Bayesian estimator outputs the posterior mean of the centroids to the decoder, and thus, is performing soft quantization of the noisy latent codes. We show that our proposed regularization method results in improved latent representations for both supervised learning and clustering downstream tasks when compared to autoencoders using other bottleneck structures.

【Keywords】:

782. Unified Graph and Low-Rank Tensor Learning for Multi-View Clustering.

Paper Link】 【Pages】:6388-6395

【Authors】: Jianlong Wu ; Xingyu Xie ; Liqiang Nie ; Zhouchen Lin ; Hongbin Zha

【Abstract】: Multi-view clustering aims to take advantage of multiple views information to improve the performance of clustering. Many existing methods compute the affinity matrix by low-rank representation (LRR) and pairwise investigate the relationship between views. However, LRR suffers from the high computational cost in self-representation optimization. Besides, compared with pairwise views, tensor form of all views' representation is more suitable for capturing the high-order correlations among all views. Towards these two issues, in this paper, we propose the unified graph and low-rank tensor learning (UGLTL) for multi-view clustering. Specifically, on the one hand, we learn the view-specific affinity matrix based on projected graph learning. On the other hand, we reorganize the affinity matrices into tensor form and learn its intrinsic tensor based on low-rank tensor approximation. Finally, we unify these two terms together and jointly learn the optimal projection matrices, affinity matrices and intrinsic low-rank tensor. We also propose an efficient algorithm to iteratively optimize the proposed model. To evaluate the performance of the proposed method, we conduct extensive experiments on multiple benchmarks across different scenarios and sizes. Compared with the state-of-the-art approaches, our method achieves much better performance.

【Keywords】:

783. Estimating Early Fundraising Performance of Innovations via Graph-Based Market Environment Model.

Paper Link】 【Pages】:6396-6403

【Authors】: Likang Wu ; Zhi Li ; Hongke Zhao ; Zhen Pan ; Qi Liu ; Enhong Chen

【Abstract】: Well begun is half done. In the crowdfunding market, the early fundraising performance of the project is a concerned issue for both creators and platforms. However, estimating the early fundraising performance before the project published is very challenging and still under-explored. To that end, in this paper, we present a focused study on this important problem in a market modeling view. Specifically, we propose a Graph-based Market Environment model (GME) for estimating the early fundraising performance of the target project by exploiting the market environment. In addition, we discriminatively model the market competition and market evolution by designing two graph-based neural network architectures and incorporating them into the joint optimization stage. Finally, we conduct extensive experiments on the real-world crowdfunding data collected from Indiegogo.com. The experimental results clearly demonstrate the effectiveness of our proposed model for modeling and estimating the early fundraising performance of the target project.

【Keywords】:

784. Meta-Amortized Variational Inference and Learning.

Paper Link】 【Pages】:6404-6412

【Authors】: Mike Wu ; Kristy Choi ; Noah D. Goodman ; Stefano Ermon

【Abstract】: Despite the recent success in probabilistic modeling and their applications, generative models trained using traditional inference techniques struggle to adapt to new distributions, even when the target distribution may be closely related to the ones seen during training. In this work, we present a doubly-amortized variational inference procedure as a way to address this challenge. By sharing computation across not only a set of query inputs, but also a set of different, related probabilistic models, we learn transferable latent representations that generalize across several related distributions. In particular, given a set of distributions over images, we find the learned representations to transfer to different data transformations. We empirically demonstrate the effectiveness of our method by introducing the MetaVAE, and show that it significantly outperforms baselines on downstream image classification tasks on MNIST (10-50%) and NORB (10-35%).

【Keywords】:

785. Regional Tree Regularization for Interpretability in Deep Neural Networks.

Paper Link】 【Pages】:6413-6421

【Authors】: Mike Wu ; Sonali Parbhoo ; Michael C. Hughes ; Ryan Kindle ; Leo A. Celi ; Maurizio Zazzi ; Volker Roth ; Finale Doshi-Velez

【Abstract】: The lack of interpretability remains a barrier to adopting deep neural networks across many safety-critical domains. Tree regularization was recently proposed to encourage a deep neural network's decisions to resemble those of a globally compact, axis-aligned decision tree. However, it is often unreasonable to expect a single tree to predict well across all possible inputs. In practice, doing so could lead to neither interpretable nor performant optima. To address this issue, we propose regional tree regularization – a method that encourages a deep model to be well-approximated by several separate decision trees specific to predefined regions of the input space. Across many datasets, including two healthcare applications, we show our approach delivers simpler explanations than other regularization schemes without compromising accuracy. Specifically, our regional regularizer finds many more “desirable” optima compared to global analogues.

【Keywords】:

786. SK-Net: Deep Learning on Point Cloud via End-to-End Discovery of Spatial Keypoints.

Paper Link】 【Pages】:6422-6429

【Authors】: Weikun Wu ; Yan Zhang ; David Wang ; Yunqi Lei

【Abstract】: Since the PointNet was proposed, deep learning on point cloud has been the concentration of intense 3D research. However, existing point-based methods usually are not adequate to extract the local features and the spatial pattern of a point cloud for further shape understanding. This paper presents an end-to-end framework, SK-Net, to jointly optimize the inference of spatial keypoint with the learning of feature representation of a point cloud for a specific point cloud task. One key process of SK-Net is the generation of spatial keypoints (Skeypoints). It is jointly conducted by two proposed regulating losses and a task objective function without knowledge of Skeypoint location annotations and proposals. Specifically, our Skeypoints are not sensitive to the location consistency but are acutely aware of shape. Another key process of SK-Net is the extraction of the local structure of Skeypoints (detail feature) and the local spatial pattern of normalized Skeypoints (pattern feature). This process generates a comprehensive representation, pattern-detail (PD) feature, which comprises the local detail information of a point cloud and reveals its spatial pattern through the part district reconstruction on normalized Skeypoints. Consequently, our network is prompted to effectively understand the correlation between different regions of a point cloud and integrate contextual information of the point cloud. In point cloud tasks, such as classification and segmentation, our proposed method performs better than or comparable with the state-of-the-art approaches. We also present an ablation study to demonstrate the advantages of SK-Net.

【Keywords】:

787. Multi-Label Causal Feature Selection.

Paper Link】 【Pages】:6430-6437

【Authors】: Xingyu Wu ; Bingbing Jiang ; Kui Yu ; Huanhuan Chen ; Chunyan Miao

【Abstract】: Multi-label feature selection has received considerable attentions during the past decade. However, existing algorithms do not attempt to uncover the underlying causal mechanism, and individually solve different types of variable relationships, ignoring the mutual effects between them. Furthermore, these algorithms lack of interpretability, which can only select features for all labels, but cannot explain the correlation between a selected feature and a certain label. To address these problems, in this paper, we theoretically study the causal relationships in multi-label data, and propose a novel Markov blanket based multi-label causal feature selection (MB-MCF) algorithm. MB-MCF mines the causal mechanism of labels and features first, to obtain a complete representation of information about labels. Based on the causal relationships, MB-MCF then selects predictive features and simultaneously distinguishes common features shared by multiple labels and label-specific features owned by single labels. Experiments on real-world data sets validate that MB-MCF could automatically determine the number of selected features and simultaneously achieve the best performance compared with state-of-the-art methods. An experiment in Emotions data set further demonstrates the interpretability of MB-MCF.

【Keywords】:

788. Dual Adversarial Co-Learning for Multi-Domain Text Classification.

Paper Link】 【Pages】:6438-6445

【Authors】: Yuan Wu ; Yuhong Guo

【Abstract】: With the advent of deep learning, the performance of text classification models have been improved significantly. Nevertheless, the successful training of a good classification model requires a sufficient amount of labeled data, while it is always expensive and time consuming to annotate data. With the rapid growth of digital data, similar classification tasks can typically occur in multiple domains, while the availability of labeled data can largely vary across domains. Some domains may have abundant labeled data, while in some other domains there may only exist a limited amount (or none) of labeled data. Meanwhile text classification tasks are highly domain-dependent — a text classifier trained in one domain may not perform well in another domain. In order to address these issues, in this paper we propose a novel dual adversarial co-learning approach for multi-domain text classification (MDTC). The approach learns shared-private networks for feature extraction and deploys dual adversarial regularizations to align features across different domains and between labeled and unlabeled data simultaneously under a discrepancy based co-learning framework, aiming to improve the classifiers' generalization capacity with the learned features. We conduct experiments on multi-domain sentiment classification datasets. The results show the proposed approach achieves the state-of-the-art MDTC performance.

【Keywords】:

789. Efficient Projection-Free Online Methods with Stochastic Recursive Gradient.

Paper Link】 【Pages】:6446-6453

【Authors】: Jiahao Xie ; Zebang Shen ; Chao Zhang ; Boyu Wang ; Hui Qian

【Abstract】: This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-round computational costs. To fill this gap, two efficient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By employing a recursive gradient estimator, our methods achieve optimal regret bounds (up to a logarithmic factor) while possessing low per-round computational costs. Experimental results demonstrate the efficiency of the proposed methods compared to state-of-the-arts.

【Keywords】:

790. Partial Multi-Label Learning with Noisy Label Identification.

Paper Link】 【Pages】:6454-6461

【Authors】: Ming-Kun Xie ; Sheng-Jun Huang

【Abstract】: Partial multi-label learning (PML) deals with problems where each instance is assigned with a candidate label set, which contains multiple relevant labels and some noisy labels. Recent studies usually solve PML problems with the disambiguation strategy, which recovers ground-truth labels from the candidate label set by simply assuming that the noisy labels are generated randomly. In real applications, however, noisy labels are usually caused by some ambiguous contents of the example. Based on this observation, we propose a partial multi-label learning approach to simultaneously recover the ground-truth information and identify the noisy labels. The two objectives are formalized in a unified framework with trace norm and ℓ1 norm regularizers. Under the supervision of the observed noise-corrupted label matrix, the multi-label classifier and noisy label identifier are jointly optimized by incorporating the label correlation exploitation and feature-induced noise model. Extensive experiments on synthetic as well as real-world data sets validate the effectiveness of the proposed approach.

【Keywords】:

791. Infinite ShapeOdds: Nonparametric Bayesian Models for Shape Representations.

Paper Link】 【Pages】:6462-6469

【Authors】: Wei Xing ; Shireen Y. Elhabian ; Robert Michael Kirby ; Ross T. Whitaker ; Shandian Zhe

【Abstract】: Learning compact representations for shapes (binary images) is important for many applications. Although neural network models are very powerful, they usually involve many parameters, require substantial tuning efforts and easily overfit small datasets, which are common in shape-related applications. The state-of-the-art approach, ShapeOdds, as a latent Gaussian model, can effectively prevent overfitting and is more robust. Nonetheless, it relies on a linear projection assumption and is incapable of capturing intrinsic nonlinear shape variations, hence may leading to inferior representations and structure discovery. To address these issues, we propose Infinite ShapeOdds (InfShapeOdds), a Bayesian nonparametric shape model, which is flexible enough to capture complex shape variations and discover hidden cluster structures, while still avoiding overfitting. Specifically, we use matrix Gaussian priors, nonlinear feature mappings and the kernel trick to generalize ShapeOdds to a shape-variate Gaussian process model, which can grasp various nonlinear correlations among the pixels within and across (different) shapes. To further discover the hidden structures in data, we place a Dirichlet process mixture (DPM) prior over the representations to jointly infer the cluster number and memberships. Finally, we exploit the Kronecker-product structure in our model to develop an efficient, truncated variational expectation-maximization algorithm for model estimation. On synthetic and real-world data, we show the advantage of our method in both representation learning and latent structure discovery.

【Keywords】:

792. Learning Feature Interactions with Lorentzian Factorization Machine.

Paper Link】 【Pages】:6470-6477

【Authors】: Canran Xu ; Ming Wu

【Abstract】: Learning representations for feature interactions to model user behaviors is critical for recommendation system and click-trough rate (CTR) predictions. Recent advances in this area are empowered by deep learning methods which could learn sophisticated feature interactions and achieve the state-of-the-art result in an end-to-end manner. These approaches require large number of training parameters integrated with the low-level representations, and thus are memory and computational inefficient. In this paper, we propose a new model named “LorentzFM” that can learn feature interactions embedded in a hyperbolic space in which the violation of triangle inequality for Lorentz distances is available. To this end, the learned representation is benefited by the peculiar geometric properties of hyperbolic triangles, and result in a significant reduction in the number of parameters (20% to 80%) because all the top deep learning layers are not required. With such a lightweight architecture, LorentzFM achieves comparable and even materially better results than the deep learning methods such as DeepFM, xDeepFM and Deep & Cross in both recommendation and CTR prediction tasks.

【Keywords】:

793. Gromov-Wasserstein Factorization Models for Graph Clustering.

Paper Link】 【Pages】:6478-6485

【Authors】: Hongtengl Xu

【Abstract】: We propose a new nonlinear factorization model for graphs that are with topological structures, and optionally, node attributes. This model is based on a pseudometric called Gromov-Wasserstein (GW) discrepancy, which compares graphs in a relational way. It estimates observed graphs as GW barycenters constructed by a set of atoms with different weights. By minimizing the GW discrepancy between each observed graph and its GW barycenter-based estimation, we learn the atoms and their weights associated with the observed graphs. The model achieves a novel and flexible factorization mechanism under GW discrepancy, in which both the observed graphs and the learnable atoms can be unaligned and with different sizes. We design an effective approximate algorithm for learning this Gromov-Wasserstein factorization (GWF) model, unrolling loopy computations as stacked modules and computing gradients with backpropagation. The stacked modules can be with two different architectures, which correspond to the proximal point algorithm (PPA) and Bregman alternating direction method of multipliers (BADMM), respectively. Experiments show that our model obtains encouraging results on clustering graphs.

【Keywords】:

794. Federated Patient Hashing.

Paper Link】 【Pages】:6486-6493

【Authors】: Jie Xu ; Zhenxing Xu ; Peter Walker ; Fei Wang

【Abstract】: Privacy concerns on sharing sensitive data across institutions are particularly paramount for the medical domain, which hinders the research and development of many applications, such as cohort construction for cross-institution observational studies and disease surveillance. Not only that, the large volume and heterogeneity of the patient data pose great challenges for retrieval and analysis. To address these challenges, in this paper, we propose a Federated Patient Hashing (FPH) framework, which collaboratively trains a retrieval model stored in a shared memory while keeping all the patient-level information in local institutions. Specifically, the objective function is constructed by minimization of a similarity preserving loss and a heterogeneity digging loss, which preserves both inter-data and intra-data relationships. Then, by leveraging the concept of Bregman divergence, we implement optimization in a federated manner in both centralized and decentralized learning settings, without accessing the raw training data across institutions. In addition to this, we also analyze the convergence rate of the FPH framework. Extensive experiments on real-world clinical data set from critical care are provided to demonstrate the effectiveness of the proposed method on similar patient matching across institutions.

【Keywords】:

795. Deep Embedded Complementary and Interactive Information for Multi-View Classification.

Paper Link】 【Pages】:6494-6501

【Authors】: Jinglin Xu ; Wenbin Li ; Xinwang Liu ; Dingwen Zhang ; Ji Liu ; Junwei Han

【Abstract】: Multi-view classification optimally integrates various features from different views to improve classification tasks. Though most of the existing works demonstrate promising performance in various computer vision applications, we observe that they can be further improved by sufficiently utilizing complementary view-specific information, deep interactive information between different views, and the strategy of fusing various views. In this work, we propose a novel multi-view learning framework that seamlessly embeds various view-specific information and deep interactive information and introduces a novel multi-view fusion strategy to make a joint decision during the optimization for classification. Specifically, we utilize different deep neural networks to learn multiple view-specific representations, and model deep interactive information through a shared interactive network using the cross-correlations between attributes of these representations. After that, we adaptively integrate multiple neural networks by flexibly tuning the power exponent of weight, which not only avoids the trivial solution of weight but also provides a new approach to fuse outputs from different deterministic neural networks. Extensive experiments on several public datasets demonstrate the rationality and effectiveness of our method.

【Keywords】:

796. Adversarial Domain Adaptation with Domain Mixup.

Paper Link】 【Pages】:6502-6509

【Authors】: Minghao Xu ; Jian Zhang ; Bingbing Ni ; Teng Li ; Chengjie Wang ; Qi Tian ; Wenjun Zhang

【Abstract】: Recent works on domain adaptation reveal the effectiveness of adversarial learning on filling the discrepancy between source and target domains. However, two common limitations exist in current adversarial-learning-based methods. First, samples from two domains alone are not sufficient to ensure domain-invariance at most part of latent space. Second, the domain discriminator involved in these methods can only judge real or fake with the guidance of hard label, while it is more reasonable to use soft scores to evaluate the generated images or features, i.e., to fully utilize the inter-domain information. In this paper, we present adversarial domain adaptation with domain mixup (DM-ADA), which guarantees domain-invariance in a more continuous latent space and guides the domain discriminator in judging samples' difference relative to source and target domains. Domain mixup is jointly conducted on pixel and feature level to improve the robustness of models. Extensive experiments prove that the proposed approach can achieve superior performance on tasks with various degrees of domain shift and data complexity.

【Keywords】:

797. Partial Multi-Label Learning with Label Distribution.

Paper Link】 【Pages】:6510-6517

【Authors】: Ning Xu ; Yun-Peng Liu ; Xin Geng

【Abstract】: Partial multi-label learning (PML) aims to learn from training examples each associated with a set of candidate labels, among which only a subset are valid for the training example. The common strategy to induce predictive model is trying to disambiguate the candidate label set, such as identifying the ground-truth label via utilizing the confidence of each candidate label or estimating the noisy labels in the candidate label sets. Nonetheless, these strategies ignore considering the essential label distribution corresponding to each instance since the label distribution is not explicitly available in the training set. In this paper, a new partial multi-label learning strategy named Pml-ld is proposed to learn from partial multi-label examples via label enhancement. Specifically, label distributions are recovered by leveraging the topological information of the feature space and the correlations among the labels. After that, a multi-class predictive model is learned by fitting a regularized multi-output regressor with the recovered label distributions. Experimental results on synthetic as well as real-world datasets clearly validate the effectiveness of Pml-ld for solving PML problems.

【Keywords】:

798. Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests.

Paper Link】 【Pages】:6518-6525

【Authors】: Xiao Xu ; Fang Dong ; Yanghua Li ; Shaojian He ; Xin Li

【Abstract】: A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to characterize the phenomenon that users' preferences towards different items vary differently over time. In the disjoint payoff model, the reward of playing an arm is determined by an arm-specific preference vector, which is piecewise-stationary with asynchronous and distinct changes across different arms. An efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length T is achieved. The algorithm is further extended to a more general setting with hybrid payoffs where the reward of playing an arm is determined by both an arm-specific preference vector and a joint coefficient vector shared by all arms. Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings.

【Keywords】:

799. Generative-Discriminative Complementary Learning.

Paper Link】 【Pages】:6526-6533

【Authors】: Yanwu Xu ; Mingming Gong ; Junxiang Chen ; Tongliang Liu ; Kun Zhang ; Kayhan Batmanghelich

【Abstract】: The majority of state-of-the-art deep learning methods are discriminative approaches, which model the conditional distribution of labels given inputs features. The success of such approaches heavily depends on high-quality labeled instances, which are not easy to obtain, especially as the number of candidate classes increases. In this paper, we study the complementary learning problem. Unlike ordinary labels, complementary labels are easy to obtain because an annotator only needs to provide a yes/no answer to a randomly chosen candidate class for each instance. We propose a generative-discriminative complementary learning method that estimates the ordinary labels by modeling both the conditional (discriminative) and instance (generative) distributions. Our method, we call Complementary Conditional GAN (CCGAN), improves the accuracy of predicting ordinary labels and is able to generate high-quality instances in spite of weak supervision. In addition to the extensive empirical studies, we also theoretically show that our model can retrieve the true conditional distribution from the complementarily-labeled data.

【Keywords】:

800. To Avoid the Pitfall of Missing Labels in Feature Selection: A Generative Model Gives the Answer.

Paper Link】 【Pages】:6534-6541

【Authors】: Yuanyuan Xu ; Jun Wang ; Jinmao Wei

【Abstract】: In multi-label learning, instances have a large number of noisy and irrelevant features, and each instance is associated with a set of class labels wherein label information is generally incomplete. These missing labels possess two sides like a coin; people cannot predict whether their provided information for feature selection is favorable (relevant) or not (irrelevant) during tossing. Existing approaches either superficially consider the missing labels as negative or indiscreetly impute them with some predicted values, which may either overestimate unobserved labels or introduce new noises in selecting discriminative features. To avoid the pitfall of missing labels, a novel unified framework of selecting discriminative features and modeling incomplete label matrix is proposed from a generative point of view in this paper. Concretely, we relax Smoothness Assumption to infer the label observability, which can reveal the positions of unobserved labels, and employ the spike-and-slab prior to perform feature selection by excluding unobserved labels. Using a data-augmentation strategy leads to full local conjugacy in our model, facilitating simple and efficient Expectation Maximization (EM) algorithm for inference. Quantitative and qualitative experimental results demonstrate the superiority of the proposed approach under various evaluation metrics.

【Keywords】:

801. Light Multi-Segment Activation for Model Compression.

Paper Link】 【Pages】:6542-6549

【Authors】: Zhenhui Xu ; Guolin Ke ; Jia Zhang ; Jiang Bian ; Tie-Yan Liu

【Abstract】: Model compression has become necessary when applying neural networks (NN) into many real application tasks that can accept slightly-reduced model accuracy but with strict tolerance to model complexity. Recently, Knowledge Distillation, which distills the knowledge from well-trained and highly complex teacher model into a compact student model, has been widely used for model compression. However, under the strict requirement on the resource cost, it is quite challenging to make student model achieve comparable performance with the teacher one, essentially due to the drastically-reduced expressiveness ability of the compact student model. Inspired by the nature of the expressiveness ability in NN, we propose to use multi-segment activation, which can significantly improve the expressiveness ability with very little cost, in the compact student model. Specifically, we propose a highly efficient multi-segment activation, called Light Multi-segment Activation (LMA), which can rapidly produce multiple linear regions with very few parameters by leveraging the statistical information. With using LMA, the compact student model is capable of achieving much better performance effectively and efficiently, than the ReLU-equipped one with same model complexity. Furthermore, the proposed method is compatible with other model compression techniques, such as quantization, which means they can be used jointly for better compression performance. Experiments on state-of-the-art NN architectures over the real-world tasks demonstrate the effectiveness and extensibility of the LMA.

【Keywords】:

802. Not All Attention Is Needed: Gated Attention Network for Sequence Data.

Paper Link】 【Pages】:6550-6557

【Authors】: Lanqing Xue ; Xiaopeng Li ; Nevin L. Zhang

【Abstract】: Although deep neural networks generally have fixed network structures, the concept of dynamic mechanism has drawn more and more attention in recent years. Attention mechanisms compute input-dependent dynamic attention weights for aggregating a sequence of hidden states. Dynamic network configuration in convolutional neural networks (CNNs) selectively activates only part of the network at a time for different inputs. In this paper, we combine the two dynamic mechanisms for text classification tasks. Traditional attention mechanisms attend to the whole sequence of hidden states for an input sentence, while in most cases not all attention is needed especially for long sequences. We propose a novel method called Gated Attention Network (GA-Net) to dynamically select a subset of elements to attend to using an auxiliary network, and compute attention weights to aggregate the selected elements. It avoids a significant amount of unnecessary computation on unattended elements, and allows the model to pay attention to important parts of the sequence. Experiments in various datasets show that the proposed method achieves better performance compared with all baseline models with global or local attention while requiring less computation and achieving better interpretability. It is also promising to extend the idea to more complex attention-based models, such as transformers and seq-to-seq models.

【Keywords】:

803. One-Shot Image Classification by Learning to Restore Prototypes.

Paper Link】 【Pages】:6558-6565

【Authors】: Wanqi Xue ; Wei Wang

【Abstract】: One-shot image classification aims to train image classifiers over the dataset with only one image per category. It is challenging for modern deep neural networks that typically require hundreds or thousands of images per class. In this paper, we adopt metric learning for this problem, which has been applied for few- and many-shot image classification by comparing the distance between the test image and the center of each class in the feature space. However, for one-shot learning, the existing metric learning approaches would suffer poor performance because the single training image may not be representative of the class. For example, if the image is far away from the class center in the feature space, the metric-learning based algorithms are unlikely to make correct predictions for the test images because the decision boundary is shifted by this noisy image. To address this issue, we propose a simple yet effective regression model, denoted by RestoreNet, which learns a class agnostic transformation on the image feature to move the image closer to the class center in the feature space. Experiments demonstrate that RestoreNet obtains superior performance over the state-of-the-art methods on a broad range of datasets. Moreover, RestoreNet can be easily combined with other methods to achieve further improvement.

【Keywords】:

804. Effective Data Augmentation with Multi-Domain Learning GANs.

Paper Link】 【Pages】:6566-6574

【Authors】: Shin'ya Yamaguchi ; Sekitoshi Kanai ; Takeharu Eda

【Abstract】: For deep learning applications, the massive data development (e.g., collecting, labeling), which is an essential process in building practical applications, still incurs seriously high costs. In this work, we propose an effective data augmentation method based on generative adversarial networks (GANs), called Domain Fusion. Our key idea is to import the knowledge contained in an outer dataset to a target model by using a multi-domain learning GAN. The multi-domain learning GAN simultaneously learns the outer and target dataset and generates new samples for the target tasks. The simultaneous learning process makes GANs generate the target samples with high fidelity and variety. As a result, we can obtain accurate models for the target tasks by using these generated samples even if we only have an extremely low volume target dataset. We experimentally evaluate the advantages of Domain Fusion in image classification tasks on 3 target datasets: CIFAR-100, FGVC-Aircraft, and Indoor Scene Recognition. When trained on each target dataset reduced the samples to 5,000 images, Domain Fusion achieves better classification accuracy than the data augmentation using fine-tuned GANs. Furthermore, we show that Domain Fusion improves the quality of generated samples, and the improvements can contribute to higher accuracy.

【Keywords】:

805. Partial Label Learning with Batch Label Correction.

Paper Link】 【Pages】:6575-6582

【Authors】: Yan Yan ; Yuhong Guo

【Abstract】: Partial label (PL) learning tackles the problem where each training instance is associated with a set of candidate labels, among which only one is the true label. In this paper, we propose a simple but effective batch-based partial label learning algorithm named PL-BLC, which tackles the partial label learning problem with batch-wise label correction (BLC). PL-BLC dynamically corrects the label confidence matrix of each training batch based on the current prediction network, and adopts a MixUp data augmentation scheme to enhance the underlying true labels against the redundant noisy labels. In addition, it introduces a teacher model through a consistency cost to ensure the stability of the batch-based prediction network update. Extensive experiments are conducted on synthesized and real-world partial label learning datasets, while the proposed approach demonstrates the state-of-the-art performance for partial label learning.

【Keywords】:

806. Active Learning with Query Generation for Cost-Effective Text Classification.

Paper Link】 【Pages】:6583-6590

【Authors】: Yifan Yan ; Sheng-Jun Huang ; Shaoyi Chen ; Meng Liao ; Jin Xu

【Abstract】: Labeling a text document is usually time consuming because it requires the annotator to read the whole document and check its relevance with each possible class label. It thus becomes rather expensive to train an effective model for text classification when it involves a large dataset of long documents. In this paper, we propose an active learning approach for text classification with lower annotation cost. Instead of scanning all the examples in the unlabeled data pool to select the best one for query, the proposed method automatically generates the most informative examples based on the classification model, and thus can be applied to tasks with large scale or even infinite unlabeled data. Furthermore, we propose to approximate the generated example with a few summary words by sparse reconstruction, which allows the annotators to easily assign the class label by reading a few words rather than the long document. Experiments on different datasets demonstrate that the proposed approach can effectively improve the classification performance while significantly reduce the annotation cost.

【Keywords】:

807. Towards Accurate Low Bit-Width Quantization with Multiple Phase Adaptations.

Paper Link】 【Pages】:6591-6598

【Authors】: Zhaoyi Yan ; Yemin Shi ; Yaowei Wang ; Mingkui Tan ; Zheyang Li ; Wenming Tan ; Yonghong Tian

【Abstract】: Low bit-width model quantization is highly desirable when deploying a deep neural network on mobile and edge devices. Quantization is an effective way to reduce the model size with low bit-width weight representation. However, the unacceptable accuracy drop hinders the development of this approach. One possible reason for this is that the weights in quantization intervals are directly assigned to the center. At the same time, some quantization applications are limited by the various of different network models. Accordingly, in this paper, we propose Multiple Phase Adaptations (MPA), a framework designed to address these two problems. Firstly, weights in the target interval are assigned to center by gradually spreading the quantization range. During the MPA process, the accuracy drop can be compensated for the unquantized parts. Moreover, as MPA does not introduce hyperparameters that depend on different models or bit-width, the framework can be conveniently applied to various models. Extensive experiments demonstrate that MPA achieves higher accuracy than most existing methods on classification tasks for AlexNet, VGG-16 and ResNet.

【Keywords】:

808. Variational Adversarial Kernel Learned Imitation Learning.

Paper Link】 【Pages】:6599-6606

【Authors】: Fan Yang ; Alina Vereshchaka ; Yufan Zhou ; Changyou Chen ; Wen Dong

【Abstract】: Imitation learning refers to the problem where an agent learns to perform a task through observing and mimicking expert demonstrations, without knowledge of the cost function. State-of-the-art imitation learning algorithms reduce imitation learning to distribution-matching problems by minimizing some distance measures. However, the distance measure may not always provide informative signals for a policy update. To this end, we propose the variational adversarial kernel learned imitation learning (VAKLIL), which measures the distance using the maximum mean discrepancy with variational kernel learning. Our method optimizes over a large cost-function space and is sample efficient and robust to overfitting. We demonstrate the performance of our algorithm through benchmarking with four state-of-the-art imitation learning algorithms over five high-dimensional control tasks, and a complex transportation control task. Experimental results indicate that our algorithm significantly outperforms related algorithms in all scenarios.

【Keywords】:

809. Revisiting Online Quantum State Learning.

Paper Link】 【Pages】:6607-6614

【Authors】: Feidiao Yang ; Jiaqing Jiang ; Jialin Zhang ; Xiaoming Sun

【Abstract】: In this paper, we study the online quantum state learning problem which is recently proposed by Aaronson et al. (2018). In this problem, the learning algorithm sequentially predicts quantum states based on observed measurements and losses and the goal is to minimize the regret. In the previous work, the existing algorithms may output mixed quantum states. However, in many scenarios, the prediction of a pure quantum state is required. In this paper, we first propose a Follow-the-Perturbed-Leader (FTPL) algorithm that can guarantee to predict pure quantum states. Theoretical analysis shows that our algorithm can achieve an O(√T) expected regret under some reasonable settings. In the case that the pure state prediction is not mandatory, we propose another deterministic learning algorithm which is simpler and more efficient. The algorithm is based on the online gradient descent (OGD) method and can also achieve an O(√T) regret bound. The main technical contribution of this result is an algorithm of projecting an arbitrary Hermitian matrix onto the set of density matrices with respect to the Frobenius norm. We think this subroutine is of independent interest and can be widely used in many other problems in the quantum computing area. In addition to the theoretical analysis, we evaluate the algorithms with a series of simulation experiments. The experimental results show that our FTPL method and OGD method outperform the existing RFTL approach proposed by Aaronson et al. (2018) in almost all settings. In the implementation of the RFTL approach, we give a closed-form solution to the algorithm. This provides an efficient, accurate, and completely executable solution to the RFTL method.

【Keywords】:

810. Bi-Directional Generation for Unsupervised Domain Adaptation.

Paper Link】 【Pages】:6615-6622

【Authors】: Guanglei Yang ; Haifeng Xia ; Mingli Ding ; Zhengming Ding

【Abstract】: Unsupervised domain adaptation facilitates the unlabeled target domain relying on well-established source domain information. The conventional methods forcefully reducing the domain discrepancy in the latent space will result in the destruction of intrinsic data structure. To balance the mitigation of domain gap and the preservation of the inherent structure, we propose a Bi-Directional Generation domain adaptation model with consistent classifiers interpolating two intermediate domains to bridge source and target domains. Specifically, two cross-domain generators are employed to synthesize one domain conditioned on the other. The performance of our proposed method can be further enhanced by the consistent classifiers and the cross-domain alignment constraints. We also design two classifiers which are jointly optimized to maximize the consistency on target sample prediction. Extensive experiments verify that our proposed model outperforms the state-of-the-art on standard cross domain visual benchmarks.

【Keywords】:

811. Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks.

Paper Link】 【Pages】:6623-6630

【Authors】: Li Yang ; Zhezhi He ; Deliang Fan

【Abstract】: Deep convolutional neural network (DNN) has demonstrated phenomenal success and been widely used in many computer vision tasks. However, its enormous model size and high computing complexity prohibits its wide deployment into resource limited embedded system, such as FPGA and mGPU. As the two most widely adopted model compression techniques, weight pruning and quantization compress DNN model through introducing weight sparsity (i.e., forcing partial weights as zeros) and quantizing weights into limited bit-width values, respectively. Although there are works attempting to combine the weight pruning and quantization, we still observe disharmony between weight pruning and quantization, especially when more aggressive compression schemes (e.g., Structured pruning and low bit-width quantization) are used. In this work, taking FPGA as the test computing platform and Processing Elements (PE) as the basic parallel computing unit, we first propose a PE-wise structured pruning scheme, which introduces weight sparsification with considering of the architecture of PE. In addition, we integrate it with an optimized weight ternarization approach which quantizes weights into ternary values ({-1,0,+1}), thus converting the dominant convolution operations in DNN from multiplication-and-accumulation (MAC) to addition-only, as well as compressing the original model (from 32-bit floating point to 2-bit ternary representation) by at least 16 times. Then, we investigate and solve the coexistence issue between PE-wise Structured pruning and ternarization, through proposing a Weight Penalty Clipping (WPC) technique with self-adapting threshold. Our experiment shows that the fusion of our proposed techniques can achieve the best state-of-the-art ∼21× PE-wise structured compression rate with merely 1.74%/0.94% (top-1/top-5) accuracy degradation of ResNet-18 on ImageNet dataset.

【Keywords】:

812. Distributed Primal-Dual Optimization for Online Multi-Task Learning.

Paper Link】 【Pages】:6631-6638

【Authors】: Peng Yang ; Ping Li

【Abstract】: Conventional online multi-task learning algorithms suffer from two critical limitations: 1) Heavy communication caused by delivering high velocity of sequential data to a central machine; 2) Expensive runtime complexity for building task relatedness. To address these issues, in this paper we consider a setting where multiple tasks are geographically located in different places, where one task can synchronize data with others to leverage knowledge of related tasks. Specifically, we propose an adaptive primal-dual algorithm, which not only captures task-specific noise in adversarial learning but also carries out a projection-free update with runtime efficiency. Moreover, our model is well-suited to decentralized periodic-connected tasks as it allows the energy-starved or bandwidth-constraint tasks to postpone the update. Theoretical results demonstrate the convergence guarantee of our distributed algorithm with an optimal regret. Empirical results confirm that the proposed model is highly effective on various real-world datasets.

【Keywords】:

813. ML-LOO: Detecting Adversarial Examples with Feature Attribution.

Paper Link】 【Pages】:6639-6647

【Authors】: Puyudi Yang ; Jianbo Chen ; Cho-Jui Hsieh ; Jane-Ling Wang ; Michael I. Jordan

【Abstract】: Deep neural networks obtain state-of-the-art performance on a series of tasks. However, they are easily fooled by adding a small adversarial perturbation to the input. The perturbation is often imperceptible to humans on image data. We observe a significant difference in feature attributions between adversarially crafted examples and original examples. Based on this observation, we introduce a new framework to detect adversarial examples through thresholding a scale estimate of feature attribution scores. Furthermore, we extend our method to include multi-layer feature attributions in order to tackle attacks that have mixed confidence levels. As demonstrated in extensive experiments, our method achieves superior performances in distinguishing adversarial examples from popular attack methods on a variety of real data sets compared to state-of-the-art detection methods. In particular, our method is able to detect adversarial examples of mixed confidence levels, and transfer between different attacking methods. We also show that our method achieves competitive performance even when the attacker has complete access to the detector.

【Keywords】:

814. Dynamical System Inspired Adaptive Time Stepping Controller for Residual Network Families.

Paper Link】 【Pages】:6648-6655

【Authors】: Yibo Yang ; Jianlong Wu ; Hongyang Li ; Xia Li ; Tiancheng Shen ; Zhouchen Lin

【Abstract】: The correspondence between residual networks and dynamical systems motivates researchers to unravel the physics of ResNets with well-developed tools in numeral methods of ODE systems. The Runge-Kutta-Fehlberg method is an adaptive time stepping that renders a good trade-off between the stability and efficiency. Can we also have an adaptive time stepping for ResNets to ensure both stability and performance? In this study, we analyze the effects of time stepping on the Euler method and ResNets. We establish a stability condition for ResNets with step sizes and weight parameters, and point out the effects of step sizes on the stability and performance. Inspired by our analyses, we develop an adaptive time stepping controller that is dependent on the parameters of the current step, and aware of previous steps. The controller is jointly optimized with the network training so that variable step sizes and evolution time can be adaptively adjusted. We conduct experiments on ImageNet and CIFAR to demonstrate the effectiveness. It is shown that our proposed method is able to improve both stability and accuracy without introducing additional overhead in inference phase.

【Keywords】:

815. Graph Few-Shot Learning via Knowledge Transfer.

Paper Link】 【Pages】:6656-6663

【Authors】: Huaxiu Yao ; Chuxu Zhang ; Ying Wei ; Meng Jiang ; Suhang Wang ; Junzhou Huang ; Nitesh V. Chawla ; Zhenhui Li

【Abstract】: Towards the challenging problem of semi-supervised node classification, there have been extensive studies. As a frontier, Graph Neural Networks (GNNs) have aroused great interest recently, which update the representation of each node by aggregating information of its neighbors. However, most GNNs have shallow layers with a limited receptive field and may not achieve satisfactory performance especially when the number of labeled nodes is quite small. To address this challenge, we innovatively propose a graph few-shot learning (GFL) algorithm that incorporates prior knowledge learned from auxiliary graphs to improve classification accuracy on the target graph. Specifically, a transferable metric space characterized by a node embedding and a graph-specific prototype embedding function is shared between auxiliary graphs and the target, facilitating the transfer of structural knowledge. Extensive experiments and ablation studies on four real-world graph datasets demonstrate the effectiveness of our proposed model and the contribution of each component.

【Keywords】:

Paper Link】 【Pages】:6664-6671

【Authors】: Quanming Yao ; Ju Xu ; Wei-Wei Tu ; Zhanxing Zhu

【Abstract】: Neural architecture search (NAS) attracts much research attention because of its ability to identify better architectures than handcrafted ones. Recently, differentiable search methods become the state-of-the-arts on NAS, which can obtain high-performance architectures in several days. However, they still suffer from huge computation costs and inferior performance due to the construction of the supernet. In this paper, we propose an efficient NAS method based on proximal iterations (denoted as NASP). Different from previous works, NASP reformulates the search process as an optimization problem with a discrete constraint on architectures and a regularizer on model complexity. As the new objective is hard to solve, we further propose an efficient algorithm inspired by proximal iterations for optimization. In this way, NASP is not only much faster than existing differentiable search methods, but also can find better architectures and balance the model complexity. Finally, extensive experiments on various tasks demonstrate that NASP can obtain high-performance architectures with more than 10 times speedup over the state-of-the-arts.

【Keywords】:

817. Mastering Complex Control in MOBA Games with Deep Reinforcement Learning.

Paper Link】 【Pages】:6672-6679

【Authors】: Deheng Ye ; Zhao Liu ; Mingfei Sun ; Bei Shi ; Peilin Zhao ; Hao Wu ; Hongsheng Yu ; Shaojie Yang ; Xipeng Wu ; Qingwei Guo ; Qiaobo Chen ; Yinyuting Yin ; Hao Zhang ; Tengfei Shi ; Liang Wang ; Qiang Fu ; Wei Yang ; Lanxiao Huang

【Abstract】: We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games. This problem involves far more complicated state and action spaces than those of traditional 1v1 games, such as Go and Atari series, which makes it very difficult to search any policies with human-level performance. In this paper, we present a deep reinforcement learning framework to tackle this problem from the perspectives of both system and algorithm. Our system is of low coupling and high scalability, which enables efficient explorations at large scale. Our algorithm includes several novel strategies, including control dependency decoupling, action mask, target attention, and dual-clip PPO, with which our proposed actor-critic network can be effectively trained in our system. Tested on the MOBA game Honor of Kings, the trained AI agents can defeat top professional human players in full 1v1 games.

【Keywords】:

818. A Novel Model for Imbalanced Data Classification.

Paper Link】 【Pages】:6680-6687

【Authors】: Jian Yin ; Chunjing Gan ; Kaiqi Zhao ; Xuan Lin ; Zhe Quan ; Zhi-Jie Wang

【Abstract】: Recently, imbalanced data classification has received much attention due to its wide applications. In the literature, existing researches have attempted to improve the classification performance by considering various factors such as the imbalanced distribution, cost-sensitive learning, data space improvement, and ensemble learning. Nevertheless, most of the existing methods focus on only part of these main aspects/factors. In this work, we propose a novel imbalanced data classification model that considers all these main aspects. To evaluate the performance of our proposed model, we have conducted experiments based on 14 public datasets. The results show that our model outperforms the state-of-the-art methods in terms of recall, G-mean, F-measure and AUC.

【Keywords】:

819. Shared Generative Latent Representation Learning for Multi-View Clustering.

Paper Link】 【Pages】:6688-6695

【Authors】: Ming Yin ; Weitian Huang ; Junbin Gao

【Abstract】: Clustering multi-view data has been a fundamental research topic in the computer vision community. It has been shown that a better accuracy can be achieved by integrating information of all the views than just using one view individually. However, the existing methods often struggle with the issues of dealing with the large-scale datasets and the poor performance in reconstructing samples. This paper proposes a novel multi-view clustering method by learning a shared generative latent representation that obeys a mixture of Gaussian distributions. The motivation is based on the fact that the multi-view data share a common latent embedding despite the diversity among the various views. Specifically, benefitting from the success of the deep generative learning, the proposed model can not only extract the nonlinear features from the views, but render a powerful ability in capturing the correlations among all the views. The extensive experimental results on several datasets with different scales demonstrate that the proposed method outperforms the state-of-the-art methods under a range of performance criteria.

【Keywords】:

820. Divide-and-Conquer Learning with Nyström: Optimal Rate and Algorithm.

Paper Link】 【Pages】:6696-6703

【Authors】: Rong Yin ; Yong Liu ; Lijing Lu ; Weiping Wang ; Dan Meng

【Abstract】: Kernel Regularized Least Squares (KRLS) is a fundamental learner in machine learning. However, due to the high time and space requirements, it has no capability to large scale scenarios. Therefore, we propose DC-NY, a novel algorithm that combines divide-and-conquer method, Nyström, conjugate gradient, and preconditioning to scale up KRLS, has the same accuracy of exact KRLS and the minimum time and space complexity compared to the state-of-the-art approximate KRLS estimates. We present a theoretical analysis of DC-NY, including a novel error decomposition with the optimal statistical accuracy guarantees. Extensive experimental results on several real-world large-scale datasets containing up to 1M data points show that DC-NY significantly outperforms the state-of-the-art approximate KRLS estimates.

【Keywords】:

821. Fragmentation Coagulation Based Mixed Membership Stochastic Blockmodel.

Paper Link】 【Pages】:6704-6711

【Authors】: Zheng Yu ; Xuhui Fan ; Marcin Pietrasik ; Marek Z. Reformat

【Abstract】: The Mixed-Membership Stochastic Blockmodel (MMSB) is proposed as one of the state-of-the-art Bayesian relational methods suitable for learning the complex hidden structure underlying the network data. However, the current formulation of MMSB suffers from the following two issues: (1), the prior information (e.g. entities' community structural information) can not be well embedded in the modelling; (2), community evolution can not be well described in the literature. Therefore, we propose a non-parametric fragmentation coagulation based Mixed Membership Stochastic Blockmodel (fcMMSB). Our model performs entity-based clustering to capture the community information for entities and linkage-based clustering to derive the group information for links simultaneously. Besides, the proposed model infers the network structure and models community evolution, manifested by appearances and disappearances of communities, using the discrete fragmentation coagulation process (DFCP). By integrating the community structure with the group compatibility matrix we derive a generalized version of MMSB. An efficient Gibbs sampling scheme with Polya Gamma (PG) approach is implemented for posterior inference. We validate our model on synthetic and real world data.

【Keywords】:

822. Trading-Off Static and Dynamic Regret in Online Least-Squares and Beyond.

Paper Link】 【Pages】:6712-6719

【Authors】: Jianjun Yuan ; Andrew G. Lamperski

【Abstract】: Recursive least-squares algorithms often use forgetting factors as a heuristic to adapt to non-stationary data streams. The first contribution of this paper rigorously characterizes the effect of forgetting factors for a class of online Newton algorithms. For exp-concave and strongly convex objectives, the algorithms achieve the dynamic regret of max{O(log T),O(√TV)}, where V is a bound on the path length of the comparison sequence. In particular, we show how classic recursive least-squares with a forgetting factor achieves this dynamic regret bound. By varying V, we obtain a trade-off between static and dynamic regret. In order to obtain more computationally efficient algorithms, our second contribution is a novel gradient descent step size rule for strongly convex functions. Our gradient descent rule recovers the order optimal dynamic regret bounds described above. For smooth problems, we can also obtain static regret of O(T1-β) and dynamic regret of O(Tβ V), where β ∈ (0,1) and V is the path length of the sequence of minimizers. By varying β, we obtain a trade-off between static and dynamic regret.

【Keywords】:

823. Apprenticeship Learning via Frank-Wolfe.

Paper Link】 【Pages】:6720-6728

【Authors】: Tom Zahavy ; Alon Cohen ; Haim Kaplan ; Yishay Mansour

【Abstract】: We consider the applications of the Frank-Wolfe (FW) algorithm for Apprenticeship Learning (AL). In this setting, we are given a Markov Decision Process (MDP) without an explicit reward function. Instead, we observe an expert that acts according to some policy, and the goal is to find a policy whose feature expectations are closest to those of the expert policy. We formulate this problem as finding the projection of the feature expectations of the expert on the feature expectations polytope – the convex hull of the feature expectations of all the deterministic policies in the MDP. We show that this formulation is equivalent to the AL objective and that solving this problem using the FW algorithm is equivalent well-known Projection method of Abbeel and Ng (2004). This insight allows us to analyze AL with tools from convex optimization literature and derive tighter convergence bounds on AL. Specifically, we show that a variation of the FW method that is based on taking “away steps” achieves a linear rate of convergence when applied to AL and that a stochastic version of the FW algorithm can be used to avoid precise estimation of feature expectations. We also experimentally show that this version outperforms the FW baseline. To the best of our knowledge, this is the first work that shows linear convergence rates for AL.

【Keywords】:

824. Fast Nonparametric Estimation of Class Proportions in the Positive-Unlabeled Classification Setting.

Paper Link】 【Pages】:6729-6736

【Authors】: Daniel Zeiberg ; Shantanu Jain ; Predrag Radivojac

【Abstract】: Estimating class proportions has emerged as an important direction in positive-unlabeled learning. Well-estimated class priors are key to accurate approximation of posterior distributions and are necessary for the recovery of true classification performance. While significant progress has been made in the past decade, there remains a need for accurate strategies that scale to big data. Motivated by this need, we propose an intuitive and fast nonparametric algorithm to estimate class proportions. Unlike any of the previous methods, our algorithm uses a sampling strategy to repeatedly (1) draw an example from the set of positives, (2) record the minimum distance to any of the unlabeled examples, and (3) remove the nearest unlabeled example. We show that the point of sharp increase in the recorded distances corresponds to the desired proportion of positives in the unlabeled set and train a deep neural network to identify that point. Our distance-based algorithm is evaluated on forty datasets and compared to all currently available methods. We provide evidence that this new approach results in the most accurate performance and can be readily used on large datasets.

【Keywords】:

825. Topic Modeling on Document Networks with Adjacent-Encoder.

Paper Link】 【Pages】:6737-6745

【Authors】: Ce Zhang ; Hady W. Lauw

【Abstract】: Oftentimes documents are linked to one another in a network structure,e.g., academic papers cite other papers, Web pages link to other pages. In this paper we propose a holistic topic model to learn meaningful and unified low-dimensional representations for networked documents that seek to preserve both textual content and network structure. On the basis of reconstructing not only the input document but also its adjacent neighbors, we develop two neural encoder architectures. Adjacent-Encoder, or AdjEnc, induces competition among documents for topic propagation, and reconstruction among neighbors for semantic capture. Adjacent-Encoder-X, or AdjEnc-X, extends this to also encode the network structure in addition to document content. We evaluate our models on real-world document networks quantitatively and qualitatively, outperforming comparable baselines comprehensively.

【Keywords】:

826. Aggregated Gradient Langevin Dynamics.

Paper Link】 【Pages】:6746-6753

【Authors】: Chao Zhang ; Jiahao Xie ; Zebang Shen ; Peilin Zhao ; Tengfei Zhou ; Hui Qian

【Abstract】: In this paper, we explore a general Aggregated Gradient Langevin Dynamics framework (AGLD) for the Markov Chain Monte Carlo (MCMC) sampling. We investigate the nonasymptotic convergence of AGLD with a unified analysis for different data accessing (e.g. random access, cyclic access and random reshuffle) and snapshot updating strategies, under convex and nonconvex settings respectively. It is the first time that bounds for I/O friendly strategies such as cyclic access and random reshuffle have been established in the MCMC literature. The theoretic results also indicate that methods in AGLD possess the merits of both the low per-iteration computational complexity and the short mixture time. Empirical studies demonstrate that our framework allows to derive novel schemes to generate high-quality samples for large-scale Bayesian posterior learning tasks.

【Keywords】:

827. CD-UAP: Class Discriminative Universal Adversarial Perturbation.

Paper Link】 【Pages】:6754-6761

【Authors】: Chaoning Zhang ; Philipp Benz ; Tooba Imtiaz ; In-So Kweon

【Abstract】: A single universal adversarial perturbation (UAP) can be added to all natural images to change most of their predicted class labels. It is of high practical relevance for an attacker to have flexible control over the targeted classes to be attacked, however, the existing UAP method attacks samples from all classes. In this work, we propose a new universal attack method to generate a single perturbation that fools a target network to misclassify only a chosen group of classes, while having limited influence on the remaining classes. Since the proposed attack generates a universal adversarial perturbation that is discriminative to targeted and non-targeted classes, we term it class discriminative universal adversarial perturbation (CD-UAP). We propose one simple yet effective algorithm framework, under which we design and compare various loss function configurations tailored for the class discriminative universal attack. The proposed approach has been evaluated with extensive experiments on various benchmark datasets. Additionally, our proposed approach achieves state-of-the-art performance for the original task of UAP attacking all classes, which demonstrates the effectiveness of our approach.

【Keywords】:

828. Learning from Positive and Unlabeled Data without Explicit Estimation of Class Prior.

Paper Link】 【Pages】:6762-6769

【Authors】: Chenguang Zhang ; Yuexian Hou ; Yan Zhang

【Abstract】: Learning a classifier from positive and unlabeled data may occur in various applications. It differs from the standard classification problems by the absence of labeled negative examples in the training set. So far, two main strategies have typically been used for this issue: the likely negative examplesbased strategy and the class prior-based strategy, in which the likely negative examples or the class prior is required to be obtained in a preprocessing step. In this paper, a new strategy based on the Bhattacharyya coefficient is put forward, which formalizes this learning problem as an optimization problem and does not need a preprocessing step. We first show that with the given positive class conditional probability density function (PDF) and the mixture PDF of both the positive class and the negative class, the class prior can be estimated by minimizing the Bhattacharyya coefficient of the positive class with respect to the negative class. We then show how to use this result in an implicit mixture model of restricted Boltzmann machines to estimate the positive class conditional PDF and the negative class conditional PDF directly to obtain a classifier without the explicit estimation of the class prior. Many experiments on real and synthetic datasets illustrated the superiority of the proposed approach.

【Keywords】:

Paper Link】 【Pages】:6770-6777

【Authors】: Chuheng Zhang ; Yuanqi Li ; Jian Li

【Abstract】: It is known that existing policy gradient methods (such as vanilla policy gradient, PPO, A2C) may suffer from overly large gradients when the current policy is close to deterministic, leading to an unstable training process. We show that such instability can happen even in a very simple environment. To address this issue, we propose a new method, called target distribution learning (TDL), for policy improvement in reinforcement learning. TDL alternates between proposing a target distribution and training the policy network to approach the target distribution. TDL is more effective in constraining the KL divergence between updated policies, and hence leads to more stable policy improvements over iterations. Our experiments show that TDL algorithms perform comparably to (or better than) state-of-the-art algorithms for most continuous control tasks in the MuJoCo environment while being more stable in training.

【Keywords】:

830. Universal Value Iteration Networks: When Spatially-Invariant Is Not Universal.

Paper Link】 【Pages】:6778-6785

【Authors】: Li Zhang ; Xin Li ; Sen Chen ; Hongyu Zang ; Jie Huang ; Mingzhong Wang

【Abstract】: In this paper, we first formally define the problem set of spatially invariant Markov Decision Processes (MDPs), and show that Value Iteration Networks (VIN) and its extensions are computationally bounded to it due to the use of the convolution kernel. To generalize VIN to spatially variant MDPs, we propose Universal Value Iteration Networks (UVIN). In comparison with VIN, UVIN automatically learns a flexible but compact network structure to encode the transition dynamics of the problems and support the differentiable planning module. We evaluate UVIN with both spatially invariant and spatially variant tasks, including navigation in regular maze, chessboard maze, and Mars, and Minecraft item syntheses. Results show that UVIN can achieve similar performance as VIN and its extensions on spatially invariant tasks, and significantly outperforms other models on more general problems.

【Keywords】:

831. Systematically Exploring Associations among Multivariate Data.

Paper Link】 【Pages】:6786-6794

【Authors】: Lifeng Zhang

【Abstract】: Detecting relationships among multivariate data is often of great importance in the analysis of high-dimensional data sets, and has received growing attention for decades from both academic and industrial fields. In this study, we propose a statistical tool named the neighbor correlation coefficient (nCor), which is based on a new idea that measures the local continuity of the reordered data points to quantify the strength of the global association between variables. With sufficient sample size, the new method is able to capture a wide range of functional relationship, whether it is linear or nonlinear, bivariate or multivariate, main effect or interaction. The score of nCor roughly approximates the coefficient of determination (R2) of the data which implies the proportion of variance in one variable that is predictable from one or more other variables. On this basis, three nCor based statistics are also proposed here to further characterize the intra and inter structures of the associations from the aspects of nonlinearity, interaction effect, and variable redundancy. The mechanisms of these measures are proved in theory and demonstrated with numerical analyses.

【Keywords】:

832. High Performance Depthwise and Pointwise Convolutions on Mobile Devices.

Paper Link】 【Pages】:6795-6802

【Authors】: Pengfei Zhang ; Eric Lo ; Baotong Lu

【Abstract】: Lightweight convolutional neural networks (e.g., MobileNets) are specifically designed to carry out inference directly on mobile devices. Among the various lightweight models, depthwise convolution (DWConv) and pointwise convolution (PWConv) are their key operations. In this paper, we observe that the existing implementations of DWConv and PWConv are not well utilizing the ARM processors in the mobile devices, and exhibit lots of cache misses under multi-core and poor data reuse at register level. We propose techniques to re-optimize the implementations of DWConv and PWConv based on ARM architecture. Experimental results show that our implementation can respectively achieve a speedup of up to 5.5× and 2.1× against TVM (Chen et al. 2018) on DWConv and PWConv.

【Keywords】:

833. Variational Inference for Sparse Gaussian Process Modulated Hawkes Process.

Paper Link】 【Pages】:6803-6810

【Authors】: Rui Zhang ; Christian J. Walder ; Marian-Andrei Rizoiu

【Abstract】: The Hawkes process (HP) has been widely applied to modeling self-exciting events including neuron spikes, earthquakes and tweets. To avoid designing parametric triggering kernel and to be able to quantify the prediction confidence, the non-parametric Bayesian HP has been proposed. However, the inference of such models suffers from unscalability or slow convergence. In this paper, we aim to solve both problems. Specifically, first, we propose a new non-parametric Bayesian HP in which the triggering kernel is modeled as a squared sparse Gaussian process. Then, we propose a novel variational inference schema for model optimization. We employ the branching structure of the HP so that maximization of evidence lower bound (ELBO) is tractable by the expectation-maximization algorithm. We propose a tighter ELBO which improves the fitting performance. Further, we accelerate the novel variational inference schema to linear time complexity by leveraging the stationarity of the triggering kernel. Different from prior acceleration methods, ours enjoys higher efficiency. Finally, we exploit synthetic data and two large social media datasets to evaluate our method. We show that our approach outperforms state-of-the-art non-parametric frequentist and Bayesian methods. We validate the efficiency of our accelerated variational inference schema and practical utility of our tighter ELBO for model selection. We observe that the tighter ELBO exceeds the common one in model selection.

【Keywords】:

834. Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset.

Paper Link】 【Pages】:6811-6820

【Authors】: Ruohan Zhang ; Calen Walshe ; Zhuode Liu ; Lin Guan ; Karl S. Muller ; Jake A. Whritner ; Luxin Zhang ; Mary M. Hayhoe ; Dana H. Ballard

【Abstract】: Large-scale public datasets have been shown to benefit research in multiple areas of modern artificial intelligence. For decision-making research that requires human data, high-quality datasets serve as important benchmarks to facilitate the development of new methods by providing a common reproducible standard. Many human decision-making tasks require visual attention to obtain high levels of performance. Therefore, measuring eye movements can provide a rich source of information about the strategies that humans use to solve decision-making tasks. Here, we provide a large-scale, high-quality dataset of human actions with simultaneously recorded eye movements while humans play Atari video games. The dataset consists of 117 hours of gameplay data from a diverse set of 20 games, with 8 million action demonstrations and 328 million gaze samples. We introduce a novel form of gameplay, in which the human plays in a semi-frame-by-frame manner. This leads to near-optimal game decisions and game scores that are comparable or better than known human records. We demonstrate the usefulness of the dataset through two simple applications: predicting human gaze and imitating human demonstrated actions. The quality of the data leads to promising results in both tasks. Moreover, using a learned human gaze model to inform imitation learning leads to an 115% increase in game performance. We interpret these results as highlighting the importance of incorporating human visual attention in models of decision making and demonstrating the value of the current dataset to the research community. We hope that the scale and quality of this dataset can provide more opportunities to researchers in the areas of visual attention, imitation learning, and reinforcement learning.

【Keywords】:

835. Optimal Margin Distribution Learning in Dynamic Environments.

Paper Link】 【Pages】:6821-6828

【Authors】: Teng Zhang ; Peng Zhao ; Hai Jin

【Abstract】: Recently a promising research direction of statistical learning has been advocated, i.e., the optimal margin distribution learning with the central idea that instead of the minimal margin, the margin distribution is more crucial to the generalization performance. Although the superiority of this new learning paradigm has been verified under batch learning settings, it remains open for online learning settings, in particular, the dynamic environments in which the underlying decision function varies over time. In this paper, we propose the dynamic optimal margin distribution machine and theoretically analyze its regret. Although the obtained bound has the same order with the best known one, our method can significantly relax the restrictive assumption that the function variation should be given ahead of time, resulting in better applicability in practical scenarios. We also derive an excess risk bound for the special case when the underlying decision function only evolves several discrete changes rather than varying continuously. Extensive experiments on both synthetic and real data sets demonstrate the superiority of our method.

【Keywords】:

836. AutoShrink: A Topology-Aware NAS for Discovering Efficient Neural Architecture.

Paper Link】 【Pages】:6829-6836

【Authors】: Tunhou Zhang ; Hsin-Pai Cheng ; Zhenwen Li ; Feng Yan ; Chengyu Huang ; Hai Helen Li ; Yiran Chen

【Abstract】: Resource is an important constraint when deploying Deep Neural Networks (DNNs) on mobile and edge devices. Existing works commonly adopt the cell-based search approach, which limits the flexibility of network patterns in learned cell structures. Moreover, due to the topology-agnostic nature of existing works, including both cell-based and node-based approaches, the search process is time consuming and the performance of found architecture may be sub-optimal. To address these problems, we propose AutoShrink, a topology-aware Neural Architecture Search (NAS) for searching efficient building blocks of neural architectures. Our method is node-based and thus can learn flexible network patterns in cell structures within a topological search space. Directed Acyclic Graphs (DAGs) are used to abstract DNN architectures and progressively optimize the cell structure through edge shrinking. As the search space intrinsically reduces as the edges are progressively shrunk, AutoShrink explores more flexible search space with even less search time. We evaluate AutoShrink on image classification and language tasks by crafting ShrinkCNN and ShrinkRNN models. ShrinkCNN is able to achieve up to 48% parameter reduction and save 34% Multiply-Accumulates (MACs) on ImageNet-1K with comparable accuracy of state-of-the-art (SOTA) models. Specifically, both ShrinkCNN and ShrinkRNN are crafted within 1.5 GPU hours, which is 7.2× and 6.7× faster than the crafting time of SOTA CNN and RNN models, respectively.

【Keywords】:

837. Adaptive Double-Exploration Tradeoff for Outlier Detection.

Paper Link】 【Pages】:6837-6844

【Authors】: Xiaojin Zhang ; Honglei Zhuang ; Shengyu Zhang ; Yuan Zhou

【Abstract】: We study a variant of the thresholding bandit problem (TBP) in the context of outlier detection, where the objective is to identify the outliers whose rewards are above a threshold. Distinct from the traditional TBP, the threshold is defined as a function of the rewards of all the arms, which is motivated by the criterion for identifying outliers. The learner needs to explore the rewards of the arms as well as the threshold. We refer to this problem as "double exploration for outlier detection". We construct an adaptively updated confidence interval for the threshold, based on the estimated value of the threshold in the previous rounds. Furthermore, by automatically trading off exploring the individual arms and exploring the outlier threshold, we provide an efficient algorithm in terms of the sample complexity. Experimental results on both synthetic datasets and real-world datasets demonstrate the efficiency of our algorithm.

【Keywords】:

838. TapNet: Multivariate Time Series Classification with Attentional Prototypical Network.

Paper Link】 【Pages】:6845-6852

【Authors】: Xuchao Zhang ; Yifeng Gao ; Jessica Lin ; Chang-Tien Lu

【Abstract】: With the advance of sensor technologies, the Multivariate Time Series classification (MTSC) problem, perhaps one of the most essential problems in the time series data mining domain, has continuously received a significant amount of attention in recent decades. Traditional time series classification approaches based on Bag-of-Patterns or Time Series Shapelet have difficulty dealing with the huge amounts of feature candidates generated in high-dimensional multivariate data but have promising performance even when the training set is small. In contrast, deep learning based methods can learn low-dimensional features efficiently but suffer from a shortage of labelled data. In this paper, we propose a novel MTSC model with an attentional prototype network to take the strengths of both traditional and deep learning based approaches. Specifically, we design a random group permutation method combined with multi-layer convolutional networks to learn the low-dimensional features from multivariate time series data. To handle the issue of limited training labels, we propose a novel attentional prototype network to train the feature representation based on their distance to class prototypes with inadequate data labels. In addition, we extend our model into its semi-supervised setting by utilizing the unlabeled data. Extensive experiments on 18 datasets in a public UEA Multivariate time series archive with eight state-of-the-art baseline methods exhibit the effectiveness of the proposed model.

【Keywords】:

839. Self-Paced Robust Learning for Leveraging Clean Labels in Noisy Data.

Paper Link】 【Pages】:6853-6860

【Authors】: Xuchao Zhang ; Xian Wu ; Fanglan Chen ; Liang Zhao ; Chang-Tien Lu

【Abstract】: The success of training accurate models strongly depends on the availability of a sufficient collection of precisely labeled data. However, real-world datasets contain erroneously labeled data samples that substantially hinder the performance of machine learning models. Meanwhile, well-labeled data is usually expensive to obtain and only a limited amount is available for training. In this paper, we consider the problem of training a robust model by using large-scale noisy data in conjunction with a small set of clean data. To leverage the information contained via the clean labels, we propose a novel self-paced robust learning algorithm (SPRL) that trains the model in a process from more reliable (clean) data instances to less reliable (noisy) ones under the supervision of well-labeled data. The self-paced learning process hedges the risk of selecting corrupted data into the training set. Moreover, theoretical analyses on the convergence of the proposed algorithm are provided under mild assumptions. Extensive experiments on synthetic and real-world datasets demonstrate that our proposed approach can achieve a considerable improvement in effectiveness and robustness to existing methods.

【Keywords】:

840. Local Regularizer Improves Generalization.

Paper Link】 【Pages】:6861-6868

【Authors】: Yikai Zhang ; Hui Qu ; Dimitris N. Metaxas ; Chao Chen

【Abstract】: Regularization plays an important role in generalization of deep learning. In this paper, we study the generalization power of an unbiased regularizor for training algorithms in deep learning. We focus on training methods called Locally Regularized Stochastic Gradient Descent (LRSGD). An LRSGD leverages a proximal type penalty in gradient descent steps to regularize SGD in training. We show that by carefully choosing relevant parameters, LRSGD generalizes better than SGD. Our thorough theoretical analysis is supported by experimental evidence. It advances our theoretical understanding of deep learning and provides new perspectives on designing training algorithms. The code is available at https://github.com/huiqu18/LRSGD.

【Keywords】:

841. An Ordinal Data Clustering Algorithm with Automated Distance Learning.

Paper Link】 【Pages】:6869-6876

【Authors】: Yiqun Zhang ; Yiu-ming Cheung

【Abstract】: Clustering ordinal data is a common task in data mining and machine learning fields. As a major type of categorical data, ordinal data is composed of attributes with naturally ordered possible values (also called categories interchangeably in this paper). However, due to the lack of dedicated distance metric, ordinal categories are usually treated as nominal ones, or coded as consecutive integers and treated as numerical ones. Both these two common ways will roughly define the distances between ordinal categories because the former way ignores the order relationship and the latter way simply assigns identical distances to different pairs of adjacent categories that may have intrinsically unequal distances. As a result, they may produce unsatisfactory ordinal data clustering results. This paper, therefore, proposes a novel ordinal data clustering algorithm, which iteratively learns: 1) The partition of ordinal dataset, and 2) the inter-category distances. To the best of our knowledge, this is the first attempt to dynamically adjust inter-category distances during the clustering process to search for a better partition of ordinal data. The proposed algorithm features superior clustering accuracy, low time complexity, fast convergence, and is parameter-free. Extensive experiments show its efficacy.

【Keywords】:

842. Joint Adversarial Learning for Domain Adaptation in Semantic Segmentation.

Paper Link】 【Pages】:6877-6884

【Authors】: Yixin Zhang ; Zilei Wang

【Abstract】: Unsupervised domain adaptation in semantic segmentation is to exploit the pixel-level annotated samples in the source domain to aid the segmentation of unlabeled samples in the target domain. For such a task, the key point is to learn domain-invariant representations and adversarial learning is usually used, in which the discriminator is to distinguish which domain the input comes from, and the segmentation model targets to deceive the domain discriminator. In this work, we first propose a novel joint adversarial learning (JAL) to boost the domain discriminator in output space by introducing the information of domain discriminator from low-level features. Consequently, the training of the high-level decoder would be enhanced. Then we propose a weight transfer module (WTM) to alleviate the inherent bias of the trained decoder towards source domain. Specifically, WTM changes the original decoder into a new decoder, which is learned only under the supervision of adversarial loss and thus mainly focuses on reducing domain divergence. The extensive experiments on two widely used benchmarks show that our method can bring considerable performance improvement over different baseline methods, which well demonstrates the effectiveness of our method in the output space adaptation.

【Keywords】:

843. Hypergraph Label Propagation Network.

Paper Link】 【Pages】:6885-6892

【Authors】: Yubo Zhang ; Nan Wang ; Yufeng Chen ; Changqing Zou ; Hai Wan ; Xibin Zhao ; Yue Gao

【Abstract】: In recent years, with the explosion of information on the Internet, there has been a large amount of data produced, and analyzing these data is useful and has been widely employed in real world applications. Since data labeling is costly, lots of research has focused on how to efficiently label data through semi-supervised learning. Among the methods, graph and hypergraph based label propagation algorithms have been a widely used method. However, traditional hypergraph learning methods may suffer from their high computational cost. In this paper, we propose a Hypergraph Label Propagation Network (HLPN) which combines hypergraph-based label propagation and deep neural networks in order to optimize the feature embedding for optimal hypergraph learning through an end-to-end architecture. The proposed method is more effective and also efficient for data labeling compared with traditional hypergraph learning methods. We verify the effectiveness of our proposed HLPN method on a real-world microblog dataset gathered from Sina Weibo. Experiments demonstrate that the proposed method can significantly outperform the state-of-the-art methods and alternative approaches.

【Keywords】:

844. Online Second Price Auction with Semi-Bandit Feedback under the Non-Stationary Setting.

Paper Link】 【Pages】:6893-6900

【Authors】: Haoyu Zhao ; Wei Chen

【Abstract】: In this paper, we study the non-stationary online second price auction problem. We assume that the seller is selling the same type of items in T rounds by the second price auction, and she can set the reserve price in each round. In each round, the bidders draw their private values from a joint distribution unknown to the seller. Then, the seller announced the reserve price in this round. Next, bidders with private values higher than the announced reserve price in that round will report their values to the seller as their bids. The bidder with the highest bid larger than the reserved price would win the item and she will pay to the seller the price equal to the second-highest bid or the reserve price, whichever is larger. The seller wants to maximize her total revenue during the time horizon T while learning the distribution of private values over time. The problem is more challenging than the standard online learning scenario since the private value distribution is non-stationary, meaning that the distribution of bidders' private values may change over time, and we need to use the non-stationary regret to measure the performance of our algorithm. To our knowledge, this paper is the first to study the repeated auction in the non-stationary setting theoretically. Our algorithm achieves the non-stationary regret upper bound Õ(min{√S T, V¯⅓T⅔), where S is the number of switches in the distribution, and V¯ is the sum of total variation, and S and V¯ are not needed to be known by the algorithm. We also prove regret lower bounds Ω(√S T) in the switching case and Ω(V¯⅓T⅔) in the dynamic case, showing that our algorithm has nearly optimal non-stationary regret.

【Keywords】:

845. Bridging Maximum Likelihood and Adversarial Learning via α-Divergence.

Paper Link】 【Pages】:6901-6908

【Authors】: Miaoyun Zhao ; Yulai Cong ; Shuyang Dai ; Lawrence Carin

【Abstract】: Maximum likelihood (ML) and adversarial learning are two popular approaches for training generative models, and from many perspectives these techniques are complementary. ML learning encourages the capture of all data modes, and it is typically characterized by stable training. However, ML learning tends to distribute probability mass diffusely over the data space, e.g., yielding blurry synthetic images. Adversarial learning is well known to synthesize highly realistic natural images, despite practical challenges like mode dropping and delicate training. We propose an α-Bridge to unify the advantages of ML and adversarial learning, enabling the smooth transfer from one to the other via the α-divergence. We reveal that generalizations of the α-Bridge are closely related to approaches developed recently to regularize adversarial learning, providing insights into that prior work, and further understanding of why the α-Bridge performs well in practice.

【Keywords】:

846. Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent.

Paper Link】 【Pages】:6909-6916

【Authors】: Pu Zhao ; Pin-Yu Chen ; Siyue Wang ; Xue Lin

【Abstract】: Despite the great achievements of the modern deep neural networks (DNNs), the vulnerability/robustness of state-of-the-art DNNs raises security concerns in many application domains requiring high reliability. Various adversarial attacks are proposed to sabotage the learning performance of DNN models. Among those, the black-box adversarial attack methods have received special attentions owing to their practicality and simplicity. Black-box attacks usually prefer less queries in order to maintain stealthy and low costs. However, most of the current black-box attack methods adopt the first-order gradient descent method, which may come with certain deficiencies such as relatively slow convergence and high sensitivity to hyper-parameter settings. In this paper, we propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks, which incorporates the zeroth-order gradient estimation technique catering to the black-box attack scenario and the second-order natural gradient descent to achieve higher query efficiency. The empirical evaluations on image classification datasets demonstrate that ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.

【Keywords】:

847. Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers.

Paper Link】 【Pages】:6917-6924

【Authors】: Ya Zhao ; Rui Xu ; Xinchao Wang ; Peng Hou ; Haihong Tang ; Mingli Song

【Abstract】: Lip reading has witnessed unparalleled development in recent years thanks to deep learning and the availability of large-scale datasets. Despite the encouraging results achieved, the performance of lip reading, unfortunately, remains inferior to the one of its counterpart speech recognition, due to the ambiguous nature of its actuations that makes it challenging to extract discriminant features from the lip movement videos. In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers. The rationale behind our approach is that the features extracted from speech recognizers may provide complementary and discriminant clues, which are formidable to be obtained from the subtle movements of the lips, and consequently facilitate the training of lip readers. This is achieved, specifically, by distilling multi-granularity knowledge from speech recognizers to lip readers. To conduct this cross-modal knowledge distillation, we utilize an efficacious alignment scheme to handle the inconsistent lengths of the audios and videos, as well as an innovative filtering strategy to refine the speech recognizer's prediction. The proposed method achieves the new state-of-the-art performance on the CMLR and LRS2 datasets, outperforming the baseline by a margin of 7.66% and 2.75% in character error rate, respectively.

【Keywords】:

848. An Annotation Sparsification Strategy for 3D Medical Image Segmentation via Representative Selection and Self-Training.

Paper Link】 【Pages】:6925-6932

【Authors】: Hao Zheng ; Yizhe Zhang ; Lin Yang ; Chaoli Wang ; Danny Z. Chen

【Abstract】: Image segmentation is critical to lots of medical applications. While deep learning (DL) methods continue to improve performance for many medical image segmentation tasks, data annotation is a big bottleneck to DL-based segmentation because (1) DL models tend to need a large amount of labeled data to train, and (2) it is highly time-consuming and label-intensive to voxel-wise label 3D medical images. Significantly reducing annotation effort while attaining good performance of DL segmentation models remains a major challenge. In our preliminary experiments, we observe that, using partially labeled datasets, there is indeed a large performance gap with respect to using fully annotated training datasets. In this paper, we propose a new DL framework for reducing annotation effort and bridging the gap between full annotation and sparse annotation in 3D medical image segmentation. We achieve this by (i) selecting representative slices in 3D images that minimize data redundancy and save annotation effort, and (ii) self-training with pseudo-labels automatically generated from the base-models trained using the selected annotated slices. Extensive experiments using two public datasets (the HVSMR 2016 Challenge dataset and mouse piriform cortex dataset) show that our framework yields competitive segmentation results comparing with state-of-the-art DL methods using less than ∼20% of annotated data.

【Keywords】:

849. A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits.

Paper Link】 【Pages】:6933-6940

【Authors】: Huozhi Zhou ; Lingda Wang ; Lav R. Varshney ; Ee-Peng Lim

【Abstract】: We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, GLR-CUCB, which incorporates an efficient combinatorial semi-bandit algorithm, CUCB, with an almost parameter-free change-point detector, the Generalized Likelihood Ratio Test (GLRT). Our analysis shows that the regret of GLR-CUCB is upper bounded by O(√NKT log T), where N is the number of piecewise-stationary segments, K is the number of base arms, and T is the number of time steps. As a complement, we also derive a nearly matching regret lower bound on the order of Ω(√NKT), for both piecewise-stationary multi-armed bandits and combinatorial semi-bandits, using information-theoretic techniques and judiciously constructed piecewise-stationary bandit instances. Our lower bound is tighter than the best available regret lower bound, which is Ω(√T). Numerical experiments on both synthetic and real-world datasets demonstrate the superiority of GLR-CUCB compared to other state-of-the-art algorithms.

【Keywords】:

850. Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization.

Paper Link】 【Pages】:6941-6948

【Authors】: Qi Zhou ; Houqiang Li ; Jie Wang

【Abstract】: Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance as model-free methods. In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)—a novel model-based approach—that can effectively improve the asymptotic performance using the uncertainty in Q-values. We derive an upper bound of the uncertainty, based on which we can approximate the uncertainty accurately and efficiently for model-based methods. We further propose an uncertainty-aware policy optimization algorithm that optimizes the policy conservatively to encourage performance improvement with high probability. This can significantly alleviate the overfitting of policy to inaccurate models. Experiments show POMBU can outperform existing state-of-the-art policy optimization algorithms in terms of sample efficiency and asymptotic performance. Moreover, the experiments demonstrate the excellent robustness of POMBU compared to previous model-based approaches.

【Keywords】:

851. DGE: Deep Generative Network Embedding Based on Commonality and Individuality.

Paper Link】 【Pages】:6949-6956

【Authors】: Sheng Zhou ; Xin Wang ; Jiajun Bu ; Martin Ester ; Pinggang Yu ; Jiawei Chen ; Qihao Shi ; Can Wang

【Abstract】: Network embedding plays a crucial role in network analysis to provide effective representations for a variety of learning tasks. Existing attributed network embedding methods mainly focus on preserving the observed node attributes and network topology in the latent embedding space, with the assumption that nodes connected through edges will share similar attributes. However, our empirical analysis of real-world datasets shows that there exist both commonality and individuality between node attributes and network topology. On the one hand, similar nodes are expected to share similar attributes and have edges connecting them (commonality). On the other hand, each information source may maintain individual differences as well (individuality). Simultaneously capturing commonality and individuality is very challenging due to their exclusive nature and existing work fail to do so. In this paper, we propose a deep generative embedding (DGE) framework which simultaneously captures commonality and individuality between network topology and node attributes in a generative process. Stochastic gradient variational Bayesian (SGVB) optimization is employed to infer model parameters as well as the node embeddings. Extensive experiments on four real-world datasets show the superiority of our proposed DGE framework in various tasks including node classification and link prediction.

【Keywords】:

852. Side Information Dependence as a Regularizer for Analyzing Human Brain Conditions across Cognitive Experiments.

Paper Link】 【Pages】:6957-6964

【Authors】: Shuo Zhou ; Wenwen Li ; Christopher R. Cox ; Haiping Lu

【Abstract】: The increasing of public neuroimaging datasets opens a door to analyzing homogeneous human brain conditions across datasets by transfer learning (TL). However, neuroimaging data are high-dimensional, noisy, and with small sample sizes. It is challenging to learn a robust model for data across different cognitive experiments and subjects. A recent TL approach minimizes domain dependence to learn common cross-domain features, via the Hilbert-Schmidt Independence Criterion (HSIC). Inspired by this approach and the multi-source TL theory, we propose a Side Information Dependence Regularization (SIDeR) learning framework for TL in brain condition decoding. Specifically, SIDeR simultaneously minimizes the empirical risk and the statistical dependence on the domain side information, to reduce the theoretical generalization error bound. We construct 17 brain decoding TL tasks using public neuroimaging data for evaluation. Comprehensive experiments validate the superiority of SIDeR over ten competing methods, particularly an average improvement of 15.6% on the TL tasks with multi-source experiments.

【Keywords】:

853. Multi-View Spectral Clustering with Optimal Neighborhood Laplacian Matrix.

Paper Link】 【Pages】:6965-6972

【Authors】: Sihang Zhou ; Xinwang Liu ; Jiyuan Liu ; Xifeng Guo ; Yawei Zhao ; En Zhu ; Yongping Zhai ; Jianping Yin ; Wen Gao

【Abstract】: Multi-view spectral clustering aims to group data into different categories by optimally exploring complementary information from multiple Laplacian matrices. However, existing methods usually linearly combine a group of pre-specified first-order Laplacian matrices to construct an optimal Laplacian matrix, which may result in limited representation capability and insufficient information exploitation. In this paper, we propose a novel optimal neighborhood multi-view spectral clustering (ONMSC) algorithm to address these issues. Specifically, the proposed algorithm generates an optimal Laplacian matrix by searching the neighborhood of both the linear combination of the first-order and high-order base Laplacian matrices simultaneously. This design enhances the representative capacity of the optimal Laplacian and better utilizes the hidden high-order connection information, leading to improved clustering performance. An efficient algorithm with proved convergence is designed to solve the resultant optimization problem. Extensive experimental results on 9 datasets demonstrate the superiority of our algorithm against state-of-the-art methods, which verifies the effectiveness and advantages of the proposed ONMSC.

【Keywords】:

854. Posterior-Guided Neural Architecture Search.

Paper Link】 【Pages】:6973-6980

【Authors】: Yizhou Zhou ; Xiaoyan Sun ; Chong Luo ; Zheng-Jun Zha ; Wenjun Zeng

【Abstract】: The emergence of neural architecture search (NAS) has greatly advanced the research on network design. Recent proposals such as gradient-based methods or one-shot approaches significantly boost the efficiency of NAS. In this paper, we formulate the NAS problem from a Bayesian perspective. We propose explicitly estimating the joint posterior distribution over pairs of network architecture and weights. Accordingly, a hybrid network representation is presented which enables us to leverage the Variational Dropout so that the approximation of the posterior distribution becomes fully gradient-based and highly efficient. A posterior-guided sampling method is then presented to sample architecture candidates and directly make evaluations. As a Bayesian approach, our posterior-guided NAS (PGNAS) avoids tuning a number of hyper-parameters and enables a very effective architecture sampling in posterior probability space. Interestingly, it also leads to a deeper insight into the weight sharing used in the one-shot NAS and naturally alleviates the mismatch between the sampled architecture and weights caused by the weight sharing. We validate our PGNAS method on the fundamental image classification task. Results on Cifar-10, Cifar-100 and ImageNet show that PGNAS achieves a good trade-off between precision and speed of search among NAS methods. For example, it takes 11 GPU days to search a very competitive architecture with 1.98% and 14.28% test errors on Cifar10 and Cifar100, respectively.

【Keywords】:

855. Safe Sample Screening for Robust Support Vector Machine.

Paper Link】 【Pages】:6981-6988

【Authors】: Zhou Zhai ; Bin Gu ; Xiang Li ; Heng Huang

【Abstract】: Robust support vector machine (RSVM) has been shown to perform remarkably well to improve the generalization performance of support vector machine under the noisy environment. Unfortunately, in order to handle the non-convexity induced by ramp loss in RSVM, existing RSVM solvers often adopt the DC programming framework which is computationally inefficient for running multiple outer loops. This hinders the application of RSVM to large-scale problems. Safe sample screening that allows for the exclusion of training samples prior to or early in the training process is an effective method to greatly reduce computational time. However, existing safe sample screening algorithms are limited to convex optimization problems while RSVM is a non-convex problem. To address this challenge, in this paper, we propose two safe sample screening rules for RSVM based on the framework of concave-convex procedure (CCCP). Specifically, we provide screening rule for the inner solver of CCCP and another rule for propagating screened samples between two successive solvers of CCCP. To the best of our knowledge, this is the first work of safe sample screening to a non-convex optimization problem. More importantly, we provide the security guarantee to our sample screening rules to RSVM. Experimental results on a variety of benchmark datasets verify that our safe sample screening rules can significantly reduce the computational time.

【Keywords】:

856. Object-Oriented Dynamics Learning through Multi-Level Abstraction.

Paper Link】 【Pages】:6989-6998

【Authors】: Guangxiang Zhu ; Jianhao Wang ; Zhizhou Ren ; Zichuan Lin ; Chongjie Zhang

【Abstract】: Object-based approaches for learning action-conditioned dynamics has demonstrated promise for generalization and interpretability. However, existing approaches suffer from structural limitations and optimization difficulties for common environments with multiple dynamic objects. In this paper, we present a novel self-supervised learning framework, called Multi-level Abstraction Object-oriented Predictor (MAOP), which employs a three-level learning architecture that enables efficient object-based dynamics learning from raw visual observations. We also design a spatial-temporal relational reasoning mechanism for MAOP to support instance-level dynamics learning and handle partial observability. Our results show that MAOP significantly outperforms previous methods in terms of sample efficiency and generalization over novel environments for learning environment models. We also demonstrate that learned dynamics models enable efficient planning in unseen environments, comparable to true environment models. In addition, MAOP learns semantically and visually interpretable disentangled representations.

【Keywords】:

857. A Knowledge-Aware Attentional Reasoning Network for Recommendation.

Paper Link】 【Pages】:6999-7006

【Authors】: Qiannan Zhu ; Xiaofei Zhou ; Jia Wu ; Jianlong Tan ; Li Guo

【Abstract】: Knowledge-graph-aware recommendation systems have increasingly attracted attention in both industry and academic recently. Many existing knowledge-aware recommendation methods have achieved better performance, which usually perform recommendation by reasoning on the paths between users and items in knowledge graphs. However, they ignore the users' personal clicked history sequences that can better reflect users' preferences within a period of time for recommendation. In this paper, we propose a knowledge-aware attentional reasoning network KARN that incorporates the users' clicked history sequences and path connectivity between users and items for recommendation. The proposed KARN not only develops an attention-based RNN to capture the user's history interests from the user's clicked history sequences, but also a hierarchical attentional neural network to reason on paths between users and items for inferring the potential user intents on items. Based on both user's history interest and potential intent, KARN can predict the clicking probability of the user with respective to a candidate item. We conduct experiment on Amazon review dataset, and the experimental results demonstrate the superiority and effectiveness of our proposed KARN model.

【Keywords】:

858. GSSNN: Graph Smoothing Splines Neural Networks.

Paper Link】 【Pages】:7007-7014

【Authors】: Shichao Zhu ; Lewei Zhou ; Shirui Pan ; Chuan Zhou ; Guiying Yan ; Bin Wang

【Abstract】: Graph Neural Networks (GNNs) have achieved state-of-the-art performance in many graph data analysis tasks. However, they still suffer from two limitations for graph representation learning. First, they exploit non-smoothing node features which may result in suboptimal embedding and degenerated performance for graph classification. Second, they only exploit neighbor information but ignore global topological knowledge. Aiming to overcome these limitations simultaneously, in this paper, we propose a novel, flexible, and end-to-end framework, Graph Smoothing Splines Neural Networks (GSSNN), for graph classification. By exploiting the smoothing splines, which are widely used to learn smoothing fitting function in regression, we develop an effective feature smoothing and enhancement module Scaled Smoothing Splines (S3) to learn graph embedding. To integrate global topological information, we design a novel scoring module, which exploits closeness, degree, as well as self-attention values, to select important node features as knots for smoothing splines. These knots can be potentially used for interpreting classification results. In extensive experiments on biological and social datasets, we demonstrate that our model achieves state-of-the-arts and GSSNN is superior in learning more robust graph representations. Furthermore, we show that S3 module is easily plugged into existing GNNs to improve their performance.

【Keywords】:

859. Semi-Supervised Streaming Learning with Emerging New Labels.

Paper Link】 【Pages】:7015-7022

【Authors】: Yong-Nan Zhu ; Yu-Feng Li

【Abstract】: In many real-world applications, the modeling environment is usually dynamic and evolutionary, especially in a data stream where emerging new class often happens. Great efforts have been devoted to learning with novel concepts recently, which are typically in a supervised setting with completely supervised initialization. However, the data collected in the stream are often in a semi-supervised manner actually, which means only a few of them are labeled while the great majority miss ground-truth labels. Besides, new classes hidden in unlabeled instances bring more challenges for the learning task. In this paper, we tackle these issues by a new approach called SEEN which consists of three major components: an effective novel class detector based on clustering random trees, a robust classifier for predictions on the known classes, and an efficient updating process that ensures the whole framework adapts to the changing environment automatically. The classifier produces known labels via label propagation that utilizes all labeled and part unlabeled data in the past which naturally describe the entire stream seen so far. Empirical studies on several datasets validate that the algorithm can accurately classify points on a dynamic stream with a small number of labeled examples and emerging new classes.

【Keywords】:

860. Observe Before Play: Multi-Armed Bandit with Pre-Observations.

Paper Link】 【Pages】:7023-7030

【Authors】: Jinhang Zuo ; Xiaoxi Zhang ; Carlee Joe-Wong

【Abstract】: We consider the stochastic multi-armed bandit (MAB) problem in a setting where a player can pay to pre-observe arm rewards before playing an arm in each round. Apart from the usual trade-off between exploring new arms to find the best one and exploiting the arm believed to offer the highest reward, we encounter an additional dilemma: pre-observing more arms gives a higher chance to play the best one, but incurs a larger cost. For the single-player setting, we design an Observe-Before-Play Upper Confidence Bound (OBP-UCB) algorithm for K arms with Bernoulli rewards, and prove a T-round regret upper bound O(K2log T). In the multi-player setting, collisions will occur when players select the same arm to play in the same round. We design a centralized algorithm, C-MP-OBP, and prove its T-round regret relative to an offline greedy strategy is upper bounded in O(K4/M2log T) for K arms and M players. We also propose distributed versions of the C-MP-OBP policy, called D-MP-OBP and D-MP-Adapt-OBP, achieving logarithmic regret with respect to collision-free target policies. Experiments on synthetic data and wireless channel traces show that C-MP-OBP and D-MP-OBP outperform random heuristics and offline optimal policies that do not allow pre-observations.

【Keywords】:

AAAI Technical Track: Multiagent Systems 38

861. Subsidy Allocations in the Presence of Income Shocks.

Paper Link】 【Pages】:7032-7039

【Authors】: Rediet Abebe ; Jon M. Kleinberg ; S. Matthew Weinberg

【Abstract】: Poverty and economic hardship are understood to be highly complex and dynamic phenomena. Due to the multi-faceted nature of welfare, assistance programs targeted at alleviating hardship can face challenges, as they often rely on simpler welfare measurements, such as income or wealth, that fail to capture to full complexity of each family's state. Here, we explore one important dimension – susceptibility to income shocks. We introduce a model of welfare that incorporates income, wealth, and income shocks and analyze this model to show that it can vary, at times substantially, from measures of welfare that only use income or wealth. We then study the algorithmic problem of optimally allocating subsidies in the presence of income shocks. We consider two well-studied objectives: the first aims to minimize the expected number of agents that fall below a given welfare threshold (a min-sum objective) and the second aims to minimize the likelihood that the most vulnerable agent falls below this threshold (a min-max objective). We present optimal and near-optimal algorithms for various general settings. We close with a discussion on future directions on allocating societal resources and ethical implications of related approaches.

【Keywords】:

862. Parameterised Resource-Bounded ATL.

Paper Link】 【Pages】:7040-7046

【Authors】: Natasha Alechina ; Stéphane Demri ; Brian Logan

【Abstract】: It is often advantageous to be able to extract resource requirements in resource logics of strategic ability, rather than to verify whether a fixed resource requirement is sufficient for achieving a goal. We study Parameterised Resource-Bounded Alternating Time Temporal Logic where parameter extraction is possible. We give a parameter extraction algorithm and prove that the model-checking problem is 2EXPTIME-complete.

【Keywords】:

863. Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning.

Paper Link】 【Pages】:7047-7054

【Authors】: Nicolas Anastassacos ; Stephen Hailes ; Mirco Musolesi

【Abstract】: Social dilemmas have been widely studied to explain how humans are able to cooperate in society. Considerable effort has been invested in designing artificial agents for social dilemmas that incorporate explicit agent motivations that are chosen to favor coordinated or cooperative responses. The prevalence of this general approach points towards the importance of achieving an understanding of both an agent's internal design and external environment dynamics that facilitate cooperative behavior. In this paper, we investigate how partner selection can promote cooperative behavior between agents who are trained to maximize a purely selfish objective function. Our experiments reveal that agents trained with this dynamic learn a strategy that retaliates against defectors while promoting cooperation with other agents resulting in a prosocial society.

【Keywords】:

864. Incentive-Compatible Classification.

Paper Link】 【Pages】:7055-7062

【Authors】: Yakov Babichenko ; Oren Dean ; Moshe Tennenholtz

【Abstract】: We investigate the possibility of an incentive-compatible (IC, a.k.a. strategy-proof) mechanism for the classification of agents in a network according to their reviews of each other. In the α-classification problem we are interested in selecting the top α fraction of users. We give upper bounds (impossibilities) and lower bounds (mechanisms) on the worst-case coincidence between the classification of an IC mechanism and the ideal α-classification.We prove bounds which depend on α and on the maximal number of reviews given by a single agent, Δ. Our results show that it is harder to find a good mechanism when α is smaller and Δ is larger. In particular, if Δ is unbounded, then the best mechanism is trivial (that is, it does not take into account the reviews). On the other hand, when Δ is sublinear in the number of agents, we give a simple, natural mechanism, with a coincidence ratio of α.

【Keywords】:

865. Learning the Value of Teamwork to Form Efficient Teams.

Paper Link】 【Pages】:7063-7070

【Authors】: Ryan Beal ; Narayan Changder ; Timothy D. Norman ; Sarvapali D. Ramchurn

【Abstract】: In this paper we describe a novel approach to team formation based on the value of inter-agent interactions. Specifically, we propose a model of teamwork that considers outcomes from chains of interactions between agents. Based on our model, we devise a number of network metrics to capture the contribution of interactions between agents. This is then used to learn the value of teamwork from historical team performance data. We apply our model to predict team performance and validate our approach using real-world team performance data from the 2018 FIFA World Cup. Our model is shown to better predict the real-world performance of teams by up to 46% compared to models that ignore inter-agent interactions.

【Keywords】:

866. Model Checking Temporal Epistemic Logic under Bounded Recall.

Paper Link】 【Pages】:7071-7078

【Authors】: Francesco Belardinelli ; Alessio Lomuscio ; Emily Yu

【Abstract】: We study the problem of verifying multi-agent systems under the assumption of bounded recall. We introduce the logic CTLKBR, a bounded-recall variant of the temporal-epistemic logic CTLK. We define and study the model checking problem against CTLK specifications under incomplete information and bounded recall and present complexity upper bounds. We present an extension of the BDD-based model checker MCMAS implementing model checking under bounded recall semantics and discuss the experimental results obtained.

【Keywords】:

867. ODSS: Efficient Hybridization for Optimal Coalition Structure Generation.

Paper Link】 【Pages】:7079-7086

【Authors】: Narayan Changder ; Samir Aknine ; Sarvapali D. Ramchurn ; Animesh Dutta

【Abstract】: Coalition Structure Generation (CSG) is an NP-complete problem that remains difficult to solve on account of its complexity. In this paper, we propose an efficient hybrid algorithm for optimal coalition structure generation called ODSS. ODSS is a hybrid version of two previously established algorithms IDP (Rahwan and Jennings 2008) and IP (Rahwan et al. 2009). ODSS minimizes the overlapping between IDP and IP by dividing the whole search space of CSG into two disjoint sets of subspaces and proposes a novel subspace shrinking technique to reduce the size of the subspace searched by IP with the help of IDP. When compared to the state-of-the-art against a wide variety of value distributions, ODSS is shown to perform better by up to 54.15% on benchmark inputs.

【Keywords】:

Paper Link】 【Pages】:7087-7094

【Authors】: Dingding Chen ; Yanchen Deng ; Ziyu Chen ; Wenxin Zhang ; Zhongshi He

【Abstract】: Search and inference are two main strategies for optimally solving Distributed Constraint Optimization Problems (DCOPs). Recently, several algorithms were proposed to combine their advantages. Unfortunately, such algorithms only use an approximated inference as a one-shot preprocessing phase to construct the initial lower bounds which lead to inefficient pruning under the limited memory budget. On the other hand, iterative inference algorithms (e.g., MB-DPOP) perform a context-based complete inference for all possible contexts but suffer from tremendous traffic overheads. In this paper, (i) hybridizing search with context-based inference, we propose a complete algorithm for DCOPs, named HS-CAI where the inference utilizes the contexts derived from the search process to establish tight lower bounds while the search uses such bounds for efficient pruning and thereby reduces contexts for the inference. Furthermore, (ii) we introduce a context evaluation mechanism to select the context patterns for the inference to further reduce the overheads incurred by iterative inferences. Finally, (iii) we prove the correctness of our algorithm and the experimental results demonstrate its superiority over the state-of-the-art.

【Keywords】:

869. AATEAM: Achieving the Ad Hoc Teamwork by Employing the Attention Mechanism.

Paper Link】 【Pages】:7095-7102

【Authors】: Shuo Chen ; Ewa Andrejczuk ; Zhiguang Cao ; Jie Zhang

【Abstract】: In the ad hoc teamwork setting, a team of agents needs to perform a task without prior coordination. The most advanced approach learns policies based on previous experiences and reuses one of the policies to interact with new teammates. However, the selected policy in many cases is sub-optimal. Switching between policies to adapt to new teammates' behaviour takes time, which threatens the successful performance of a task. In this paper, we propose AATEAM – a method that uses the attention-based neural networks to cope with new teammates' behaviour in real-time. We train one attention network per teammate type. The attention networks learn both to extract the temporal correlations from the sequence of states (i.e. contexts) and the mapping from contexts to actions. Each attention network also learns to predict a future state given the current context and its output action. The prediction accuracies help to determine which actions the ad hoc agent should take. We perform extensive experiments to show the effectiveness of our method.

【Keywords】:

870. Convergence of Opinion Diffusion is PSPACE-Complete.

Paper Link】 【Pages】:7103-7110

【Authors】: Dmitry Chistikov ; Grzegorz Lisowski ; Mike Paterson ; Paolo Turrini

【Abstract】: We analyse opinion diffusion in social networks, where a finite set of individuals is connected in a directed graph and each simultaneously changes their opinion to that of the majority of their influencers. We study the algorithmic properties of the fixed-point behaviour of such networks, showing that the problem of establishing whether individuals converge to stable opinions is PSPACE-complete.

【Keywords】:

871. A Particle Swarm Based Algorithm for Functional Distributed Constraint Optimization Problems.

Paper Link】 【Pages】:7111-7118

【Authors】: Moumita Choudhury ; Saaduddin Mahmud ; Md. Mosaddek Khan

【Abstract】: Distributed Constraint Optimization Problems (DCOPs) are a widely studied constraint handling framework. The objective of a DCOP algorithm is to optimize a global objective function that can be described as the aggregation of several distributed constraint cost functions. In a DCOP, each of these functions is defined by a set of discrete variables. However, in many applications, such as target tracking or sleep scheduling in sensor networks, continuous valued variables are more suited than the discrete ones. Considering this, Functional DCOPs (F-DCOPs) have been proposed that can explicitly model a problem containing continuous variables. Nevertheless, state-of-the-art F-DCOPs approaches experience onerous memory or computation overhead. To address this issue, we propose a new F-DCOP algorithm, namely Particle Swarm based F-DCOP (PFD), which is inspired by a meta-heuristic, Particle Swarm Optimization (PSO). Although it has been successfully applied to many continuous optimization problems, the potential of PSO has not been utilized in F-DCOPs. To be exact, PFD devises a distributed method of solution construction while significantly reducing the computation and memory requirements. Moreover, we theoretically prove that PFD is an anytime algorithm. Finally, our empirical results indicate that PFD outperforms the state-of-the-art approaches in terms of solution quality and computation overhead.

【Keywords】:

872. An Operational Semantics for True Concurrency in BDI Agent Systems.

Paper Link】 【Pages】:7119-7126

【Authors】: Lavindra de Silva

【Abstract】: Agent programming languages have proved useful for formally modelling implemented systems such as PRS and JACK, and for reasoning about their behaviour. Over the past decades, many agent programming languages and extensions have been developed. A key feature in some of them is their support for the specification of ‘concurrent’ actions and programs. However, their notion of concurrency is still limited, as it amounts to a nondeterministic choice between (sequential) action interleavings. Thus, the notion does not represent ‘true concurrency’, which can more naturally exploit multi-core computers and multi-robot manufacturing cells. This paper provides a true concurrency operational semantics for a BDI agent programming language, allowing actions to overlap in execution. We prove key properties of the semantics, relating to true concurrency and to its link with interleaving.

【Keywords】:

873. Scalable Decision-Theoretic Planning in Open and Typed Multiagent Systems.

Paper Link】 【Pages】:7127-7134

【Authors】: Adam Eck ; Maulik Shah ; Prashant Doshi ; Leen-Kiat Soh

【Abstract】: In open agent systems, the set of agents that are cooperating or competing changes over time and in ways that are nontrivial to predict. For example, if collaborative robots were tasked with fighting wildfires, they may run out of suppressants and be temporarily unavailable to assist their peers. We consider the problem of planning in these contexts with the additional challenges that the agents are unable to communicate with each other and that there are many of them. Because an agent's optimal action depends on the actions of others, each agent must not only predict the actions of its peers, but, before that, reason whether they are even present to perform an action. Addressing openness thus requires agents to model each other's presence, which becomes computationally intractable with high numbers of agents. We present a novel, principled, and scalable method in this context that enables an agent to reason about others' presence in its shared environment and their actions. Our method extrapolates models of a few peers to the overall behavior of the many-agent system, and combines it with a generalization of Monte Carlo tree search to perform individual agent reasoning in many-agent open environments. Theoretical analyses establish the number of agents to model in order to achieve acceptable worst case bounds on extrapolation error, as well as regret bounds on the agent's utility from modeling only some neighbors. Simulations of multiagent wildfire suppression problems demonstrate our approach's efficacy compared with alternative baselines.

【Keywords】:

874. Parameterized Complexity of Envy-Free Resource Allocation in Social Networks.

Paper Link】 【Pages】:7135-7142

【Authors】: Eduard Eiben ; Robert Ganian ; Thekla Hamm ; Sebastian Ordyniak

【Abstract】: We consider the classical problem of allocating resources among agents in an envy-free (and, where applicable, proportional) way. Recently, the basic model was enriched by introducing the concept of a social network which allows to capture situations where agents might not have full information about the allocation of all resources. We initiate the study of the parameterized complexity of these resource allocation problems by considering natural parameters which capture structural properties of the network and similarities between agents and items. In particular, we show that even very general fragments of the considered problems become tractable as long as the social network has bounded treewidth or bounded clique-width. We complement our results with matching lower bounds which show that our algorithms cannot be substantially improved.

【Keywords】:

875. On the Convergence of Model Free Learning in Mean Field Games.

Paper Link】 【Pages】:7143-7150

【Authors】: Romuald Elie ; Julien Pérolat ; Mathieu Laurière ; Matthieu Geist ; Olivier Pietquin

【Abstract】: Learning by experience in Multi-Agent Systems (MAS) is a difficult and exciting task, due to the lack of stationarity of the environment, whose dynamics evolves as the population learns. In order to design scalable algorithms for systems with a large population of interacting agents (e.g., swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite. Recently, a very active burgeoning field studies the effects of diverse reinforcement learning algorithms for agents with no prior information on a stationary Mean Field Game (MFG) and learn their policy through repeated experience. We adopt a high perspective on this problem and analyze in full generality the convergence of a fictitious iterative scheme using any single agent learning algorithm at each step. We quantify the quality of the computed approximate Nash equilibrium, in terms of the accumulated errors arising at each learning iteration step. Notably, we show for the first time convergence of model free learning algorithms towards non-stationary MFG equilibria, relying only on classical assumptions on the MFG dynamics. We illustrate our theoretical results with a numerical experiment in a continuous action-space environment, where the approximate best response of the iterative fictitious play scheme is computed with a deep RL algorithm.

【Keywords】:

876. Implicit Coordination Using FOND Planning.

Paper Link】 【Pages】:7151-7159

【Authors】: Thorsten Engesser ; Tim Miller

【Abstract】: Epistemic planning can be used to achieve implicit coordination in cooperative multi-agent settings where knowledge and capabilities are distributed between the agents. In these scenarios, agents plan and act on their own without having to agree on a common plan or protocol beforehand. However, epistemic planning is undecidable in general. In this paper, we show how implicit coordination can be achieved in a simpler, propositional setting by using nondeterminism as a means to allow the agents to take the other agents' perspectives. We identify a decidable fragment of epistemic planning that allows for arbitrary initial state uncertainty and non-determinism, but where actions can never increase the uncertainty of the agents. We show that in this fragment, planning for implicit coordination can be reduced to a version of fully observable nondeterministic (FOND) planning and that it thus has the same computational complexity as FOND planning. We provide a small case study, modeling the problem of multi-agent path finding with destination uncertainty in FOND, to show that our approach can be successfully applied in practice.

【Keywords】:

877. Communication Learning via Backpropagation in Discrete Channels with Unknown Noise.

Paper Link】 【Pages】:7160-7168

【Authors】: Benjamin Freed ; Guillaume Sartoretti ; Jiaheng Hu ; Howie Choset

【Abstract】: This work focuses on multi-agent reinforcement learning (RL) with inter-agent communication, in which communication is differentiable and optimized through backpropagation. Such differentiable approaches tend to converge more quickly to higher-quality policies compared to techniques that treat communication as actions in a traditional RL framework. However, modern communication networks (e.g., Wi-Fi or Bluetooth) rely on discrete communication channels, for which existing differentiable approaches that consider real-valued messages cannot be directly applied, or require biased gradient estimators. Some works have overcome this problem by treating the message space as an extension of the action space, and use standard RL to optimize message selection, but these methods tend to converge slower and to inferior policies. In this paper, we propose a stochastic message encoding/decoding procedure that makes a discrete communication channel mathematically equivalent to an analog channel with additive noise, through which gradients can be backpropagated. Additionally, we introduce an encryption step for use in noisy channels that forces channel noise to be message-independent, allowing us to compute unbiased derivative estimates even in the presence of unknown channel noise. To the best of our knowledge, this work presents the first differentiable communication learning approach that can compute unbiased derivatives through channels with unknown noise. We demonstrate the effectiveness of our approach in two example multi-robot tasks: a path finding and a collaborative search problem. There, we show that our approach achieves learning speed and performance similar to differentiable communication learning with real-valued messages (i.e., unlimited communication bandwidth), while naturally handling more realistic real-world communication constraints. Content Areas: Multi-Agent Communication, Reinforcement Learning.

【Keywords】:

878. Distributed Stochastic Gradient Descent with Event-Triggered Communication.

Paper Link】 【Pages】:7169-7178

【Authors】: Jemin George ; Prudhvi Gurram

【Abstract】: We develop a Distributed Event-Triggered Stochastic GRAdient Descent (DETSGRAD) algorithm for solving non-convex optimization problems typically encountered in distributed deep learning. We propose a novel communication triggering mechanism that would allow the networked agents to update their model parameters aperiodically and provide sufficient conditions on the algorithm step-sizes that guarantee the asymptotic mean-square convergence. The algorithm is applied to a distributed supervised-learning problem, in which a set of networked agents collaboratively train their individual neural networks to perform image classification, while aperiodically sharing the model parameters with their one-hop neighbors. Results indicate that all agents report similar performance that is also comparable to the performance of a centrally trained neural network, while the event-triggered communication provides significant reduction in inter-agent communication. Results also show that the proposed algorithm allows the individual agents to classify the images even though the training data corresponding to all the classes are not locally available to each agent.

【Keywords】:

879. Distributed Machine Learning through Heterogeneous Edge Systems.

Paper Link】 【Pages】:7179-7186

【Authors】: Hanpeng Hu ; Dan Wang ; Chuan Wu

【Abstract】: Many emerging AI applications request distributed machine learning (ML) among edge systems (e.g., IoT devices and PCs at the edge of the Internet), where data cannot be uploaded to a central venue for model training, due to their large volumes and/or security/privacy concerns. Edge devices are intrinsically heterogeneous in computing capacity, posing significant challenges to parameter synchronization for parallel training with the parameter server (PS) architecture. This paper proposes ADSP, a parameter synchronization model for distributed machine learning (ML) with heterogeneous edge systems. Eliminating the significant waiting time occurring with existing parameter synchronization models, the core idea of ADSP is to let faster edge devices continue training, while committing their model updates at strategically decided intervals. We design algorithms that decide time points for each worker to commit its model update, and ensure not only global model convergence but also faster convergence. Our testbed implementation and experiments show that ADSP outperforms existing parameter synchronization models significantly in terms of ML model convergence time, scalability and adaptability to large heterogeneity.

【Keywords】:

Paper Link】 【Pages】:7187-7194

【Authors】: Adam Lerer ; Hengyuan Hu ; Jakob N. Foerster ; Noam Brown

【Abstract】: Recent superhuman results in games have largely been achieved in a variety of zero-sum settings, such as Go and Poker, in which agents need to compete against others. However, just like humans, real-world AI systems have to coordinate and communicate with other agents in cooperative partially observable environments as well. These settings commonly require participants to both interpret the actions of others and to act in a way that is informative when being interpreted. Those abilities are typically summarized as theory of mind and are seen as crucial for social interactions. In this paper we propose two different search techniques that can be applied to improve an arbitrary agreed-upon policy in a cooperative partially observable game. The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to the agreed-upon policy. In contrast, in multi-agent search all agents carry out the same common-knowledge search procedure whenever doing so is computationally feasible, and fall back to playing according to the agreed-upon policy otherwise. We prove that these search procedures are theoretically guaranteed to at least maintain the original performance of the agreed-upon policy (up to a bounded approximation error). In the benchmark challenge problem of Hanabi, our search technique greatly improves the performance of every agent we tested and when applied to a policy trained using RL achieves a new state-of-the-art score of 24.61 / 25 in the game, compared to a previous-best of 24.08 / 25.

【Keywords】:

881. Generative Attention Networks for Multi-Agent Behavioral Modeling.

Paper Link】 【Pages】:7195-7202

【Authors】: Max Guangyu Li ; Bo Jiang ; Hao Zhu ; Zhengping Che ; Yan Liu

【Abstract】: Understanding and modeling behavior of multi-agent systems is a central step for artificial intelligence. Here we present a deep generative model which captures behavior generating process of multi-agent systems, supports accurate predictions and inference, infers how agents interact in a complex system, as well as identifies agent groups and interaction types. Built upon advances in deep generative models and a novel attention mechanism, our model can learn interactions in highly heterogeneous systems with linear complexity in the number of agents. We apply this model to three multi-agent systems in different domains and evaluate performance on a diverse set of tasks including behavior prediction, interaction analysis and system identification. Experimental results demonstrate its ability to model multi-agent systems, yielding improved performance over competitive baselines. We also show the model can successfully identify agent groups and interaction types in these systems. Our model offers new opportunities to predict complex multi-agent behaviors and takes a step forward in understanding interactions in multi-agent systems.

【Keywords】:

882. A Variational Perturbative Approach to Planning in Graph-Based Markov Decision Processes.

Paper Link】 【Pages】:7203-7210

【Authors】: Dominik Linzner ; Heinz Koeppl

【Abstract】: Coordinating multiple interacting agents to achieve a common goal is a difficult task with huge applicability. This problem remains hard to solve, even when limiting interactions to be mediated via a static interaction-graph. We present a novel approximate solution method for multi-agent Markov decision problems on graphs, based on variational perturbation theory. We adopt the strategy of planning via inference, which has been explored in various prior works. We employ a non-trivial extension of a novel high-order variational method that allows for approximate inference in large networks and has been shown to surpass the accuracy of existing variational methods. To compare our method to two state-of-the-art methods for multi-agent planning on graphs, we apply the method different standard GMDP problems. We show that in cases, where the goal is encoded as a non-local cost function, our method performs well, while state-of-the-art methods approach the performance of random guess. In a final experiment, we demonstrate that our method brings significant improvement for synchronization tasks.

【Keywords】:

883. Multi-Agent Game Abstraction via Graph Attention Neural Network.

Paper Link】 【Pages】:7211-7218

【Authors】: Yong Liu ; Weixun Wang ; Yujing Hu ; Jianye Hao ; Xingguo Chen ; Yang Gao

【Abstract】: In large-scale multi-agent systems, the large number of agents and complex game relationship cause great difficulty for policy learning. Therefore, simplifying the learning process is an important research issue. In many multi-agent systems, the interactions between agents often happen locally, which means that agents neither need to coordinate with all other agents nor need to coordinate with others all the time. Traditional methods attempt to use pre-defined rules to capture the interaction relationship between agents. However, the methods cannot be directly used in a large-scale environment due to the difficulty of transforming the complex interactions between agents into rules. In this paper, we model the relationship between agents by a complete graph and propose a novel game abstraction mechanism based on two-stage attention network (G2ANet), which can indicate whether there is an interaction between two agents and the importance of the interaction. We integrate this detection mechanism into graph neural network-based multi-agent reinforcement learning for conducting game abstraction and propose two novel learning algorithms GA-Comm and GA-AC. We conduct experiments in Traffic Junction and Predator-Prey. The results indicate that the proposed methods can simplify the learning process and meanwhile get better asymptotic performance compared with state-of-the-art algorithms.

【Keywords】:

884. Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning.

Paper Link】 【Pages】:7219-7226

【Authors】: Hangyu Mao ; Wulong Liu ; Jianye Hao ; Jun Luo ; Dong Li ; Zhengchao Zhang ; Jun Wang ; Zhen Xiao

【Abstract】: Social psychology and real experiences show that cognitive consistency plays an important role to keep human society in order: if people have a more consistent cognition about their environments, they are more likely to achieve better cooperation. Meanwhile, only cognitive consistency within a neighborhood matters because humans only interact directly with their neighbors. Inspired by these observations, we take the first step to introduce neighborhood cognitive consistency (NCC) into multi-agent reinforcement learning (MARL). Our NCC design is quite general and can be easily combined with existing MARL methods. As examples, we propose neighborhood cognition consistent deep Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations. Extensive experiments on several challenging tasks (i.e., packet routing, wifi configuration and Google football player control) justify the superior performance of our methods compared with state-of-the-art MARL approaches.

【Keywords】:

885. Multi-Objective Multi-Agent Planning for Jointly Discovering and Tracking Mobile Objects.

Paper Link】 【Pages】:7227-7235

【Authors】: Hoa Van Nguyen ; Hamid Rezatofighi ; Ba-Ngu Vo ; Damith Chinthana Ranasinghe

【Abstract】: We consider the challenging problem of online planning for a team of agents to autonomously search and track a time-varying number of mobile objects under the practical constraint of detection range limited onboard sensors. A standard POMDP with a value function that either encourages discovery or accurate tracking of mobile objects is inadequate to simultaneously meet the conflicting goals of searching for undiscovered mobile objects whilst keeping track of discovered objects. The planning problem is further complicated by misdetections or false detections of objects caused by range limited sensors and noise inherent to sensor measurements. We formulate a novel multi-objective POMDP based on information theoretic criteria, and an online multi-object tracking filter for the problem. Since controlling multi-agent is a well known combinatorial optimization problem, assigning control actions to agents necessitates a greedy algorithm. We prove that our proposed multi-objective value function is a monotone submodular set function; consequently, the greedy algorithm can achieve a (1-1/e) approximation for maximizing the submodular multi-objective function.

【Keywords】:

886. Multi-Agent Actor-Critic with Hierarchical Graph Attention Network.

Paper Link】 【Pages】:7236-7243

【Authors】: Heechang Ryu ; Hayong Shin ; Jinkyoo Park

【Abstract】: Most previous studies on multi-agent reinforcement learning focus on deriving decentralized and cooperative policies to maximize a common reward and rarely consider the transferability of trained policies to new tasks. This prevents such policies from being applied to more complex multi-agent tasks. To resolve these limitations, we propose a model that conducts both representation learning for multiple agents using hierarchical graph attention network and policy learning using multi-agent actor-critic. The hierarchical graph attention network is specially designed to model the hierarchical relationships among multiple agents that either cooperate or compete with each other to derive more advanced strategic policies. Two attention networks, the inter-agent and inter-group attention layers, are used to effectively model individual and group level interactions, respectively. The two attention networks have been proven to facilitate the transfer of learned policies to new tasks with different agent compositions and allow one to interpret the learned strategies. Empirically, we demonstrate that the proposed model outperforms existing methods in several mixed cooperative and competitive tasks.

【Keywords】:

887. Clouseau: Generating Communication Protocols from Commitments.

Paper Link】 【Pages】:7244-7252

【Authors】: Munindar P. Singh ; Amit K. Chopra

【Abstract】: Engineering a decentralized multiagent system (MAS) requires realizing interactions modeled as a communication protocol between autonomous agents. We contribute Clouseau, an approach that takes a commitment-based specification of an interaction and generates a communication protocol amenable to decentralized enactment. We show that the generated protocol is (1) correct—realizes all and only the computations that satisfy the input specification; (2) safe—ensures the agents' local views remain consistent; and (3) live—ensures the agents can proceed to completion.

【Keywords】:

888. Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence.

Paper Link】 【Pages】:7253-7260

【Authors】: Yuhang Song ; Andrzej Wojcicki ; Thomas Lukasiewicz ; Jianyi Wang ; Abi Aryan ; Zhenghua Xu ; Mai Xu ; Zihan Ding ; Lianlong Wu

【Abstract】: Learning agents that are not only capable of taking tests, but also innovating is becoming a hot topic in AI. One of the most promising paths towards this vision is multi-agent learning, where agents act as the environment for each other, and improving each agent means proposing new problems for others. However, existing evaluation platforms are either not compatible with multi-agent settings, or limited to a specific game. That is, there is not yet a general evaluation platform for research on multi-agent intelligence. To this end, we introduce Arena, a general evaluation platform for multi-agent intelligence with 35 games of diverse logics and representations. Furthermore, multi-agent intelligence is still at the stage where many problems remain unexplored. Therefore, we provide a building toolkit for researchers to easily invent and build novel multi-agent problems from the provided game set based on a GUI-configurable social tree and five basic multi-agent reward schemes. Finally, we provide Python implementations of five state-of-the-art deep multi-agent reinforcement learning baselines. Along with the baseline implementations, we release a set of 100 best agents/teams that we can train with different training schemes for each game, as the base for evaluating agents with population performance. As such, the research community can perform comparisons under a stable and uniform standard. All the implementations and accompanied tutorials have been open-sourced for the community at https://sites.google.com/view/arena-unity/.

【Keywords】:

889. Learning to Communicate Implicitly by Actions.

Paper Link】 【Pages】:7261-7268

【Authors】: Zheng Tian ; Shihao Zou ; Ian Davies ; Tim Warr ; Lisheng Wu ; Haitham Bou-Ammar ; Jun Wang

【Abstract】: In situations where explicit communication is limited, human collaborators act by learning to: (i) infer meaning behind their partner's actions, and (ii) convey private information about the state to their partner implicitly through actions. The first component of this learning process has been well-studied in multi-agent systems, whereas the second — which is equally crucial for successful collaboration — has not. To mimic both components mentioned above, thereby completing the learning process, we introduce a novel algorithm: Policy Belief Learning (PBL). PBL uses a belief module to model the other agent's private information and a policy module to form a distribution over actions informed by the belief module. Furthermore, to encourage communication by actions, we propose a novel auxiliary reward which incentivizes one agent to help its partner to make correct inferences about its private information. The auxiliary reward for communication is integrated into the learning of the policy module. We evaluate our approach on a set of environments including a matrix game, particle environment and the non-competitive bidding problem from contract bridge. We show empirically that this auxiliary reward is effective and easy to generalize. These results demonstrate that our PBL algorithm can produce strong pairs of agents in collaborative games where explicit communication is disabled.

【Keywords】:

890. Fair Procedures for Fair Stable Marriage Outcomes.

Paper Link】 【Pages】:7269-7276

【Authors】: Nikolaos Tziavelis ; Ioannis Giannakopoulos ; Rune Quist Johansen ; Katerina Doka ; Nectarios Koziris ; Panagiotis Karras

【Abstract】: Given a two-sided market where each agent ranks those on the other side by preference, the stable marriage problem calls for finding a perfect matching such that no pair of agents prefer each other to their matches. Recent studies show that the number of stable solutions can be large in practice. Yet the classical solution to the problem, the Gale-Shapley (GS) algorithm, assigns an optimal match to each agent on one side, and a pessimal one to each on the other side; such a solution may fare well in terms of equity only in highly asymmetric markets. Finding a stable matching that minimizes the sex equality cost, an equity measure expressing the discrepancy of mean happiness among the two sides, is strongly NP-hard. Extant heuristics either (a) oblige some agents to involuntarily abandon their matches, or (b) bias the outcome in favor of some agents, or (c) need high-polynomial or unbounded time.We provide the first procedurally fair algorithms that output equitable stable marriages and are guaranteed to terminate in at most cubic time; the key to this breakthrough is the monitoring of a monotonic state function and the use of a selective criterion for accepting proposals. Our experiments with diverse simulated markets show that: (a) extant heuristics fail to yield high equity; (b) the best solution found by the GS algorithm can be very far from optimal equity; and (c) our procedures stand out in both efficiency and equity, even when compared to a non-procedurally fair approximation scheme.

【Keywords】:

891. Generalized and Sub-Optimal Bipartite Constraints for Conflict-Based Search.

Paper Link】 【Pages】:7277-7284

【Authors】: Thayne T. Walker ; Nathan R. Sturtevant ; Ariel Felner

【Abstract】: The main idea of conflict-based search (CBS), a popular, state-of-the-art algorithm for multi-agent pathfinding is to resolve conflicts between agents by systematically adding constraints to agents. Recently, CBS has been adapted for new domains and variants, including non-unit costs and continuous time settings. These adaptations require new types of constraints. This paper introduces a new automatic constraint generation technique called bipartite reduction (BR). BR converts the constraint generation step of CBS to a surrogate bipartite graph problem. The properties of BR guarantee completeness and optimality for CBS. Also, BR's properties may be relaxed to obtain suboptimal solutions. Empirical results show that BR yields significant speedups in 2k connected grids over the previous state-of-the-art for both optimal and suboptimal search.

【Keywords】:

892. Shapley Q-Value: A Local Reward Approach to Solve Global Reward Games.

Paper Link】 【Pages】:7285-7292

【Authors】: Jianhong Wang ; Yuan Zhang ; Tae-Kyun Kim ; Yunjie Gu

【Abstract】: Cooperative game is a critical research area in the multi-agent reinforcement learning (MARL). Global reward game is a subclass of cooperative games, where all agents aim to maximize the global reward. Credit assignment is an important problem studied in the global reward game. Most of previous works stood by the view of non-cooperative-game theoretical framework with the shared reward approach, i.e., each agent being assigned a shared global reward directly. This, however, may give each agent an inaccurate reward on its contribution to the group, which could cause inefficient learning. To deal with this problem, we i) introduce a cooperative-game theoretical framework called extended convex game (ECG) that is a superset of global reward game, and ii) propose a local reward approach called Shapley Q-value. Shapley Q-value is able to distribute the global reward, reflecting each agent's own contribution in contrast to the shared reward approach. Moreover, we derive an MARL algorithm called Shapley Q-value deep deterministic policy gradient (SQDDPG), using Shapley Q-value as the critic for each agent. We evaluate SQDDPG on Cooperative Navigation, Prey-and-Predator and Traffic Junction, compared with the state-of-the-art algorithms, e.g., MADDPG, COMA, Independent DDPG and Independent A2C. In the experiments, SQDDPG shows a significant improvement on the convergence rate. Finally, we plot Shapley Q-value and validate the property of fair credit assignment.

【Keywords】:

893. From Few to More: Large-Scale Dynamic Multiagent Curriculum Learning.

Paper Link】 【Pages】:7293-7300

【Authors】: Weixun Wang ; Tianpei Yang ; Yong Liu ; Jianye Hao ; Xiaotian Hao ; Yujing Hu ; Yingfeng Chen ; Changjie Fan ; Yang Gao

【Abstract】: A lot of efforts have been devoted to investigating how agents can learn effectively and achieve coordination in multiagent systems. However, it is still challenging in large-scale multiagent settings due to the complex dynamics between the environment and agents and the explosion of state-action space. In this paper, we design a novel Dynamic Multiagent Curriculum Learning (DyMA-CL) to solve large-scale problems by starting from learning on a multiagent scenario with a small size and progressively increasing the number of agents. We propose three transfer mechanisms across curricula to accelerate the learning process. Moreover, due to the fact that the state dimension varies across curricula, and existing network structures cannot be applied in such a transfer setting since their network input sizes are fixed. Therefore, we design a novel network structure called Dynamic Agent-number Network (DyAN) to handle the dynamic size of the network input. Experimental results show that DyMA-CL using DyAN greatly improves the performance of large-scale multiagent learning compared with state-of-the-art deep reinforcement learning approaches. We also investigate the influence of three transfer mechanisms across curricula through extensive simulations.

【Keywords】:

894. SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning.

Paper Link】 【Pages】:7301-7308

【Authors】: Chao Wen ; Xinghu Yao ; Yuhui Wang ; Xiaoyang Tan

【Abstract】: This work presents a sample efficient and effective value-based method, named SMIX(λ), for reinforcement learning in multi-agent environments (MARL) within the paradigm of centralized training with decentralized execution (CTDE), in which learning a stable and generalizable centralized value function (CVF) is crucial. To achieve this, our method carefully combines different elements, including 1) removing the unrealistic centralized greedy assumption during the learning phase, 2) using the λ-return to balance the trade-off between bias and variance and to deal with the environment's non-Markovian property, and 3) adopting an experience-replay style off-policy training. Interestingly, it is revealed that there exists inherent connection between SMIX(λ) and previous off-policy Q(λ) approach for single-agent learning. Experiments on the StarCraft Multi-Agent Challenge (SMAC) benchmark show that the proposed SMIX(λ) algorithm outperforms several state-of-the-art MARL methods by a large margin, and that it can be used as a general tool to improve the overall performance of a CTDE-type method by enhancing the evaluation quality of its CVF. We open-source our code at: https://github.com/chaovven/SMIX.

【Keywords】:

895. Optimal Common Contract with Heterogeneous Agents.

Paper Link】 【Pages】:7309-7316

【Authors】: Shenke Xiao ; Zihe Wang ; Mengjing Chen ; Pingzhong Tang ; Xiwang Yang

【Abstract】: We consider the principal-agent problem with heterogeneous agents. Previous works assume that the principal signs independent incentive contracts with every agent to make them invest more efforts on the tasks. However, in many circumstances, these contracts need to be identical for the sake of fairness. We investigate the optimal common contract problem. To our knowledge, this is the first attempt to consider this natural and important generalization. We first show this problem is NP-complete. Then we provide a dynamic programming algorithm to compute the optimal contract in O(n2m) time, where n,m are the number of agents and actions, under the assumption that the agents' cost functions obey increasing difference property. At last, we generalize the setting such that each agent can choose to directly produce a reward in [0,1]. We provide an O(log n)-approximate algorithm for this generalization.

【Keywords】:

896. COBRA: Context-Aware Bernoulli Neural Networks for Reputation Assessment.

Paper Link】 【Pages】:7317-7324

【Authors】: Leonid Zeynalvand ; Tie Luo ; Jie Zhang

【Abstract】: Trust and reputation management (TRM) plays an increasingly important role in large-scale online environments such as multi-agent systems (MAS) and the Internet of Things (IoT). One main objective of TRM is to achieve accurate trust assessment of entities such as agents or IoT service providers. However, this encounters an accuracy-privacy dilemma as we identify in this paper, and we propose a framework called Context-aware Bernoulli Neural Network based

【Keywords】:

897. Bi-Level Actor-Critic for Multi-Agent Coordination.

Paper Link】 【Pages】:7325-7332

【Authors】: Haifeng Zhang ; Weizhe Chen ; Zeren Huang ; Minne Li ; Yaodong Yang ; Weinan Zhang ; Jun Wang

【Abstract】: Coordination is one of the essential problems in multi-agent systems. Typically multi-agent reinforcement learning (MARL) methods treat agents equally and the goal is to solve the Markov game to an arbitrary Nash equilibrium (NE) when multiple equilibra exist, thus lacking a solution for NE selection. In this paper, we treat agents unequally and consider Stackelberg equilibrium as a potentially better convergence point than Nash equilibrium in terms of Pareto superiority, especially in cooperative environments. Under Markov games, we formally define the bi-level reinforcement learning problem in finding Stackelberg equilibrium. We propose a novel bi-level actor-critic learning method that allows agents to have different knowledge base (thus intelligent), while their actions still can be executed simultaneously and distributedly. The convergence proof is given, while the resulting learning algorithm is tested against the state of the arts. We found that the proposed bi-level actor-critic algorithm successfully converged to the Stackelberg equilibria in matrix games and find a asymmetric solution in a highway merge environment.

【Keywords】:

898. Beyond Trees: Analysis and Convergence of Belief Propagation in Graphs with Multiple Cycles.

Paper Link】 【Pages】:7333-7340

【Authors】: Roie Zivan ; Omer Lev ; Rotem Galiki

【Abstract】: Belief propagation, an algorithm for solving problems represented by graphical models, has long been known to converge to the optimal solution when the graph is a tree. When the graph representing the problem includes a single cycle, the algorithm either converges to the optimal solution or performs periodic oscillations. While the conditions that trigger these two behaviors have been established, the question regarding the convergence and divergence of the algorithm on graphs that include more than one cycle is still open.Focusing on Max-sum, the version of belief propagation for solving distributed constraint optimization problems (DCOPs), we extend the theory on the behavior of belief propagation in general – and Max-sum specifically – when solving problems represented by graphs with multiple cycles. This includes: 1) Generalizing the results obtained for graphs with a single cycle to graphs with multiple cycles, by using backtrack cost trees (BCT). 2) Proving that when the algorithm is applied to adjacent symmetric cycles, the use of a large enough damping factor guarantees convergence to the optimal solution.

【Keywords】:

AAAI Technical Track: Natural Language Processing 299

899. LeDeepChef Deep Reinforcement Learning Agent for Families of Text-Based Games.

Paper Link】 【Pages】:7342-7349

【Authors】: Leonard Adolphs ; Thomas Hofmann

【Abstract】: While Reinforcement Learning (RL) approaches lead to significant achievements in a variety of areas in recent history, natural language tasks remained mostly unaffected, due to the compositional and combinatorial nature that makes them notoriously hard to optimize. With the emerging field of Text-Based Games (TBGs), researchers try to bridge this gap. Inspired by the success of RL algorithms on Atari games, the idea is to develop new methods in a restricted game world and then gradually move to more complex environments. Previous work in the area of TBGs has mainly focused on solving individual games. We, however, consider the task of designing an agent that not just succeeds in a single game, but performs well across a whole family of games, sharing the same theme. In this work, we present our deep RL agent—LeDeepChef—that shows generalization capabilities to never-before-seen games of the same family with different environments and task descriptions. The agent participated in Microsoft Research's First TextWorld Problems: A Language and Reinforcement Learning Challenge and outperformed all but one competitor on the final test set. The games from the challenge all share the same theme, namely cooking in a modern house environment, but differ significantly in the arrangement of the rooms, the presented objects, and the specific goal (recipe to cook). To build an agent that achieves high scores across a whole family of games, we use an actor-critic framework and prune the action-space by using ideas from hierarchical reinforcement learning and a specialized module trained on a recipe database.

【Keywords】:

900. Knowledge Distillation from Internal Representations.

Paper Link】 【Pages】:7350-7357

【Authors】: Gustavo Aguilar ; Yuan Ling ; Yu Zhang ; Benjamin Yao ; Xing Fan ; Chenlei Guo

【Abstract】: Knowledge distillation is typically conducted by training a small model (the student) to mimic a large and cumbersome model (the teacher). The idea is to compress the knowledge from the teacher by using its output probabilities as soft-labels to optimize the student. However, when the teacher is considerably large, there is no guarantee that the internal knowledge of the teacher will be transferred into the student; even if the student closely matches the soft-labels, its internal representations may be considerably different. This internal mismatch can undermine the generalization capabilities originally intended to be transferred from the teacher to the student. In this paper, we propose to distill the internal representations of a large model such as BERT into a simplified version of it. We formulate two ways to distill such representations and various algorithms to conduct the distillation. We experiment with datasets from the GLUE benchmark and consistently show that adding knowledge distillation from internal representations is a more powerful method than only using soft-label distillation.

【Keywords】:

901. Modelling Sentence Pairs via Reinforcement Learning: An Actor-Critic Approach to Learn the Irrelevant Words.

Paper Link】 【Pages】:7358-7366

【Authors】: Mahtab Ahmed ; Robert E. Mercer

【Abstract】: Learning sentence representation is a fundamental task in Natural Language Processing. Most of the existing sentence pair modelling architectures focus only on extracting and using the rich sentence pair features. The drawback of utilizing all of these features makes the learning process much harder. In this study, we propose a reinforcement learning (RL) method to learn a sentence pair representation when performing tasks like semantic similarity, paraphrase identification, and question-answer pair modelling. We formulate this learning problem as a sequential decision making task where the decision made in the current state will have a strong impact on the following decisions. We address this decision making with a policy gradient RL method which chooses the irrelevant words to delete by looking at the sub-optimal representation of the sentences being compared. With this policy, extensive experiments show that our model achieves on par performance when learning task-specific representations of sentence pairs without needing any further knowledge like parse trees. We suggest that the simplicity of each task inference provided by our RL model makes it easier to explain.

【Keywords】:

902. End-to-End Argumentation Knowledge Graph Construction.

Paper Link】 【Pages】:7367-7374

【Authors】: Khalid Al Khatib ; Yufang Hou ; Henning Wachsmuth ; Charles Jochim ; Francesca Bonin ; Benno Stein

【Abstract】: This paper studies the end-to-end construction of an argumentation knowledge graph that is intended to support argument synthesis, argumentative question answering, or fake news detection, among others. The study is motivated by the proven effectiveness of knowledge graphs for interpretable and controllable text generation and exploratory search. Original in our work is that we propose a model of the knowledge encapsulated in arguments. Based on this model, we build a new corpus that comprises about 16k manual annotations of 4740 claims with instances of the model's elements, and we develop an end-to-end framework that automatically identifies all modeled types of instances. The results of experiments show the potential of the framework for building a web-based argumentation graph that is of high quality and large scale.

【Keywords】:

903. Story Realization: Expanding Plot Events into Sentences.

Paper Link】 【Pages】:7375-7382

【Authors】: Prithviraj Ammanabrolu ; Ethan Tien ; Wesley Cheung ; Zhaochen Luo ; William Ma ; Lara J. Martin ; Mark O. Riedl

【Abstract】: Neural network based approaches to automated story plot generation attempt to learn how to generate novel plots from a corpus of natural language plot summaries. Prior work has shown that a semantic abstraction of sentences called events improves neural plot generation and and allows one to decompose the problem into: (1) the generation of a sequence of events (event-to-event) and (2) the transformation of these events into natural language sentences (event-to-sentence). However, typical neural language generation approaches to event-to-sentence can ignore the event details and produce grammatically-correct but semantically-unrelated sentences. We present an ensemble-based model that generates natural language guided by events. We provide results—including a human subjects study—for a full end-to-end automated story generation system showing that our method generates more coherent and plausible stories than baseline approaches 1.

【Keywords】:

904. Do Not Have Enough Data? Deep Learning to the Rescue!

Paper Link】 【Pages】:7383-7390

【Authors】: Ateret Anaby-Tavor ; Boaz Carmeli ; Esther Goldbraich ; Amir Kantor ; George Kour ; Segev Shlomov ; Naama Tepper ; Naama Zwerdling

【Abstract】: Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically those applicable to text classification tasks with little data.

【Keywords】:

905. Fine-Grained Named Entity Typing over Distantly Supervised Data Based on Refined Representations.

Paper Link】 【Pages】:7391-7398

【Authors】: Muhammad Asif Ali ; Yifang Sun ; Bing Li ; Wei Wang

【Abstract】: Fine-Grained Named Entity Typing (FG-NET) is a key component in Natural Language Processing (NLP). It aims at classifying an entity mention into a wide range of entity types. Due to a large number of entity types, distant supervision is used to collect training data for this task, which noisily assigns type labels to entity mentions irrespective of the context. In order to alleviate the noisy labels, existing approaches on FG-NET analyze the entity mentions entirely independent of each other and assign type labels solely based on mention's sentence-specific context. This is inadequate for highly overlapping and/or noisy type labels as it hinders information passing across sentence boundaries. For this, we propose an edge-weighted attentive graph convolution network that refines the noisy mention representations by attending over corpus-level contextual clues prior to the end classification. Experimental evaluation shows that the proposed model outperforms the existing research by a relative score of upto 10.2% and 8.3% for macro-f1 and micro-f1 respectively.

【Keywords】:

906. Understanding the Semantic Content of Sparse Word Embeddings Using a Commonsense Knowledge Base.

Paper Link】 【Pages】:7399-7406

【Authors】: Vanda Balogh ; Gábor Berend ; Dimitrios I. Diochnos ; György Turán

【Abstract】: Word embeddings have developed into a major NLP tool with broad applicability. Understanding the semantic content of word embeddings remains an important challenge for additional applications. One aspect of this issue is to explore the interpretability of word embeddings. Sparse word embeddings have been proposed as models with improved interpretability. Continuing this line of research, we investigate the extent to which human interpretable semantic concepts emerge along the bases of sparse word representations. In order to have a broad framework for evaluation, we consider three general approaches for constructing sparse word representations, which are then evaluated in multiple ways. We propose a novel methodology to evaluate the semantic content of word embeddings using a commonsense knowledge base, applied here to the sparse case. This methodology is illustrated by two techniques using the ConceptNet knowledge base. The first approach assigns a commonsense concept label to the individual dimensions of the embedding space. The second approach uses a metric, derived by spreading activation, to quantify the coherence of coordinates along the individual axes. We also provide results on the relationship between the two approaches. The results show, for example, that in the individual dimensions of sparse word embeddings, words having high coefficients are more semantically related in terms of path lengths in the knowledge base than the ones having zero coefficients.

【Keywords】:

907. Simultaneously Linking Entities and Extracting Relations from Biomedical Text without Mention-Level Supervision.

Paper Link】 【Pages】:7407-7414

【Authors】: Trapit Bansal ; Patrick Verga ; Neha Choudhary ; Andrew McCallum

【Abstract】: Understanding the meaning of text often involves reasoning about entities and their relationships. This requires identifying textual mentions of entities, linking them to a canonical concept, and discerning their relationships. These tasks are nearly always viewed as separate components within a pipeline, each requiring a distinct model and training data. While relation extraction can often be trained with readily available weak or distant supervision, entity linkers typically require expensive mention-level supervision – which is not available in many domains. Instead, we propose a model which is trained to simultaneously produce entity linking and relation decisions while requiring no mention-level annotations. This approach avoids cascading errors that arise from pipelined methods and more accurately predicts entity relationships from text. We show that our model outperforms a state-of-the art entity linking and relation extraction pipeline on two biomedical datasets and can drastically improve the overall recall of the system.

【Keywords】:

908. Zero-Resource Cross-Lingual Named Entity Recognition.

Paper Link】 【Pages】:7415-7423

【Authors】: M. Saiful Bari ; Shafiq R. Joty ; Prathyusha Jwalapuram

【Abstract】: Recently, neural methods have achieved state-of-the-art (SOTA) results in Named Entity Recognition (NER) tasks for many languages without the need for manually crafted features. However, these models still require manually annotated training data, which is not available for many languages. In this paper, we propose an unsupervised cross-lingual NER model that can transfer NER knowledge from one language to another in a completely unsupervised way without relying on any bilingual dictionary or parallel data. Our model achieves this through word-level adversarial learning and augmented fine-tuning with parameter sharing and feature augmentation. Experiments on five different languages demonstrate the effectiveness of our approach, outperforming existing models by a good margin and setting a new SOTA for each language pair.

【Keywords】:

909. Generating Well-Formed Answers by Machine Reading with Stochastic Selector Networks.

Paper Link】 【Pages】:7424-7431

【Authors】: Bin Bi ; Chen Wu ; Ming Yan ; Wei Wang ; Jiangnan Xia ; Chenliang Li

【Abstract】: Question answering (QA) based on machine reading comprehension has been a recent surge in popularity, yet most work has focused on extractive methods. We instead address a more challenging QA problem of generating a well-formed answer by reading and summarizing the paragraph for a given question.For the generative QA task, we introduce a new neural architecture, LatentQA, in which a novel stochastic selector network composes a well-formed answer with words selected from the question, the paragraph and the global vocabulary, based on a sequence of discrete latent variables. Bayesian inference for the latent variables is performed to train the LatentQA model. The experiments on public datasets of natural answer generation confirm the effectiveness of LatentQA in generating high-quality well-formed answers.

【Keywords】:

910. PIQA: Reasoning about Physical Commonsense in Natural Language.

Paper Link】 【Pages】:7432-7439

【Authors】: Yonatan Bisk ; Rowan Zellers ; Ronan LeBras ; Jianfeng Gao ; Yejin Choi

【Abstract】: To apply eyeshadow without a brush, should I use a cotton swab or a toothpick? Questions requiring this kind of physical commonsense pose a challenge to today's natural language understanding systems. While recent pretrained models (such as BERT) have made progress on question answering over more abstract domains – such as news articles and encyclopedia entries, where text is plentiful – in more physical domains, text is inherently limited due to reporting bias. Can AI systems learn to reliably answer physical commonsense questions without experiencing the physical world?In this paper, we introduce the task of physical commonsense reasoning and a corresponding benchmark dataset Physical Interaction: Question Answering or PIQA. Though humans find the dataset easy (95% accuracy), large pretrained models struggle (∼75%). We provide analysis about the dimensions of knowledge that existing models lack, which offers significant opportunities for future research.

【Keywords】:

911. Back to the Future - Temporal Adaptation of Text Representations.

Paper Link】 【Pages】:7440-7447

【Authors】: Johannes Bjerva ; Wouter Kouw ; Isabelle Augenstein

【Abstract】: Language evolves over time in many ways relevant to natural language processing tasks. For example, recent occurrences of tokens 'BERT' and 'ELMO' in publications refer to neural network architectures rather than persons. This type of temporal signal is typically overlooked, but is important if one aims to deploy a machine learning model over an extended period of time. In particular, language evolution causes data drift between time-steps in sequential decision-making tasks. Examples of such tasks include prediction of paper acceptance for yearly conferences (regular intervals) or author stance prediction for rumours on Twitter (irregular intervals). Inspired by successes in computer vision, we tackle data drift by sequentially aligning learned representations. We evaluate on three challenging tasks varying in terms of time-scales, linguistic units, and domains. These tasks show our method outperforming several strong baselines, including using all available data. We argue that, due to its low computational expense, sequential alignment is a practical solution to dealing with language evolution.

【Keywords】:

912. Modelling Semantic Categories Using Conceptual Neighborhood.

Paper Link】 【Pages】:7448-7455

【Authors】: Zied Bouraoui ; José Camacho-Collados ; Luis Espinosa Anke ; Steven Schockaert

【Abstract】: While many methods for learning vector space embeddings have been proposed in the field of Natural Language Processing, these methods typically do not distinguish between categories and individuals. Intuitively, if individuals are represented as vectors, we can think of categories as (soft) regions in the embedding space. Unfortunately, meaningful regions can be difficult to estimate, especially since we often have few examples of individuals that belong to a given category. To address this issue, we rely on the fact that different categories are often highly interdependent. In particular, categories often have conceptual neighbors, which are disjoint from but closely related to the given category (e.g. fruit and vegetable). Our hypothesis is that more accurate category representations can be learned by relying on the assumption that the regions representing such conceptual neighbors should be adjacent in the embedding space. We propose a simple method for identifying conceptual neighbors and then show that incorporating these conceptual neighbors indeed leads to more accurate region based representations.

【Keywords】:

913. Inducing Relational Knowledge from BERT.

Paper Link】 【Pages】:7456-7463

【Authors】: Zied Bouraoui ; José Camacho-Collados ; Steven Schockaert

【Abstract】: One of the most remarkable properties of word embeddings is the fact that they capture certain types of semantic and syntactic relationships. Recently, pre-trained language models such as BERT have achieved groundbreaking results across a wide range of Natural Language Processing tasks. However, it is unclear to what extent such models capture relational knowledge beyond what is already captured by standard word embeddings. To explore this question, we propose a methodology for distilling relational knowledge from a pre-trained language model. Starting from a few seed instances of a given relation, we first use a large text corpus to find sentences that are likely to express this relation. We then use a subset of these extracted sentences as templates. Finally, we fine-tune a language model to predict whether a given word pair is likely to be an instance of some relation, when given an instantiated template for that relation as input.

【Keywords】:

914. Graph Transformer for Graph-to-Sequence Learning.

Paper Link】 【Pages】:7464-7471

【Authors】: Deng Cai ; Wai Lam

【Abstract】: The dominant graph-to-sequence transduction models employ graph neural networks for graph representation learning, where the structural information is reflected by the receptive field of neurons. Unlike graph neural networks that restrict the information exchange between immediate neighborhood, we propose a new model, known as Graph Transformer, that uses explicit relation encoding and allows direct communication between two distant nodes. It provides a more efficient way for global graph structure modeling. Experiments on the applications of text generation from Abstract Meaning Representation (AMR) and syntax-based neural machine translation show the superiority of our proposed model. Specifically, our model achieves 27.4 BLEU on LDC2015E86 and 29.7 BLEU on LDC2017T10 for AMR-to-text generation, outperforming the state-of-the-art results by up to 2.2 points. On the syntax-based translation tasks, our model establishes new single-model state-of-the-art BLEU scores, 21.3 for English-to-German and 14.1 for English-to-Czech, improving over the existing best results, including ensembles, by over 1 BLEU.

【Keywords】:

915. Learning from Easy to Complex: Adaptive Multi-Curricula Learning for Neural Dialogue Generation.

Paper Link】 【Pages】:7472-7479

【Authors】: Hengyi Cai ; Hongshen Chen ; Cheng Zhang ; Yonghao Song ; Xiaofang Zhao ; Yangxi Li ; Dongsheng Duan ; Dawei Yin

【Abstract】: Current state-of-the-art neural dialogue systems are mainly data-driven and are trained on human-generated responses. However, due to the subjectivity and open-ended nature of human conversations, the complexity of training dialogues varies greatly. The noise and uneven complexity of query-response pairs impede the learning efficiency and effects of the neural dialogue generation models. What is more, so far, there are no unified dialogue complexity measurements, and the dialogue complexity embodies multiple aspects of attributes—specificity, repetitiveness, relevance, etc. Inspired by human behaviors of learning to converse, where children learn from easy dialogues to complex ones and dynamically adjust their learning progress, in this paper, we first analyze five dialogue attributes to measure the dialogue complexity in multiple perspectives on three publicly available corpora. Then, we propose an adaptive multi-curricula learning framework to schedule a committee of the organized curricula. The framework is established upon the reinforcement learning paradigm, which automatically chooses different curricula at the evolving learning process according to the learning status of the neural dialogue generation model. Extensive experiments conducted on five state-of-the-art models demonstrate its learning efficiency and effectiveness with respect to 13 automatic evaluation metrics and human judgments.

【Keywords】:

916. Unsupervised Domain Adaptation on Reading Comprehension.

Paper Link】 【Pages】:7480-7487

【Authors】: Yu Cao ; Meng Fang ; Baosheng Yu ; Joey Tianyi Zhou

【Abstract】: Reading comprehension (RC) has been studied in a variety of datasets with the boosted performance brought by deep neural networks. However, the generalization capability of these models across different domains remains unclear. To alleviate the problem, we investigate unsupervised domain adaptation on RC, wherein a model is trained on the labeled source domain and to be applied to the target domain with only unlabeled samples. We first show that even with the powerful BERT contextual representation, a model can not generalize well from one domain to another. To solve this, we provide a novel conditional adversarial self-training method (CASe). Specifically, our approach leverages a BERT model fine-tuned on the source dataset along with the confidence filtering to generate reliable pseudo-labeled samples in the target domain for self-training. On the other hand, it further reduces domain distribution discrepancy through conditional adversarial learning across domains. Extensive experiments show our approach achieves comparable performance to supervised models on multiple large-scale benchmark datasets.

【Keywords】:

917. Zero-Shot Text-to-SQL Learning with Auxiliary Task.

Paper Link】 【Pages】:7488-7495

【Authors】: Shuaichen Chang ; Pengfei Liu ; Yun Tang ; Jing Huang ; Xiaodong He ; Bowen Zhou

【Abstract】: Recent years have seen great success in the use of neural seq2seq models on the text-to-SQL task. However, little work has paid attention to how these models generalize to realistic unseen data, which naturally raises a question: does this impressive performance signify a perfect generalization model, or are there still some limitations?In this paper, we first diagnose the bottleneck of the text-to-SQL task by providing a new testbed, in which we observe that existing models present poor generalization ability on rarely-seen data. The above analysis encourages us to design a simple but effective auxiliary task, which serves as a supportive model as well as a regularization term to the generation task to increase the models' generalization. Experimentally, We evaluate our models on a large text-to-SQL dataset WikiSQL. Compared to a strong baseline coarse-to-fine model, our models improve over the baseline by more than 3% absolute in accuracy on the whole dataset. More interestingly, on a zero-shot subset test of WikiSQL, our models achieve 5% absolute accuracy gain over the baseline, clearly demonstrating its superior generalizability.

【Keywords】:

918. Hyperbolic Interaction Model for Hierarchical Multi-Label Classification.

Paper Link】 【Pages】:7496-7503

【Authors】: Boli Chen ; Xin Huang ; Lin Xiao ; Zixin Cai ; Liping Jing

【Abstract】: Different from the traditional classification tasks which assume mutual exclusion of labels, hierarchical multi-label classification (HMLC) aims to assign multiple labels to every instance with the labels organized under hierarchical relations. Besides the labels, since linguistic ontologies are intrinsic hierarchies, the conceptual relations between words can also form hierarchical structures. Thus it can be a challenge to learn mappings from word hierarchies to label hierarchies. We propose to model the word and label hierarchies by embedding them jointly in the hyperbolic space. The main reason is that the tree-likeness of the hyperbolic space matches the complexity of symbolic data with hierarchical structures. A new Hyperbolic Interaction Model (HyperIM) is designed to learn the label-aware document representations and make predictions for HMLC. Extensive experiments are conducted on three benchmark datasets. The results have demonstrated that the new model can realistically capture the complex data structures and further improve the performance for HMLC comparing with the state-of-the-art methods. To facilitate future research, our code is publicly available.

【Keywords】:

919. DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog.

Paper Link】 【Pages】:7504-7511

【Authors】: Feilong Chen ; Fandong Meng ; Jiaming Xu ; Peng Li ; Bo Xu ; Jie Zhou

【Abstract】: Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image. It remains a challenging task since it requires the agent to fully understand a given question before making an appropriate response not only from the textual dialog history, but also from the visually-grounded information. While previous models typically leverage single-hop reasoning or single-channel reasoning to deal with this complex multimodal reasoning task, which is intuitively insufficient. In this paper, we thus propose a novel and more powerful Dual-channel Multi-hop Reasoning Model for Visual Dialog, named DMRM. DMRM synchronously captures information from the dialog history and the image to enrich the semantic representation of the question by exploiting dual-channel reasoning. Specifically, DMRM maintains a dual channel to obtain the question- and history-aware image features and the question- and image-aware dialog history features by a mulit-hop reasoning process in each channel. Additionally, we also design an effective multimodal attention to further enhance the decoder to generate more accurate responses. Experimental results on the VisDial v0.9 and v1.0 datasets demonstrate that the proposed model is effective and outperforms compared models by a significant margin.

【Keywords】:

920. Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning.

Paper Link】 【Pages】:7512-7520

【Authors】: Liqun Chen ; Ke Bai ; Chenyang Tao ; Yizhe Zhang ; Guoyin Wang ; Wenlin Wang ; Ricardo Henao ; Lawrence Carin

【Abstract】: Reinforcement learning (RL) has been widely used to aid training in language generation. This is achieved by enhancing standard maximum likelihood objectives with user-specified reward functions that encourage global semantic consistency. We propose a principled approach to address the difficulties associated with RL-based solutions, namely, high-variance gradients, uninformative rewards and brittle training. By leveraging the optimal transport distance, we introduce a regularizer that significantly alleviates the above issues. Our formulation emphasizes the preservation of semantic features, enabling end-to-end training instead of ad-hoc fine-tuning, and when combined with RL, it controls the exploration space for more efficient model updates. To validate the effectiveness of the proposed solution, we perform a comprehensive evaluation covering a wide variety of NLP tasks: machine translation, abstractive text summarization and image caption, with consistent improvements over competing solutions.

【Keywords】:

921. Schema-Guided Multi-Domain Dialogue State Tracking with Graph Attention Neural Networks.

Paper Link】 【Pages】:7521-7528

【Authors】: Lu Chen ; Boer Lv ; Chi Wang ; Su Zhu ; Bowen Tan ; Kai Yu

【Abstract】: Dialogue state tracking (DST) aims at estimating the current dialogue state given all the preceding conversation. For multi-domain DST, the data sparsity problem is also a major obstacle due to the increased number of state candidates. Existing approaches generally predict the value for each slot independently and do not consider slot relations, which may aggravate the data sparsity problem. In this paper, we propose a Schema-guided multi-domain dialogue State Tracker with graph attention networks (SST) that predicts dialogue states from dialogue utterances and schema graphs which contain slot relations in edges. We also introduce a graph attention matching network to fuse information from utterances and graphs, and a recurrent graph attention network to control state updating. Experiment results show that our approach obtains new state-of-the-art performance on both MultiWOZ 2.0 and MultiWOZ 2.1 benchmarks.

【Keywords】:

922. Improving Entity Linking by Modeling Latent Entity Type Information.

Paper Link】 【Pages】:7529-7537

【Authors】: Shuang Chen ; Jinpeng Wang ; Feng Jiang ; Chin-Yew Lin

【Abstract】: Existing state of the art neural entity linking models employ attention-based bag-of-words context model and pre-trained entity embeddings bootstrapped from word embeddings to assess topic level context compatibility. However, the latent entity type information in the immediate context of the mention is neglected, which causes the models often link mentions to incorrect entities with incorrect type. To tackle this problem, we propose to inject latent entity type information into the entity embeddings based on pre-trained BERT. In addition, we integrate a BERT-based entity similarity score into the local context model of a state-of-the-art model to better capture latent entity type information. Our model significantly outperforms the state-of-the-art entity linking models on standard benchmark (AIDA-CoNLL). Detailed experiment analysis demonstrates that our model corrects most of the type errors produced by the direct baseline.

【Keywords】:

923. TemPEST: Soft Template-Based Personalized EDM Subject Generation through Collaborative Summarization.

Paper Link】 【Pages】:7538-7545

【Authors】: Yu-Hsiu Chen ; Pin-Yu Chen ; Hong-Han Shuai ; Wen-Chih Peng

【Abstract】: We address personalized Electronic Direct Mail (EDM) subject generation, which generates an attractive subject line for a product description according to user's preference on different contents or writing styles. Generating personalized EDM subjects has a few notable differences from generating text summaries. The subject has to be not only faithful to the description itself but also attractive to increase the click-through rate. Moreover, different users may have different preferences over the styles of topics. We propose a novel personalized EDM subject generation model named Soft Template-based Personalized EDM Subject Generator (TemPEST) to consider the aforementioned users' characteristics when generating subjects, which contains a soft template-based selective encoder network, a user rating encoder network, a summary decoder network and a rating decoder. Experimental results indicate that TemPEST is able to generate personalized topics and also effectively perform recommending rating reconstruction.

【Keywords】:

924. Learning to Map Frequent Phrases to Sub-Structures of Meaning Representation for Neural Semantic Parsing.

Paper Link】 【Pages】:7546-7553

【Authors】: Bo Chen ; Xianpei Han ; Ben He ; Le Sun

【Abstract】: Neural semantic parsers usually generate meaning representation tokens from natural language tokens via an encoder-decoder model. However, there is often a vocabulary-mismatch problem between natural language utterances and logical forms. That is, one word maps to several atomic logical tokens, which need to be handled as a whole, rather than individual logical tokens at multiple steps. In this paper, we propose that the vocabulary-mismatch problem can be effectively resolved by leveraging appropriate logical tokens. Specifically, we exploit macro actions, which are of the same granularity of words/phrases, and allow the model to learn mappings from frequent phrases to corresponding sub-structures of meaning representation. Furthermore, macro actions are compact, and therefore utilizing them can significantly reduce the search space, which brings a great benefit to weakly supervised semantic parsing. Experiments show that our method leads to substantial performance improvement on three benchmarks, in both supervised and weakly supervised settings.

【Keywords】:

925. Attending to Entities for Better Text Understanding.

Paper Link】 【Pages】:7554-7561

【Authors】: Pengxiang Cheng ; Katrin Erk

【Abstract】: Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This clearly demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model would improve performance on such complex problems. On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision configurations, suggesting future directions on applying similar techniques to other problems.

【Keywords】:

926. Dynamic Embedding on Textual Networks via a Gaussian Process.

Paper Link】 【Pages】:7562-7569

【Authors】: Pengyu Cheng ; Yitong Li ; Xinyuan Zhang ; Liqun Chen ; David E. Carlson ; Lawrence Carin

【Abstract】: Textual network embedding aims to learn low-dimensional representations of text-annotated nodes in a graph. Prior work in this area has typically focused on fixed graph structures; however, real-world networks are often dynamic. We address this challenge with a novel end-to-end node-embedding model, called Dynamic Embedding for Textual Networks with a Gaussian Process (DetGP). After training, DetGP can be applied efficiently to dynamic graphs without re-training or backpropagation. The learned representation of each node is a combination of textual and structural embeddings. Because the structure is allowed to be dynamic, our method uses the Gaussian process to take advantage of its non-parametric properties. To use both local and global graph structures, diffusion is used to model multiple hops between neighbors. The relative importance of global versus local structure for the embeddings is learned automatically. With the non-parametric nature of the Gaussian process, updating the embeddings for a changed graph structure requires only a forward pass through the learned model. Considering link prediction and node classification, experiments demonstrate the empirical effectiveness of our method compared to baseline approaches. We further show that DetGP can be straightforwardly and efficiently applied to dynamic textual networks.

【Keywords】:

927. Cross-Lingual Natural Language Generation via Pre-Training.

Paper Link】 【Pages】:7570-7577

【Authors】: Zewen Chi ; Li Dong ; Furu Wei ; Wenhui Wang ; Xian-Ling Mao ; Heyan Huang

【Abstract】: In this work we focus on transferring supervision signals of natural language generation (NLG) tasks between multiple languages. We propose to pretrain the encoder and the decoder of a sequence-to-sequence model under both monolingual and cross-lingual settings. The pre-training objective encourages the model to represent different languages in the shared space, so that we can conduct zero-shot cross-lingual transfer. After the pre-training procedure, we use monolingual data to fine-tune the pre-trained model on downstream NLG tasks. Then the sequence-to-sequence model trained in a single language can be directly evaluated beyond that language (i.e., accepting multi-lingual input and producing multi-lingual output). Experimental results on question generation and abstractive summarization show that our model outperforms the machine-translation-based pipeline methods for zero-shot cross-lingual generation. Moreover, cross-lingual transfer improves NLG performance of low-resource languages by leveraging rich-resource language data. Our implementation and data are available at https://github.com/CZWin32768/xnlg.

【Keywords】:

928. An Empirical Study of Content Understanding in Conversational Question Answering.

Paper Link】 【Pages】:7578-7585

【Authors】: Ting-Rui Chiang ; Hao-Tong Ye ; Yun-Nung Chen

【Abstract】: With a lot of work about context-free question answering systems, there is an emerging trend of conversational question answering models in the natural language processing field. Thanks to the recently collected datasets, including QuAC and CoQA, there has been more work on conversational question answering, and recent work has achieved competitive performance on both datasets. However, to best of our knowledge, two important questions for conversational comprehension research have not been well studied: 1) How well can the benchmark dataset reflect models' content understanding? 2) Do the models well utilize the conversation content when answering questions? To investigate these questions, we design different training settings, testing settings, as well as an attack to verify the models' capability of content understanding on QuAC and CoQA. The experimental results indicate some potential hazards in the benchmark datasets, QuAC and CoQA, for conversational comprehension research. Our analysis also sheds light on both what models may learn and how datasets may bias the models. With deep investigation of the task, it is believed that this work can benefit the future progress of conversation comprehension. The source code is available at https://github.com/MiuLab/CQA-Study.

【Keywords】:

929. How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions.

Paper Link】 【Pages】:7586-7593

【Authors】: Zewei Chu ; Mingda Chen ; Jing Chen ; Miaosen Wang ; Kevin Gimpel ; Manaal Faruqui ; Xiance Si

【Abstract】: We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one. Our multi-domain question rewriting (MQR) dataset is constructed from human contributed Stack Exchange question edit histories. The dataset contains 427,719 question pairs which come from 303 domains. We provide human annotations for a subset of the dataset as a quality estimate. When moving from ill-formed to well-formed questions, the question quality improves by an average of 45 points across three aspects. We train sequence-to-sequence neural models on the constructed dataset and obtain an improvement of 13.2% in BLEU-4 over baseline methods built from other data resources. We release the MQR dataset to encourage research on the problem of question rewriting.1

【Keywords】:

930. Guiding Attention in Sequence-to-Sequence Models for Dialogue Act Prediction.

Paper Link】 【Pages】:7594-7601

【Authors】: Pierre Colombo ; Emile Chapuis ; Matteo Manica ; Emmanuel Vignon ; Giovanna Varni ; Chloé Clavel

【Abstract】: The task of predicting dialog acts (DA) based on conversational dialog is a key component in the development of conversational agents. Accurately predicting DAs requires a precise modeling of both the conversation and the global tag dependencies. We leverage seq2seq approaches widely adopted in Neural Machine Translation (NMT) to improve the modelling of tag sequentiality. Seq2seq models are known to learn complex global dependencies while currently proposed approaches using linear conditional random fields (CRF) only model local tag dependencies. In this work, we introduce a seq2seq model tailored for DA classification using: a hierarchical encoder, a novel guided attention mechanism and beam search applied to both training and inference. Compared to the state of the art our model does not require handcrafted features and is trained end-to-end. Furthermore, the proposed approach achieves an unmatched accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on MRDA.

【Keywords】:

931. Discriminative Sentence Modeling for Story Ending Prediction.

Paper Link】 【Pages】:7602-7609

【Authors】: Yiming Cui ; Wanxiang Che ; Wei-Nan Zhang ; Ting Liu ; Shijin Wang ; Guoping Hu

【Abstract】: Story Ending Prediction is a task that needs to select an appropriate ending for the given story, which requires the machine to understand the story and sometimes needs commonsense knowledge. To tackle this task, we propose a new neural network called Diff-Net for better modeling the differences of each ending in this task. The proposed model could discriminate two endings in three semantic levels: contextual representation, story-aware representation, and discriminative representation. Experimental results on the Story Cloze Test dataset show that the proposed model siginificantly outperforms various systems by a large margin, and detailed ablation studies are given for better understanding our model. We also carefully examine the traditional and BERT-based models on both SCT v1.0 and v1.5 with interesting findings that may potentially help future studies.

【Keywords】:

932. Multiple Positional Self-Attention Network for Text Classification.

Paper Link】 【Pages】:7610-7617

【Authors】: Biyun Dai ; Jinlong Li ; Ruoyi Xu

【Abstract】: Self-attention mechanisms have recently caused many concerns on Natural Language Processing (NLP) tasks. Relative positional information is important to self-attention mechanisms. We propose Faraway Mask focusing on the (2m + 1)-gram words and Scaled-Distance Mask putting the logarithmic distance punishment to avoid and weaken the self-attention of distant words respectively. To exploit different masks, we present Positional Self-Attention Layer for generating different Masked-Self-Attentions and a following Position-Fusion Layer in which fused positional information multiplies the Masked-Self-Attentions for generating sentence embeddings. To evaluate our sentence embeddings approach Multiple Positional Self-Attention Network (MPSAN), we perform the comparison experiments on sentiment analysis, semantic relatedness and sentence classification tasks. The result shows that our MPSAN outperforms state-of-the-art methods on five datasets and the test accuracy is improved by 0.81%, 0.6% on SST, CR datasets, respectively. In addition, we reduce training parameters and improve the time efficiency of MPSAN by lowering the dimension number of self-attention and simplifying fusion mechanism.

【Keywords】:

933. Adversarial Training Based Multi-Source Unsupervised Domain Adaptation for Sentiment Analysis.

Paper Link】 【Pages】:7618-7625

【Authors】: Yong Dai ; Jian Liu ; Xiancong Ren ; Zenglin Xu

【Abstract】: Multi-source unsupervised domain adaptation (MS-UDA) for sentiment analysis (SA) aims to leverage useful information in multiple source domains to help do SA in an unlabeled target domain that has no supervised information. Existing algorithms of MS-UDA either only exploit the shared features, i.e., the domain-invariant information, or based on some weak assumption in NLP, e.g., smoothness assumption. To avoid these problems, we propose two transfer learning frameworks based on the multi-source domain adaptation methodology for SA by combining the source hypotheses to derive a good target hypothesis. The key feature of the first framework is a novel Weighting Scheme based Unsupervised Domain Adaptation framework ((WS-UDA), which combine the source classifiers to acquire pseudo labels for target instances directly. While the second framework is a Two-Stage Training based Unsupervised Domain Adaptation framework (2ST-UDA), which further exploits these pseudo labels to train a target private extractor. Importantly, the weights assigned to each source classifier are based on the relations between target instances and source domains, which measured by a discriminator through the adversarial training. Furthermore, through the same discriminator, we also fulfill the separation of shared features and private features.Experimental results on two SA datasets demonstrate the promising performance of our frameworks, which outperforms unsupervised state-of-the-art competitors.

【Keywords】:

934. Hypernym Detection Using Strict Partial Order Networks.

Paper Link】 【Pages】:7626-7633

【Authors】: Sarthak Dash ; Md. Faisal Mahbub Chowdhury ; Alfio Gliozzo ; Nandana Mihindukulasooriya ; Nicolas Rodolfo Fauceglia

【Abstract】: This paper introduces Strict Partial Order Networks (SPON), a novel neural network architecture designed to enforce asymmetry and transitive properties as soft constraints. We apply it to induce hypernymy relations by training with is-a pairs. We also present an augmented variant of SPON that can generalize type information learned for in-vocabulary terms to previously unseen ones. An extensive evaluation over eleven benchmarks across different tasks shows that SPON consistently either outperforms or attains the state of the art on all but one of these benchmarks.

【Keywords】:

935. Just Add Functions: A Neural-Symbolic Language Model.

Paper Link】 【Pages】:7634-7642

【Authors】: David Demeter ; Doug Downey

【Abstract】: Neural network language models (NNLMs) have achieved ever-improving accuracy due to more sophisticated architectures and increasing amounts of training data. However, the inductive bias of these models (formed by the distributional hypothesis of language), while ideally suited to modeling most running text, results in key limitations for today's models. In particular, the models often struggle to learn certain spatial, temporal, or quantitative relationships, which are commonplace in text and are second-nature for human readers. Yet, in many cases, these relationships can be encoded with simple mathematical or logical expressions. How can we augment today's neural models with such encodings?In this paper, we propose a general methodology to enhance the inductive bias of NNLMs by incorporating simple functions into a neural architecture to form a hierarchical neural-symbolic language model (NSLM). These functions explicitly encode symbolic deterministic relationships to form probability distributions over words. We explore the effectiveness of this approach on numbers and geographic locations, and show that NSLMs significantly reduce perplexity in small-corpus language modeling, and that the performance improvement persists for rare tokens even on much larger corpora. The approach is simple and general, and we discuss how it can be applied to other word classes beyond numbers and geography.

【Keywords】:

936. An Iterative Polishing Framework Based on Quality Aware Masked Language Model for Chinese Poetry Generation.

Paper Link】 【Pages】:7643-7650

【Authors】: Liming Deng ; Jie Wang ; Hang-Ming Liang ; Hui Chen ; Zhiqiang Xie ; Bojin Zhuang ; Shaojun Wang ; Jing Xiao

【Abstract】: Owing to its unique literal and aesthetical characteristics, automatic generation of Chinese poetry is still challenging in Artificial Intelligence, which can hardly be straightforwardly realized by end-to-end methods. In this paper, we propose a novel iterative polishing framework for highly qualified Chinese poetry generation. In the first stage, an encoder-decoder structure is utilized to generate a poem draft. Afterwards, our proposed Quality-Aware Masked Language Model (QA-MLM) is employed to polish the draft towards higher quality in terms of linguistics and literalness. Based on a multi-task learning scheme, QA-MLM is able to determine whether polishing is needed based on the poem draft. Furthermore, QA-MLM is able to localize improper characters of the poem draft and substitute with newly predicted ones accordingly. Benefited from the masked language model structure, QA-MLM incorporates global context information into the polishing process, which can obtain more appropriate polishing results than the unidirectional sequential decoding. Moreover, the iterative polishing process will be terminated automatically when QA-MLM regards the processed poem as a qualified one. Both human and automatic evaluation have been conducted, and the results demonstrate that our approach is effective to improve the performance of encoder-decoder structure.

【Keywords】:

937. Joint Learning of Answer Selection and Answer Summary Generation in Community Question Answering.

Paper Link】 【Pages】:7651-7658

【Authors】: Yang Deng ; Wai Lam ; Yuexiang Xie ; Daoyuan Chen ; Yaliang Li ; Min Yang ; Ying Shen

【Abstract】: Community question answering (CQA) gains increasing popularity in both academy and industry recently. However, the redundancy and lengthiness issues of crowdsourced answers limit the performance of answer selection and lead to reading difficulties and misunderstandings for community users. To solve these problems, we tackle the tasks of answer selection and answer summary generation in CQA with a novel joint learning model. Specifically, we design a question-driven pointer-generator network, which exploits the correlation information between question-answer pairs to aid in attending the essential information when generating answer summaries. Meanwhile, we leverage the answer summaries to alleviate noise in original lengthy answers when ranking the relevancy degrees of question-answer pairs. In addition, we construct a new large-scale CQA corpus, WikiHowQA, which contains long answers for answer selection as well as reference summaries for answer summarization. The experimental results show that the joint learning method can effectively address the answer redundancy issue in CQA and achieves state-of-the-art results on both answer selection and text summarization tasks. Furthermore, the proposed model is shown to be of great transferring ability and applicability for resource-poor CQA tasks, which lack of reference answer summaries.

【Keywords】:

938. On Measuring and Mitigating Biased Inferences of Word Embeddings.

Paper Link】 【Pages】:7659-7666

【Authors】: Sunipa Dev ; Tao Li ; Jeff M. Phillips ; Vivek Srikumar

【Abstract】: Word embeddings carry stereotypical connotations from the text they are trained on, which can lead to invalid inferences in downstream models that rely on them. We use this observation to design a mechanism for measuring stereotypes using the task of natural language inference. We demonstrate a reduction in invalid inferences via bias mitigation strategies on static word embeddings (GloVe). Further, we show that for gender bias, these techniques extend to contextualized embeddings when applied selectively only to the static components of contextualized embeddings (ELMo, BERT).

【Keywords】:

939. Asymmetrical Hierarchical Networks with Attentive Interactions for Interpretable Review-Based Recommendation.

Paper Link】 【Pages】:7667-7674

【Authors】: Xin Dong ; Jingchao Ni ; Wei Cheng ; Zhengzhang Chen ; Bo Zong ; Dongjin Song ; Yanchi Liu ; Haifeng Chen ; Gerard de Melo

【Abstract】: Recently, recommender systems have been able to emit substantially improved recommendations by leveraging user-provided reviews. Existing methods typically merge all reviews of a given user (item) into a long document, and then process user and item documents in the same manner. In practice, however, these two sets of reviews are notably different: users' reviews reflect a variety of items that they have bought and are hence very heterogeneous in their topics, while an item's reviews pertain only to that single item and are thus topically homogeneous. In this work, we develop a novel neural network model that properly accounts for this important difference by means of asymmetric attentive modules. The user module learns to attend to only those signals that are relevant with respect to the target item, whereas the item module learns to extract the most salient contents with regard to properties of the item. Our multi-hierarchical paradigm accounts for the fact that neither are all reviews equally useful, nor are all sentences within each review equally pertinent. Extensive experimental results on a variety of real datasets demonstrate the effectiveness of our method.

【Keywords】:

940. Detecting Asks in Social Engineering Attacks: Impact of Linguistic and Structural Knowledge.

Paper Link】 【Pages】:7675-7682

【Authors】: Bonnie J. Dorr ; Archna Bhatia ; Adam Dalton ; Brodie Mather ; Bryanna Hebenstreit ; Sashank Santhanam ; Zhuo Cheng ; Samira Shaikh ; Alan Zemel ; Tomek Strzalkowski

【Abstract】: Social engineers attempt to manipulate users into undertaking actions such as downloading malware by clicking links or providing access to money or sensitive information. Natural language processing, computational sociolinguistics, and media-specific structural clues provide a means for detecting both the ask (e.g., buy gift card) and the risk/reward implied by the ask, which we call framing (e.g., lose your job, get a raise). We apply linguistic resources such as Lexical Conceptual Structure to tackle ask detection and also leverage structural clues such as links and their proximity to identified asks to improve confidence in our results. Our experiments indicate that the performance of ask detection, framing detection, and identification of the top ask is improved by linguistically motivated classes coupled with structural clues such as links. Our approach is implemented in a system that informs users about social engineering risk situations.

【Keywords】:

941. Corpus Wide Argument Mining - A Working Solution.

Paper Link】 【Pages】:7683-7691

【Authors】: Liat Ein-Dor ; Eyal Shnarch ; Lena Dankin ; Alon Halfon ; Benjamin Sznajder ; Ariel Gera ; Carlos Alzate ; Martin Gleize ; Leshem Choshen ; Yufang Hou ; Yonatan Bilu ; Ranit Aharonov ; Noam Slonim

【Abstract】: One of the main tasks in argument mining is the retrieval of argumentative content pertaining to a given topic. Most previous work addressed this task by retrieving a relatively small number of relevant documents as the initial source for such content. This line of research yielded moderate success, which is of limited use in a real-world system. Furthermore, for such a system to yield a comprehensive set of relevant arguments, over a wide range of topics, it requires leveraging a large and diverse corpus in an appropriate manner. Here we present a first end-to-end high-precision, corpus-wide argument mining system. This is made possible by combining sentence-level queries over an appropriate indexing of a very large corpus of newspaper articles, with an iterative annotation scheme. This scheme addresses the inherent label bias in the data and pinpoints the regions of the sample space whose manual labeling is required to obtain high-precision among top-ranked candidates.

【Keywords】:

942. Latent Emotion Memory for Multi-Label Emotion Classification.

Paper Link】 【Pages】:7692-7699

【Authors】: Hao Fei ; Yue Zhang ; Yafeng Ren ; Donghong Ji

【Abstract】: Identifying multiple emotions in a sentence is an important research topic. Existing methods usually model the problem as multi-label classification task. However, previous methods have two issues, limiting the performance of the task. First, these models do not consider prior emotion distribution in a sentence. Second, they fail to effectively capture the context information closely related to the corresponding emotion. In this paper, we propose a Latent Emotion Memory network (LEM) for multi-label emotion classification. The proposed model can learn the latent emotion distribution without external knowledge, and can effectively leverage it into the classification network. Experimental results on two benchmark datasets show that the proposed model outperforms strong baselines, achieving the state-of-the-art performance.

【Keywords】:

943. Translucent Answer Predictions in Multi-Hop Reading Comprehension.

Paper Link】 【Pages】:7700-7707

【Authors】: G. P. Shrivatsa Bhargav ; Michael R. Glass ; Dinesh Garg ; Shirish K. Shevade ; Saswati Dana ; Dinesh Khandelwal ; L. Venkata Subramaniam ; Alfio Gliozzo

【Abstract】: Research on the task of Reading Comprehension style Question Answering (RCQA) has gained momentum in recent years due to the emergence of human annotated datasets and associated leaderboards, for example CoQA, HotpotQA, SQuAD, TriviaQA, etc. While state-of-the-art has advanced considerably, there is still ample opportunity to advance it further on some important variants of the RCQA task. In this paper, we propose a novel deep neural architecture, called TAP (Translucent Answer Prediction), to identify answers and evidence (in the form of supporting facts) in an RCQA task requiring multi-hop reasoning. TAP comprises two loosely coupled networks – Local and Global Interaction eXtractor (LoGIX) and Answer Predictor (AP). LoGIX predicts supporting facts, whereas AP consumes these predicted supporting facts to predict the answer span. The novel design of LoGIX is inspired by two key design desiderata – local context and global interaction– that we identified by analyzing examples of multi-hop RCQA task. The loose coupling between LoGIX and the AP reveals the set of sentences used by the AP in predicting an answer. Therefore, answer predictions of TAP can be interpreted in a translucent manner. TAP offers state-of-the-art performance on the HotpotQA (Yang et al. 2018) dataset – an apt dataset for multi-hop RCQA task – as it occupies Rank-1 on its leaderboard (https://hotpotqa.github.io/) at the time of submission.

【Keywords】:

944. Posterior-GAN: Towards Informative and Coherent Response Generation with Posterior Generative Adversarial Network.

Paper Link】 【Pages】:7708-7715

【Authors】: Shaoxiong Feng ; Hongshen Chen ; Kan Li ; Dawei Yin

【Abstract】: Neural conversational models learn to generate responses by taking into account the dialog history. These models are typically optimized over the query-response pairs with a maximum likelihood estimation objective. However, the query-response tuples are naturally loosely coupled, and there exist multiple responses that can respond to a given query, which leads the conversational model learning burdensome. Besides, the general dull response problem is even worsened when the model is confronted with meaningless response training instances. Intuitively, a high-quality response not only responds to the given query but also links up to the future conversations, in this paper, we leverage the query-response-future turn triples to induce the generated responses that consider both the given context and the future conversations. To facilitate the modeling of these triples, we further propose a novel encoder-decoder based generative adversarial learning framework, Posterior Generative Adversarial Network (Posterior-GAN), which consists of a forward and a backward generative discriminator to cooperatively encourage the generated response to be informative and coherent by two complementary assessment perspectives. Experimental results demonstrate that our method effectively boosts the informativeness and coherence of the generated response on both automatic and human evaluation, which verifies the advantages of considering two assessment perspectives.

【Keywords】:

945. Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation.

Paper Link】 【Pages】:7716-7723

【Authors】: Xiaocheng Feng ; Yawei Sun ; Bing Qin ; Heng Gong ; Yibo Sun ; Wei Bi ; Xiaojiang Liu ; Ting Liu

【Abstract】: In this paper, we focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer and aims to preserve text styles while altering the content. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference. The task is unsupervised due to lack of parallel data, and is challenging to select suitable records and style words from bi-aspect inputs respectively and generate a high-fidelity long document. To tackle those problems, we first build a dataset based on a basketball game report corpus as our testbed, and present an unsupervised neural model with interactive attention mechanism, which is used for learning the semantic relationship between records and reference texts to achieve better content transfer and better style preservation. In addition, we also explore the effectiveness of the back-translation in our task for constructing some pseudo-training pairs. Empirical results show superiority of our approaches over competitive methods, and the models also yield a new state-of-the-art result on a sentence-level dataset. 1

【Keywords】:

946. Discontinuous Constituent Parsing with Pointer Networks.

Paper Link】 【Pages】:7724-7731

【Authors】: Daniel Fernández-González ; Carlos Gómez-Rodríguez

【Abstract】: One of the most complex syntactic representations used in computational linguistics and NLP are discontinuous constituent trees, crucial for representing all grammatical phenomena of languages such as German. Recent advances in dependency parsing have shown that Pointer Networks excel in efficiently parsing syntactic relations between words in a sentence. This kind of sequence-to-sequence models achieve outstanding accuracies in building non-projective dependency trees, but its potential has not been proved yet on a more difficult task. We propose a novel neural network architecture that, by means of Pointer Networks, is able to generate the most accurate discontinuous constituent representations to date, even without the need of Part-of-Speech tagging information. To do so, we internally model discontinuous constituent structures as augmented non-projective dependency structures. The proposed approach achieves state-of-the-art results on the two widely-used NEGRA and TIGER benchmarks, outperforming previous work by a wide margin.

【Keywords】:

947. Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study.

Paper Link】 【Pages】:7732-7739

【Authors】: Jinlan Fu ; Pengfei Liu ; Qi Zhang

【Abstract】: While neural network-based models have achieved impressive performance on a large body of NLP tasks, the generalization behavior of different models remains poorly understood: Does this excellent performance imply a perfect generalization model, or are there still some limitations? In this paper, we take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives and characterize the differences of their generalization abilities through the lens of our proposed measures, which guides us to better design models and training methods. Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models in terms of breakdown performance analysis, annotation errors, dataset bias, and category relationships, which suggest directions for improvement. We have released the datasets: (ReCoNLL, PLONER) for the future research at our project page: http://pfliu.com/InterpretNER/.

【Keywords】:

948. Document Summarization with VHTM: Variational Hierarchical Topic-Aware Mechanism.

Paper Link】 【Pages】:7740-7747

【Authors】: Xiyan Fu ; Jun Wang ; Jinghan Zhang ; Jinmao Wei ; Zhenglu Yang

【Abstract】: Automatic text summarization focuses on distilling summary information from texts. This research field has been considerably explored over the past decades because of its significant role in many natural language processing tasks; however, two challenging issues block its further development: (1) how to yield a summarization model embedding topic inference rather than extending with a pre-trained one and (2) how to merge the latent topics into diverse granularity levels. In this study, we propose a variational hierarchical model to holistically address both issues, dubbed VHTM. Different from the previous work assisted by a pre-trained single-grained topic model, VHTM is the first attempt to jointly accomplish summarization with topic inference via variational encoder-decoder and merge topics into multi-grained levels through topic embedding and attention. Comprehensive experiments validate the superior performance of VHTM compared with the baselines, accompanying with semantically consistent topics.

【Keywords】:

949. Open Domain Event Text Generation.

Paper Link】 【Pages】:7748-7755

【Authors】: Zihao Fu ; Lidong Bing ; Wai Lam

【Abstract】: Text generation tasks aim at generating human-readable text from different kinds of data. Normally, the generated text only contains the information included in the data and its application is thus restricted to some limited scenarios. In this paper, we extend the task to an open domain event text generation scenario with an entity chain as its skeleton. Specifically, given an entity chain containing several related event entities, the model should retrieve from a trustworthy repository (e.g. Wikipedia) the detailed information of these entities and generate a description text based on the retrieved sentences. We build a new dataset called WikiEvent1 that provides 34K pairs of entity chain and its corresponding description sentences. To solve the problem, we propose a wiki augmented generator framework that contains an encoder, a retriever, and a decoder. The encoder encodes the entity chain into a hidden space while the decoder decodes from the hidden space and generates description text. The retriever retrieves relevant text from a trustworthy repository which provides more information for generation. To alleviate the overfitting problem, we propose a novel random drop component that randomly deletes words from the retrieved sentences making our model more robust for handling long input sentences. We apply the proposed model on the WikiEvent dataset and compare it with a few baselines. The experimental results show that our carefully-designed architecture does help generate better event text, and extensive analysis further uncovers the characteristics of the proposed task.

【Keywords】:

950. ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional GANs.

Paper Link】 【Pages】:7756-7763

【Authors】: Zuohui Fu ; Yikun Xian ; Shijie Geng ; Yingqiang Ge ; Yuting Wang ; Xin Dong ; Guang Wang ; Gerard de Melo

【Abstract】: A number of cross-lingual transfer learning approaches based on neural networks have been proposed for the case when large amounts of parallel text are at our disposal. However, in many real-world settings, the size of parallel annotated training data is restricted. Additionally, prior cross-lingual mapping research has mainly focused on the word level. This raises the question of whether such techniques can also be applied to effortlessly obtain cross-lingually aligned sentence representations. To this end, we propose an Adversarial Bi-directional Sentence Embedding Mapping (ABSent) framework, which learns mappings of cross-lingual sentence representations from limited quantities of parallel data. The experiments show that our method outperforms several technically more powerful approaches, especially under challenging low-resource circumstances. The source code is available from https://github.com/zuohuif/ABSent along with relevant datasets.

【Keywords】:

951. Likelihood Ratios and Generative Classifiers for Unsupervised Out-of-Domain Detection in Task Oriented Dialog.

Paper Link】 【Pages】:7764-7771

【Authors】: Varun Gangal ; Abhinav Arora ; Arash Einolghozati ; Sonal Gupta

【Abstract】: The task of identifying out-of-domain (OOD) input examples directly at test-time has seen renewed interest recently due to increased real world deployment of models. In this work, we focus on OOD detection for natural language sentence inputs to task-based dialog systems. Our findings are three-fold:First, we curate and release ROSTD (Real Out-of-Domain Sentences From Task-oriented Dialog) - a dataset of 4K OOD examples for the publicly available dataset from (Schuster et al. 2019). In contrast to existing settings which synthesize OOD examples by holding out a subset of classes, our examples were authored by annotators with apriori instructions to be out-of-domain with respect to the sentences in an existing dataset.Second, we explore likelihood ratio based approaches as an alternative to currently prevalent paradigms. Specifically, we reformulate and apply these approaches to natural language inputs. We find that they match or outperform the latter on all datasets, with larger improvements on non-artificial OOD benchmarks such as our dataset. Our ablations validate that specifically using likelihood ratios rather than plain likelihood is necessary to discriminate well between OOD and in-domain data.Third, we propose learning a generative classifier and computing a marginal likelihood (ratio) for OOD detection. This allows us to use a principled likelihood while at the same time exploiting training-time labels. We find that this approach outperforms both simple likelihood (ratio) based and other prior approaches. We are hitherto the first to investigate the use of generative classifiers for OOD detection at test-time.

【Keywords】:

952. Neural Snowball for Few-Shot Relation Learning.

Paper Link】 【Pages】:7772-7779

【Authors】: Tianyu Gao ; Xu Han ; Ruobing Xie ; Zhiyuan Liu ; Fen Lin ; Leyu Lin ; Maosong Sun

【Abstract】: Knowledge graphs typically undergo open-ended growth of new relations. This cannot be well handled by relation extraction that focuses on pre-defined relations with sufficient training data. To address new relations with few-shot instances, we propose a novel bootstrapping approach, Neural Snowball, to learn new relations by transferring semantic knowledge about existing relations. More specifically, we use Relational Siamese Networks (RSN) to learn the metric of relational similarities between instances based on existing relations and their labeled data. Afterwards, given a new relation and its few-shot instances, we use RSN to accumulate reliable instances from unlabeled corpora; these instances are used to train a relation classifier, which can further identify new facts of the new relation. The process is conducted iteratively like a snowball. Experiments show that our model can gather high-quality instances for better few-shot relation learning and achieves significant improvement compared to baselines. Codes and datasets are released on https://github.com/thunlp/Neural-Snowball.

【Keywords】:

953. TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection.

Paper Link】 【Pages】:7780-7788

【Authors】: Siddhant Garg ; Thuy Vu ; Alessandro Moschitti

【Abstract】: We propose TandA, an effective technique for fine-tuning pre-trained Transformer models for natural language tasks. Specifically, we first transfer a pre-trained model into a model for a general task by fine-tuning it with a large and high-quality dataset. We then perform a second fine-tuning step to adapt the transferred model to the target domain. We demonstrate the benefits of our approach for answer sentence selection, which is a well-known inference task in Question Answering. We built a large scale dataset to enable the transfer step, exploiting the Natural Questions dataset. Our approach establishes the state of the art on two well-known benchmarks, WikiQA and TREC-QA, achieving the impressive MAP scores of 92% and 94.3%, respectively, which largely outperform the the highest scores of 83.4% and 87.5% of previous work. We empirically show that TandA generates more stable and robust models reducing the effort required for selecting optimal hyper-parameters. Additionally, we show that the transfer step of TandA makes the adaptation step more robust to noise. This enables a more effective use of noisy datasets for fine-tuning. Finally, we also confirm the positive impact of TandA in an industrial setting, using domain specific datasets subject to different types of noise.

【Keywords】:

954. Predictive Engagement: An Efficient Metric for Automatic Evaluation of Open-Domain Dialogue Systems.

Paper Link】 【Pages】:7789-7796

【Authors】: Sarik Ghazarian ; Ralph M. Weischedel ; Aram Galstyan ; Nanyun Peng

【Abstract】: User engagement is a critical metric for evaluating the quality of open-domain dialogue systems. Prior work has focused on conversation-level engagement by using heuristically constructed features such as the number of turns and the total time of the conversation. In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, predictive engagement, for automatic evaluation of open-domain dialogue systems. Our experiments demonstrate that (1) human annotators have high agreement on assessing utterance-level engagement scores; (2) conversation-level engagement scores can be predicted from properly aggregated utterance-level engagement scores. Furthermore, we show that the utterance-level engagement scores can be learned from data. These scores can be incorporated into automatic evaluation metrics for open-domain dialogue systems to improve the correlation with human judgements. This suggests that predictive engagement can be used as a real-time feedback for training better dialogue models.

【Keywords】:

955. Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation.

Paper Link】 【Pages】:7797-7804

【Authors】: Goran Glavas ; Swapna Somasundaran

【Abstract】: Breaking down the structure of long texts into semantically coherent segments makes the texts more readable and supports downstream applications like summarization and retrieval. Starting from an apparent link between text coherence and segmentation, we introduce a novel supervised model for text segmentation with simple but explicit coherence modeling. Our model – a neural architecture consisting of two hierarchically connected Transformer networks – is a multi-task learning model that couples the sentence-level segmentation objective with the coherence objective that differentiates correct sequences of sentences from corrupt ones. The proposed model, dubbed Coherence-Aware Text Segmentation (CATS), yields state-of-the-art segmentation performance on a collection of benchmark datasets. Furthermore, by coupling CATS with cross-lingual word embeddings, we demonstrate its effectiveness in zero-shot language transfer: it can successfully segment texts in languages unseen in training.

【Keywords】:

956. A Large-Scale Dataset for Argument Quality Ranking: Construction and Analysis.

Paper Link】 【Pages】:7805-7813

【Authors】: Shai Gretz ; Roni Friedman ; Edo Cohen-Karlik ; Assaf Toledo ; Dan Lahav ; Ranit Aharonov ; Noam Slonim

【Abstract】: Identifying the quality of free-text arguments has become an important task in the rapidly expanding field of computational argumentation. In this work, we explore the challenging task of argument quality ranking. To this end, we created a corpus of 30,497 arguments carefully annotated for point-wise quality, released as part of this work. To the best of our knowledge, this is the largest dataset annotated for point-wise argument quality, larger by a factor of five than previously released datasets. Moreover, we address the core issue of inducing a labeled score from crowd annotations by performing a comprehensive evaluation of different approaches to this problem. In addition, we analyze the quality dimensions that characterize this dataset. Finally, we present a neural method for argument quality ranking, which outperforms several baselines on our own dataset, as well as previous methods published for another dataset.

【Keywords】:

957. Two Birds with One Stone: Investigating Invertible Neural Networks for Inverse Problems in Morphology.

Paper Link】 【Pages】:7814-7821

【Authors】: Gözde Gül Sahin ; Iryna Gurevych

【Abstract】: Most problems in natural language processing can be approximated as inverse problems such as analysis and generation at variety of levels from morphological (e.g., cat+Plural↔cats) to semantic (e.g., (call + 1 2)↔“Calculate one plus two.”). Although the tasks in both directions are closely related, general approach in the field has been to design separate models specific for each task. However, having one shared model for both tasks, would help the researchers exploit the common knowledge among these problems with reduced time and memory requirements. We investigate a specific class of neural networks, called Invertible Neural Networks (INNs) (Ardizzone et al. 2019) that enable simultaneous optimization in both directions, hence allow addressing of inverse problems via a single model. In this study, we investigate INNs on morphological problems casted as inverse problems. We apply INNs to various morphological tasks with varying ambiguity and show that they provide competitive performance in both directions. We show that they are able to recover the morphological input parameters, i.e., predicting the lemma (e.g., cat) or the morphological tags (e.g., Plural) when run in the reverse direction, without any significant performance drop in the forward direction, i.e., predicting the surface form (e.g., cats).

【Keywords】:

958. Working Memory-Driven Neural Networks with a Novel Knowledge Enhancement Paradigm for Implicit Discourse Relation Recognition.

Paper Link】 【Pages】:7822-7829

【Authors】: Fengyu Guo ; Ruifang He ; Jianwu Dang ; Jian Wang

【Abstract】: Recognizing implicit discourse relation is a challenging task in discourse analysis, which aims to understand and infer the latent relations between two discourse arguments, such as temporal, comparison. Most of the present models largely focus on learning-based methods that utilize only intra-sentence textual information to identify discourse relations, ignoring the wider contexts beyond the discourse. Moreover, people comprehend the meanings and the relations of discourses, heavily relying on their interconnected working memories (e.g., instant memory, long-term memory). Inspired by this, we propose a Knowledge-Enhanced Attentive Neural Network (KANN) framework to address these issues. Specifically, it establishes a mutual attention matrix to capture the reciprocal information between two arguments, as instant memory. While implicitly stated knowledge in the arguments is retrieved from external knowledge source and encoded as inter-words semantic connection embeddings to further construct knowledge matrix, as long-term memory. We devise a novel paradigm with two ways by the collaboration of the memories to enrich the argument representation: 1) integrating the knowledge matrix into the mutual attention matrix, which implicitly maps knowledge into the process of capturing asymmetric interactions between two discourse arguments; 2) directly concatenating the argument representations and the semantic connection embeddings, which explicitly supplements knowledge to help discourse understanding. The experimental results on the PDTB also show that our KANN model is effective.

【Keywords】:

959. Multi-Source Domain Adaptation for Text Classification via DistanceNet-Bandits.

Paper Link】 【Pages】:7830-7838

【Authors】: Han Guo ; Ramakanth Pasunuru ; Mohit Bansal

【Abstract】: Domain adaptation performance of a learning algorithm on a target domain is a function of its source domain error and a divergence measure between the data distribution of these two domains. We present a study of various distance-based measures in the context of NLP tasks, that characterize the dissimilarity between domains based on sample estimates. We first conduct analysis experiments to show which of these distance measures can best differentiate samples from same versus different domains, and are correlated with empirical results. Next, we develop a DistanceNet model which uses these distance measures, or a mixture of these distance measures, as an additional loss function to be minimized jointly with the task's loss function, so as to achieve better unsupervised domain adaptation. Finally, we extend this model to a novel DistanceNet-Bandit model, which employs a multi-armed bandit controller to dynamically switch between multiple source domains and allow the model to learn an optimal trajectory and mixture of domains for transfer to the low-resource target domain. We conduct experiments on popular sentiment analysis datasets with several diverse domains and show that our DistanceNet model, as well as its dynamic bandit variant, can outperform competitive baselines in the context of unsupervised domain adaptation.

【Keywords】:

960. Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation.

Paper Link】 【Pages】:7839-7846

【Authors】: Junliang Guo ; Xu Tan ; Linli Xu ; Tao Qin ; Enhong Chen ; Tie-Yan Liu

【Abstract】: Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models. Considering that AT models have higher accuracy and are easier to train than NAT models, and both of them share the same model configurations, a natural idea to improve the accuracy of NAT models is to transfer a well-trained AT model to an NAT model through fine-tuning. However, since AT and NAT models differ greatly in training strategy, straightforward fine-tuning does not work well. In this work, we introduce curriculum learning into fine-tuning for NAT. Specifically, we design a curriculum in the fine-tuning process to progressively switch the training from autoregressive generation to non-autoregressive generation. Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than 1 BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than 10 times) the inference process over AT baselines.

【Keywords】:

961. Multi-Scale Self-Attention for Text Classification.

Paper Link】 【Pages】:7847-7854

【Authors】: Qipeng Guo ; Xipeng Qiu ; Pengfei Liu ; Xiangyang Xue ; Zheng Zhang

【Abstract】: In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules. We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales. Based on the linguistic perspective and the analysis of pre-trained Transformer (BERT) on a huge corpus, we further design a strategy to control the scale distribution for each layer. Results of three different kinds of tasks (21 datasets) show our Multi-Scale Transformer outperforms the standard Transformer consistently and significantly on small and moderate size datasets.

【Keywords】:

962. Fact-Aware Sentence Split and Rephrase with Permutation Invariant Training.

Paper Link】 【Pages】:7855-7862

【Authors】: Yinuo Guo ; Tao Ge ; Furu Wei

【Abstract】: Sentence Split and Rephrase aims to break down a complex sentence into several simple sentences with its meaning preserved. Previous studies tend to address the issue by seq2seq learning from parallel sentence pairs, which takes a complex sentence as input and sequentially generates a series of simple sentences. However, the conventional seq2seq learning has two limitations for this task: (1) it does not take into account the facts stated in the long sentence; As a result, the generated simple sentences may miss or inaccurately state the facts in the original sentence. (2) The order variance of the simple sentences to be generated may confuse the seq2seq model during training because the simple sentences derived from the long source sentence could be in any order.To overcome the challenges, we first propose the Fact-aware Sentence Encoding, which enables the model to learn facts from the long sentence and thus improves the precision of sentence split; then we introduce Permutation Invariant Training to alleviate the effects of order variance in seq2seq learning for this task. Experiments on the WebSplit-v1.0 benchmark dataset show that our approaches can largely improve the performance over the previous seq2seq learning approaches. Moreover, an extrinsic evaluation on oie-benchmark verifies the effectiveness of our approaches by an observation that splitting long sentences with our state-of-the-art model as preprocessing is helpful for improving OpenIE performance.

【Keywords】:

963. P-SIF: Document Embeddings Using Partition Averaging.

Paper Link】 【Pages】:7863-7870

【Authors】: Vivek Gupta ; Ankit Saw ; Pegah Nokhiz ; Praneeth Netrapalli ; Piyush Rai ; Partha P. Talukdar

【Abstract】: Simple weighted averaging of word vectors often yields effective representations for sentences which outperform sophisticated seq2seq neural models in many tasks. While it is desirable to use the same method to represent documents as well, unfortunately, the effectiveness is lost when representing long documents involving multiple sentences. One of the key reasons is that a longer document is likely to contain words from many different topics; hence, creating a single vector while ignoring all the topical structure is unlikely to yield an effective document representation. This problem is less acute in single sentences and other short text fragments where the presence of a single topic is most likely. To alleviate this problem, we present P-SIF, a partitioned word averaging model to represent long documents. P-SIF retains the simplicity of simple weighted word averaging while taking a document's topical structure into account. In particular, P-SIF learns topic-specific vectors from a document and finally concatenates them all to represent the overall document. We provide theoretical justifications on the correctness of P-SIF. Through a comprehensive set of experiments, we demonstrate P-SIF's effectiveness compared to simple weighted averaging and many other baselines.

【Keywords】:

964. CASE: Context-Aware Semantic Expansion.

Paper Link】 【Pages】:7871-7878

【Authors】: Jialong Han ; Aixin Sun ; Haisong Zhang ; Chenliang Li ; Shuming Shi

【Abstract】: In this paper, we define and study a new task called Context-Aware Semantic Expansion (CASE). Given a seed term in a sentential context, we aim to suggest other terms that well fit the context as the seed. CASE has many interesting applications such as query suggestion, computer-assisted writing, and word sense disambiguation, to name a few. Previous explorations, if any, only involve some similar tasks, and all require human annotations for evaluation. In this study, we demonstrate that annotations for this task can be harvested at scale from existing corpora, in a fully automatic manner. On a dataset of 1.8 million sentences thus derived, we propose a network architecture that encodes the context and seed term separately before suggesting alternative terms. The context encoder in this architecture can be easily extended by incorporating seed-aware attention. Our experiments demonstrate that competitive results are achieved with appropriate choices of context encoder and attention scoring function.

【Keywords】:

965. ManyModalQA: Modality Disambiguation and QA over Diverse Inputs.

Paper Link】 【Pages】:7879-7886

【Authors】: Darryl Hannan ; Akshay Jain ; Mohit Bansal

【Abstract】: We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities: text, images, and tables. We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs. Our questions are ambiguous, in that the modality that contains the answer is not easily determined based solely upon the question. To demonstrate this ambiguity, we construct a modality selector (or disambiguator) network, and this model gets substantially lower accuracy on our challenge set, compared to existing datasets, indicating that our questions are more ambiguous. By analyzing this model, we investigate which words in the question are indicative of the modality. Next, we construct a simple baseline ManyModalQA model, which, based on the prediction from the modality selector, fires a corresponding pre-trained state-of-the-art unimodal QA model. We focus on providing the community with a new manymodal evaluation set and only provide a fine-tuning set, with the expectation that existing datasets and approaches will be transferred for most of the training, to encourage low-resource generalization without large, monolithic training sets for each new task. There is a significant gap between our baseline models and human performance; therefore, we hope that this challenge encourages research in end-to-end modality disambiguation and multimodal QA models, as well as transfer learning.

【Keywords】:

966. What Do You Mean 'Why?': Resolving Sluices in Conversations.

Paper Link】 【Pages】:7887-7894

【Authors】: Victor Petrén Bach Hansen ; Anders Søgaard

【Abstract】: In conversation, we often ask one-word questions such as ‘Why?’ or ‘Who?’. Such questions are typically easy for humans to answer, but can be hard for computers, because their resolution requires retrieving both the right semantic frames and the right arguments from context. This paper introduces the novel ellipsis resolution task of resolving such one-word questions, referred to as sluices in linguistics. We present a crowd-sourced dataset containing annotations of sluices from over 4,000 dialogues collected from conversational QA datasets, as well as a series of strong baseline architectures.

【Keywords】:

967. One Homonym per Translation.

Paper Link】 【Pages】:7895-7902

【Authors】: Bradley Hauer ; Grzegorz Kondrak

【Abstract】: The study of homonymy is vital to resolving fundamental problems in lexical semantics. In this paper, we propose four hypotheses that characterize the unique behavior of homonyms in the context of translations, discourses, collocations, and sense clusters. We present a new annotated homonym resource that allows us to test our hypotheses on existing WSD resources. The results of the experiments provide strong empirical evidence for the hypotheses. This study represents a step towards a computational method for distinguishing between homonymy and polysemy, and constructing a definitive inventory of coarse-grained senses.

【Keywords】:

968. Interactive Fiction Games: A Colossal Adventure.

Paper Link】 【Pages】:7903-7910

【Authors】: Matthew J. Hausknecht ; Prithviraj Ammanabrolu ; Marc-Alexandre Côté ; Xingdi Yuan

【Abstract】: A hallmark of human intelligence is the ability to understand and communicate with language. Interactive Fiction games are fully text-based simulation environments where a player issues text commands to effect change in the environment and progress through the story. We argue that IF games are an excellent testbed for studying language-based autonomous agents. In particular, IF games combine challenges of combinatorial action spaces, language understanding, and commonsense reasoning. To facilitate rapid development of language-based agents, we introduce Jericho, a learning environment for man-made IF games and conduct a comprehensive study of text-agents across a rich set of games, highlighting directions in which agents can improve.

【Keywords】:

969. Latent Relation Language Models.

Paper Link】 【Pages】:7911-7918

【Authors】: Hiroaki Hayashi ; Zecong Hu ; Chenyan Xiong ; Graham Neubig

【Abstract】: In this paper, we propose Latent Relation Language Models (LRLMs), a class of language models that parameterizes the joint distribution over the words in a document and the entities that occur therein via knowledge graph relations. This model has a number of attractive properties: it not only improves language modeling performance, but is also able to annotate the posterior probability of entity spans for a given text through relations. Experiments demonstrate empirical improvements over both word-based language models and a previous approach that incorporates knowledge graph information. Qualitative analysis further demonstrates the proposed model's ability to learn to predict appropriate relations in context. 1

【Keywords】:

970. Knowledge-Graph Augmented Word Representations for Named Entity Recognition.

Paper Link】 【Pages】:7919-7926

【Authors】: Qizhen He ; Liang Wu ; Yida Yin ; Heming Cai

【Abstract】: By modeling the context information, ELMo and BERT have successfully improved the state-of-the-art of word representation, and demonstrated their effectiveness on the Named Entity Recognition task. In this paper, in addition to such context modeling, we propose to encode the prior knowledge of entities from an external knowledge base into the representation, and introduce a Knowledge-Graph Augmented Word Representation or KAWR for named entity recognition. Basically, KAWR provides a kind of knowledge-aware representation for words by 1) encoding entity information from a pre-trained KG embedding model with a new recurrent unit (GERU), and 2) strengthening context modeling from knowledge wise by providing a relation attention scheme based on the entity relations defined in KG. We demonstrate that KAWR, as an augmented version of the existing linguistic word representations, promotes F1 scores on 5 datasets in various domains by +0.46∼+2.07. Better generalization is also observed for KAWR on new entities that cannot be found in the training sets.

【Keywords】:

971. Improving Neural Relation Extraction with Positive and Unlabeled Learning.

Paper Link】 【Pages】:7927-7934

【Authors】: Zhengqiu He ; Wenliang Chen ; Yuyi Wang ; Wei Zhang ; Guanchun Wang ; Min Zhang

【Abstract】: We present a novel approach to improve the performance of distant supervision relation extraction with Positive and Unlabeled (PU) Learning. This approach first applies reinforcement learning to decide whether a sentence is positive to a given relation, and then positive and unlabeled bags are constructed. In contrast to most previous studies, which mainly use selected positive instances only, we make full use of unlabeled instances and propose two new representations for positive and unlabeled bags. These two representations are then combined in an appropriate way to make bag-level prediction. Experimental results on a widely used real-world dataset demonstrate that this new approach indeed achieves significant and consistent improvements as compared to several competitive baselines.

【Keywords】:

972. Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization.

Paper Link】 【Pages】:7935-7943

【Authors】: Wataru Hirota ; Yoshihiko Suhara ; Behzad Golshan ; Wang-Chiew Tan

【Abstract】: We present Emu, a system that semantically enhances multilingual sentence embeddings. Our framework fine-tunes pre-trained multilingual sentence embeddings using two main components: a semantic classifier and a language discriminator. The semantic classifier improves the semantic similarity of related sentences, whereas the language discriminator enhances the multilinguality of the embeddings via multilingual adversarial training. Our experimental results based on several language pairs show that our specialized embeddings outperform the state-of-the-art multilingual sentence embedding model on the task of cross-lingual intent classification using only monolingual labeled data.

【Keywords】:

973. Unsupervised Interlingual Semantic Representations from Sentence Embeddings for Zero-Shot Cross-Lingual Transfer.

Paper Link】 【Pages】:7944-7951

【Authors】: Channy Hong ; Jaeyeon Lee ; Jungkwon Lee

【Abstract】: As numerous modern NLP models demonstrate high-performance in various tasks when trained with resource-rich language data sets such as those of English, there has been a shift in attention to the idea of applying such learning to low-resource languages via zero-shot or few-shot cross-lingual transfer. While the most prominent efforts made previously on achieving this feat entails the use of parallel corpora for sentence alignment training, we seek to generalize further by assuming plausible scenarios in which such parallel data sets are unavailable. In this work, we present a novel architecture for training interlingual semantic representations on top of sentence embeddings in a completely unsupervised manner, and demonstrate its effectiveness in zero-shot cross-lingual transfer in natural language inference task. Furthermore, we showcase a method of leveraging this framework in a few-shot scenario, and finally analyze the distributional and permutational alignment across languages of these interlingual semantic representations.

【Keywords】:

974. Knowledge-Enriched Visual Storytelling.

Paper Link】 【Pages】:7952-7960

【Authors】: Chao-Chun Hsu ; Zi-Yuan Chen ; Chi-Yang Hsu ; Chih-Chia Li ; Tzu-Yuan Lin ; Ting-Hao Kenneth Huang ; Lun-Wei Ku

【Abstract】: Stories are diverse and highly personalized, resulting in a large possible output space for story generation. Existing end-to-end approaches produce monotonous stories because they are limited to the vocabulary and knowledge in a single training dataset. This paper introduces KG-Story, a three-stage framework that allows the story generation model to take advantage of external Knowledge Graphs to produce interesting stories. KG-Story distills a set of representative words from the input prompts, enriches the word set by using external knowledge graphs, and finally generates stories based on the enriched word set. This distill-enrich-generate framework allows the use of external resources not only for the enrichment phase, but also for the distillation and generation phases. In this paper, we show the superiority of KG-Story for visual storytelling, where the input prompt is a sequence of five photos and the output is a short story. Per the human ranking evaluation, stories generated by KG-Story are on average ranked better than that of the state-of-the-art systems. Our code and output stories are available at https://github.com/zychen423/KE-VIST.

【Keywords】:

975. Leveraging Multi-Token Entities in Document-Level Named Entity Recognition.

Paper Link】 【Pages】:7961-7968

【Authors】: Anwen Hu ; Zhicheng Dou ; Jian-Yun Nie ; Ji-Rong Wen

【Abstract】: Most state-of-the-art named entity recognition systems are designed to process each sentence within a document independently. These systems are easy to confuse entity types when the context information in a sentence is not sufficient enough. To utilize the context information within the whole document, most document-level work let neural networks on their own to learn the relation across sentences, which is not intuitive enough for us humans. In this paper, we divide entities to multi-token entities that contain multiple tokens and single-token entities that are composed of a single token. We propose that the context information of multi-token entities should be more reliable in document-level NER for news articles. We design a fusion attention mechanism which not only learns the semantic relevance between occurrences of the same token, but also focuses more on occurrences belonging to multi-tokens entities. To identify multi-token entities, we design an auxiliary task namely ‘Multi-token Entity Classification’ and perform this task simultaneously with document-level NER. This auxiliary task is simplified from NER and doesn't require extra annotation. Experimental results on the CoNLL-2003 dataset and OntoNotesnbm dataset show that our model outperforms state-of-the-art sentence-level and document-level NER methods.

【Keywords】:

976. What Makes A Good Story? Designing Composite Rewards for Visual Storytelling.

Paper Link】 【Pages】:7969-7976

【Authors】: Junjie Hu ; Yu Cheng ; Zhe Gan ; Jingjing Liu ; Jianfeng Gao ; Graham Neubig

【Abstract】: Previous storytelling approaches mostly focused on optimizing traditional metrics such as BLEU, ROUGE and CIDEr. In this paper, we re-examine this problem from a different angle, by looking deep into what defines a natural and topically-coherent story. To this end, we propose three assessment criteria: relevance, coherence and expressiveness, which we observe through empirical analysis could constitute a “high-quality” story to the human eye. We further propose a reinforcement learning framework, ReCo-RL, with reward functions designed to capture the essence of these quality criteria. Experiments on the Visual Storytelling Dataset (VIST) with both automatic and human evaluation demonstrate that our ReCo-RL model achieves better performance than state-of-the-art baselines on both traditional metrics and the proposed new criteria.

【Keywords】:

977. MALA: Cross-Domain Dialogue Generation with Action Learning.

Paper Link】 【Pages】:7977-7984

【Authors】: Xinting Huang ; Jianzhong Qi ; Yu Sun ; Rui Zhang

【Abstract】: Response generation for task-oriented dialogues involves two basic components: dialogue planning and surface realization. These two components, however, have a discrepancy in their objectives, i.e., task completion and language quality. To deal with such discrepancy, conditioned response generation has been introduced where the generation process is factorized into action decision and language generation via explicit action representations. To obtain action representations, recent studies learn latent actions in an unsupervised manner based on the utterance lexical similarity. Such an action learning approach is prone to diversities of language surfaces, which may impinge task completion and language quality. To address this issue, we propose multi-stage adaptive latent action learning (MALA) that learns semantic latent actions by distinguishing the effects of utterances on dialogue progress. We model the utterance effect using the transition of dialogue states caused by the utterance and develop a semantic similarity measurement that estimates whether utterances have similar effects. For learning semantic actions on domains without dialogue states, MALA extends the semantic similarity measurement across domains progressively, i.e., from aligning shared actions to learning domain-specific actions. Experiments using multi-domain datasets, SMD and MultiWOZ, show that our proposed model achieves consistent improvements over the baselines models in terms of both task completion and language quality.

【Keywords】:

978. Privacy Enhanced Multimodal Neural Representations for Emotion Recognition.

Paper Link】 【Pages】:7985-7993

【Authors】: Mimansa Jaiswal ; Emily Mower Provost

【Abstract】: Many mobile applications and virtual conversational agents now aim to recognize and adapt to emotions. To enable this, data are transmitted from users' devices and stored on central servers. Yet, these data contain sensitive information that could be used by mobile applications without user's consent or, maliciously, by an eavesdropping adversary. In this work, we show how multimodal representations trained for a primary task, here emotion recognition, can unintentionally leak demographic information, which could override a selected opt-out option by the user. We analyze how this leakage differs in representations obtained from textual, acoustic, and multimodal data. We use an adversarial learning paradigm to unlearn the private information present in a representation and investigate the effect of varying the strength of the adversarial component on the primary task and on the privacy metric, defined here as the inability of an attacker to predict specific demographic information. We evaluate this paradigm on multiple datasets and show that we can improve the privacy metric while not significantly impacting the performance on the primary task. To the best of our knowledge, this is the first work to analyze how the privacy metric differs across modalities and how multiple privacy concerns can be tackled while still maintaining performance on emotion recognition.

【Keywords】:

979. Bayes-Adaptive Monte-Carlo Planning and Learning for Goal-Oriented Dialogues.

Paper Link】 【Pages】:7994-8001

【Authors】: Youngsoo Jang ; Jongmin Lee ; Kee-Eung Kim

【Abstract】: We consider a strategic dialogue task, where the ability to infer the other agent's goal is critical to the success of the conversational agent. While this problem can be naturally formulated as Bayesian planning, it is known to be a very difficult problem due to its enormous search space consisting of all possible utterances. In this paper, we introduce an efficient Bayes-adaptive planning algorithm for goal-oriented dialogues, which combines RNN-based dialogue generation and MCTS-based Bayesian planning in a novel way, leading to robust decision-making under the uncertainty of the other agent's goal. We then introduce reinforcement learning for the dialogue agent that uses MCTS as a strong policy improvement operator, casting reinforcement learning as iterative alternation of planning and supervised-learning of self-generated dialogues. In the experiments, we demonstrate that our Bayes-adaptive dialogue planning agent significantly outperforms the state-of-the-art in a negotiation dialogue domain. We also show that reinforcement learning via MCTS further improves end-task performance without diverging from human language.

【Keywords】:

980. Real-Time Emotion Recognition via Attention Gated Hierarchical Memory Network.

Paper Link】 【Pages】:8002-8009

【Authors】: Wenxiang Jiao ; Michael R. Lyu ; Irwin King

【Abstract】: Real-time emotion recognition (RTER) in conversations is significant for developing emotionally intelligent chatting machines. Without the future context in RTER, it becomes critical to build the memory bank carefully for capturing historical context and summarize the memories appropriately to retrieve relevant information. We propose an Attention Gated Hierarchical Memory Network (AGHMN) to address the problems of prior work: (1) Commonly used convolutional neural networks (CNNs) for utterance feature extraction are less compatible in the memory modules; (2) Unidirectional gated recurrent units (GRUs) only allow each historical utterance to have context before it, preventing information propagation in the opposite direction; (3) The Soft Attention for summarizing loses the positional and ordering information of memories, regardless of how the memory bank is built. Particularly, we propose a Hierarchical Memory Network (HMN) with a bidirectional GRU (BiGRU) as the utterance reader and a BiGRU fusion layer for the interaction between historical utterances. For memory summarizing, we propose an Attention GRU (AGRU) where we utilize the attention weights to update the internal state of GRU. We further promote the AGRU to a bidirectional variant (BiAGRU) to balance the contextual information from recent memories and that from distant memories. We conduct experiments on two emotion conversation datasets with extensive analysis, demonstrating the efficacy of our AGHMN models.

【Keywords】:

981. MMM: Multi-Stage Multi-Task Learning for Multi-Choice Reading Comprehension.

Paper Link】 【Pages】:8010-8017

【Authors】: Di Jin ; Shuyang Gao ; Jiun-Yu Kao ; Tagyoung Chung ; Dilek Hakkani-Tür

【Abstract】: Machine Reading Comprehension (MRC) for question answering (QA), which aims to answer a question given the relevant context passages, is an important way to test the ability of intelligence systems to understand human language. Multiple-Choice QA (MCQA) is one of the most difficult tasks in MRC because it often requires more advanced reading comprehension skills such as logical reasoning, summarization, and arithmetic operations, compared to the extractive counterpart where answers are usually spans of text within given passages. Moreover, most existing MCQA datasets are small in size, making the task even harder. We introduce MMM, a Multi-stage Multi-task learning framework for Multi-choice reading comprehension. Our method involves two sequential stages: coarse-tuning stage using out-of-domain datasets and multi-task learning stage using a larger in-domain dataset to help model generalize better with limited data. Furthermore, we propose a novel multi-step attention network (MAN) as the top-level classifier for this task. We demonstrate MMM significantly advances the state-of-the-art on four representative MCQA datasets.

【Keywords】:

982. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment.

Paper Link】 【Pages】:8018-8025

【Authors】: Di Jin ; Zhijing Jin ; Joey Tianyi Zhou ; Peter Szolovits

【Abstract】: Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models. It is helpful to evaluate or even improve the robustness of these models by exposing the maliciously crafted adversarial examples. In this paper, we present TextFooler, a simple but strong baseline to generate adversarial text. By applying it to two fundamental natural language tasks, text classification and textual entailment, we successfully attacked three target models, including the powerful pre-trained BERT, and the widely used convolutional and recurrent neural networks. We demonstrate three advantages of this framework: (1) effective—it outperforms previous attacks by success rate and perturbation rate, (2) utility-preserving—it preserves semantic content, grammaticality, and correct types classified by humans, and (3) efficient—it generates adversarial text with computational complexity linear to the text length.1

【Keywords】:

983. SemSUM: Semantic Dependency Guided Neural Abstractive Summarization.

Paper Link】 【Pages】:8026-8033

【Authors】: Hanqi Jin ; Tianming Wang ; Xiaojun Wan

【Abstract】: In neural abstractive summarization, the generated summaries often face semantic irrelevance and content deviation from the input sentences. In this work, we incorporate semantic dependency graphs about predicate-argument structure of input sentences into neural abstractive summarization for the problem. We propose a novel semantics dependency guided summarization model (SemSUM), which can leverage the information of original input texts and the corresponding semantic dependency graphs in a complementary way to guide summarization process. We evaluate our model on the English Gigaword, DUC 2004 and MSR abstractive sentence summarization datasets. Experiments show that the proposed model improves semantic relevance and reduces content deviation, and also brings significant improvements on automatic evaluation ROUGE metrics.

【Keywords】:

984. Relation Extraction Exploiting Full Dependency Forests.

Paper Link】 【Pages】:8034-8041

【Authors】: Lifeng Jin ; Linfeng Song ; Yue Zhang ; Kun Xu ; Wei-Yun Ma ; Dong Yu

【Abstract】: Dependency syntax has long been recognized as a crucial source of features for relation extraction. Previous work considers 1-best trees produced by a parser during preprocessing. However, error propagation from the out-of-domain parser may impact the relation extraction performance. We propose to leverage full dependency forests for this task, where a full dependency forest encodes all possible trees. Such representations of full dependency forests provide a differentiable connection between a parser and a relation extraction model, and thus we are also able to study adjusting the parser parameters based on end-task loss. Experiments on three datasets show that full dependency forests and parser adjustment give significant improvements over carefully designed baselines, showing state-of-the-art or competitive performances on biomedical or newswire benchmarks.

【Keywords】:

985. Monolingual Transfer Learning via Bilingual Translators for Style-Sensitive Paraphrase Generation.

Paper Link】 【Pages】:8042-8049

【Authors】: Tomoyuki Kajiwara ; Biwa Miura ; Yuki Arase

【Abstract】: We tackle the low-resource problem in style transfer by employing transfer learning that utilizes abundantly available raw corpora. Our method consists of two steps: pre-training learns to generate a semantically equivalent sentence with an input assured grammaticality, and fine-tuning learns to add a desired style. Pre-training has two options, auto-encoding and machine translation based methods. Pre-training based on AutoEncoder is a simple way to learn these from a raw corpus. If machine translators are available, the model can learn more diverse paraphrasing via roundtrip translation. After these, fine-tuning achieves high-quality paraphrase generation even in situations where only 1k sentence pairs of the parallel corpus for style transfer is available. Experimental results of formality style transfer indicated the effectiveness of both pre-training methods and the method based on roundtrip translation achieves state-of-the-art performance.

【Keywords】:

986. Syntactically Look-Ahead Attention Network for Sentence Compression.

Paper Link】 【Pages】:8050-8057

【Authors】: Hidetaka Kamigaito ; Manabu Okumura

【Abstract】: Sentence compression is the task of compressing a long sentence into a short one by deleting redundant words. In sequence-to-sequence (Seq2Seq) based models, the decoder unidirectionally decides to retain or delete words. Thus, it cannot usually explicitly capture the relationships between decoded words and unseen words that will be decoded in the future time steps. Therefore, to avoid generating ungrammatical sentences, the decoder sometimes drops important words in compressing sentences. To solve this problem, we propose a novel Seq2Seq model, syntactically look-ahead attention network (SLAHAN), that can generate informative summaries by explicitly tracking both dependency parent and child words during decoding and capturing important words that will be decoded in the future. The results of the automatic evaluation on the Google sentence compression dataset showed that SLAHAN achieved the best kept-token-based-F1, ROUGE-1, ROUGE-2 and ROUGE-L scores of 85.5, 79.3, 71.3 and 79.1, respectively. SLAHAN also improved the summarization performance on longer sentences. Furthermore, in the human evaluation, SLAHAN improved informativeness without losing readability.

【Keywords】:

987. Learning to Learn Morphological Inflection for Resource-Poor Languages.

Paper Link】 【Pages】:8058-8065

【Authors】: Katharina Kann ; Samuel R. Bowman ; Kyunghyun Cho

【Abstract】: We propose to cast the task of morphological inflection—mapping a lemma to an indicated inflected form—for resource-poor languages as a meta-learning problem. Treating each language as a separate task, we use data from high-resource source languages to learn a set of model parameters that can serve as a strong initialization point for fine-tuning on a resource-poor target language. Experiments with two model architectures on 29 target languages from 3 families show that our suggested approach outperforms all baselines. In particular, it obtains a 31.7% higher absolute accuracy than a previously proposed cross-lingual transfer model and outperforms the previous state of the art by 1.7% absolute accuracy on average over languages.

【Keywords】:

988. Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages.

Paper Link】 【Pages】:8066-8073

【Authors】: Katharina Kann ; Ophélie Lacroix ; Anders Søgaard

【Abstract】: Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision – e.g., cross-lingual transfer, type-level supervision, or a combination thereof – have been reported to perform almost as well as supervised ones. However, weakly supervised POS taggers are commonly only evaluated on languages that are very different from truly low-resource languages, and the taggers use sources of information, like high-coverage and almost error-free dictionaries, which are likely not available for resource-poor languages. We train and evaluate state-of-the-art weakly supervised POS taggers for a typologically diverse set of 15 truly low-resource languages. On these languages, given a realistic amount of resources, even our best model gets only less than half of the words right. Our results highlight the need for new and different approaches to POS tagging for truly low-resource languages.

【Keywords】:

989. Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks.

Paper Link】 【Pages】:8074-8081

【Authors】: Pavan Kapanipathi ; Veronika Thost ; Siva Sankalp Patel ; Spencer Whitehead ; Ibrahim Abdelaziz ; Avinash Balakrishnan ; Maria Chang ; Kshitij P. Fadnis ; R. Chulaka Gunasekara ; Bassem Makni ; Nicholas Mattei ; Kartik Talamadupula ; Achille Fokoue

【Abstract】: Textual entailment is a fundamental task in natural language processing. Most approaches for solving this problem use only the textual content present in training data. A few approaches have shown that information from external knowledge sources like knowledge graphs (KGs) can add value, in addition to the textual content, by providing background knowledge that may be critical for a task. However, the proposed models do not fully exploit the information in the usually large and noisy KGs, and it is not clear how it can be effectively encoded to be useful for entailment. We present an approach that complements text-based entailment models with information from KGs by (1) using Personalized PageRank to generate contextual subgraphs with reduced noise and (2) encoding these subgraphs using graph convolutional networks to capture the structural and semantic information in KGs. We evaluate our approach on multiple textual entailment datasets and show that the use of external knowledge helps the model to be robust and improves prediction accuracy. This is particularly evident in the challenging BreakingNLI dataset, where we see an absolute improvement of 5-20% over multiple text-based entailment models.

【Keywords】:

990. QASC: A Dataset for Question Answering via Sentence Composition.

Paper Link】 【Pages】:8082-8090

【Authors】: Tushar Khot ; Peter Clark ; Michal Guerquin ; Peter Jansen ; Ashish Sabharwal

【Abstract】: Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a multi-hop reasoning dataset, Question Answering via Sentence Composition (QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice question. QASC is the first dataset to offer two desirable properties: (a) the facts to be composed are annotated in a large corpus, and (b) the decomposition into these facts is not evident from the question itself. The latter makes retrieval challenging as the system must introduce new concepts or relations in order to discover potential decompositions. Further, the reasoning model must then learn to identify valid compositions of these retrieved facts using common-sense reasoning. To help address these challenges, we provide annotation for supporting facts as well as their composition. Guided by these annotations, we present a two-step approach to mitigate the retrieval challenges. We use other multiple-choice datasets as additional training data to strengthen the reasoning model. Our proposed approach improves over current state-of-the-art language models by 11% (absolute). The reasoning and retrieval problems, however, remain unsolved as this model still lags by 20% behind human performance.

【Keywords】:

991. Modality-Balanced Models for Visual Dialogue.

Paper Link】 【Pages】:8091-8098

【Authors】: Hyounghun Kim ; Hao Tan ; Mohit Bansal

【Abstract】: The Visual Dialog task requires a model to exploit both image and conversational context information to generate the next response to the dialogue. However, via manual analysis, we find that a large number of conversational questions can be answered by only looking at the image without any access to the context history, while others still need the conversation context to predict the correct answers. We demonstrate that due to this reason, previous joint-modality (history and image) models over-rely on and are more prone to memorizing the dialogue history (e.g., by extracting certain keywords or patterns in the context information), whereas image-only models are more generalizable (because they cannot memorize or extract keywords from history) and perform substantially better at the primary normalized discounted cumulative gain (NDCG) task metric which allows multiple correct answers. Hence, this observation encourages us to explicitly maintain two models, i.e., an image-only model and an image-history joint model, and combine their complementary abilities for a more balanced multimodal model. We present multiple methods for this integration of the two models, via ensemble and consensus dropout fusion with shared parameters. Empirically, our models achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and high balance across metrics), and substantially outperform the winner of the Visual Dialog challenge 2018 on most metrics.

【Keywords】:

992. Top-Down RST Parsing Utilizing Granularity Levels in Documents.

Paper Link】 【Pages】:8099-8106

【Authors】: Naoki Kobayashi ; Tsutomu Hirao ; Hidetaka Kamigaito ; Manabu Okumura ; Masaaki Nagata

【Abstract】: Some downstream NLP tasks exploit discourse dependency trees converted from RST trees. To obtain better discourse dependency trees, we need to improve the accuracy of RST trees at the upper parts of the structures. Thus, we propose a novel neural top-down RST parsing method. Then, we exploit three levels of granularity in a document, paragraphs, sentences and Elementary Discourse Units (EDUs), to parse a document accurately and efficiently. The parsing is done in a top-down manner for each granularity level, by recursively splitting a larger text span into two smaller ones while predicting nuclearity and relation labels for the divided spans. The results on the RST-DT corpus show that our method achieved the state-of-the-art results, 87.0 unlabeled span score, 74.6 nuclearity labeled span score, and the comparable result with the state-of-the-art, 60.0 relation labeled span score. Furthermore, discourse dependency trees converted from our RST trees also achieved the state-of-the-art results, 64.9 unlabeled attachment score and 48.5 labeled attachment score.

【Keywords】:

993. MA-DST: Multi-Attention-Based Scalable Dialog State Tracking.

Paper Link】 【Pages】:8107-8114

【Authors】: Adarsh Kumar ; Peter Ku ; Anuj Kumar Goyal ; Angeliki Metallinou ; Dilek Hakkani-Tür

【Abstract】: Task oriented dialog agents provide a natural language interface for users to complete their goal. Dialog State Tracking (DST), which is often a core component of these systems, tracks the system's understanding of the user's goal throughout the conversation. To enable accurate multi-domain DST, the model needs to encode dependencies between past utterances and slot semantics and understand the dialog context, including long-range cross-domain references. We introduce a novel architecture for this task to encode the conversation history and slot semantics more robustly by using attention mechanisms at multiple granularities. In particular, we use cross-attention to model relationships between the context and slots at different semantic levels and self-attention to resolve cross-domain coreferences. In addition, our proposed architecture does not rely on knowing the domain ontologies beforehand and can also be used in a zero-shot setting for new domains or unseen slot values. Our model improves the joint goal accuracy by 5% (absolute) in the full-data setting and by up to 2% (absolute) in the zero-shot setting over the present state-of-the-art on the MultiWoZ 2.1 dataset.

【Keywords】:

994. Deep Attentive Ranking Networks for Learning to Order Sentences.

Paper Link】 【Pages】:8115-8122

【Authors】: Pawan Kumar ; Dhanajit Brahma ; Harish Karnick ; Piyush Rai

【Abstract】: We present an attention-based ranking framework for learning to order sentences given a paragraph. Our framework is built on a bidirectional sentence encoder and a self-attention based transformer network to obtain an input order invariant representation of paragraphs. Moreover, it allows seamless training using a variety of ranking based loss functions, such as pointwise, pairwise, and listwise ranking. We apply our framework on two tasks: Sentence Ordering and Order Discrimination. Our framework outperforms various state-of-the-art methods on these tasks on a variety of evaluation metrics. We also show that it achieves better results when using pairwise and listwise ranking losses, rather than the pointwise ranking loss, which suggests that incorporating relative positions of two or more sentences in the loss function contributes to better learning.

【Keywords】:

995. CSI: A Coarse Sense Inventory for 85% Word Sense Disambiguation.

Paper Link】 【Pages】:8123-8130

【Authors】: Caterina Lacerra ; Michele Bevilacqua ; Tommaso Pasini ; Roberto Navigli

【Abstract】: Word Sense Disambiguation (WSD) is the task of associating a word in context with one of its meanings. While many works in the past have focused on raising the state of the art, none has even come close to achieving an F-score in the 80% ballpark when using WordNet as its sense inventory. We contend that one of the main reasons for this failure is the excessively fine granularity of this inventory, resulting in senses that are hard to differentiate between, even for an experienced human annotator. In this paper we cope with this long-standing problem by introducing Coarse Sense Inventory (CSI), obtained by linking WordNet concepts to a new set of 45 labels. The results show that the coarse granularity of CSI leads a WSD model to achieve 85.9% F1, while maintaining a high expressive power. Our set of labels also exhibits ease of use in tagging and a descriptiveness that other coarse inventories lack, as demonstrated in two annotation tasks which we performed. Moreover, a few-shot evaluation proves that the class-based nature of CSI allows the model to generalise over unseen or under-represented words.

【Keywords】:

996. A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces.

Paper Link】 【Pages】:8131-8138

【Authors】: Anne Lauscher ; Goran Glavas ; Simone Paolo Ponzetto ; Ivan Vulic

【Abstract】: Distributional word vectors have recently been shown to encode many of the human biases, most notably gender and racial biases, and models for attenuating such biases have consequently been proposed. However, existing models and studies (1) operate on under-specified and mutually differing bias definitions, (2) are tailored for a particular bias (e.g., gender bias) and (3) have been evaluated inconsistently and non-rigorously. In this work, we introduce a general framework for debiasing word embeddings. We operationalize the definition of a bias by discerning two types of bias specification: explicit and implicit. We then propose three debiasing models that operate on explicit or implicit bias specifications and that can be composed towards more robust debiasing. Finally, we devise a full-fledged evaluation framework in which we couple existing bias metrics with newly proposed ones. Experimental findings across three embedding methods suggest that the proposed debiasing models are robust and widely applicable: they often completely remove the bias both implicitly and explicitly without degradation of semantic information encoded in any of the input distributional spaces. Moreover, we successfully transfer debiasing models, by means of cross-lingual embedding spaces, and remove or attenuate biases in distributional word vector spaces of languages that lack readily available bias specifications.

【Keywords】:

997. Multi-Task Learning for Metaphor Detection with Graph Convolutional Neural Networks and Word Sense Disambiguation.

Paper Link】 【Pages】:8139-8146

【Authors】: Duong Le ; My Thai ; Thien Nguyen

【Abstract】: The current deep learning works on metaphor detection have only considered this task independently, ignoring the useful knowledge from the related tasks and knowledge resources. In this work, we introduce two novel mechanisms to improve the performance of the deep learning models for metaphor detection. The first mechanism employs graph convolutional neural networks (GCN) with dependency parse trees to directly connect the words of interest with their important context words for metaphor detection. The GCN networks in this work also present a novel control mechanism to filter the learned representation vectors to retain the most important information for metaphor detection. The second mechanism, on the other hand, features a multi-task learning framework that exploits the similarity between word sense disambiguation and metaphor detection to transfer the knowledge between the two tasks. The extensive experiments demonstrate the effectiveness of the proposed techniques, yielding the state-of-the-art performance over several datasets.

【Keywords】:

998. Segment-Then-Rank: Non-Factoid Question Answering on Instructional Videos.

Paper Link】 【Pages】:8147-8154

【Authors】: Kyungjae Lee ; Nan Duan ; Lei Ji ; Jason Li ; Seung-won Hwang

【Abstract】: We study the problem of non-factoid QA on instructional videos. Existing work focuses either on visual or textual modality of video content, to find matching answers to the question. However, neither is flexible enough for our problem setting of non-factoid answers with varying lengths. Motivated by this, we propose a two-stage model: (a) multimodal segmentation of video into span candidates and (b) length-adaptive ranking of the candidates to the question. First, for segmentation, we propose Segmenter for generating span candidates of diverse length, considering both textual and visual modality. Second, for ranking, we propose Ranker to score the candidates, dynamically combining the two models with complementary strength for both short and long spans respectively. Experimental result demonstrates that our model achieves state-of-the-art performance.

【Keywords】:

999. ALOHA: Artificial Learning of Human Attributes for Dialogue Agents.

Paper Link】 【Pages】:8155-8163

【Authors】: Aaron W. Li ; Veronica Jiang ; Steven Y. Feng ; Julia Sprague ; Wei Zhou ; Jesse Hoey

【Abstract】: For conversational AI and virtual assistants to communicate with humans in a realistic way, they must exhibit human characteristics such as expression of emotion and personality. Current attempts toward constructing human-like dialogue agents have presented significant difficulties. We propose Human Level Attributes (HLAs) based on tropes as the basis of a method for learning dialogue agents that can imitate the personalities of fictional characters. Tropes are characteristics of fictional personalities that are observed recurrently and determined by viewers' impressions. By combining detailed HLA data with dialogue data for specific characters, we present a dataset, HLA-Chat, that models character profiles and gives dialogue agents the ability to learn characters' language styles through their HLAs. We then introduce a three-component system, ALOHA (which stands for Artificial Learning of Human Attributes), that combines character space mapping, character community detection, and language style retrieval to build a character (or personality) specific language model. Our preliminary experiments demonstrate that two variations of ALOHA, combined with our proposed dataset, can outperform baseline models at identifying the correct dialogue responses of chosen target characters, and are stable regardless of the character's identity, the genre of the show, and the context of the dialogue.

【Keywords】:

1000. Recursively Binary Modification Model for Nested Named Entity Recognition.

Paper Link】 【Pages】:8164-8171

【Authors】: Bing Li ; Shifeng Liu ; Yifang Sun ; Wei Wang ; Xiang Zhao

【Abstract】: Recently, there has been an increasing interest in identifying named entities with nested structures. Existing models only make independent typing decisions on the entire entity span while ignoring strong modification relations between sub-entity types. In this paper, we present a novel Recursively Binary Modification model for nested named entity recognition. Our model utilizes the modification relations among sub-entities types to infer the head component on top of a Bayesian framework and uses entity head as a strong evidence to determine the type of the entity span. The process is recursive, allowing lower-level entities to help better model those on the outer-level. To the best of our knowledge, our work is the first effort that uses modification relation in nested NER task. Extensive experiments on four benchmark datasets demonstrate that our model outperforms state-of-the-art models in nested NER tasks, and delivers competitive results with state-of-the-art models in flat NER task, without relying on any extra annotations or NLP tools.

【Keywords】:

1001. GraphER: Token-Centric Entity Resolution with Graph Convolutional Neural Networks.

Paper Link】 【Pages】:8172-8179

【Authors】: Bing Li ; Wei Wang ; Yifang Sun ; Linhan Zhang ; Muhammad Asif Ali ; Yi Wang

【Abstract】: Entity resolution (ER) aims to identify entity records that refer to the same real-world entity, which is a critical problem in data cleaning and integration. Most of the existing models are attribute-centric, that is, matching entity pairs by comparing similarities of pre-aligned attributes, which require the schemas of records to be identical and are too coarse-grained to capture subtle key information within a single attribute. In this paper, we propose a novel graph-based ER model GraphER. Our model is token-centric: the final matching results are generated by directly aggregating token-level comparison features, in which both the semantic and structural information has been softly embedded into token embeddings by training an Entity Record Graph Convolutional Network (ER-GCN). To the best of our knowledge, our work is the first effort to do token-centric entity resolution with the help of GCN in entity resolution task. Extensive experiments on two real-world datasets demonstrate that our model stably outperforms state-of-the-art models.

【Keywords】:

1002. ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network.

Paper Link】 【Pages】:8180-8187

【Authors】: Fei Li ; Hong Yu

【Abstract】: Automated ICD coding, which assigns the International Classification of Disease codes to patient visits, has attracted much research attention since it can save time and labor for billing. The previous state-of-the-art model utilized one convolutional layer to build document representations for predicting ICD codes. However, the lengths and grammar of text fragments, which are closely related to ICD coding, vary a lot in different documents. Therefore, a flat and fixed-length convolutional architecture may not be capable of learning good document representations. In this paper, we proposed a Multi-Filter Residual Convolutional Neural Network (MultiResCNN) for ICD coding. The innovations of our model are two-folds: it utilizes a multi-filter convolutional layer to capture various text patterns with different lengths and a residual convolutional layer to enlarge the receptive field. We evaluated the effectiveness of our model on the widely-used MIMIC dataset. On the full code set of MIMIC-III, our model outperformed the state-of-the-art model in 4 out of 6 evaluation metrics. On the top-50 code set of MIMIC-III and the full code set of MIMIC-II, our model outperformed all the existing and state-of-the-art models in all evaluation metrics. The code is available at https://github.com/foxlf823/Multi-Filter-Residual-Convolutional-Neural-Network.

【Keywords】:

1003. Aspect-Aware Multimodal Summarization for Chinese E-Commerce Products.

Paper Link】 【Pages】:8188-8195

【Authors】: Haoran Li ; Peng Yuan ; Song Xu ; Youzheng Wu ; Xiaodong He ; Bowen Zhou

【Abstract】: We present an abstractive summarization system that produces summary for Chinese e-commerce products. This task is more challenging than general text summarization. First, the appearance of a product typically plays a significant role in customers' decisions to buy the product or not, which requires that the summarization model effectively use the visual information of the product. Furthermore, different products have remarkable features in various aspects, such as “energy efficiency” and “large capacity” for refrigerators. Meanwhile, different customers may care about different aspects. Thus, the summarizer needs to capture the most attractive aspects of a product that resonate with potential purchasers. We propose an aspect-aware multimodal summarization model that can effectively incorporate the visual information and also determine the most salient aspects of a product. We construct a large-scale Chinese e-commerce product summarization dataset that contains approximately 1.4 million manually created product summaries that are paired with detailed product information, including an image, a title, and other textual descriptions for each product. The experimental results on this dataset demonstrate that our models significantly outperform the comparative methods in terms of both the ROUGE score and manual evaluations.

【Keywords】:

1004. Keywords-Guided Abstractive Sentence Summarization.

Paper Link】 【Pages】:8196-8203

【Authors】: Haoran Li ; Junnan Zhu ; Jiajun Zhang ; Chengqing Zong ; Xiaodong He

【Abstract】: We study the problem of generating a summary for a given sentence. Existing researches on abstractive sentence summarization ignore that keywords in the input sentence provide significant clues for valuable content, and humans tend to write summaries covering these keywords. In this paper, we propose an abstractive sentence summarization method by applying guidance signals of keywords to both the encoder and the decoder in the sequence-to-sequence model. A multi-task learning framework is adopted to jointly learn to extract keywords and generate a summary for the input sentence. We apply keywords-guided selective encoding strategies to filter source information by investigating the interactions between the input sentence and the keywords. We extend pointer-generator network by a dual-attention and a dual-copy mechanism, which can integrate the semantics of the input sentence and the keywords, and copy words from both the input sentence and the keywords. We demonstrate that multi-task learning and keywords-oriented guidance facilitate sentence summarization task, achieving better performance than the competitive models on the English Gigaword sentence summarization dataset.

【Keywords】:

1005. Neuron Interaction Based Representation Composition for Neural Machine Translation.

Paper Link】 【Pages】:8204-8211

【Authors】: Jian Li ; Xing Wang ; Baosong Yang ; Shuming Shi ; Michael R. Lyu ; Zhaopeng Tu

【Abstract】: Recent NLP studies reveal that substantial linguistic information can be attributed to single neurons, i.e., individual dimensions of the representation vectors. We hypothesize that modeling strong interactions among neurons helps to better capture complex information by composing the linguistic properties embedded in individual neurons. Starting from this intuition, we propose a novel approach to compose representations learned by different components in neural machine translation (e.g., multi-layer networks or multi-head attention), based on modeling strong interactions among neurons in the representation vectors. Specifically, we leverage bilinear pooling to model pairwise multiplicative interactions among individual neurons, and a low-rank approximation to make the model computationally feasible. We further propose extended bilinear pooling to incorporate first-order representations. Experiments on WMT14 English⇒German and English⇒French translation tasks show that our model consistently improves performances over the SOTA Transformer baseline. Further analyses demonstrate that our approach indeed captures more syntactic and semantic information as expected.

【Keywords】:

1006. Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce.

Paper Link】 【Pages】:8212-8219

【Authors】: Juntao Li ; Chang Liu ; Jian Wang ; Lidong Bing ; Hongsong Li ; Xiaozhong Liu ; Dongyan Zhao ; Rui Yan

【Abstract】: With the prosperous of cross-border e-commerce, there is an urgent demand for designing intelligent approaches for assisting e-commerce sellers to offer local products for consumers from all over the world. In this paper, we explore a new task of cross-lingual information retrieval, i.e., cross-lingual set-to-description retrieval in cross-border e-commerce, which involves matching product attribute sets in the source language with persuasive product descriptions in the target language. We manually collect a new and high-quality paired dataset, where each pair contains an unordered product attribute set in the source language and an informative product description in the target language. As the dataset construction process is both time-consuming and costly, the new dataset only comprises of 13.5k pairs, which is a low-resource setting and can be viewed as a challenging testbed for model development and evaluation in cross-border e-commerce. To tackle this cross-lingual set-to-description retrieval task, we propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual mapping upon the pre-trained monolingual BERT representations. Experimental results indicate that our proposed CLMN yields impressive results on the challenging task and the context-dependent cross-lingual mapping on BERT yields noticeable improvement over the pre-trained multi-lingual BERT model.

【Keywords】:

1007. Simultaneous Learning of Pivots and Representations for Cross-Domain Sentiment Classification.

Paper Link】 【Pages】:8220-8227

【Authors】: Liang Li ; Weirui Ye ; Mingsheng Long ; Yateng Tang ; Jin Xu ; Jianmin Wang

【Abstract】: Cross-domain sentiment classification aims to leverage useful knowledge from a source domain to mitigate the supervision sparsity in a target domain. A series of approaches depend on the pivot features that behave similarly for polarity prediction in both domains. However, the engineering of such pivot features remains cumbersome and prevents us from learning the disentangled and transferable representations from rich semantic and syntactic information. Towards learning the pivots and representations simultaneously, we propose a new Transferable Pivot Transformer (TPT). Our model consists of two networks: a Pivot Selector that learns to detect transferable n-gram pivots from contexts, and a Transferable Transformer that learns to generate domain-invariant representations by modeling the correlation between pivot and non-pivot words. The Pivot Selector and Transferable Transformer are jointly optimized through end-to-end back-propagation. We experiment with real tasks of cross-domain sentiment classification over 20 domain pairs where our model outperforms prior arts.

【Keywords】:

1008. RobuTrans: A Robust Transformer-Based Text-to-Speech Model.

Paper Link】 【Pages】:8228-8235

【Authors】: Naihan Li ; Yanqing Liu ; Yu Wu ; Shujie Liu ; Sheng Zhao ; Ming Liu

【Abstract】: Recently, neural network based speech synthesis has achieved outstanding results, by which the synthesized audios are of excellent quality and naturalness. However, current neural TTS models suffer from the robustness issue, which results in abnormal audios (bad cases) especially for unusual text (unseen context). To build a neural model which can synthesize both natural and stable audios, in this paper, we make a deep analysis of why the previous neural TTS models are not robust, based on which we propose RobuTrans (Robust Transformer), a robust neural TTS model based on Transformer. Comparing to TransformerTTS, our model first converts input texts to linguistic features, including phonemic features and prosodic features, then feed them to the encoder. In the decoder, the encoder-decoder attention is replaced with a duration-based hard attention mechanism, and the causal self-attention is replaced with a "pseudo non-causal attention" mechanism to model the holistic information of the input. Besides, the position embedding is replaced with a 1-D CNN, since it constrains the maximum length of synthesized audio. With these modifications, our model not only fix the robustness problem, but also achieves on parity MOS (4.36) with TransformerTTS (4.37) and Tacotron2 (4.37) on our general set.

【Keywords】:

1009. Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER.

Paper Link】 【Pages】:8236-8244

【Authors】: Peng-Hsuan Li ; Tsu-Jui Fu ; Wei-Yun Ma

【Abstract】: BiLSTM has been prevalently used as a core module for NER in a sequence-labeling setup. State-of-the-art approaches use BiLSTM with additional resources such as gazetteers, language-modeling, or multi-task supervision to further improve NER. This paper instead takes a step back and focuses on analyzing problems of BiLSTM itself and how exactly self-attention can bring improvements. We formally show the limitation of (CRF-)BiLSTM in modeling cross-context patterns for each word – the XOR limitation. Then, we show that two types of simple cross-structures – self-attention and Cross-BiLSTM – can effectively remedy the problem. We test the practical impacts of the deficiency on real-world NER datasets, OntoNotes 5.0 and WNUT 2017, with clear and consistent improvements over the baseline, up to 8.7% on some of the multi-token entity mentions. We give in-depth analyses of the improvements across several aspects of NER, especially the identification of multi-token mentions. This study should lay a sound foundation for future improvements on sequence-labeling NER1.

【Keywords】:

1010. MetaMT, a Meta Learning Method Leveraging Multiple Domain Data for Low Resource Machine Translation.

Paper Link】 【Pages】:8245-8252

【Authors】: Rumeng Li ; Xun Wang ; Hong Yu

【Abstract】: Neural machine translation (NMT) models have achieved state-of-the-art translation quality with a large quantity of parallel corpora available. However, their performance suffers significantly when it comes to domain-specific translations, in which training data are usually scarce. In this paper, we present a novel NMT model with a new word embedding transition technique for fast domain adaption. We propose to split parameters in the model into two groups: model parameters and meta parameters. The former are used to model the translation while the latter are used to adjust the representational space to generalize the model to different domains. We mimic the domain adaptation of the machine translation model to low-resource domains using multiple translation tasks on different domains. A new training strategy based on meta-learning is developed along with the proposed model to update the model parameters and meta parameters alternately. Experiments on datasets of different domains showed substantial improvements of NMT performances on a limited amount of data.

【Keywords】:

1011. Relevance-Promoting Language Model for Short-Text Conversation.

Paper Link】 【Pages】:8253-8260

【Authors】: Xin Li ; Piji Li ; Wei Bi ; Xiaojiang Liu ; Wai Lam

【Abstract】: Despite the effectiveness of sequence-to-sequence framework on the task of Short-Text Conversation (STC), the issue of under-exploitation of training data (i.e., the supervision signals from query text is ignored) still remains unresolved. Also, the adopted maximization-based decoding strategies, inclined to generating the generic responses or responses with repetition, are unsuited to the STC task. In this paper, we propose to formulate the STC task as a language modeling problem and tailor-make a training strategy to adapt a language model for response generation. To enhance generation performance, we design a relevance-promoting transformer language model, which performs additional supervised source attention after the self-attention to increase the importance of informative query tokens in calculating the token-level representation. The model further refines the query representation with relevance clues inferred from its multiple references during training. In testing, we adopt a randomization-over-maximization strategy to reduce the generation of generic responses. Experimental results on a large Chinese STC dataset demonstrate the superiority of the proposed model on relevance metrics and diversity metrics.1

【Keywords】:

1012. Towards Zero-Shot Learning for Automatic Phonemic Transcription.

Paper Link】 【Pages】:8261-8268

【Authors】: Xinjian Li ; Siddharth Dalmia ; David R. Mortensen ; Juncheng Li ; Alan W. Black ; Florian Metze

【Abstract】: Automatic phonemic transcription tools are useful for low-resource language documentation. However, due to the lack of training sets, only a tiny fraction of languages have phonemic transcription tools. Fortunately, multilingual acoustic modeling provides a solution given limited audio training data. A more challenging problem is to build phonemic transcribers for languages with zero training data. The difficulty of this task is that phoneme inventories often differ between the training languages and the target language, making it infeasible to recognize unseen phonemes. In this work, we address this problem by adopting the idea of zero-shot learning. Our model is able to recognize unseen phonemes in the target language without any training data. In our model, we decompose phonemes into corresponding articulatory attributes such as vowel and consonant. Instead of predicting phonemes directly, we first predict distributions over articulatory attributes, and then compute phoneme distributions with a customized acoustic model. We evaluate our model by training it using 13 languages and testing it using 7 unseen languages. We find that it achieves 7.7% better phoneme error rate on average over a standard multilingual model.

【Keywords】:

1013. Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction.

Paper Link】 【Pages】:8269-8276

【Authors】: Yang Li ; Guodong Long ; Tao Shen ; Tianyi Zhou ; Lina Yao ; Huan Huo ; Jing Jiang

【Abstract】: Distantly supervised relation extraction intrinsically suffers from noisy labels due to the strong assumption of distant supervision. Most prior works adopt a selective attention mechanism over sentences in a bag to denoise from wrongly labeled data, which however could be incompetent when there is only one sentence in a bag. In this paper, we propose a brand-new light-weight neural framework to address the distantly supervised relation extraction problem and alleviate the defects in previous selective attention framework. Specifically, in the proposed framework, 1) we use an entity-aware word embedding method to integrate both relative position information and head/tail entity embeddings, aiming to highlight the essence of entities for this task; 2) we develop a self-attention mechanism to capture the rich contextual dependencies as a complement for local dependencies captured by piecewise CNN; and 3) instead of using selective attention, we design a pooling-equipped gate, which is based on rich contextual representations, as an aggregator to generate bag-level representation for final relation classification. Compared to selective attention, one major advantage of the proposed gating mechanism is that, it performs stably and promisingly even if only one sentence appears in a bag and thus keeps the consistency across all training examples. The experiments on NYT dataset demonstrate that our approach achieves a new state-of-the-art performance in terms of both AUC and top-n precision metrics.

【Keywords】:

1014. Span-Based Neural Buffer: Towards Efficient and Effective Utilization of Long-Distance Context for Neural Sequence Models.

Paper Link】 【Pages】:8277-8284

【Authors】: Yangming Li ; Kaisheng Yao ; Libo Qin ; Shuang Peng ; Yijia Liu ; Xiaolong Li

【Abstract】: Neural sequence model, though widely used for modeling sequential data such as the language model, has sequential recency bias (Kuncoro et al. 2018) to the local context, limiting its full potential to capture long-distance context. To address this problem, this paper proposes augmenting sequence models with a span-based neural buffer that efficiently represents long-distance context, allowing a gate policy network to make interpolated predictions from both the neural buffer and the underlying sequence model. Training this policy network to utilize long-distance context is however challenging due to the simple sentence dominance problem (Marvin and Linzen 2018). To alleviate this problem, we propose a novel training algorithm that combines an annealed maximum likelihood estimation with an intrinsic reward-driven reinforcement learning. Sequence models with the proposed span-based neural buffer significantly improve the state-of-the-art perplexities on the benchmark Penn Treebank and WikiText-2 datasets to 43.9 and 35.2 respectively. We conduct extensive analysis and confirm that the proposed architecture and the training algorithm both contribute to the improvements.

【Keywords】:

1015. Neural Machine Translation with Joint Representation.

Paper Link】 【Pages】:8285-8292

【Authors】: Yanyang Li ; Qiang Wang ; Tong Xiao ; Tongran Liu ; Jingbo Zhu

【Abstract】: Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to the explicit modelling of the interaction between any two source and target units, e.g., alignment, the recent Neural Machine Translation (NMT) systems resort to the attention which partially encodes the interaction for efficiency. In this paper, we employ Joint Representation that fully accounts for each possible interaction. We sidestep the inefficiency issue by refining representations with the proposed efficient attention operation. The resulting Reformer models offer a new Sequence-to-Sequence modelling paradigm besides the Encoder-Decoder framework and outperform the Transformer baseline in either the small scale IWSLT14 German-English, English-German and IWSLT15 Vietnamese-English or the large scale NIST12 Chinese-English translation tasks by about 1 BLEU point. We also propose a systematic model scaling approach, allowing the Reformer model to beat the state-of-the-art Transformer in IWSLT14 German-English and NIST12 Chinese-English with about 50% fewer parameters. The code is publicly available at https://github.com/lyy1994/reformer.

【Keywords】:

1016. End-to-End Trainable Non-Collaborative Dialog System.

Paper Link】 【Pages】:8293-8302

【Authors】: Yu Li ; Kun Qian ; Weiyan Shi ; Zhou Yu

【Abstract】: End-to-end task-oriented dialog models have achieved promising performance on collaborative tasks where users willingly coordinate with the system to complete a given task. While in non-collaborative settings, for example, negotiation and persuasion, users and systems do not share a common goal. As a result, compared to collaborate tasks, people use social content to build rapport and trust in these non-collaborative settings in order to advance their goals. To handle social content, we introduce a hierarchical intent annotation scheme, which can be generalized to different non-collaborative dialog tasks. Building upon TransferTransfo (Wolf et al. 2019), we propose an end-to-end neural network model to generate diverse coherent responses. Our model utilizes intent and semantic slots as the intermediate sentence representation to guide the generation process. In addition, we design a filter to select appropriate responses based on whether these intermediate representations fit the designed task and conversation constraints. Our non-collaborative dialog model guides users to complete the task while simultaneously keeps them engaged. We test our approach on our newly proposed AntiScam dataset and an existing PersuasionForGood dataset. Both automatic and human evaluations suggest that our model outperforms multiple baselines in these two non-collaborative tasks.

【Keywords】:

1017. Complementary Auxiliary Classifiers for Label-Conditional Text Generation.

Paper Link】 【Pages】:8303-8310

【Authors】: Yuan Li ; Chunyuan Li ; Yizhe Zhang ; Xiujun Li ; Guoqing Zheng ; Lawrence Carin ; Jianfeng Gao

【Abstract】: Learning to generate text with a given label is a challenging task because natural language sentences are highly variable and ambiguous. It renders difficulties in trade-off between sentence quality and label fidelity. In this paper, we present CARA to alleviate the issue, where two auxiliary classifiers work simultaneously to ensure that (1) the encoder learns disentangled features and (2) the generator produces label-related sentences. Two practical techniques are further proposed to improve the performance, including annealing the learning signal from the auxiliary classifier, and enhancing the encoder with pre-trained language models. To establish a comprehensive benchmark fostering future research, we consider a suite of four datasets, and systematically reproduce three representative methods. CARA shows consistent improvement over the previous methods on the task of label-conditional text generation, and achieves state-of-the-art on the task of attribute transfer.

【Keywords】:

1018. Explicit Sentence Compression for Neural Machine Translation.

Paper Link】 【Pages】:8311-8318

【Authors】: Zuchao Li ; Rui Wang ; Kehai Chen ; Masao Utiyama ; Eiichiro Sumita ; Zhuosheng Zhang ; Hai Zhao

【Abstract】: State-of-the-art Transformer-based neural machine translation (NMT) systems still follow a standard encoder-decoder framework, in which source sentence representation can be well done by an encoder with self-attention mechanism. Though Transformer-based encoder may effectively capture general information in its resulting source sentence representation, the backbone information, which stands for the gist of a sentence, is not specifically focused on. In this paper, we propose an explicit sentence compression method to enhance the source sentence representation for NMT. In practice, an explicit sentence compression goal used to learn the backbone information in a sentence. We propose three ways, including backbone source-side fusion, target-side fusion, and both-side fusion, to integrate the compressed sentence into NMT. Our empirical tests on the WMT English-to-French and English-to-German translation tasks show that the proposed sentence compression method significantly improves the translation performances over strong baselines.

【Keywords】:

1019. Global Greedy Dependency Parsing.

Paper Link】 【Pages】:8319-8326

【Authors】: Zuchao Li ; Hai Zhao ; Kevin Parnow

【Abstract】: Most syntactic dependency parsing models may fall into one of two categories: transition- and graph-based models. The former models enjoy high inference efficiency with linear time complexity, but they rely on the stacking or re-ranking of partially-built parse trees to build a complete parse tree and are stuck with slower training for the necessity of dynamic oracle training. The latter, graph-based models, may boast better performance but are unfortunately marred by polynomial time inference. In this paper, we propose a novel parsing order objective, resulting in a novel dependency parsing model capable of both global (in sentence scope) feature extraction as in graph models and linear time inference as in transitional models. The proposed global greedy parser only uses two arc-building actions, left and right arcs, for projective parsing. When equipped with two extra non-projective arc-building actions, the proposed parser may also smoothly support non-projective parsing. Using multiple benchmark treebanks, including the Penn Treebank (PTB), the CoNLL-X treebanks, and the Universal Dependency Treebanks, we evaluate our parser and demonstrate that the proposed novel parser achieves good performance with faster training and decoding.

【Keywords】:

1020. MOSS: End-to-End Dialog System Framework with Modular Supervision.

Paper Link】 【Pages】:8327-8335

【Authors】: Weixin Liang ; Youzhi Tian ; Chengcai Chen ; Zhou Yu

【Abstract】: A major bottleneck in training end-to-end task-oriented dialog system is the lack of data. To utilize limited training data more efficiently, we propose Modular Supervision Network (MOSS), an encoder-decoder training framework that could incorporate supervision from various intermediate dialog system modules including natural language understanding, dialog state tracking, dialog policy learning and natural language generation. With only 60% of the training data, MOSS-all (i.e., MOSS with supervision from all four dialog modules) outperforms state-of-the-art models on CamRest676. Moreover, introducing modular supervision has even bigger benefits when the dialog task has a more complex dialog state and action space. With only 40% of the training data, MOSS-all outperforms the state-of-the-art model on a complex laptop network trouble shooting dataset, LaptopNetwork, that we introduced. LaptopNetwork consists of conversations between real customers and customer service agents in Chinese. Moreover, MOSS framework can accommodate dialogs that have supervision from different dialog modules at both framework level and model level. Therefore, MOSS is extremely flexible to update in real-world deployment.

【Keywords】:

1021. Embedding Compression with Isotropic Iterative Quantization.

Paper Link】 【Pages】:8336-8343

【Authors】: Siyu Liao ; Jie Chen ; Yanzhi Wang ; Qinru Qiu ; Bo Yuan

【Abstract】: Continuous representation of words is a standard component in deep learning-based NLP models. However, representing a large vocabulary requires significant memory, which can cause problems, particularly on resource-constrained platforms. Therefore, in this paper we propose an isotropic iterative quantization (IIQ) approach for compressing embedding vectors into binary ones, leveraging the iterative quantization technique well established for image retrieval, while satisfying the desired isotropic property of PMI based models. Experiments with pre-trained embeddings (i.e., GloVe and HDC) demonstrate a more than thirty-fold compression ratio with comparable and sometimes even improved performance over the original real-valued embedding vectors.

【Keywords】:

1022. Semi-Supervised Learning on Meta Structure: Multi-Task Tagging and Parsing in Low-Resource Scenarios.

Paper Link】 【Pages】:8344-8351

【Authors】: KyungTae Lim ; Jay Yoon Lee ; Jaime G. Carbonell ; Thierry Poibeau

【Abstract】: Multi-view learning makes use of diverse models arising from multiple sources of input or different feature subsets for the same task. For example, a given natural language processing task can combine evidence from models arising from character, morpheme, lexical, or phrasal views. The most common strategy with multi-view learning, especially popular in the neural network community, is to unify multiple representations into one unified vector through concatenation, averaging, or pooling, and then build a single-view model on top of the unified representation. As an alternative, we examine whether building one model per view and then unifying the different models can lead to improvements, especially in low-resource scenarios. More specifically, taking inspiration from co-training methods, we propose a semi-supervised learning approach based on multi-view models through consensus promotion, and investigate whether this improves overall performance. To test the multi-view hypothesis, we use moderately low-resource scenarios for nine languages and test the performance of the joint model for part-of-speech tagging and dependency parsing. The proposed model shows significant improvements across the test cases, with average gains of -0.9 ∼ +9.3 labeled attachment score (LAS) points. We also investigate the effect of unlabeled data on the proposed model by varying the amount of training data and by using different domains of unlabeled data.

【Keywords】:

1023. Hierarchical Attention Network with Pairwise Loss for Chinese Zero Pronoun Resolution.

Paper Link】 【Pages】:8352-8359

【Authors】: Peiqin Lin ; Meng Yang

【Abstract】: Recent neural network methods for Chinese zero pronoun resolution didn't take bidirectional attention between zero pronouns and candidate antecedents into consideration, and simply treated the task as a classification task, ignoring the relationship between different candidates of a zero pronoun. To solve these problems, we propose a Hierarchical Attention Network with Pairwise Loss (HAN-PL), for Chinese zero pronoun resolution. In the proposed HAN-PL, we design a two-layer attention model to generate more powerful representations for zero pronouns and candidate antecedents. Furthermore, we propose a novel pairwise loss by introducing the correct-antecedent similarity constraint and the pairwise-margin loss, making the learned model more discriminative. Extensive experiments have been conducted on OntoNotes 5.0 dataset, and our model achieves state-of-the-art performance in the task of Chinese zero pronoun resolution.

【Keywords】:

1024. Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement.

Paper Link】 【Pages】:8360-8367

【Authors】: Ting-En Lin ; Hua Xu ; Hanlei Zhang

【Abstract】: Identifying new user intents is an essential task in the dialogue system. However, it is hard to get satisfying clustering results since the definition of intents is strongly guided by prior knowledge. Existing methods incorporate prior knowledge by intensive feature engineering, which not only leads to overfitting but also makes it sensitive to the number of clusters. In this paper, we propose constrained deep adaptive clustering with cluster refinement (CDAC+), an end-to-end clustering method that can naturally incorporate pairwise constraints as prior knowledge to guide the clustering process. Moreover, we refine the clusters by forcing the model to learn from the high confidence assignments. After eliminating low confidence assignments, our approach is surprisingly insensitive to the number of clusters. Experimental results on the three benchmark datasets show that our method can yield significant improvements over strong baselines. 1

【Keywords】:

1025. Integrating Linguistic Knowledge to Sentence Paraphrase Generation.

Paper Link】 【Pages】:8368-8375

【Authors】: Zibo Lin ; Ziran Li ; Ning Ding ; Hai-Tao Zheng ; Ying Shen ; Wei Wang ; Cong-Zhi Zhao

【Abstract】: Paraphrase generation aims to rewrite a text with different words while keeping the same meaning. Previous work performs the task based solely on the given dataset while ignoring the availability of external linguistic knowledge. However, it is intuitive that a model can generate more expressive and diverse paraphrase with the help of such knowledge. To fill this gap, we propose Knowledge-Enhanced Paraphrase Network (KEPN), a transformer-based framework that can leverage external linguistic knowledge to facilitate paraphrase generation. (1) The model integrates synonym information from the external linguistic knowledge into the paraphrase generator, which is used to guide the decision on whether to generate a new word or replace it with a synonym. (2) To locate the synonym pairs more accurately, we adopt an incremental encoding scheme to incorporate position information of each synonym. Besides, a multi-task architecture is designed to help the framework jointly learn the selection of synonym pairs and the generation of expressive paraphrase. Experimental results on both English and Chinese datasets show that our method significantly outperforms the state-of-the-art approaches in terms of both automatic and human evaluation.

【Keywords】:

1026. Revision in Continuous Space: Unsupervised Text Style Transfer without Adversarial Learning.

Paper Link】 【Pages】:8376-8383

【Authors】: Dayiheng Liu ; Jie Fu ; Yidan Zhang ; Chris Pal ; Jiancheng Lv

【Abstract】: Typical methods for unsupervised text style transfer often rely on two key ingredients: 1) seeking the explicit disentanglement of the content and the attributes, and 2) troublesome adversarial learning. In this paper, we show that neither of these components is indispensable. We propose a new framework that utilizes the gradients to revise the sentence in a continuous space during inference to achieve text style transfer. Our method consists of three key components: a variational auto-encoder (VAE), some attribute predictors (one for each attribute), and a content predictor. The VAE and the two types of predictors enable us to perform gradient-based optimization in the continuous space, which is mapped from sentences in a discrete space, to find the representation of a target sentence with the desired attributes and preserved content. Moreover, the proposed method naturally has the ability to simultaneously manipulate multiple fine-grained attributes, such as sentence length and the presence of specific words, when performing text style transfer tasks. Compared with previous adversarial learning based methods, the proposed method is more interpretable, controllable and easier to train. Extensive experimental studies on three popular text style transfer tasks show that the proposed method significantly outperforms five state-of-the-art methods.

【Keywords】:

1027. Joint Character-Level Word Embedding and Adversarial Stability Training to Defend Adversarial Text.

Paper Link】 【Pages】:8384-8391

【Authors】: Hui Liu ; Yongzheng Zhang ; Yipeng Wang ; Zheng Lin ; Yige Chen

【Abstract】: Text classification is a basic task in natural language processing, but the small character perturbations in words can greatly decrease the effectiveness of text classification models, which is called character-level adversarial example attack. There are two main challenges in character-level adversarial examples defense, which are out-of-vocabulary words in word embedding model and the distribution difference between training and inference. Both of these two challenges make the character-level adversarial examples difficult to defend. In this paper, we propose a framework which jointly uses the character embedding and the adversarial stability training to overcome these two challenges. Our experimental results on five text classification data sets show that the models based on our framework can effectively defend character-level adversarial examples, and our models can defend 93.19% gradient-based adversarial examples and 94.83% natural adversarial examples, which outperforms the state-of-the-art defense models.

【Keywords】:

1028. A Robust Adversarial Training Approach to Machine Reading Comprehension.

Paper Link】 【Pages】:8392-8400

【Authors】: Kai Liu ; Xin Liu ; An Yang ; Jing Liu ; Jinsong Su ; Sujian Li ; Qiaoqiao She

【Abstract】: Lacking robustness is a serious problem for Machine Reading Comprehension (MRC) models. To alleviate this problem, one of the most promising ways is to augment the training dataset with sophisticated designed adversarial examples. Generally, those examples are created by rules according to the observed patterns of successful adversarial attacks. Since the types of adversarial examples are innumerable, it is not adequate to manually design and enrich training data to defend against all types of adversarial attacks. In this paper, we propose a novel robust adversarial training approach to improve the robustness of MRC models in a more generic way. Given an MRC model well-trained on the original dataset, our approach dynamically generates adversarial examples based on the parameters of current model and further trains the model by using the generated examples in an iterative schedule. When applied to the state-of-the-art MRC models, including QANET, BERT and ERNIE2.0, our approach obtains significant and comprehensive improvements on 5 adversarial datasets constructed in different ways, without sacrificing the performance on the original SQuAD development set. Moreover, when coupled with other data augmentation strategy, our approach further boosts the overall performance on adversarial datasets and outperforms the state-of-the-art methods.

【Keywords】:

1029. HAMNER: Headword Amplified Multi-Span Distantly Supervised Method for Domain Specific Named Entity Recognition.

Paper Link】 【Pages】:8401-8408

【Authors】: Shifeng Liu ; Yifang Sun ; Bing Li ; Wei Wang ; Xiang Zhao

【Abstract】: To tackle Named Entity Recognition (NER) tasks, supervised methods need to obtain sufficient cleanly annotated data, which is labor and time consuming. On the contrary, distantly supervised methods acquire automatically annotated data using dictionaries to alleviate this requirement. Unfortunately, dictionaries hinder the effectiveness of distantly supervised methods for NER due to its limited coverage, especially in specific domains. In this paper, we aim at the limitations of the dictionary usage and mention boundary detection. We generalize the distant supervision by extending the dictionary with headword based non-exact matching. We apply a function to better weight the matched entity mentions. We propose a span-level model, which classifies all the possible spans then infers the selected spans with a proposed dynamic programming algorithm. Experiments on all three benchmark datasets demonstrate that our method outperforms previous state-of-the-art distantly supervised methods.

【Keywords】:

1030. Tensor Graph Convolutional Networks for Text Classification.

Paper Link】 【Pages】:8409-8416

【Authors】: Xien Liu ; Xinxin You ; Xiao Zhang ; Ji Wu ; Ping Lv

【Abstract】: Compared to sequential learning models, graph-based neural networks exhibit some excellent properties, such as ability capturing global information. In this paper, we investigate graph-based neural networks for text classification problem. A new framework TensorGCN (tensor graph convolutional networks), is presented for this task. A text graph tensor is firstly constructed to describe semantic, syntactic, and sequential contextual information. Then, two kinds of propagation learning perform on the text graph tensor. The first is intra-graph propagation used for aggregating information from neighborhood nodes in a single graph. The second is inter-graph propagation used for harmonizing heterogeneous information between graphs. Extensive experiments are conducted on benchmark datasets, and the results illustrate the effectiveness of our proposed framework. Our proposed TensorGCN presents an effective way to harmonize and integrate heterogeneous information from different kinds of graphs.

【Keywords】:

1031. Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding.

Paper Link】 【Pages】:8417-8424

【Authors】: Yuchen Liu ; Jiajun Zhang ; Hao Xiong ; Long Zhou ; Zhongjun He ; Hua Wu ; Haifeng Wang ; Chengqing Zong

【Abstract】: Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years. Compared to the traditional pipeline system, the end-to-end ST model has potential benefits of lower latency, smaller model size, and less error propagation. However, it is notoriously difficult to implement such a model without transcriptions as intermediate. Existing works generally apply multi-task learning to improve translation quality by jointly training end-to-end ST along with automatic speech recognition (ASR). However, different tasks in this method cannot utilize information from each other, which limits the improvement. Other works propose a two-stage model where the second model can use the hidden state from the first one, but its cascade manner greatly affects the efficiency of training and inference process. In this paper, we propose a novel interactive attention mechanism which enables ASR and ST to perform synchronously and interactively in a single model. Specifically, the generation of transcriptions and translations not only relies on its previous outputs but also the outputs predicted in the other task. Experiments on TED speech translation corpora have shown that our proposed model can outperform strong baselines on the quality of speech translation and achieve better speech recognition performances as well.

【Keywords】:

1032. CatGAN: Category-Aware Generative Adversarial Networks with Hierarchical Evolutionary Learning for Category Text Generation.

Paper Link】 【Pages】:8425-8432

【Authors】: Zhiyue Liu ; Jiahai Wang ; Zhiwei Liang

【Abstract】: Generating multiple categories of texts is a challenging task and draws more and more attention. Since generative adversarial nets (GANs) have shown competitive results on general text generation, they are extended for category text generation in some previous works. However, the complicated model structures and learning strategies limit their performance and exacerbate the training instability. This paper proposes a category-aware GAN (CatGAN) which consists of an efficient category-aware model for category text generation and a hierarchical evolutionary learning algorithm for training our model. The category-aware model directly measures the gap between real samples and generated samples on each category, then reducing this gap will guide the model to generate high-quality category samples. The Gumbel-Softmax relaxation further frees our model from complicated learning strategies for updating CatGAN on discrete data. Moreover, only focusing on the sample quality normally leads the mode collapse problem, thus a hierarchical evolutionary learning algorithm is introduced to stabilize the training procedure and obtain the trade-off between quality and diversity while training CatGAN. Experimental results demonstrate that CatGAN outperforms most of the existing state-of-the-art methods.

【Keywords】:

1033. Attention-Informed Mixed-Language Training for Zero-Shot Cross-Lingual Task-Oriented Dialogue Systems.

Paper Link】 【Pages】:8433-8440

【Authors】: Zihan Liu ; Genta Indra Winata ; Zhaojiang Lin ; Peng Xu ; Pascale Fung

【Abstract】: Recently, data-driven task-oriented dialogue systems have achieved promising performance in English. However, developing dialogue systems that support low-resource languages remains a long-standing challenge due to the absence of high-quality data. In order to circumvent the expensive and time-consuming data collection, we introduce Attention-Informed Mixed-Language Training (MLT), a novel zero-shot adaptation method for cross-lingual task-oriented dialogue systems. It leverages very few task-related parallel word pairs to generate code-switching sentences for learning the inter-lingual semantics across languages. Instead of manually selecting the word pairs, we propose to extract source words based on the scores computed by the attention layer of a trained English task-related model and then generate word pairs using existing bilingual dictionaries. Furthermore, intensive experiments with different cross-lingual embeddings demonstrate the effectiveness of our approach. Finally, with very few word pairs, our model achieves significant zero-shot adaptation performance improvements in both cross-lingual dialogue state tracking and natural language understanding (i.e., intent detection and slot filling) tasks compared to the current state-of-the-art approaches, which utilize a much larger amount of bilingual data.

【Keywords】:

1034. Hierarchical Contextualized Representation for Named Entity Recognition.

Paper Link】 【Pages】:8441-8448

【Authors】: Ying Luo ; Fengshun Xiao ; Hai Zhao

【Abstract】: Named entity recognition (NER) models are typically based on the architecture of Bi-directional LSTM (BiLSTM). The constraints of sequential nature and the modeling of single input prevent the full utilization of global information from larger scope, not only in the entire sentence, but also in the entire document (dataset). In this paper, we address these two deficiencies and propose a model augmented with hierarchical contextualized representation: sentence-level representation and document-level representation. In sentence-level, we take different contributions of words in a single sentence into consideration to enhance the sentence representation learned from an independent BiLSTM via label embedding attention mechanism. In document-level, the key-value memory network is adopted to record the document-aware information for each unique word which is sensitive to similarity of context information. Our two-level hierarchical contextualized representations are fused with each input token embedding and corresponding hidden state of BiLSTM, respectively. The experimental results on three benchmark NER datasets (CoNLL-2003 and Ontonotes 5.0 English datasets, CoNLL-2002 Spanish dataset) show that we establish new state-of-the-art results.

【Keywords】:

1035. Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering.

Paper Link】 【Pages】:8449-8456

【Authors】: Shangwen Lv ; Daya Guo ; Jingjing Xu ; Duyu Tang ; Nan Duan ; Ming Gong ; Linjun Shou ; Daxin Jiang ; Guihong Cao ; Songlin Hu

【Abstract】: Commonsense question answering aims to answer questions which require background knowledge that is not explicitly expressed in the question. The key challenge is how to obtain evidence from external knowledge and make predictions based on the evidence. Recent studies either learn to generate evidence from human-annotated evidence which is expensive to collect, or extract evidence from either structured or unstructured knowledge bases which fails to take advantages of both sources simultaneously. In this work, we propose to automatically extract evidence from heterogeneous knowledge sources, and answer questions based on the extracted evidence. Specifically, we extract evidence from both structured knowledge base (i.e. ConceptNet) and Wikipedia plain texts. We construct graphs for both sources to obtain the relational structures of evidence. Based on these graphs, we propose a graph-based approach consisting of a graph-based contextual word representation learning module and a graph-based inference module. The first module utilizes graph structural information to re-define the distance between words for learning better contextual word representations. The second module adopts graph convolutional network to encode neighbor information into the representations of nodes, and aggregates evidence with graph attention mechanism for predicting the final answer. Experimental results on CommonsenseQA dataset illustrate that our graph-based approach over both knowledge sources brings improvement over strong baselines. Our approach achieves the state-of-the-art accuracy (75.3%) on the CommonsenseQA dataset.

【Keywords】:

1036. FPETS: Fully Parallel End-to-End Text-to-Speech System.

Paper Link】 【Pages】:8457-8463

【Authors】: Dabiao Ma ; Zhiba Su ; Wenxuan Wang ; Yuhao Lu

【Abstract】: End-to-end Text-to-speech (TTS) system can greatly improve the quality of synthesised speech. But it usually suffers form high time latency due to its auto-regressive structure. And the synthesised speech may also suffer from some error modes, e.g. repeated words, mispronunciations, and skipped words. In this paper, we propose a novel non-autoregressive, fully parallel end-to-end TTS system (FPETS). It utilizes a new alignment model and the recently proposed U-shape convolutional structure, UFANS. Different from RNN, UFANS can capture long term information in a fully parallel manner. Trainable position encoding and two-step training strategy are used for learning better alignments. Experimental results show FPETS utilizes the power of parallel computation and reaches a significant speed up of inference compared with state-of-the-art end-to-end TTS systems. More specifically, FPETS is 600X faster than Tacotron2, 50X faster than DCTTS and 10X faster than Deep Voice3. And FPETS can generates audios with equal or better quality and fewer errors comparing with other system. As far as we know, FPETS is the first end-to-end TTS system which is fully parallel.

【Keywords】:

1037. Improving Question Generation with Sentence-Level Semantic Matching and Answer Position Inferring.

Paper Link】 【Pages】:8464-8471

【Authors】: Xiyao Ma ; Qile Zhu ; Yanlin Zhou ; Xiaolin Li

【Abstract】: Taking an answer and its context as input, sequence-to-sequence models have made considerable progress on question generation. However, we observe that these approaches often generate wrong question words or keywords and copy answer-irrelevant words from the input. We believe that lacking global question semantics and exploiting answer position-awareness not well are the key root causes. In this paper, we propose a neural question generation model with two general modules: sentence-level semantic matching and answer position inferring. Further, we enhance the initial state of the decoder by leveraging the answer-aware gated fusion mechanism. Experimental results demonstrate that our model outperforms the state-of-the-art (SOTA) models on SQuAD and MARCO datasets. Owing to its generality, our work also improves the existing models significantly.

【Keywords】:

1038. CAWA: An Attention-Network for Credit Attribution.

Paper Link】 【Pages】:8472-8479

【Authors】: Saurav Manchanda ; George Karypis

【Abstract】: Credit attribution is the task of associating individual parts in a document with their most appropriate class labels. It is an important task with applications to information retrieval and text summarization. When labeled training data is available, traditional approaches for sequence tagging can be used for credit attribution. However, generating such labeled datasets is expensive and time-consuming. In this paper, we present Credit Attribution With Attention (CAWA), a neural-network-based approach, that instead of using sentence-level labeled data, uses the set of class labels that are associated with an entire document as a source of distant-supervision. CAWA combines an attention mechanism with a multilabel classifier into an end-to-end learning framework to perform credit attribution. CAWA labels the individual sentences from the input document using the resultant attention-weights. CAWA improves upon the state-of-the-art credit attribution approach by not constraining a sentence to belong to just one class, but modeling each sentence as a distribution over all classes, leading to better modeling of semantically-similar classes. Experiments on the credit attribution task on a variety of datasets show that the sentence class labels generated by CAWA outperform the competing approaches. Additionally, on the multilabel text classification task, CAWA performs better than the competing credit attribution approaches1.

【Keywords】:

1039. Robust Named Entity Recognition with Truecasing Pretraining.

Paper Link】 【Pages】:8480-8487

【Authors】: Stephen Mayhew ; Nitish Gupta ; Dan Roth

【Abstract】: Although modern named entity recognition (NER) systems show impressive performance on standard datasets, they perform poorly when presented with noisy data. In particular, capitalization is a strong signal for entities in many languages, and even state of the art models overfit to this feature, with drastically lower performance on uncapitalized text. In this work, we address the problem of robustness of NER systems in data with noisy or uncertain casing, using a pretraining objective that predicts casing in text, or a truecaser, leveraging unlabeled data. The pretrained truecaser is combined with a standard BiLSTM-CRF model for NER by appending output distributions to character embeddings. In experiments over several datasets of varying domain and casing quality, we show that our new model improves performance in uncased text, even adding value to uncased BERT embeddings. Our method achieves a new state of the art on the WNUT17 shared task dataset.

【Keywords】:

1040. Simplify-Then-Translate: Automatic Preprocessing for Black-Box Translation.

Paper Link】 【Pages】:8488-8495

【Authors】: Sneha Mehta ; Bahareh Azarnoush ; Boris Chen ; Avneesh Saluja ; Vinith Misra ; Ballav Bihani ; Ritwik Kumar

【Abstract】: Black-box machine translation systems have proven incredibly useful for a variety of applications yet by design are hard to adapt, tune to a specific domain, or build on top of. In this work, we introduce a method to improve such systems via automatic pre-processing (APP) using sentence simplification. We first propose a method to automatically generate a large in-domain paraphrase corpus through back-translation with a black-box MT system, which is used to train a paraphrase model that “simplifies” the original sentence to be more conducive for translation. The model is used to preprocess source sentences of multiple low-resource language pairs. We show that this preprocessing leads to better translation performance as compared to non-preprocessed source sentences. We further perform side-by-side human evaluation to verify that translations of the simplified sentences are better than the original ones. Finally, we provide some guidance on recommended language pairs for generating the simplification model corpora by investigating the relationship between ease of translation of a language pair (as measured by BLEU) and quality of the resulting simplification model from back-translations of this language pair (as measured by SARI), and tie this into the downstream task of low-resource translation.

【Keywords】:

1041. RefNet: A Reference-Aware Network for Background Based Conversation.

Paper Link】 【Pages】:8496-8503

【Authors】: Chuan Meng ; Pengjie Ren ; Zhumin Chen ; Christof Monz ; Jun Ma ; Maarten de Rijke

【Abstract】: Existing conversational systems tend to generate generic responses. Recently, Background Based Conversation (BBCs) have been introduced to address this issue. Here, the generated responses are grounded in some background information. The proposed methods for BBCs are able to generate more informative responses, however, they either cannot generate natural responses or have difficulties in locating the right background information. In this paper, we propose a Reference-aware Network (RefNet) to address both issues. Unlike existing methods that generate responses token by token, RefNet incorporates a novel reference decoder that provides an alternative way to learn to directly select a semantic unit (e.g., a span containing complete semantic information) from the background. Experimental results show that RefNet significantly outperforms state-of-the-art methods in terms of both automatic and human evaluations, indicating that RefNet can generate more appropriate and human-like responses.

【Keywords】:

1042. Enhancing Natural Language Inference Using New and Expanded Training Data Sets and New Learning Models.

Paper Link】 【Pages】:8504-8511

【Authors】: Arindam Mitra ; Ishan Shrivastava ; Chitta Baral

【Abstract】: Natural Language Inference (NLI) plays an important role in many natural language processing tasks such as question answering. However, existing NLI modules that are trained on existing NLI datasets have several drawbacks. For example, they do not capture the notion of entity and role well and often end up making mistakes such as “Peter signed a deal” can be inferred from “John signed a deal”. As part of this work, we have developed two datasets that help mitigate such issues and make the systems better at understanding the notion of “entities” and “roles”. After training the existing models on the new dataset we observe that the existing models do not perform well on one of the new benchmark. We then propose a modification to the “word-to-word” attention function which has been uniformly reused across several popular NLI architectures. The resulting models perform as well as their unmodified counterparts on the existing benchmarks and perform significantly well on the new benchmarks that emphasize “roles” and “entities”.

【Keywords】:

1043. TRENDNERT: A Benchmark for Trend and Downtrend Detection in a Scientific Domain.

Paper Link】 【Pages】:8512-8519

【Authors】: Alena Moiseeva ; Hinrich Schütze

【Abstract】: Computational analysis and modeling of the evolution of trends is an important area of research in Natural Language Processing (NLP) because of its socio-economic impact. However, no large publicly available benchmark for trend detection currently exists, making a comparative evaluation of methods impossible. We remedy this situation by publishing the benchmark TRENDNERT, consisting of a set of gold trends and downtrends and document labels that is available as an unrestricted download, and a large underlying document collection that can also be obtained for free. We propose Mean Average Precision (MAP) as an evaluation measure for trend detection and apply this measure in an investigation of several baselines.

【Keywords】:

1044. Conclusion-Supplement Answer Generation for Non-Factoid Questions.

Paper Link】 【Pages】:8520-8527

【Authors】: Makoto Nakatsuji ; Sohei Okui

【Abstract】: This paper tackles the goal of conclusion-supplement answer generation for non-factoid questions, which is a critical issue in the field of Natural Language Processing (NLP) and Artificial Intelligence (AI), as users often require supplementary information before accepting a conclusion. The current encoder-decoder framework, however, has difficulty generating such answers, since it may become confused when it tries to learn several different long answers to the same non-factoid question. Our solution, called an ensemble network, goes beyond single short sentences and fuses logically connected conclusion statements and supplementary statements. It extracts the context from the conclusion decoder's output sequence and uses it to create supplementary decoder states on the basis of an attention mechanism. It also assesses the closeness of the question encoder's output sequence and the separate outputs of the conclusion and supplement decoders as well as their combination. As a result, it generates answers that match the questions and have natural-sounding supplementary sequences in line with the context expressed by the conclusion sequence. Evaluations conducted on datasets including “Love Advice” and “Arts & Humanities” categories indicate that our model outputs much more accurate results than the tested baseline models do.

【Keywords】:

1045. Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction.

Paper Link】 【Pages】:8528-8535

【Authors】: Tapas Nayak ; Hwee Tou Ng

【Abstract】: A relation tuple consists of two entities and the relation between them, and often such tuples are found in unstructured text. There may be multiple relation tuples present in a text and they may share one or both entities among them. Extracting such relation tuples from a sentence is a difficult task and sharing of entities or overlapping entities among the tuples makes it more challenging. Most prior work adopted a pipeline approach where entities were identified first followed by finding the relations among them, thus missing the interaction among the relation tuples in a sentence. In this paper, we propose two approaches to use encoder-decoder architecture for jointly extracting entities and relations. In the first approach, we propose a representation scheme for relation tuples which enables the decoder to generate one word at a time like machine translation models and still finds all the tuples present in a sentence with full entity names of different length and with overlapping entities. Next, we propose a pointer network-based decoding approach where an entire tuple is generated at every time step. Experiments on the publicly available New York Times corpus show that our proposed approaches outperform previous work and achieve significantly higher F1 scores.

【Keywords】:

1046. Merging Weak and Active Supervision for Semantic Parsing.

Paper Link】 【Pages】:8536-8543

【Authors】: Ansong Ni ; Pengcheng Yin ; Graham Neubig

【Abstract】: A semantic parser maps natural language commands (NLs) from the users to executable meaning representations (MRs), which are later executed in certain environment to obtain user-desired results. The fully-supervised training of such parser requires NL/MR pairs, annotated by domain experts, which makes them expensive to collect. However, weakly-supervised semantic parsers are learnt only from pairs of NL and expected execution results, leaving the MRs latent. While weak supervision is cheaper to acquire, learning from this input poses difficulties. It demands that parsers search a large space with a very weak learning signal and it is hard to avoid spurious MRs that achieve the correct answer in the wrong way. These factors lead to a performance gap between parsers trained in weakly- and fully-supervised setting. To bridge this gap, we examine the intersection between weak supervision and active learning, which allows the learner to actively select examples and query for manual annotations as extra supervision to improve the model trained under weak supervision. We study different active learning heuristics for selecting examples to query, and various forms of extra supervision for such queries. We evaluate the effectiveness of our method on two different datasets. Experiments on the WikiSQL show that by annotating only 1.8% of examples, we improve over a state-of-the-art weakly-supervised baseline by 6.4%, achieving an accuracy of 79.0%, which is only 1.3% away from the model trained with full supervision. Experiments on WikiTableQuestions with human annotators show that our method can improve the performance with only 100 active queries, especially for weakly-supervised parsers learnt from a cold start. 1

【Keywords】:

1047. Message Passing Attention Networks for Document Understanding.

Paper Link】 【Pages】:8544-8551

【Authors】: Giannis Nikolentzos ; Antoine J.-P. Tixier ; Michalis Vazirgiannis

【Abstract】: Graph neural networks have recently emerged as a very effective framework for processing graph-structured data. These models have achieved state-of-the-art performance in many tasks. Most graph neural networks can be described in terms of message passing, vertex update, and readout functions. In this paper, we represent documents as word co-occurrence networks and propose an application of the message passing framework to NLP, the Message Passing Attention network for Document understanding (MPAD). We also propose several hierarchical variants of MPAD. Experiments conducted on 10 standard text classification datasets show that our architectures are competitive with the state-of-the-art. Ablation studies reveal further insights about the impact of the different components on performance. Code is publicly available at: https://github.com/giannisnik/mpad.

【Keywords】:

1048. Deep Residual-Dense Lattice Network for Speech Enhancement.

Paper Link】 【Pages】:8552-8559

【Authors】: Mohammad Nikzad ; Aaron Nicolson ; Yongsheng Gao ; Jun Zhou ; Kuldip K. Paliwal ; Fanhua Shang

【Abstract】: Convolutional neural networks (CNNs) with residual links (ResNets) and causal dilated convolutional units have been the network of choice for deep learning approaches to speech enhancement. While residual links improve gradient flow during training, feature diminution of shallow layer outputs can occur due to repetitive summations with deeper layer outputs. One strategy to improve feature re-usage is to fuse both ResNets and densely connected CNNs (DenseNets). DenseNets, however, over-allocate parameters for feature re-usage. Motivated by this, we propose the residual-dense lattice network (RDL-Net), which is a new CNN for speech enhancement that employs both residual and dense aggregations without over-allocating parameters for feature re-usage. This is managed through the topology of the RDL blocks, which limit the number of outputs used for dense aggregations. Our extensive experimental investigation shows that RDL-Nets are able to achieve a higher speech enhancement performance than CNNs that employ residual and/or dense aggregations. RDL-Nets also use substantially fewer parameters and have a lower computational requirement. Furthermore, we demonstrate that RDL-Nets outperform many state-of-the-art deep learning approaches to speech enhancement. Availability: https://github.com/nick-nikzad/RDL-SE.

【Keywords】:

1049. AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses.

Paper Link】 【Pages】:8560-8567

【Authors】: Tong Niu ; Mohit Bansal

【Abstract】: Many sequence-to-sequence dialogue models tend to generate safe, uninformative responses. There have been various useful efforts on trying to eliminate them. However, these approaches either improve decoding algorithms during inference, rely on hand-crafted features, or employ complex models. In our work, we build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. Specifically, we start with a simple yet effective automatic metric, AvgOut, which calculates the average output probability distribution of all time steps on the decoder side during training. This metric directly estimates which tokens are more likely to be generated, thus making it a faithful evaluation of the model diversity (i.e., for diverse models, the token probabilities should be more evenly distributed rather than peaked at a few dull tokens). We then leverage this novel metric to propose three models that promote diversity without losing relevance. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch; the second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level; the third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal. Moreover, we experiment with a hybrid model by combining the loss terms of MinAvgOut and RL. All four models outperform their base LSTM-RNN model on both diversity and relevance by a large margin, and are comparable to or better than competitive baselines (also verified via human evaluation). Moreover, our approaches are orthogonal to the base model, making them applicable as an add-on to other emerging better dialogue models in the future.

【Keywords】:

1050. Controlling Neural Machine Translation Formality with Synthetic Supervision.

Paper Link】 【Pages】:8568-8575

【Authors】: Xing Niu ; Marine Carpuat

【Abstract】: This work aims to produce translations that convey source language content at a formality level that is appropriate for a particular audience. Framing this problem as a neural sequence-to-sequence task ideally requires training triplets consisting of a bilingual sentence pair labeled with target language formality. However, in practice, available training examples are limited to English sentence pairs of different styles, and bilingual parallel sentences of unknown formality. We introduce a novel training scheme for multi-task models that automatically generates synthetic training triplets by inferring the missing element on the fly, thus enabling end-to-end training. Comprehensive automatic and human assessments show that our best model outperforms existing models by producing translations that better match desired formality levels while preserving the source meaning.1

【Keywords】:

1051. Fine-Grained Entity Typing for Domain Independent Entity Linking.

Paper Link】 【Pages】:8576-8583

【Authors】: Yasumasa Onoe ; Greg Durrett

【Abstract】: Neural entity linking models are very powerful, but run the risk of overfitting to the domain they are trained in. For this problem, a “domain” is characterized not just by genre of text but even by factors as specific as the particular distribution of entities, as neural models tend to overfit by memorizing properties of frequent entities in a dataset. We tackle the problem of building robust entity linking models that generalize effectively and do not rely on labeled entity linking data with a specific entity distribution. Rather than predicting entities directly, our approach models fine-grained entity properties, which can help disambiguate between even closely related entities. We derive a large inventory of types (tens of thousands) from Wikipedia categories, and use hyperlinked mentions in Wikipedia to distantly label data and train an entity typing model. At test time, we classify a mention with this typing model and use soft type predictions to link the mention to the most similar candidate entity. We evaluate our entity linking system on the CoNLL-YAGO dataset (Hoffart et al. 2011) and show that our approach outperforms prior domain-independent entity linking systems. We also test our approach in a harder setting derived from the WikilinksNED dataset (Eshel et al. 2017) where all the mention-entity pairs are unseen during test time. Results indicate that our approach generalizes better than a state-of-the-art neural model on the dataset.

【Keywords】:

1052. Mask & Focus: Conversation Modelling by Learning Concepts.

Paper Link】 【Pages】:8584-8591

【Authors】: Gaurav Pandey ; Dinesh Raghu ; Sachindra Joshi

【Abstract】: Sequence to sequence models attempt to capture the correlation between all the words in the input and output sequences. While this is quite useful for machine translation where the correlation among the words is indeed quite strong, it becomes problematic for conversation modelling where the correlation is often at a much abstract level. In contrast, humans tend to focus on the essential concepts discussed in the conversation context and generate responses accordingly. In this paper, we attempt to mimic this response generating mechanism by learning the essential concepts in the context and response in an unsupervised manner. The proposed model, referred to as Mask & Focus maps the input context to a sequence of concepts which are then used to generate the response concepts. Together, the context and the response concepts generate the final response. In order to learn context concepts from the training data automatically, we mask words in the input and observe the effect of masking on response generation. We train our model to learn those response concepts that have high mutual information with respect to the context concepts, thereby guiding the model to focus on the context concepts. Mask & Focus achieves significant improvement over the existing baselines in several established metrics for dialogues.

【Keywords】:

1053. Associating Natural Language Comment and Source Code Entities.

Paper Link】 【Pages】:8592-8599

【Authors】: Sheena Panthaplackel ; Milos Gligoric ; Raymond J. Mooney ; Junyi Jessy Li

【Abstract】: Comments are an integral part of software development; they are natural language descriptions associated with source code elements. Understanding explicit associations can be useful in improving code comprehensibility and maintaining the consistency between code and comments. As an initial step towards this larger goal, we address the task of associating entities in Javadoc comments with elements in Java source code. We propose an approach for automatically extracting supervised data using revision histories of open source projects and present a manually annotated evaluation dataset for this task. We develop a binary classifier and a sequence labeling model by crafting a rich feature set which encompasses various aspects of code, comments, and the relationships between them. Experiments show that our systems outperform several baselines learning from the proposed supervision.

【Keywords】:

1054. Knowing What, How and Why: A Near Complete Solution for Aspect-Based Sentiment Analysis.

Paper Link】 【Pages】:8600-8607

【Authors】: Haiyun Peng ; Lu Xu ; Lidong Bing ; Fei Huang ; Wei Lu ; Luo Si

【Abstract】: Target-based sentiment analysis or aspect-based sentiment analysis (ABSA) refers to addressing various sentiment analysis tasks at a fine-grained level, which includes but is not limited to aspect extraction, aspect sentiment classification, and opinion extraction. There exist many solvers of the above individual subtasks or a combination of two subtasks, and they can work together to tell a complete story, i.e. the discussed aspect, the sentiment on it, and the cause of the sentiment. However, no previous ABSA research tried to provide a complete solution in one shot. In this paper, we introduce a new subtask under ABSA, named aspect sentiment triplet extraction (ASTE). Particularly, a solver of this task needs to extract triplets (What, How, Why) from the inputs, which show WHAT the targeted aspects are, HOW their sentiment polarities are and WHY they have such polarities (i.e. opinion reasons). For instance, one triplet from “Waiters are very friendly and the pasta is simply average” could be (‘Waiters’, positive, ‘friendly’). We propose a two-stage framework to address this task. The first stage predicts what, how and why in a unified model, and then the second stage pairs up the predicted what (how) and why from the first stage to output triplets. In the experiments, our framework has set a benchmark performance in this novel triplet extraction task. Meanwhile, it outperforms a few strong baselines adapted from state-of-the-art related methods.

【Keywords】:

1055. MTSS: Learn from Multiple Domain Teachers and Become a Multi-Domain Dialogue Expert.

Paper Link】 【Pages】:8608-8615

【Authors】: Shuke Peng ; Feng Ji ; Zehao Lin ; Shaobo Cui ; Haiqing Chen ; Yin Zhang

【Abstract】: How to build a high-quality multi-domain dialogue system is a challenging work due to its complicated and entangled dialogue state space among each domain, which seriously limits the quality of dialogue policy, and further affects the generated response. In this paper, we propose a novel method to acquire a satisfying policy and subtly circumvent the knotty dialogue state representation problem in the multi-domain setting. Inspired by real school teaching scenarios, our method is composed of multiple domain-specific teachers and a universal student. Each individual teacher only focuses on one specific domain and learns its corresponding domain knowledge and dialogue policy based on a precisely extracted single domain dialogue state representation. Then, these domain-specific teachers impart their domain knowledge and policies to a universal student model and collectively make this student model a multi-domain dialogue expert. Experiment results show that our method reaches competitive results with SOTAs in both multi-domain and single domain setting.

【Keywords】:

1056. Verb Class Induction with Partial Supervision.

Paper Link】 【Pages】:8616-8623

【Authors】: Daniel W. Peterson ; Susan Windisch Brown ; Martha Palmer

【Abstract】: Dirichlet-multinomial (D-M) mixtures like latent Dirichlet allocation (LDA) are widely used for both topic modeling and clustering. Prior work on constructing Levin-style semantic verb clusters achieves state-of-the-art results using D-M mixtures for verb sense induction and clustering. We add a bias toward known clusters by explicitly labeling a small number of observations with their correct VerbNet class. We demonstrate that this partial supervision guides the resulting clusters effectively, improving the recovery of both labeled and unlabeled classes by 16%, for a joint 12% absolute improvement in F1 score compared to clustering without supervision. The resulting clusters are also more semantically coherent. Although the technical change is minor, it produces a large effect, with important practical consequences for supervised topic modeling in general.

【Keywords】:

1057. Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets.

Paper Link】 【Pages】:8624-8631

【Authors】: Fanchao Qi ; Liang Chang ; Maosong Sun ; Sicong Ouyang ; Zhiyuan Liu

【Abstract】: A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread utilization. To address the issue, we propose to build a unified sememe KB for multiple languages based on BabelNet, a multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It manually annotates sememes for over 15 thousand synsets (the entries of BabelNet). Then, we present a novel task of automatic sememe prediction for synsets, aiming to expand the seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative analyses to explore important factors and difficulties in the task. All the source code and data of this work can be obtained on https://github.com/thunlp/BabelNet-Sememe-Prediction.

【Keywords】:

1058. Translation-Based Matching Adversarial Network for Cross-Lingual Natural Language Inference.

Paper Link】 【Pages】:8632-8639

【Authors】: Kunxun Qi ; Jianfeng Du

【Abstract】: Cross-lingual natural language inference is a fundamental task in cross-lingual natural language understanding, widely addressed by neural models recently. Existing neural model based methods either align sentence embeddings between source and target languages, heavily relying on annotated parallel corpora, or exploit pre-trained cross-lingual language models that are fine-tuned on a single language and hard to transfer knowledge to another language. To resolve these limitations in existing methods, this paper proposes an adversarial training framework to enhance both pre-trained models and classical neural models for cross-lingual natural language inference. It trains on the union of data in the source language and data in the target language, learning language-invariant features to improve the inference performance. Experimental results on the XNLI benchmark demonstrate that three popular neural models enhanced by the proposed framework significantly outperform the original models.

【Keywords】:

1059. Solving Sequential Text Classification as Board-Game Playing.

Paper Link】 【Pages】:8640-8648

【Authors】: Chen Qian ; Fuli Feng ; Lijie Wen ; Zhenpeng Chen ; Li Lin ; Yanan Zheng ; Tat-Seng Chua

【Abstract】: Sequential Text Classification (STC) aims to classify a sequence of text fragments (e.g., words in a sentence or sentences in a document) into a sequence of labels. In addition to the intra-fragment text contents, considering the inter-fragment context dependencies is also important for STC. Previous sequence labeling approaches largely generate a sequence of labels in left-to-right reading order. However, the need for context information in making decisions varies across different fragments and is not strictly organized in a left-to-right order. Therefore, it is appealing to label the fragments that need less consideration of context information first before labeling the fragments that need more. In this paper, we propose a novel model that labels a sequence of fragments in jumping order. Specifically, we devise a dedicated board-game to develop a correspondence between solving STC and board-game playing. By defining proper game rules and devising a game state evaluator in which context clues are injected, at each round, each player is effectively pushed to find the optimal move without position restrictions via considering the current game state, which corresponds to producing a label for an unlabeled fragment jumpily with the consideration of the contexts clues. The final game-end state is viewed as the optimal label sequence. Extensive results on three representative datasets show that the proposed approach outperforms the state-of-the-art methods with statistical significance.

【Keywords】:

1060. Lexical Simplification with Pretrained Encoders.

Paper Link】 【Pages】:8649-8656

【Authors】: Jipeng Qiang ; Yun Li ; Yi Zhu ; Yunhao Yuan ; Xindong Wu

【Abstract】: Lexical simplification (LS) aims to replace complex words in a given sentence with their simpler alternatives of equivalent meaning. Recently unsupervised lexical simplification approaches only rely on the complex word itself regardless of the given sentence to generate candidate substitutions, which will inevitably produce a large number of spurious candidates. We present a simple LS approach that makes use of the Bidirectional Encoder Representations from Transformers (BERT) which can consider both the given sentence and the complex word during generating candidate substitutions for the complex word. Specifically, we mask the complex word of the original sentence for feeding into the BERT to predict the masked token. The predicted results will be used as candidate substitutions. Despite being entirely unsupervised, experimental results show that our approach obtains obvious improvement compared with these baselines leveraging linguistic databases and parallel corpus, outperforming the state-of-the-art by more than 12 Accuracy points on three well-known benchmarks.

【Keywords】:

1061. Dynamic Knowledge Routing Network for Target-Guided Open-Domain Conversation.

Paper Link】 【Pages】:8657-8664

【Authors】: Jinghui Qin ; Zheng Ye ; Jianheng Tang ; Xiaodan Liang

【Abstract】: Target-guided open-domain conversation aims to proactively and naturally guide a dialogue agent or human to achieve specific goals, topics or keywords during open-ended conversations. Existing methods mainly rely on single-turn data-driven learning and simple target-guided strategy without considering semantic or factual knowledge relations among candidate topics/keywords. This results in poor transition smoothness and low success rate. In this work, we adopt a structured approach that controls the intended content of system responses by introducing coarse-grained keywords, attains smooth conversation transition through turn-level supervised learning and knowledge relations between candidate keywords, and drives an conversation towards an specified target with discourse-level guiding strategy. Specially, we propose a novel dynamic knowledge routing network (DRKN) which considers semantic knowledge relations among candidate keywords for accurate next topic prediction of next discourse. With the help of more accurate keyword prediction, our keyword-augmented response retrieval module can achieve better retrieval performance and more meaningful conversations. Besides, we also propose a novel dual discourse-level target-guided strategy to guide conversations to reach their goals smoothly with higher success rate. Furthermore, to push the research boundary of target-guided open-domain conversation to match real-world scenarios better, we introduce a new large-scale Chinese target-guided open-domain conversation dataset (more than 900K conversations) crawled from Sina Weibo. Quantitative and human evaluations show our method can produce meaningful and effective target-guided conversations, significantly improving over other state-of-the-art methods by more than 20% in success rate and more than 0.6 in average smoothness score.

【Keywords】:

1062. DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act Recognition and Sentiment Classification.

Paper Link】 【Pages】:8665-8672

【Authors】: Libo Qin ; Wanxiang Che ; Yangming Li ; Minheng Ni ; Ting Liu

【Abstract】: In dialog system, dialog act recognition and sentiment classification are two correlative tasks to capture speakers' intentions, where dialog act and sentiment can indicate the explicit and the implicit intentions separately (Kim and Kim 2018). Most of the existing systems either treat them as separate tasks or just jointly model the two tasks by sharing parameters in an implicit way without explicitly modeling mutual interaction and relation. To address this problem, we propose a Deep Co-Interactive Relation Network (DCR-Net) to explicitly consider the cross-impact and model the interaction between the two tasks by introducing a co-interactive relation layer. In addition, the proposed relation layer can be stacked to gradually capture mutual knowledge with multiple steps of interaction. Especially, we thoroughly study different relation layers and their effects. Experimental results on two public datasets (Mastodon and Dailydialog) show that our model outperforms the state-of-the-art joint model by 4.3% and 3.4% in terms of F1 score on dialog act recognition task, 5.7% and 12.4% on sentiment classification respectively. Comprehensive analysis empirically verifies the effectiveness of explicitly modeling the relation between the two tasks and the multi-steps interaction mechanism. Finally, we employ the Bidirectional Encoder Representation from Transformer (BERT) in our framework, which can further boost our performance in both tasks.

【Keywords】:

1063. Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs.

Paper Link】 【Pages】:8673-8680

【Authors】: Pengda Qin ; Xin Wang ; Wenhu Chen ; Chunyun Zhang ; Weiran Xu ; William Yang Wang

【Abstract】: Large-scale knowledge graphs (KGs) are shown to become more important in current information systems. To expand the coverage of KGs, previous studies on knowledge graph completion need to collect adequate training instances for newly-added relations. In this paper, we consider a novel formulation, zero-shot learning, to free this cumbersome curation. For newly-added relations, we attempt to learn their semantic features from their text descriptions and hence recognize the facts of unseen relations with no examples being seen. For this purpose, we leverage Generative Adversarial Networks (GANs) to establish the connection between text and knowledge graph domain: The generator learns to generate the reasonable relation embeddings merely with noisy text descriptions. Under this setting, zero-shot learning is naturally converted to a traditional supervised classification task. Empirically, our method is model-agnostic that could be potentially applied to any version of KG embeddings, and consistently yields performance improvements on NELL and Wiki dataset.

【Keywords】:

1064. Entrainment2Vec: Embedding Entrainment for Multi-Party Dialogues.

Paper Link】 【Pages】:8681-8688

【Authors】: Zahra Rahimi ; Diane J. Litman

【Abstract】: Entrainment is the propensity of speakers to begin behaving like one another in conversation. While most entrainment studies have focused on dyadic interactions, researchers have also started to investigate multi-party conversations. In these studies, multi-party entrainment has typically been estimated by averaging the pairs' entrainment values or by averaging individuals' entrainment to the group. While such multi-party measures utilize the strength of dyadic entrainment, they have not yet exploited different aspects of the dynamics of entrainment relations in multi-party groups. In this paper, utilizing an existing pairwise asymmetric entrainment measure, we propose a novel graph-based vector representation of multi-party entrainment that incorporates both strength and dynamics of pairwise entrainment relations. The proposed kernel approach and weakly-supervised representation learning method show promising results at the downstream task of predicting team outcomes. Also, examining the embedding, we found interesting information about the dynamics of the entrainment relations. For example, teams with more influential members have more process conflict.

【Keywords】:

1065. Towards Scalable Multi-Domain Conversational Agents: The Schema-Guided Dialogue Dataset.

Paper Link】 【Pages】:8689-8696

【Authors】: Abhinav Rastogi ; Xiaoxue Zang ; Srinivas Sunkara ; Raghav Gupta ; Pranav Khaitan

【Abstract】: Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including language understanding, slot filling, dialogue state tracking and response generation. Along the same lines, we present a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots, provided as input, using their natural language descriptions. This allows a single dialogue system to easily support a large number of services and facilitates simple integration of new services without requiring additional training data. Building upon the proposed paradigm, we release a model for dialogue state tracking capable of zero-shot generalization to new APIs, while remaining competitive in the regular setting.

【Keywords】:

1066. Thinking Globally, Acting Locally: Distantly Supervised Global-to-Local Knowledge Selection for Background Based Conversation.

Paper Link】 【Pages】:8697-8704

【Authors】: Pengjie Ren ; Zhumin Chen ; Christof Monz ; Jun Ma ; Maarten de Rijke

【Abstract】: Background Based Conversation (BBCs) have been introduced to help conversational systems avoid generating overly generic responses. In a BBC, the conversation is grounded in a knowledge source. A key challenge in BBCs is Knowledge Selection (KS): given a conversational context, try to find the appropriate background knowledge (a text fragment containing related facts or comments, etc.) based on which to generate the next response. Previous work addresses KS by employing attention and/or pointer mechanisms. These mechanisms use a local perspective, i.e., they select a token at a time based solely on the current decoding state. We argue for the adoption of a global perspective, i.e., pre-selecting some text fragments from the background knowledge that could help determine the topic of the next response. We enhance KS in BBCs by introducing a Global-to-Local Knowledge Selection (GLKS) mechanism. Given a conversational context and background knowledge, we first learn a topic transition vector to encode the most likely text fragments to be used in the next response, which is then used to guide the local KS at each decoding timestamp. In order to effectively learn the topic transition vector, we propose a distantly supervised learning schema. Experimental results show that the GLKS model significantly outperforms state-of-the-art methods in terms of both automatic and human evaluation. More importantly, GLKS achieves this without requiring any extra annotations, which demonstrates its high degree of scalability.

【Keywords】:

1067. Multi-Task Learning with Generative Adversarial Training for Multi-Passage Machine Reading Comprehension.

Paper Link】 【Pages】:8705-8712

【Authors】: Qiyu Ren ; Xiang Cheng ; Sen Su

【Abstract】: Multi-passage machine reading comprehension (MRC) aims to answer a question by multiple passages. Existing multi-passage MRC approaches have shown that employing passages with and without golden answers (i.e. labeled and unlabeled passages) for model training can improve prediction accuracy. In this paper, we present MG-MRC, a novel approach for multi-passage MRC via multi-task learning with generative adversarial training. MG-MRC adopts the extract-then-select framework, where an extractor is first used to predict answer candidates, then a selector is used to choose the final answer. In MG-MRC, we adopt multi-task learning to train the extractor by using both labeled and unlabeled passages. In particular, we use labeled passages to train the extractor by supervised learning, while using unlabeled passages to train the extractor by generative adversarial training, where the extractor is regarded as the generator and a discriminator is introduced to evaluate the generated answer candidates. Moreover, to train the extractor by backpropagation in the generative adversarial training process, we propose a hybrid method which combines boundary-based and content-based extracting methods to produce the answer candidate set and its representation. The experimental results on three open-domain QA datasets confirm the effectiveness of our approach.

【Keywords】:

1068. Probing Natural Language Inference Models through Semantic Fragments.

Paper Link】 【Pages】:8713-8721

【Authors】: Kyle Richardson ; Hai Hu ; Lawrence S. Moss ; Ashish Sabharwal

【Abstract】: Do state-of-the-art models for language understanding already have, or can they easily learn, abilities such as boolean coordination, quantification, conditionals, comparatives, and monotonicity reasoning (i.e., reasoning about word substitutions in sentential contexts)? While such phenomena are involved in natural language inference (NLI) and go beyond basic linguistic understanding, it is unclear the extent to which they are captured in existing NLI benchmarks and effectively learned by models. To investigate this, we propose the use of semantic fragments—systematically generated datasets that each target a different semantic phenomenon—for probing, and efficiently improving, such capabilities of linguistic models. This approach to creating challenge datasets allows direct control over the semantic diversity and complexity of the targeted linguistic phenomena, and results in a more precise characterization of a model's linguistic behavior. Our experiments, using a library of 8 such semantic fragments, reveal two remarkable findings: (a) State-of-the-art models, including BERT, that are pre-trained on existing NLI benchmark datasets perform poorly on these new fragments, even though the phenomena probed here are central to the NLI task; (b) On the other hand, with only a few minutes of additional fine-tuning—with a carefully selected learning rate and a novel variation of “inoculation”—a BERT-based model can master all of these logic and monotonicity fragments while retaining its performance on established NLI benchmarks.

【Keywords】:

1069. Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks.

Paper Link】 【Pages】:8722-8731

【Authors】: Anna Rogers ; Olga Kovaleva ; Matthew Downey ; Anna Rumshisky

【Abstract】: The recent explosion in question answering research produced a wealth of both factoid reading comprehension (RC) and commonsense reasoning datasets. Combining them presents a different kind of task: deciding not simply whether information is present in the text, but also whether a confident guess could be made for the missing information. We present QuAIL, the first RC dataset to combine text-based, world knowledge and unanswerable questions, and to provide question type annotation that would enable diagnostics of the reasoning strategies by a given QA system. QuAIL contains 15K multi-choice questions for 800 texts in 4 domains. Crucially, it offers both general and text-specific questions, unlikely to be found in pretraining data. We show that QuAIL poses substantial challenges to the current state-of-the-art systems, with a 30% drop in accuracy compared to the most similar existing dataset.

【Keywords】:

1070. WinoGrande: An Adversarial Winograd Schema Challenge at Scale.

Paper Link】 【Pages】:8732-8740

【Authors】: Keisuke Sakaguchi ; Ronan Le Bras ; Chandra Bhagavatula ; Yejin Choi

【Abstract】: The Winograd Schema Challenge (WSC) (Levesque, Davis, and Morgenstern 2011), a benchmark for commonsense reasoning, is a set of 273 expert-crafted pronoun resolution problems originally designed to be unsolvable for statistical models that rely on selectional preferences or word associations. However, recent advances in neural language models have already reached around 90% accuracy on variants of WSC. This raises an important question whether these models have truly acquired robust commonsense capabilities or whether they rely on spurious biases in the datasets that lead to an overestimation of the true capabilities of machine commonsense.To investigate this question, we introduce WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset. The key steps of the dataset construction consist of (1) a carefully designed crowdsourcing procedure, followed by (2) systematic bias reduction using a novel AfLite algorithm that generalizes human-detectable word associations to machine-detectable embedding associations. The best state-of-the-art methods on WinoGrande achieve 59.4 – 79.1%, which are ∼15-35% (absolute) below human performance of 94.0%, depending on the amount of the training data allowed (2% – 100% respectively).Furthermore, we establish new state-of-the-art results on five related benchmarks — WSC (→ 90.1%), DPR (→ 93.1%), COPA(→ 90.6%), KnowRef (→ 85.6%), and Winogender (→ 97.1%). These results have dual implications: on one hand, they demonstrate the effectiveness of WinoGrande when used as a resource for transfer learning. On the other hand, they raise a concern that we are likely to be overestimating the true capabilities of machine commonsense across all these benchmarks. We emphasize the importance of algorithmic bias reduction in existing and future benchmarks to mitigate such overestimation.

【Keywords】:

1071. Hierarchical Reinforcement Learning for Open-Domain Dialog.

Paper Link】 【Pages】:8741-8748

【Authors】: Abdelrhman Saleh ; Natasha Jaques ; Asma Ghandeharioun ; Judy Hanwen Shen ; Rosalind W. Picard

【Abstract】: Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text. Reinforcement Learning (RL) is a powerful framework that could potentially address these issues, for example by allowing a dialog model to optimize for reducing toxicity and repetitiveness. However, previous approaches which apply RL to open-domain dialog generation do so at the word level, making it difficult for the model to learn proper credit assignment for long-term conversational rewards. In this paper, we propose a novel approach to hierarchical reinforcement learning (HRL), VHRL, which uses policy gradients to tune the utterance-level embedding of a variational sequence model. This hierarchical approach provides greater flexibility for learning long-term, conversational rewards. We use self-play and RL to optimize for a set of human-centered conversation metrics, and show that our approach provides significant improvements – in terms of both human evaluation and automatic metrics – over state-of-the-art dialog models, including Transformers.

【Keywords】:

1072. CASIE: Extracting Cybersecurity Event Information from Text.

Paper Link】 【Pages】:8749-8757

【Authors】: Taneeya Satyapanich ; Francis Ferraro ; Tim Finin

【Abstract】: We present CASIE, a system that extracts information about cybersecurity events from text and populates a semantic model, with the ultimate goal of integration into a knowledge graph of cybersecurity data. It was trained on a new corpus of 1,000 English news articles from 2017–2019 that are labeled with rich, event-based annotations and that covers both cyberattack and vulnerability-related events. Our model defines five event subtypes along with their semantic roles and 20 event-relevant argument types (e.g., file, device, software, money). CASIE uses different deep neural networks approaches with attention and can incorporate rich linguistic features and word embeddings. We have conducted experiments on each component in the event detection pipeline and the results show that each subsystem performs well.

【Keywords】:

1073. SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation.

Paper Link】 【Pages】:8758-8765

【Authors】: Bianca Scarlini ; Tommaso Pasini ; Roberto Navigli

【Abstract】: Contextual representations of words derived by neural language models have proven to effectively encode the subtle distinctions that might occur between different meanings of the same word. However, these representations are not tied to a semantic network, hence they leave the word meanings implicit and thereby neglect the information that can be derived from the knowledge base itself. In this paper, we propose SensEmBERT, a knowledge-based approach that brings together the expressive power of language modelling and the vast amount of knowledge contained in a semantic network to produce high-quality latent semantic representations of word meanings in multiple languages. Our vectors lie in a space comparable with that of contextualized word embeddings, thus allowing a word occurrence to be easily linked to its meaning by applying a simple nearest neighbour approach.We show that, whilst not relying on manual semantic annotations, SensEmBERT is able to either achieve or surpass state-of-the-art results attained by most of the supervised neural approaches on the English Word Sense Disambiguation task. When scaling to other languages, our representations prove to be equally effective as their English counterpart and outperform the existing state of the art on all the Word Sense Disambiguation multilingual datasets. The embeddings are released in five different languages at http://sensembert.org.

【Keywords】:

1074. Rare Words: A Major Problem for Contextualized Embeddings and How to Fix it by Attentive Mimicking.

Paper Link】 【Pages】:8766-8774

【Authors】: Timo Schick ; Hinrich Schütze

【Abstract】: Pretraining deep neural network architectures with a language modeling objective has brought large improvements for many natural language processing tasks. Exemplified by BERT, a recently proposed such architecture, we demonstrate that despite being trained on huge amounts of data, deep language models still struggle to understand rare words. To fix this problem, we adapt Attentive Mimicking, a method that was designed to explicitly learn embeddings for rare words, to deep language models. In order to make this possible, we introduce one-token approximation, a procedure that enables us to use Attentive Mimicking even when the underlying language model uses subword-based tokenization, i.e., it does not assign embeddings to all words. To evaluate our method, we create a novel dataset that tests the ability of language models to capture semantic properties of words without any task-specific fine-tuning. Using this dataset, we show that adding our adapted version of Attentive Mimicking to BERT does substantially improve its understanding of rare words.

【Keywords】:

1075. Can Embeddings Adequately Represent Medical Terminology? New Large-Scale Medical Term Similarity Datasets Have the Answer!

Paper Link】 【Pages】:8775-8782

【Authors】: Claudia Schulz ; Damir Juric

【Abstract】: A large number of embeddings trained on medical data have emerged, but it remains unclear how well they represent medical terminology, in particular whether the close relationship of semantically similar medical terms is encoded in these embeddings. To date, only small datasets for testing medical term similarity are available, not allowing to draw conclusions about the generalisability of embeddings to the enormous amount of medical terms used by doctors. We present multiple automatically created large-scale medical term similarity datasets and confirm their high quality in an annotation study with doctors. We evaluate state-of-the-art word and contextual embeddings on our new datasets, comparing multiple vector similarity metrics and word vector aggregation techniques. Our results show that current embeddings are limited in their ability to adequately encode medical terms. The novel datasets thus form a challenging new benchmark for the development of medical embeddings able to accurately represent the whole medical terminology.

【Keywords】:

1076. Interpretable Rumor Detection in Microblogs by Attending to User Interactions.

Paper Link】 【Pages】:8783-8790

【Authors】: Ling Min Serena Khoo ; Hai Leong Chieu ; Zhong Qian ; Jing Jiang

【Abstract】: We address rumor detection by learning to differentiate between the community's response to real and fake claims in microblogs. Existing state-of-the-art models are based on tree models that model conversational trees. However, in social media, a user posting a reply might be replying to the entire thread rather than to a specific user. We propose a post-level attention model (PLAN) to model long distance interactions between tweets with the multi-head attention mechanism in a transformer network. We investigated variants of this model: (1) a structure aware self-attention model (StA-PLAN) that incorporates tree structure information in the transformer network, and (2) a hierarchical token and post-level attention model (StA-HiTPLAN) that learns a sentence representation with token-level self-attention. To the best of our knowledge, we are the first to evaluate our models on two rumor detection data sets: the PHEME data set as well as the Twitter15 and Twitter16 data sets. We show that our best models outperform current state-of-the-art models for both data sets. Moreover, the attention mechanism allows us to explain rumor detection predictions at both token-level and post-level.

【Keywords】:

1077. Automatic Fact-Guided Sentence Modification.

Paper Link】 【Pages】:8791-8798

【Authors】: Darsh J. Shah ; Tal Schuster ; Regina Barzilay

【Abstract】: Online encyclopediae like Wikipedia contain large amounts of text that need frequent corrections and updates. The new information may contradict existing content in encyclopediae. In this paper, we focus on rewriting such dynamically changing articles. This is a challenging constrained generation task, as the output must be consistent with the new information and fit into the rest of the existing document. To this end, we propose a two-step solution: (1) We identify and remove the contradicting components in a target text for a given claim, using a neutralizing stance model; (2) We expand the remaining text to be consistent with the given claim, using a novel two-encoder sequence-to-sequence model with copy attention. Applied to a Wikipedia fact update dataset, our method successfully generates updated sentences for new claims, achieving the highest SARI score. Furthermore, we demonstrate that generating synthetic data through such rewritten sentences can successfully augment the FEVER fact-checking training dataset, leading to a relative error reduction of 13%.1

【Keywords】:

1078. Are Noisy Sentences Useless for Distant Supervised Relation Extraction?

Paper Link】 【Pages】:8799-8806

【Authors】: Yuming Shang ; He Yan Huang ; Xianling Mao ; Xin Sun ; Wei Wei

【Abstract】: The noisy labeling problem has been one of the major obstacles for distant supervised relation extraction. Existing approaches usually consider that the noisy sentences are useless and will harm the model's performance. Therefore, they mainly alleviate this problem by reducing the influence of noisy sentences, such as applying bag-level selective attention or removing noisy sentences from sentence-bags. However, the underlying cause of the noisy labeling problem is not the lack of useful information, but the missing relation labels. Intuitively, if we can allocate credible labels for noisy sentences, they will be transformed into useful training data and benefit the model's performance. Thus, in this paper, we propose a novel method for distant supervised relation extraction, which employs unsupervised deep clustering to generate reliable labels for noisy sentences. Specifically, our model contains three modules: a sentence encoder, a noise detector and a label generator. The sentence encoder is used to obtain feature representations. The noise detector detects noisy sentences from sentence-bags, and the label generator produces high-confidence relation labels for noisy sentences. Extensive experimental results demonstrate that our model outperforms the state-of-the-art baselines on a popular benchmark dataset, and can indeed alleviate the noisy labeling problem.

【Keywords】:

1079. Graph-Based Transformer with Cross-Candidate Verification for Semantic Parsing.

Paper Link】 【Pages】:8807-8814

【Authors】: Bo Shao ; Yeyun Gong ; Weizhen Qi ; Guihong Cao ; Jianshu Ji ; Xiaola Lin

【Abstract】: In this paper, we present a graph-based Transformer for semantic parsing. We separate the semantic parsing task into two steps: 1) Use a sequence-to-sequence model to generate the logical form candidates. 2) Design a graph-based Transformer to rerank the candidates. To handle the structure of logical forms, we incorporate graph information to Transformer, and design a cross-candidate verification mechanism to consider all the candidates in the ranking process. Furthermore, we integrate BERT into our model and jointly train the graph-based Transformer and BERT. We conduct experiments on 3 semantic parsing benchmarks, ATIS, JOBS and Task Oriented semantic Parsing dataset (TOP). Experiments show that our graph-based reranking model achieves results comparable to state-of-the-art models on the ATIS and JOBS datasets. And on the TOP dataset, our model achieves a new state-of-the-art result.

【Keywords】:

1080. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT.

Paper Link】 【Pages】:8815-8821

【Authors】: Sheng Shen ; Zhen Dong ; Jiayu Ye ; Linjian Ma ; Zhewei Yao ; Amir Gholami ; Michael W. Mahoney ; Kurt Keutzer

【Abstract】: Transformer based architectures have become de-facto models used for a range of Natural Language Processing tasks. In particular, the BERT based models achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However, BERT based models have a prohibitive memory footprint and latency. As a result, deploying BERT based models in resource constrained environments has become a challenging task. In this work, we perform an extensive analysis of fine-tuned BERT models using second order Hessian information, and we use our results to propose a novel method for quantizing BERT models to ultra low precision. In particular, we propose a new group-wise quantization scheme, and we use Hessian-based mix-precision method to compress the model further. We extensively test our proposed method on BERT downstream tasks of SST-2, MNLI, CoNLL-03, and SQuAD. We can achieve comparable performance to baseline with at most 2.3% performance degradation, even with ultra-low precision quantization down to 2 bits, corresponding up to 13× compression of the model parameters, and up to 4× compression of the embedding table as well as activations. Among all tasks, we observed the highest performance loss for BERT fine-tuned on SQuAD. By probing into the Hessian based analysis as well as visualization, we show that this is related to the fact that current training/fine-tuning strategy of BERT does not converge for SQuAD.

【Keywords】:

1081. On the Generation of Medical Question-Answer Pairs.

Paper Link】 【Pages】:8822-8829

【Authors】: Sheng Shen ; Yaliang Li ; Nan Du ; Xian Wu ; Yusheng Xie ; Shen Ge ; Tao Yang ; Kai Wang ; Xingzheng Liang ; Wei Fan

【Abstract】: Question answering (QA) has achieved promising progress recently. However, answering a question in real-world scenarios like the medical domain is still challenging, due to the requirement of external knowledge and the insufficient quantity of high-quality training data. In the light of these challenges, we study the task of generating medical QA pairs in this paper. With the insight that each medical question can be considered as a sample from the latent distribution of questions given answers, we propose an automated medical QA pair generation framework, consisting of an unsupervised key phrase detector that explores unstructured material for validity, and a generator that involves a multi-pass decoder to integrate structural knowledge for diversity. A series of experiments have been conducted on a real-world dataset collected from the National Medical Licensing Examination of China. Both automatic evaluation and human annotation demonstrate the effectiveness of the proposed method. Further investigation shows that, by incorporating the generated QA pairs for training, significant improvement in terms of accuracy can be achieved for the examination QA system. 1

【Keywords】:

1082. IntroVNMT: An Introspective Model for Variational Neural Machine Translation.

Paper Link】 【Pages】:8830-8837

【Authors】: Xin Sheng ; Linli Xu ; Junliang Guo ; Jingchang Liu ; Ruoyu Zhao ; Yinlong Xu

【Abstract】: We propose a novel introspective model for variational neural machine translation (IntroVNMT) in this paper, inspired by the recent successful application of introspective variational autoencoder (IntroVAE) in high quality image synthesis. Different from the vanilla variational NMT model, IntroVNMT is capable of improving itself introspectively by evaluating the quality of the generated target sentences according to the high-level latent variables of the real and generated target sentences. As a consequence of introspective training, the proposed model is able to discriminate between the generated and real sentences of the target language via the latent variables generated by the encoder of the model. In this way, IntroVNMT is able to generate more realistic target sentences in practice. In the meantime, IntroVNMT inherits the advantages of the variational autoencoders (VAEs), and the model training process is more stable than the generative adversarial network (GAN) based models. Experimental results on different translation tasks demonstrate that the proposed model can achieve significant improvements over the vanilla variational NMT model.

【Keywords】:

1083. Understanding Medical Conversations with Scattered Keyword Attention and Weak Supervision from Responses.

Paper Link】 【Pages】:8838-8845

【Authors】: Xiaoming Shi ; Haifeng Hu ; Wanxiang Che ; Zhongqian Sun ; Ting Liu ; Junzhou Huang

【Abstract】: In this work, we consider the medical slot filling problem, i.e., the problem of converting medical queries into structured representations which is a challenging task. We analyze the effectiveness of two points: scattered keywords in user utterances and weak supervision with responses. We approach the medical slot filling as a multi-label classification problem with label-embedding attentive model to pay more attention to scattered medical keywords and learn the classification models by weak-supervision from responses. To evaluate the approaches, we annotate a medical slot filling data and collect a large scale unlabeled data. The experiments demonstrate that these two points are promising to improve the task.

【Keywords】:

1084. Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference Using a Delta Posterior.

Paper Link】 【Pages】:8846-8853

【Authors】: Raphael Shu ; Jason Lee ; Hideki Nakayama ; Kyunghyun Cho

【Abstract】: Although neural machine translation models reached high translation quality, the autoregressive nature makes inference difficult to parallelize and leads to high translation latency. Inspired by recent refinement-based approaches, we propose LaNMT, a latent-variable non-autoregressive model with continuous latent variables and deterministic inference procedure. In contrast to existing approaches, we use a deterministic inference algorithm to find the target sequence that maximizes the lowerbound to the log-probability. During inference, the length of translation automatically adapts itself. Our experiments show that the lowerbound can be greatly increased by running the inference algorithm, resulting in significantly improved translation quality. Our proposed model closes the performance gap between non-autoregressive and autoregressive approaches on ASPEC Ja-En dataset with 8.6x faster decoding. On WMT'14 En-De dataset, our model narrows the gap with autoregressive baseline to 2.0 BLEU points with 12.5x speedup. By decoding multiple initial latent variables in parallel and rescore using a teacher model, the proposed model further brings the gap down to 1.0 BLEU point on WMT'14 En-De task with 6.8x speedup.

【Keywords】:

1085. Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation.

Paper Link】 【Pages】:8854-8861

【Authors】: Aditya Siddhant ; Melvin Johnson ; Henry Tsai ; Naveen Ari ; Jason Riesa ; Ankur Bapna ; Orhan Firat ; Karthik Raman

【Abstract】: The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model (Aharoni, Johnson, and Firat 2019). Its improved translation performance on low resource languages hints at potential cross-lingual transfer capability for downstream tasks. In this paper, we evaluate the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream classification and sequence labeling tasks covering a diverse set of over 50 languages. We compare against a strong baseline, multilingual BERT (mBERT) (Devlin et al. 2018), in different cross-lingual transfer learning scenarios and show gains in zero-shot transfer in 4 out of these 5 tasks.

【Keywords】:

1086. Low Resource Sequence Tagging with Weak Labels.

Paper Link】 【Pages】:8862-8869

【Authors】: Edwin Simpson ; Jonas Pfeiffer ; Iryna Gurevych

【Abstract】: Current methods for sequence tagging depend on large quantities of domain-specific training data, limiting their use in new, user-defined tasks with few or no annotations. While crowdsourcing can be a cheap source of labels, it often introduces errors that degrade the performance of models trained on such crowdsourced data. Another solution is to use transfer learning to tackle low resource sequence labelling, but current approaches rely heavily on similar high resource datasets in different languages. In this paper, we propose a domain adaptation method using Bayesian sequence combination to exploit pre-trained models and unreliable crowdsourced data that does not require high resource data in a different language. Our method boosts performance by learning the relationship between each labeller and the target task and trains a sequence labeller on the target domain with little or no gold-standard data. We apply our approach to labelling diagnostic classes in medical and educational case studies, showing that the model achieves strong performance though zero-shot transfer learning and is more effective than alternative ensemble methods. Using NER and information extraction tasks, we show how our approach can train a model directly from crowdsourced labels, outperforming pipeline approaches that first aggregate the crowdsourced data, then train on the aggregated labels.

【Keywords】:

1087. Modelling Form-Meaning Systematicity with Linguistic and Visual Features.

Paper Link】 【Pages】:8870-8877

【Authors】: Arie Soeteman ; E. Dario Gutiérrez ; Elia Bruni ; Ekaterina Shutova

【Abstract】: Several studies in linguistics and natural language processing (NLP) pointed out systematic correspondences between word form and meaning in language. A prominent example of such systematicity is iconicity, which occurs when the form of a word is motivated by some perceptual (e.g. visual) aspect of its referent. However, the existing data-driven approaches to form-meaning systematicity modelled word meanings relying on information extracted from textual data alone. In this paper, we investigate to what extent our visual experience explains some of the form-meaning systematicity found in language. We construct word meaning representations from linguistic as well as visual data and analyze the structure and significance of form-meaning systematicity found in English using these models. Our findings corroborate the existence of form-meaning systematicity and show that this systematicity is concentrated in localized clusters. Furthermore, applying a multimodal approach allows us to identify new patterns of systematicity that have not been previously identified with the text-based models.

【Keywords】:

1088. Generating Persona Consistent Dialogues by Exploiting Natural Language Inference.

Paper Link】 【Pages】:8878-8885

【Authors】: Haoyu Song ; Wei-Nan Zhang ; Jingwen Hu ; Ting Liu

【Abstract】: Consistency is one of the major challenges faced by dialogue agents. A human-like dialogue agent should not only respond naturally, but also maintain a consistent persona. In this paper, we exploit the advantages of natural language inference (NLI) technique to address the issue of generating persona consistent dialogues. Different from existing work that re-ranks the retrieved responses through an NLI model, we cast the task as a reinforcement learning problem and propose to exploit the NLI signals from response-persona pairs as rewards for the process of dialogue generation. Specifically, our generator employs an attention-based encoder-decoder to generate persona-based responses. Our evaluator consists of two components: an adversarially trained naturalness module and an NLI based consistency module. Moreover, we use another well-performed NLI model in the evaluation of persona-consistency. Experimental results on both human and automatic metrics, including the model-based consistency evaluation, demonstrate that the proposed approach outperforms strong generative baselines, especially in the persona-consistency of generated responses.

【Keywords】:

1089. Alignment-Enhanced Transformer for Constraining NMT with Pre-Specified Translations.

Paper Link】 【Pages】:8886-8893

【Authors】: Kai Song ; Kun Wang ; Heng Yu ; Yue Zhang ; Zhongqiang Huang ; Weihua Luo ; Xiangyu Duan ; Min Zhang

【Abstract】: We investigate the task of constraining NMT with pre-specified translations, which has practical significance for a number of research and industrial applications. Existing works impose pre-specified translations as lexical constraints during decoding, which are based on word alignments derived from target-to-source attention weights. However, multiple recent studies have found that word alignment derived from generic attention heads in the Transformer is unreliable. We address this problem by introducing a dedicated head in the multi-head Transformer architecture to capture external supervision signals. Results on five language pairs show that our method is highly effective in constraining NMT with pre-specified translations, consistently outperforming previous methods in translation quality.

【Keywords】:

1090. Joint Parsing and Generation for Abstractive Summarization.

Paper Link】 【Pages】:8894-8901

【Authors】: Kaiqiang Song ; Logan Lebanoff ; Qipeng Guo ; Xipeng Qiu ; Xiangyang Xue ; Chen Li ; Dong Yu ; Fei Liu

【Abstract】: Sentences produced by abstractive summarization systems can be ungrammatical and fail to preserve the original meanings, despite being locally fluent. In this paper we propose to remedy this problem by jointly generating a sentence and its syntactic dependency parse while performing abstraction. If generating a word can introduce an erroneous relation to the summary, the behavior must be discouraged. The proposed method thus holds promise for producing grammatical sentences and encouraging the summary to stay true-to-original. Our contributions of this work are twofold. First, we present a novel neural architecture for abstractive summarization that combines a sequential decoder with a tree-based decoder in a synchronized manner to generate a summary sentence and its syntactic parse. Secondly, we describe a novel human evaluation protocol to assess if, and to what extent, a summary remains true to its original meanings. We evaluate our method on a number of summarization datasets and demonstrate competitive results against strong baselines.

【Keywords】:

1091. Controlling the Amount of Verbatim Copying in Abstractive Summarization.

Paper Link】 【Pages】:8902-8909

【Authors】: Kaiqiang Song ; Bingqing Wang ; Zhe Feng ; Ren Liu ; Fei Liu

【Abstract】: An abstract must not change the meaning of the original text. A single most effective way to achieve that is to increase the amount of copying while still allowing for text abstraction. Human editors can usually exercise control over copying, resulting in summaries that are more extractive than abstractive, or vice versa. However, it remains poorly understood whether modern neural abstractive summarizers can provide the same flexibility, i.e., learning from single reference summaries to generate multiple summary hypotheses with varying degrees of copying. In this paper, we present a neural summarization model that, by learning from single human abstracts, can produce a broad spectrum of summaries ranging from purely extractive to highly generative ones. We frame the task of summarization as language modeling and exploit alternative mechanisms to generate summary hypotheses. Our method allows for control over copying during both training and decoding stages of a neural summarization model. Through extensive experiments we illustrate the significance of our proposed method on controlling the amount of verbatim copying and achieve competitive results over strong baselines. Our analysis further reveals interesting and unobvious facts.

【Keywords】:

1092. Attractive or Faithful? Popularity-Reinforced Learning for Inspired Headline Generation.

Paper Link】 【Pages】:8910-8917

【Authors】: Yun-Zhu Song ; Hong-Han Shuai ; Sung-Lin Yeh ; Yi-Lun Wu ; Lun-Wei Ku ; Wen-Chih Peng

【Abstract】: With the rapid proliferation of online media sources and published news, headlines have become increasingly important for attracting readers to news articles, since users may be overwhelmed with the massive information. In this paper, we generate inspired headlines that preserve the nature of news articles and catch the eye of the reader simultaneously. The task of inspired headline generation can be viewed as a specific form of Headline Generation (HG) task, with the emphasis on creating an attractive headline from a given news article. To generate inspired headlines, we propose a novel framework called POpularity-Reinforced Learning for inspired Headline Generation (PORL-HG). PORL-HG exploits the extractive-abstractive architecture with 1) Popular Topic Attention (PTA) for guiding the extractor to select the attractive sentence from the article and 2) a popularity predictor for guiding the abstractor to rewrite the attractive sentence. Moreover, since the sentence selection of the extractor is not differentiable, techniques of reinforcement learning (RL) are utilized to bridge the gap with rewards obtained from a popularity score predictor. Through quantitative and qualitative experiments, we show that the proposed PORL-HG significantly outperforms the state-of-the-art headline generation models in terms of attractiveness evaluated by both human (71.03%) and the predictor (at least 27.60%), while the faithfulness of PORL-HG is also comparable to the state-of-the-art generation model.

【Keywords】:

1093. Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets.

Paper Link】 【Pages】:8918-8927

【Authors】: Saku Sugawara ; Pontus Stenetorp ; Kentaro Inui ; Akiko Aizawa

【Abstract】: Existing analysis work in machine reading comprehension (MRC) is largely concerned with evaluating the capabilities of systems. However, the capabilities of datasets are not assessed for benchmarking language understanding precisely. We propose a semi-automated, ablation-based methodology for this challenge; By checking whether questions can be solved even after removing features associated with a skill requisite for language understanding, we evaluate to what degree the questions do not require the skill. Experiments on 10 datasets (e.g., CoQA, SQuAD v2.0, and RACE) with a strong baseline model show that, for example, the relative scores of the baseline model provided with content words only and with shuffled sentence words in the context are on average 89.2% and 78.5% of the original scores, respectively. These results suggest that most of the questions already answered correctly by the model do not necessarily require grammatical and complex reasoning. For precise benchmarking, MRC datasets will need to take extra care in their design to ensure that questions can correctly evaluate the intended skills.

【Keywords】:

1094. Relation Extraction with Convolutional Network over Learnable Syntax-Transport Graph.

Paper Link】 【Pages】:8928-8935

【Authors】: Kai Sun ; Richong Zhang ; Yongyi Mao ; Samuel Mensah ; Xudong Liu

【Abstract】: A large majority of approaches have been proposed to leverage the dependency tree in the relation classification task. Recent works have focused on pruning irrelevant information from the dependency tree. The state-of-the-art Attention Guided Graph Convolutional Networks (AGGCNs) transforms the dependency tree into a weighted-graph to distinguish the relevance of nodes and edges for relation classification. However, in their approach, the graph is fully connected, which destroys the structure information of the original dependency tree. How to effectively make use of relevant information while ignoring irrelevant information from the dependency trees remains a challenge in the relation classification task. In this work, we learn to transform the dependency tree into a weighted graph by considering the syntax dependencies of the connected nodes and persisting the structure of the original dependency tree. We refer to this graph as a syntax-transport graph. We further propose a learnable syntax-transport attention graph convolutional network (LST-AGCN) which operates on the syntax-transport graph directly to distill the final representation which is sufficient for classification. Experiments on Semeval-2010 Task 8 and Tacred show our approach outperforms previous methods.

【Keywords】:

1095. Learning Sparse Sharing Architectures for Multiple Tasks.

Paper Link】 【Pages】:8936-8943

【Authors】: Tianxiang Sun ; Yunfan Shao ; Xiaonan Li ; Pengfei Liu ; Hang Yan ; Xipeng Qiu ; Xuanjing Huang

【Abstract】: Most existing deep multi-task learning models are based on parameter sharing, such as hard sharing, hierarchical sharing, and soft sharing. How choosing a suitable sharing mechanism depends on the relations among the tasks, which is not easy since it is difficult to understand the underlying shared factors among these tasks. In this paper, we propose a novel parameter sharing mechanism, named Sparse Sharing. Given multiple tasks, our approach automatically finds a sparse sharing structure. We start with an over-parameterized base network, from which each task extracts a subnetwork. The subnetworks of multiple tasks are partially overlapped and trained in parallel. We show that both hard sharing and hierarchical sharing can be formulated as particular instances of the sparse sharing framework. We conduct extensive experiments on three sequence labeling tasks. Compared with single-task models and three typical multi-task learning baselines, our proposed approach achieves consistent improvement while requiring fewer parameters.

【Keywords】:

1096. History-Adaption Knowledge Incorporation Mechanism for Multi-Turn Dialogue System.

Paper Link】 【Pages】:8944-8951

【Authors】: Yajing Sun ; Yue Hu ; Luxi Xing ; Jing Yu ; Yuqiang Xie

【Abstract】: Keeping the conversation consistent and avoiding its repetition are two key factors to construct an intelligent multi-turn knowledge-grounded dialogue system. Although some works tend to combine history with external knowledge such as personal background information to boost dialogue quality, they are prone to ignore the fact that incorporating the same knowledge multiple times into the conversation leads to repetition. The main reason is the lack of effective control over the use of knowledge on the conversation level. So we design a history-adaption knowledge incorporation mechanism to build an effective multi-turn dialogue model. Our proposed model addresses repetition by recurrently updating the knowledge from the conversation level and progressively incorporating it into the history step-by-step. And the knowledge-grounded history representation also enhances the conversation consistency. Experimental results show that our proposed model significantly outperforms several retrieval-based models on some benchmark datasets. The human evaluation demonstrates that our model can maintain conversation consistent and reduce conversation repetition.

【Keywords】:

1097. SPARQA: Skeleton-Based Semantic Parsing for Complex Questions over Knowledge Bases.

Paper Link】 【Pages】:8952-8959

【Authors】: Yawei Sun ; Lingling Zhang ; Gong Cheng ; Yuzhong Qu

【Abstract】: Semantic parsing transforms a natural language question into a formal query over a knowledge base. Many existing methods rely on syntactic parsing like dependencies. However, the accuracy of producing such expressive formalisms is not satisfying on long complex questions. In this paper, we propose a novel skeleton grammar to represent the high-level structure of a complex question. This dedicated coarse-grained formalism with a BERT-based parsing algorithm helps to improve the accuracy of the downstream fine-grained semantic parsing. Besides, to align the structure of a question with the structure of a knowledge base, our multi-strategy method combines sentence-level and word-level semantics. Our approach shows promising performance on several datasets.

【Keywords】:

1098. Neural Semantic Parsing in Low-Resource Settings with Back-Translation and Meta-Learning.

Paper Link】 【Pages】:8960-8967

【Authors】: Yibo Sun ; Duyu Tang ; Nan Duan ; Yeyun Gong ; Xiaocheng Feng ; Bing Qin ; Daxin Jiang

【Abstract】: Neural semantic parsing has achieved impressive results in recent years, yet its success relies on the availability of large amounts of supervised data. Our goal is to learn a neural semantic parser when only prior knowledge about a limited number of simple rules is available, without access to either annotated programs or execution results. Our approach is initialized by rules, and improved in a back-translation paradigm using generated question-program pairs from the semantic parser and the question generator. A phrase table with frequent mapping patterns is automatically derived, also updated as training progresses, to measure the quality of generated instances. We train the model with model-agnostic meta-learning to guarantee the accuracy and stability on examples covered by rules, and meanwhile acquire the versatility to generalize well on examples uncovered by rules. Results on three benchmark datasets with different domains and programs show that our approach incrementally improves the accuracy. On WikiSQL, our best model is comparable to the state-of-the-art system learned from denotations.

【Keywords】:

1099. ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding.

Paper Link】 【Pages】:8968-8975

【Authors】: Yu Sun ; Shuohuan Wang ; Yu-Kun Li ; Shikun Feng ; Hao Tian ; Hua Wu ; Haifeng Wang

【Abstract】: Recently pre-trained models have achieved state-of-the-art results in various language understanding tasks. Current pre-training procedures usually focus on training the model with several simple tasks to grasp the co-occurrence of words or sentences. However, besides co-occurring information, there exists other valuable lexical, syntactic and semantic information in training corpora, such as named entities, semantic closeness and discourse relations. In order to extract the lexical, syntactic and semantic information from training corpora, we propose a continual pre-training framework named ERNIE 2.0 which incrementally builds pre-training tasks and then learn pre-trained models on these constructed tasks via continual multi-task learning. Based on this framework, we construct several tasks and train the ERNIE 2.0 model to capture lexical, syntactic and semantic aspects of information in the training data. Experimental results demonstrate that ERNIE 2.0 model outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several similar tasks in Chinese. The source codes and pre-trained models have been released at https://github.com/PaddlePaddle/ERNIE.

【Keywords】:

1100. Generating Diverse Translation by Manipulating Multi-Head Attention.

Paper Link】 【Pages】:8976-8983

【Authors】: Zewei Sun ; Shujian Huang ; Hao-Ran Wei ; Xinyu Dai ; Jiajun Chen

【Abstract】: Transformer model (Vaswani et al. 2017) has been widely used in machine translation tasks and obtained state-of-the-art results. In this paper, we report an interesting phenomenon in its encoder-decoder multi-head attention: different attention heads of the final decoder layer align to different word translation candidates. We empirically verify this discovery and propose a method to generate diverse translations by manipulating heads. Furthermore, we make use of these diverse translations with the back-translation technique for better data augmentation. Experiment results show that our method generates diverse translations without a severe drop in translation quality. Experiments also show that back-translation with these diverse translations could bring a significant improvement in performance on translation tasks. An auxiliary experiment of conversation response generation task proves the effect of diversity as well.

【Keywords】:

1101. TreeGen: A Tree-Based Transformer Architecture for Code Generation.

Paper Link】 【Pages】:8984-8991

【Authors】: Zeyu Sun ; Qihao Zhu ; Yingfei Xiong ; Yican Sun ; Lili Mou ; Lu Zhang

【Abstract】: A code generation system generates programming language code based on an input natural language description. State-of-the-art approaches rely on neural networks for code generation. However, these code generators suffer from two problems. One is the long dependency problem, where a code element often depends on another far-away code element. A variable reference, for example, depends on its definition, which may appear quite a few lines before. The other problem is structure modeling, as programs contain rich structural information. In this paper, we propose a novel tree-based neural architecture, TreeGen, for code generation. TreeGen uses the attention mechanism of Transformers to alleviate the long-dependency problem, and introduces a novel AST reader (encoder) to incorporate grammar rules and AST structures into the network. We evaluated TreeGen on a Python benchmark, HearthStone, and two semantic parsing benchmarks, ATIS and GEO. TreeGen outperformed the previous state-of-the-art approach by 4.5 percentage points on HearthStone, and achieved the best accuracy among neural network-based approaches on ATIS (89.1%) and GEO (89.6%). We also conducted an ablation test to better understand each component of our model.

【Keywords】:

1102. Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis.

Paper Link】 【Pages】:8992-8999

【Authors】: Zhongkai Sun ; Prathusha Kameswara Sarma ; William A. Sethares ; Yingyu Liang

【Abstract】: Multimodal language analysis often considers relationships between features based on text and those based on acoustical and visual properties. Text features typically outperform non-text features in sentiment analysis or emotion recognition tasks in part because the text features are derived from advanced language models or word embeddings trained on massive data sources while audio and video features are human-engineered and comparatively underdeveloped. Given that the text, audio, and video are describing the same utterance in different ways, we hypothesize that the multimodal sentiment analysis and emotion recognition can be improved by learning (hidden) correlations between features extracted from the outer product of text and audio (we call this text-based audio) and analogous text-based video. This paper proposes a novel model, the Interaction Canonical Correlation Network (ICCN), to learn such multimodal embeddings. ICCN learns correlations between all three modes via deep canonical correlation analysis (DCCA) and the proposed embeddings are then tested on several benchmark datasets and against other state-of-the-art multimodal embedding algorithms. Empirical results and ablation studies confirm the effectiveness of ICCN in capturing useful information from all three views.

【Keywords】:

1103. Distributed Representations for Arithmetic Word Problems.

Paper Link】 【Pages】:9000-9007

【Authors】: Sowmya S. Sundaram ; Deepak P ; Savitha Sam Abraham

【Abstract】: We consider the task of learning distributed representations for arithmetic word problems. We outline the characteristics of the domain of arithmetic word problems that make generic text embedding methods inadequate, necessitating a specialized representation learning method to facilitate the task of retrieval across a wide range of use cases within online learning platforms. Our contribution is two-fold; first, we propose several 'operators' that distil knowledge of the domain of arithmetic word problems and schemas into word problem transformations. Second, we propose a novel neural architecture that combines LSTMs with graph convolutional networks to leverage word problems and their operator-transformed versions to learn distributed representations for word problems. While our target is to ensure that the distributed representations are schema-aligned, we do not make use of schema labels in the learning process, thus yielding an unsupervised representation learning method. Through an evaluation on retrieval over a publicly available corpus of word problems, we illustrate that our framework is able to consistently improve upon contemporary generic text embeddings in terms of schema-alignment.

【Keywords】:

1104. Adapting Language Models for Non-Parallel Author-Stylized Rewriting.

Paper Link】 【Pages】:9008-9015

【Authors】: Bakhtiyar Syed ; Gaurav Verma ; Balaji Vasan Srinivasan ; Anandhavelu Natarajan ; Vasudeva Varma

【Abstract】: Given the recent progress in language modeling using Transformer-based neural models and an active interest in generating stylized text, we present an approach to leverage the generalization capabilities of a language model to rewrite an input text in a target author's style. Our proposed approach adapts a pre-trained language model to generate author-stylized text by fine-tuning on the author-specific corpus using a denoising autoencoder (DAE) loss in a cascaded encoder-decoder framework. Optimizing over DAE loss allows our model to learn the nuances of an author's style without relying on parallel data, which has been a severe limitation of the previous related works in this space. To evaluate the efficacy of our approach, we propose a linguistically-motivated framework to quantify stylistic alignment of the generated text to the target author at lexical, syntactic and surface levels. The evaluation framework is both interpretable as it leads to several insights about the model, and self-contained as it does not rely on external classifiers, e.g. sentiment or formality classifiers. Qualitative and quantitative assessment indicates that the proposed approach rewrites the input text with better alignment to the target style while preserving the original content better than state-of-the-art baselines.

【Keywords】:

1105. Boundary Enhanced Neural Span Classification for Nested Named Entity Recognition.

Paper Link】 【Pages】:9016-9023

【Authors】: Chuanqi Tan ; Wei Qiu ; Mosha Chen ; Rui Wang ; Fei Huang

【Abstract】: Named entity recognition (NER) is a well-studied task in natural language processing. However, the widely-used sequence labeling framework is usually difficult to detect entities with nested structures. The span-based method that can easily detect nested entities in different subsequences is naturally suitable for the nested NER problem. However, previous span-based methods have two main issues. First, classifying all subsequences is computationally expensive and very inefficient at inference. Second, the span-based methods mainly focus on learning span representations but lack of explicit boundary supervision. To tackle the above two issues, we propose a boundary enhanced neural span classification model. In addition to classifying the span, we propose incorporating an additional boundary detection task to predict those words that are boundaries of entities. The two tasks are jointly trained under a multitask learning framework, which enhances the span representation with additional boundary supervision. In addition, the boundary detection model has the ability to generate high-quality candidate spans, which greatly reduces the time complexity during inference. Experiments show that our approach outperforms all existing methods and achieves 85.3, 83.9, and 78.3 scores in terms of F1 on the ACE2004, ACE2005, and GENIA datasets, respectively.

【Keywords】:

1106. Multi-Label Patent Categorization with Non-Local Attention-Based Graph Convolutional Network.

Paper Link】 【Pages】:9024-9031

【Authors】: Pingjie Tang ; Meng Jiang ; Bryan (Ning) Xia ; Jed W. Pitera ; Jeffrey Welser ; Nitesh V. Chawla

【Abstract】: Patent categorization, which is to assign multiple International Patent Classification (IPC) codes to a patent document, relies heavily on expert efforts, as it requires substantial domain knowledge. When formulated as a multi-label text classification (MTC) problem, it draws two challenges to existing models: one is to learn effective document representations from text content; the other is to model the cross-section behavior of label set. In this work, we propose a label attention model based on graph convolutional network. It jointly learns the document-word associations and word-word co-occurrences to generate rich semantic embeddings of documents. It employs a non-local attention mechanism to learn label representations in the same space of document representations for multi-label classification. On a large CIRCA patent database, we evaluate the performance of our model and as many as seven competitive baselines. We find that our model outperforms all those prior state of the art by a large margin and achieves high performance on P@k and nDCG@k.

【Keywords】:

1107. Capturing Sentence Relations for Answer Sentence Selection with Multi-Perspective Graph Encoding.

Paper Link】 【Pages】:9032-9039

【Authors】: Zhixing Tian ; Yuanzhe Zhang ; Xinwei Feng ; Wenbin Jiang ; Yajuan Lyu ; Kang Liu ; Jun Zhao

【Abstract】: This paper focuses on the answer sentence selection task. Unlike previous work, which only models the relation between the question and each candidate sentence, we propose Multi-Perspective Graph Encoder (MPGE) to take the relations among the candidate sentences into account and capture the relations from multiple perspectives. By utilizing MPGE as a module, we construct two answer sentence selection models which are based on traditional representation and pre-trained representation, respectively. We conduct extensive experiments on two datasets, WikiQA and SQuAD. The results show that the proposed MPGE is effective for both types of representation. Moreover, the overall performance of our proposed model surpasses the state-of-the-art on both datasets. Additionally, we further validate the robustness of our method by the adversarial examples of AddSent and AddOneSent.

【Keywords】:

1108. Image Enhanced Event Detection in News Articles.

Paper Link】 【Pages】:9040-9047

【Authors】: Meihan Tong ; Shuai Wang ; Yixin Cao ; Bin Xu ; Juanzi Li ; Lei Hou ; Tat-Seng Chua

【Abstract】: Event detection is a crucial and challenging sub-task of event extraction, which suffers from a severe ambiguity issue of trigger words. Existing works mainly focus on using textual context information, while there naturally exist many images accompanied by news articles that are yet to be explored. We believe that images not only reflect the core events of the text, but are also helpful for the disambiguation of trigger words. In this paper, we first contribute an image dataset supplement to ED benchmarks (i.e., ACE2005) for training and evaluation. We then propose a novel Dual Recurrent Multimodal Model, DRMM, to conduct deep interactions between images and sentences for modality features aggregation. DRMM utilizes pre-trained BERT and ResNet to encode sentences and images, and employs an alternating dual attention to select informative features for mutual enhancements. Our superior performance compared to six state-of-art baselines as well as further ablation studies demonstrate the significance of image modality and effectiveness of the proposed architecture. The code and image dataset are avaliable at https://github.com/shuaiwa16/image-enhanced-event-extraction.

【Keywords】:

1109. Fine-Grained Argument Unit Recognition and Classification.

Paper Link】 【Pages】:9048-9056

【Authors】: Dietrich Trautmann ; Johannes Daxenberger ; Christian Stab ; Hinrich Schütze ; Iryna Gurevych

【Abstract】: Prior work has commonly defined argument retrieval from heterogeneous document collections as a sentence-level classification task. Consequently, argument retrieval suffers both from low recall and from sentence segmentation errors making it difficult for humans and machines to consume the arguments. In this work, we argue that the task should be performed on a more fine-grained level of sequence labeling. For this, we define the task as Argument Unit Recognition and Classification (AURC). We present a dataset of arguments from heterogeneous sources annotated as spans of tokens within a sentence, as well as with a corresponding stance. We show that and how such difficult argument annotations can be effectively collected through crowdsourcing with high inter-annotator agreement. The new benchmark, AURC-8, contains up to 15% more arguments per topic as compared to annotations on the sentence level. We identify a number of methods targeted at AURC sequence labeling, achieving close to human performance on known domains. Further analysis also reveals that, contrary to previous approaches, our methods are more robust against sentence segmentation errors. We publicly release our code and the AURC-8 dataset.1

【Keywords】:

1110. Sentence Generation for Entity Description with Content-Plan Attention.

Paper Link】 【Pages】:9057-9064

【Authors】: Bayu Distiawan Trisedya ; Jianzhong Qi ; Rui Zhang

【Abstract】: We study neural data-to-text generation. Specifically, we consider a target entity that is associated with a set of attributes. We aim to generate a sentence to describe the target entity. Previous studies use encoder-decoder frameworks where the encoder treats the input as a linear sequence and uses LSTM to encode the sequence. However, linearizing a set of attributes may not yield the proper order of the attributes, and hence leads the encoder to produce an improper context to generate a description. To handle disordered input, recent studies propose two-stage neural models that use pointer networks to generate a content-plan (i.e., content-planner) and use the content-plan as input for an encoder-decoder model (i.e., text generator). However, in two-stage models, the content-planner may yield an incomplete content-plan, due to missing one or more salient attributes in the generated content-plan. This will in turn cause the text generator to generate an incomplete description. To address these problems, we propose a novel attention model that exploits content-plan to highlight salient attributes in a proper order. The challenge of integrating a content-plan in the attention model of an encoder-decoder framework is to align the content-plan and the generated description. We handle this problem by devising a coverage mechanism to track the extent to which the content-plan is exposed in the previous decoding time-step, and hence it helps our proposed attention model select the attributes to be mentioned in the description in a proper order. Experimental results show that our model outperforms state-of-the-art baselines by up to 3% and 5% in terms of BLEU score on two real-world datasets, respectively.

【Keywords】:

1111. Capturing Greater Context for Question Generation.

Paper Link】 【Pages】:9065-9072

【Authors】: Luu Anh Tuan ; Darsh J. Shah ; Regina Barzilay

【Abstract】: Automatic question generation can benefit many applications ranging from dialogue systems to reading comprehension. While questions are often asked with respect to long documents, there are many challenges with modeling such long documents. Many existing techniques generate questions by effectively looking at one sentence at a time, leading to questions that are easy and not reflective of the human process of question generation. Our goal is to incorporate interactions across multiple sentences to generate realistic questions for long documents. In order to link a broad document context to the target answer, we represent the relevant context via a multi-stage attention mechanism, which forms the foundation of a sequence to sequence model. We outperform state-of-the-art methods on question generation on three question-answering datasets - SQuAD, MS MARCO and NewsQA. 1

【Keywords】:

1112. Select, Answer and Explain: Interpretable Multi-Hop Reading Comprehension over Multiple Documents.

Paper Link】 【Pages】:9073-9080

【Authors】: Ming Tu ; Kevin Huang ; Guangtao Wang ; Jing Huang ; Xiaodong He ; Bowen Zhou

【Abstract】: Interpretable multi-hop reading comprehension (RC) over multiple documents is a challenging problem because it demands reasoning over multiple information sources and explaining the answer prediction by providing supporting evidences. In this paper, we propose an effective and interpretable Select, Answer and Explain (SAE) system to solve the multi-document RC problem. Our system first filters out answer-unrelated documents and thus reduce the amount of distraction information. This is achieved by a document classifier trained with a novel pairwise learning-to-rank loss. The selected answer-related documents are then input to a model to jointly predict the answer and supporting sentences. The model is optimized with a multi-task learning objective on both token level for answer prediction and sentence level for supporting sentences prediction, together with an attention-based interaction between these two tasks. Evaluated on HotpotQA, a challenging multi-hop RC data set, the proposed SAE system achieves top competitive performance in distractor setting compared to other existing systems on the leaderboard.

【Keywords】:

1113. An Annotated Corpus of Reference Resolution for Interpreting Common Grounding.

Paper Link】 【Pages】:9081-9089

【Authors】: Takuma Udagawa ; Akiko Aizawa

【Abstract】: Common grounding is the process of creating, repairing and updating mutual understandings, which is a fundamental aspect of natural language conversation. However, interpreting the process of common grounding is a challenging task, especially under continuous and partially-observable context where complex ambiguity, uncertainty, partial understandings and misunderstandings are introduced. Interpretation becomes even more challenging when we deal with dialogue systems which still have limited capability of natural language understanding and generation. To address this problem, we consider reference resolution as the central subtask of common grounding and propose a new resource to study its intermediate process. Based on a simple and general annotation schema, we collected a total of 40,172 referring expressions in 5,191 dialogues curated from an existing corpus, along with multiple judgements of referent interpretations. We show that our annotation is highly reliable, captures the complexity of common grounding through a natural degree of reasonable disagreements, and allows for more detailed and quantitative analyses of common grounding strategies. Finally, we demonstrate the advantages of our annotation for interpreting, analyzing and improving common grounding in baseline dialogue systems.

【Keywords】:

1114. A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings.

Paper Link】 【Pages】:9090-9097

【Authors】: Niels van der Heijden ; Samira Abnar ; Ekaterina Shutova

【Abstract】: The lack of annotated data in many languages is a well-known challenge within the field of multilingual natural language processing (NLP). Therefore, many recent studies focus on zero-shot transfer learning and joint training across languages to overcome data scarcity for low-resource languages. In this work we (i) perform a comprehensive comparison of state-of-the-art multilingual word and sentence encoders on the tasks of named entity recognition (NER) and part of speech (POS) tagging; and (ii) propose a new method for creating multilingual contextualized word embeddings, compare it to multiple baselines and show that it performs at or above state-of-the-art level in zero-shot transfer settings. Finally, we show that our method allows for better knowledge sharing across languages in a joint training setting.

【Keywords】:

1115. A Joint Model for Definition Extraction with Syntactic Connection and Semantic Consistency.

Paper Link】 【Pages】:9098-9105

【Authors】: Amir Pouran Ben Veyseh ; Franck Dernoncourt ; Dejing Dou ; Thien Huu Nguyen

【Abstract】: Definition Extraction (DE) is one of the well-known topics in Information Extraction that aims to identify terms and their corresponding definitions in unstructured texts. This task can be formalized either as a sentence classification task (i.e., containing term-definition pairs or not) or a sequential labeling task (i.e., identifying the boundaries of the terms and definitions). The previous works for DE have only focused on one of the two approaches, failing to model the inter-dependencies between the two tasks. In this work, we propose a novel model for DE that simultaneously performs the two tasks in a single framework to benefit from their inter-dependencies. Our model features deep learning architectures to exploit the global structures of the input sentences as well as the semantic consistencies between the terms and the definitions, thereby improving the quality of the representation vectors for DE. Besides the joint inference between sentence classification and sequential labeling, the proposed model is fundamentally different from the prior work for DE in that the prior work has only employed the local structures of the input sentences (i.e., word-to-word relations), and not yet considered the semantic consistencies between terms and definitions. In order to implement these novel ideas, our model presents a multi-task learning framework that employs graph convolutional neural networks and predicts the dependency paths between the terms and the definitions. We also seek to enforce the consistency between the representations of the terms and definitions both globally (i.e., increasing semantic consistency between the representations of the entire sentences and the terms/definitions) and locally (i.e., promoting the similarity between the representations of the terms and the definitions). The extensive experiments on three benchmark datasets demonstrate the effectiveness of our approach.1

【Keywords】:

1116. Multi-View Consistency for Relation Extraction via Mutual Information and Structure Prediction.

Paper Link】 【Pages】:9106-9113

【Authors】: Amir Pouran Ben Veyseh ; Franck Dernoncourt ; My Tra Thai ; Dejing Dou ; Thien Huu Nguyen

【Abstract】: Relation Extraction (RE) is one of the fundamental tasks in Information Extraction. The goal of this task is to find the semantic relations between entity mentions in text. It has been shown in many previous work that the structure of the sentences (i.e., dependency trees) can provide important information/features for the RE models. However, the common limitation of the previous work on RE is the reliance on some external parsers to obtain the syntactic trees for the sentence structures. On the one hand, it is not guaranteed that the independent external parsers can offer the optimal sentence structures for RE and the customized structures for RE might help to further improve the performance. On the other hand, the quality of the external parsers might suffer when applied to different domains, thus also affecting the performance of the RE models on such domains. In order to overcome this issue, we introduce a novel method for RE that simultaneously induces the structures and predicts the relations for the input sentences, thus avoiding the external parsers and potentially leading to better sentence structures for RE. Our general strategy to learn the RE-specific structures is to apply two different methods to infer the structures for the input sentences (i.e., two views). We then introduce several mechanisms to encourage the structure and semantic consistencies between these two views so the effective structure and semantic representations for RE can emerge. We perform extensive experiments on the ACE 2005 and SemEval 2010 datasets to demonstrate the advantages of the proposed method, leading to the state-of-the-art performance on such datasets.

【Keywords】:

1117. Parsing as Pretraining.

Paper Link】 【Pages】:9114-9121

【Authors】: David Vilares ; Michalina Strzyz ; Anders Søgaard ; Carlos Gómez-Rodríguez

【Abstract】: Recent analyses suggest that encoders pretrained for language modeling capture certain morpho-syntactic structure. However, probing frameworks for word vectors still do not report results on standard setups such as constituent and dependency parsing. This paper addresses this problem and does full parsing (on English) relying only on pretraining architectures – and no decoding. We first cast constituent and dependency parsing as sequence tagging. We then use a single feed-forward layer to directly map word vectors to labels that encode a linearized tree. This is used to: (i) see how far we can reach on syntax modelling with just pretrained encoders, and (ii) shed some light about the syntax-sensitivity of different word vectors (by freezing the weights of the pretraining network during training). For evaluation, we use bracketing F1-score and las, and analyze in-depth differences across representations for span lengths and dependency displacements. The overall results surpass existing sequence tagging parsers on the ptb (93.5%) and end-to-end en-ewt ud (78.8%).

【Keywords】:

1118. Target-Aspect-Sentiment Joint Detection for Aspect-Based Sentiment Analysis.

Paper Link】 【Pages】:9122-9129

【Authors】: Hai Wan ; Yufei Yang ; Jianfeng Du ; Yanan Liu ; Kunxun Qi ; Jeff Z. Pan

【Abstract】: Aspect-based sentiment analysis (ABSA) aims to detect the targets (which are composed by continuous words), aspects and sentiment polarities in text. Published datasets from SemEval-2015 and SemEval-2016 reveal that a sentiment polarity depends on both the target and the aspect. However, most of the existing methods consider predicting sentiment polarities from either targets or aspects but not from both, thus they easily make wrong predictions on sentiment polarities. In particular, where the target is implicit, i.e., it does not appear in the given text, the methods predicting sentiment polarities from targets do not work. To tackle these limitations in ABSA, this paper proposes a novel method for target-aspect-sentiment joint detection. It relies on a pre-trained language model and can capture the dependence on both targets and aspects for sentiment prediction. Experimental results on the SemEval-2015 and SemEval-2016 restaurant datasets show that the proposed method achieves a high performance in detecting target-aspect-sentiment triples even for the implicit target cases; moreover, it even outperforms the state-of-the-art methods for those subtasks of target-aspect-sentiment detection that they are competent to.

【Keywords】:

1119. Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling.

Paper Link】 【Pages】:9130-9137

【Authors】: Yu Wan ; Baosong Yang ; Derek F. Wong ; Lidia S. Chao ; Haihua Du ; Ben C. H. Ao

【Abstract】: As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation. In this paper, we investigate how to exploit the commonality and diversity between dialects thus to build unsupervised translation models merely accessing to monolingual data. Specifically, we leverage pivot-private embedding, layer coordination, as well as parameter sharing to sufficiently model commonality and diversity among source and target, ranging from lexical, through syntactic, to semantic levels. In order to examine the effectiveness of the proposed models, we collect 20 million monolingual corpus for each of Mandarin and Cantonese, which are official language and the most widely used dialect in China. Experimental results reveal that our methods outperform rule-based simplified and traditional Chinese conversion and conventional unsupervised translation models over 12 BLEU scores.

【Keywords】:

1120. Neural Question Generation with Answer Pivot.

Paper Link】 【Pages】:9138-9145

【Authors】: Bingning Wang ; Xiaochuan Wang ; Ting Tao ; Qi Zhang ; Jingfang Xu

【Abstract】: Neural question generation (NQG) is the task of generating questions from the given context with deep neural networks. Previous answer-aware NQG methods suffer from the problem that the generated answers are focusing on entity and most of the questions are trivial to be answered. The answer-agnostic NQG methods reduce the bias towards named entities and increasing the model's degrees of freedom, but sometimes result in generating unanswerable questions which are not valuable for the subsequent machine reading comprehension system. In this paper, we treat the answers as the hidden pivot for question generation and combine the question generation and answer selection process in a joint model. We achieve the state-of-the-art result on the SQuAD dataset according to automatic metric and human evaluation.

【Keywords】:

1121. ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion.

Paper Link】 【Pages】:9146-9153

【Authors】: Bingning Wang ; Ting Yao ; Qi Zhang ; Jingfang Xu ; Xiaochuan Wang

【Abstract】: This paper presents the ReCO, a human-curated Chinese Reading Comprehension dataset on Opinion. The questions in ReCO are opinion based queries issued to commercial search engine. The passages are provided by the crowdworkers who extract the support snippet from the retrieved documents. Finally, an abstractive yes/no/uncertain answer was given by the crowdworkers. The release of ReCO consists of 300k questions that to our knowledge is the largest in Chinese reading comprehension. A prominent characteristic of ReCO is that in addition to the original context paragraph, we also provided the support evidence that could be directly used to answer the question. Quality analysis demonstrates the challenge of ReCO that it requires various types of reasoning skills such as causal inference, logical reasoning, etc. Current QA models that perform very well on many question answering problems, such as BERT (Devlin et al. 2018), only achieves 77% accuracy on this dataset, a large margin behind humans nearly 92% performance, indicating ReCO present a good challenge for machine reading comprehension. The codes, dataset and leaderboard will be freely available at https://github.com/benywon/ReCO.

【Keywords】:

1122. Neural Machine Translation with Byte-Level Subwords.

Paper Link】 【Pages】:9154-9160

【Authors】: Changhan Wang ; Kyunghyun Cho ; Jiatao Gu

【Abstract】: Almost all existing machine translation models are built on top of character-based vocabularies: characters, subwords or words. Rare characters from noisy text or character-rich languages such as Japanese and Chinese however can unnecessarily take up vocabulary slots and limit its compactness. Representing text at the level of bytes and using the 256 byte set as vocabulary is a potential solution to this issue. High computational cost has however prevented it from being widely deployed or used in practice. In this paper, we investigate byte-level subwords, specifically byte-level BPE (BBPE), which is compacter than character vocabulary and has no out-of-vocabulary tokens, but is more efficient than using pure bytes only is. We claim that contextualizing BBPE embeddings is necessary, which can be implemented by a convolutional or recurrent layer. Our experiments show that BBPE has comparable performance to BPE while its size is only 1/8 of that for BPE. In the multilingual setting, BBPE maximizes vocabulary sharing across many languages and achieves better translation quality. Moreover, we show that BBPE enables transferring models between languages with non-overlapping character sets.

【Keywords】:

1123. Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation.

Paper Link】 【Pages】:9161-9168

【Authors】: Chengyi Wang ; Yu Wu ; Shujie Liu ; Zhenglu Yang ; Ming Zhou

【Abstract】: End-to-end speech translation, a hot topic in recent years, aims to translate a segment of audio into a specific language with an end-to-end model. Conventional approaches employ multi-task learning and pre-training methods for this task, but they suffer from the huge gap between pre-training and fine-tuning. To address these issues, we propose a Tandem Connectionist Encoding Network (TCEN) which bridges the gap by reusing all subnets in fine-tuning, keeping the roles of subnets consistent, and pre-training the attention module. Furthermore, we propose two simple but effective methods to guarantee the speech encoder outputs and the MT encoder inputs are consistent in terms of semantic representation and sequence length. Experimental results show that our model leads to significant improvements in En-De and En-Fr translation irrespective of the backbones.

【Keywords】:

1124. Improving Knowledge-Aware Dialogue Generation via Knowledge Base Question Answering.

Paper Link】 【Pages】:9169-9176

【Authors】: Jian Wang ; Junhao Liu ; Wei Bi ; Xiaojiang Liu ; Kejing He ; Ruifeng Xu ; Min Yang

【Abstract】: Neural network models usually suffer from the challenge of incorporating commonsense knowledge into the open-domain dialogue systems. In this paper, we propose a novel knowledge-aware dialogue generation model (called TransDG), which transfers question representation and knowledge matching abilities from knowledge base question answering (KBQA) task to facilitate the utterance understanding and factual knowledge selection for dialogue generation. In addition, we propose a response guiding attention and a multi-step decoding strategy to steer our model to focus on relevant features for response generation. Experiments on two benchmark datasets demonstrate that our model has robust superiority over compared methods in generating informative and fluent dialogues. Our code is available at https://github.com/siat-nlp/TransDG.

【Keywords】:

1125. Sentiment Classification in Customer Service Dialogue with Topic-Aware Multi-Task Learning.

Paper Link】 【Pages】:9177-9184

【Authors】: Jiancheng Wang ; Jingjing Wang ; Changlong Sun ; Shoushan Li ; Xiaozhong Liu ; Luo Si ; Min Zhang ; Guodong Zhou

【Abstract】: Sentiment analysis in dialogues plays a critical role in dialogue data analysis. However, previous studies on sentiment classification in dialogues largely ignore topic information, which is important for capturing overall information in some types of dialogues. In this study, we focus on the sentiment classification task in an important type of dialogue, namely customer service dialogue, and propose a novel approach which captures overall information to enhance the classification performance. Specifically, we propose a topic-aware multi-task learning (TML) approach which learns topic-enriched utterance representations in customer service dialogue by capturing various kinds of topic information. In the experiment, we propose a large-scale and high-quality annotated corpus for the sentiment classification task in customer service dialogue and empirical studies on the proposed corpus show that our approach significantly outperforms several strong baselines.

【Keywords】:

1126. Storytelling from an Image Stream Using Scene Graphs.

Paper Link】 【Pages】:9185-9192

【Authors】: Ruize Wang ; Zhongyu Wei ; Piji Li ; Qi Zhang ; Xuanjing Huang

【Abstract】: Visual storytelling aims at generating a story from an image stream. Most existing methods tend to represent images directly with the extracted high-level features, which is not intuitive and difficult to interpret. We argue that translating each image into a graph-based semantic representation, i.e., scene graph, which explicitly encodes the objects and relationships detected within image, would benefit representing and describing images. To this end, we propose a novel graph-based architecture for visual storytelling by modeling the two-level relationships on scene graphs. In particular, on the within-image level, we employ a Graph Convolution Network (GCN) to enrich local fine-grained region representations of objects on scene graphs. To further model the interaction among images, on the cross-images level, a Temporal Convolution Network (TCN) is utilized to refine the region representations along the temporal dimension. Then the relation-aware representations are fed into the Gated Recurrent Unit (GRU) with attention mechanism for story generation. Experiments are conducted on the public visual storytelling dataset. Automatic and human evaluation results indicate that our method achieves state-of-the-art.

【Keywords】:

1127. Multi-Task Self-Supervised Learning for Disfluency Detection.

Paper Link】 【Pages】:9193-9200

【Authors】: Shaolei Wang ; Wanxiang Che ; Qi Liu ; Pengda Qin ; Ting Liu ; William Yang Wang

【Abstract】: Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasks-i.e., supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard.

【Keywords】:

1128. Probing Brain Activation Patterns by Dissociating Semantics and Syntax in Sentences.

Paper Link】 【Pages】:9201-9208

【Authors】: Shaonan Wang ; Jiajun Zhang ; Nan Lin ; Chengqing Zong

【Abstract】: The relation between semantics and syntax and where they are represented in the neural level has been extensively debated in neurosciences. Existing methods use manually designed stimuli to distinguish semantic and syntactic information in a sentence that may not generalize beyond the experimental setting. This paper proposes an alternative framework to study the brain representation of semantics and syntax. Specifically, we embed the highly-controlled stimuli as objective functions in learning sentence representations and propose a disentangled feature representation model (DFRM) to extract semantic and syntactic information in sentences. This model can generate one semantic and one syntactic vector for each sentence. Then we associate these disentangled feature vectors with brain imaging data to explore brain representation of semantics and syntax. Results have shown that semantic feature is represented more robustly than syntactic feature across the brain including the default-mode, frontoparietal, visual networks, etc.. The brain representations of semantics and syntax are largely overlapped, but there are brain regions only sensitive to one of them. For instance, several frontal and temporal regions are specific to the semantic feature; parts of the right superior frontal and right inferior parietal gyrus are specific to the syntactic feature.

【Keywords】:

1129. Multi-Level Head-Wise Match and Aggregation in Transformer for Textual Sequence Matching.

Paper Link】 【Pages】:9209-9216

【Authors】: Shuohang Wang ; Yunshi Lan ; Yi Tay ; Jing Jiang ; Jingjing Liu

【Abstract】: Transformer has been successfully applied to many natural language processing tasks. However, for textual sequence matching, simple matching between the representation of a pair of sequences might bring in unnecessary noise. In this paper, we propose a new approach to sequence pair matching with Transformer, by learning head-wise matching representations on multiple levels. Experiments show that our proposed approach can achieve new state-of-the-art performance on multiple tasks that rely only on pre-computed sequence-vector-representation, such as SNLI, MNLI-match, MNLI-mismatch, QQP, and SQuAD-binary.

【Keywords】:

1130. Masking Orchestration: Multi-Task Pretraining for Multi-Role Dialogue Representation Learning.

Paper Link】 【Pages】:9217-9224

【Authors】: Tianyi Wang ; Yating Zhang ; Xiaozhong Liu ; Changlong Sun ; Qiong Zhang

【Abstract】: Multi-role dialogue understanding comprises a wide range of diverse tasks such as question answering, act classification, dialogue summarization etc. While dialogue corpora are abundantly available, labeled data, for specific learning tasks, can be highly scarce and expensive. In this work, we investigate dialogue context representation learning with various types unsupervised pretraining tasks where the training objectives are given naturally according to the nature of the utterance and the structure of the multi-role conversation. Meanwhile, in order to locate essential information for dialogue summarization/extraction, the pretraining process enables external knowledge integration. The proposed fine-tuned pretraining mechanism is comprehensively evaluated via three different dialogue datasets along with a number of downstream dialogue-mining tasks. Result shows that the proposed pretraining mechanism significantly contributes to all the downstream tasks without discrimination to different encoders.

【Keywords】:

1131. Integrating Deep Learning with Logic Fusion for Information Extraction.

Paper Link】 【Pages】:9225-9232

【Authors】: Wenya Wang ; Sinno Jialin Pan

【Abstract】: Information extraction (IE) aims to produce structured information from an input text, e.g., Named Entity Recognition and Relation Extraction. Various attempts have been proposed for IE via feature engineering or deep learning. However, most of them fail to associate the complex relationships inherent in the task itself, which has proven to be especially crucial. For example, the relation between 2 entities is highly dependent on their entity types. These dependencies can be regarded as complex constraints that can be efficiently expressed as logical rules. To combine such logic reasoning capabilities with learning capabilities of deep neural networks, we propose to integrate logical knowledge in the form of first-order logic into a deep learning system, which can be trained jointly in an end-to-end manner. The integrated framework is able to enhance neural outputs with knowledge regularization via logic rules, and at the same time update the weights of logic rules to comply with the characteristics of the training data. We demonstrate the effectiveness and generalization of the proposed model on multiple IE tasks.

【Keywords】:

1132. Go From the General to the Particular: Multi-Domain Translation with Domain Transformation Networks.

Paper Link】 【Pages】:9233-9241

【Authors】: Yong Wang ; Longyue Wang ; Shuming Shi ; Victor O. K. Li ; Zhaopeng Tu

【Abstract】: The key challenge of multi-domain translation lies in simultaneously encoding both the general knowledge shared across domains and the particular knowledge distinctive to each domain in a unified model. Previous work shows that the standard neural machine translation (NMT) model, trained on mixed-domain data, generally captures the general knowledge, but misses the domain-specific knowledge. In response to this problem, we augment NMT model with additional domain transformation networks to transform the general representations to domain-specific representations, which are subsequently fed to the NMT decoder. To guarantee the knowledge transformation, we also propose two complementary supervision signals by leveraging the power of knowledge distillation and adversarial learning. Experimental results on several language pairs, covering both balanced and unbalanced multi-domain translation, demonstrate the effectiveness and universality of the proposed approach. Encouragingly, the proposed unified model achieves comparable results with the fine-tuning approach that requires multiple models to preserve the particular knowledge. Further analyses reveal that the domain transformation networks successfully capture the domain-specific knowledge as expected.1

【Keywords】:

Paper Link】 【Pages】:9242-9249

【Authors】: Yujing Wang ; Yaming Yang ; Yiren Chen ; Jing Bai ; Ce Zhang ; Guinan Su ; Xiaoyu Kou ; Yunhai Tong ; Mao Yang ; Lidong Zhou

【Abstract】: Learning text representation is crucial for text classification and other language related tasks. There are a diverse set of text representation networks in the literature, and how to find the optimal one is a non-trivial problem. Recently, the emerging Neural Architecture Search (NAS) techniques have demonstrated good potential to solve the problem. Nevertheless, most of the existing works of NAS focus on the search algorithms and pay little attention to the search space. In this paper, we argue that the search space is also an important human prior to the success of NAS in different applications. Thus, we propose a novel search space tailored for text representation. Through automatic search, the discovered network architecture outperforms state-of-the-art models on various public datasets on text classification and natural language inference tasks. Furthermore, some of the design principles found in the automatic network agree well with human intuition.

【Keywords】:

1134. Learning Multi-Level Dependencies for Robust Word Recognition.

Paper Link】 【Pages】:9250-9257

【Authors】: Zhiwei Wang ; Hui Liu ; Jiliang Tang ; Songfan Yang ; Gale Yan Huang ; Zitao Liu

【Abstract】: Robust language processing systems are becoming increasingly important given the recent awareness of dangerous situations where brittle machine learning models can be easily broken with the presence of noises. In this paper, we introduce a robust word recognition framework that captures multi-level sequential dependencies in noised sentences. The proposed framework employs a sequence-to-sequence model over characters of each word, whose output is given to a word-level bi-directional recurrent neural network. We conduct extensive experiments to verify the effectiveness of the framework. The results show that the proposed framework outperforms state-of-the-art methods by a large margin and they also suggest that character-level dependencies can play an important role in word recognition. The code of the proposed framework and the major experiments are publicly available1.

【Keywords】:

1135. GRET: Global Representation Enhanced Transformer.

Paper Link】 【Pages】:9258-9265

【Authors】: Rongxiang Weng ; Hao-Ran Wei ; Shujian Huang ; Heng Yu ; Lidong Bing ; Weihua Luo ; Jiajun Chen

【Abstract】: Transformer, based on the encoder-decoder framework, has achieved state-of-the-art performance on several natural language generation tasks. The encoder maps the words in the input sentence into a sequence of hidden states, which are then fed into the decoder to generate the output sentence. These hidden states usually correspond to the input words and focus on capturing local information. However, the global (sentence level) information is seldom explored, leaving room for the improvement of generation quality. In this paper, we propose a novel global representation enhanced Transformer (GRET) to explicitly model global representation in the Transformer network. Specifically, in the proposed model, an external state is generated for the global representation from the encoder. The global representation is then fused into the decoder during the decoding process to improve generation quality. We conduct experiments in two text generation tasks: machine translation and text summarization. Experimental results on four WMT machine translation tasks and LCSTS text summarization task demonstrate the effectiveness of the proposed approach on natural language generation1.

【Keywords】:

1136. Acquiring Knowledge from Pre-Trained Model to Neural Machine Translation.

Paper Link】 【Pages】:9266-9273

【Authors】: Rongxiang Weng ; Heng Yu ; Shujian Huang ; Shanbo Cheng ; Weihua Luo

【Abstract】: Pre-training and fine-tuning have achieved great success in natural language process field. The standard paradigm of exploiting them includes two steps: first, pre-training a model, e.g. BERT, with a large scale unlabeled monolingual data. Then, fine-tuning the pre-trained model with labeled data from downstream tasks. However, in neural machine translation (NMT), we address the problem that the training objective of the bilingual task is far different from the monolingual pre-trained model. This gap leads that only using fine-tuning in NMT can not fully utilize prior language knowledge. In this paper, we propose an Apt framework for acquiring knowledge from pre-trained model to NMT. The proposed approach includes two modules: 1). a dynamic fusion mechanism to fuse task-specific features adapted from general knowledge into NMT network, 2). a knowledge distillation paradigm to learn language knowledge continuously during the NMT training process. The proposed approach could integrate suitable knowledge from pre-trained models to improve the NMT. Experimental results on WMT English to German, German to English and Chinese to English machine translation tasks show that our model outperforms strong baselines and the fine-tuning counterparts.

【Keywords】:

1137. Enhanced Meta-Learning for Cross-Lingual Named Entity Recognition with Minimal Resources.

Paper Link】 【Pages】:9274-9281

【Authors】: Qianhui Wu ; Zijia Lin ; Guoxin Wang ; Hui Chen ; Börje F. Karlsson ; Biqing Huang ; Chin-Yew Lin

【Abstract】: For languages with no annotated resources, transferring knowledge from rich-resource languages is an effective solution for named entity recognition (NER). While all existing methods directly transfer from source-learned model to a target language, in this paper, we propose to fine-tune the learned model with a few similar examples given a test case, which could benefit the prediction by leveraging the structural and semantic information conveyed in such similar examples. To this end, we present a meta-learning algorithm to find a good model parameter initialization that could fast adapt to the given test case and propose to construct multiple pseudo-NER tasks for meta-training by computing sentence similarities. To further improve the model's generalization ability across different languages, we introduce a masking scheme and augment the loss function with an additional maximum term during meta-training. We conduct extensive experiments on cross-lingual named entity recognition with minimal resources over five target languages. The results show that our approach significantly outperforms existing state-of-the-art methods across the board.

【Keywords】:

1138. Importance-Aware Learning for Neural Headline Editing.

Paper Link】 【Pages】:9282-9289

【Authors】: Qingyang Wu ; Lei Li ; Hao Zhou ; Ying Zeng ; Zhou Yu

【Abstract】: Many social media news writers are not professionally trained. Therefore, social media platforms have to hire professional editors to adjust amateur headlines to attract more readers. We propose to automate this headline editing process through neural network models to provide more immediate writing support for these social media news writers. To train such a neural headline editing model, we collected a dataset which contains articles with original headlines and professionally edited headlines. However, it is expensive to collect a large number of professionally edited headlines. To solve this low-resource problem, we design an encoder-decoder model which leverages large scale pre-trained language models. We further improve the pre-trained model's quality by introducing a headline generation task as an intermediate task before the headline editing task. Also, we propose Self Importance-Aware (SIA) loss to address the different levels of editing in the dataset by down-weighting the importance of easily classified tokens and sentences. With the help of Pre-training, Adaptation, and SIA, the model learns to generate headlines in the professional editor's style. Experimental results show that our method significantly improves the quality of headline editing comparing against previous methods.

【Keywords】:

1139. A Dataset for Low-Resource Stylized Sequence-to-Sequence Generation.

Paper Link】 【Pages】:9290-9297

【Authors】: Yu Wu ; Yunli Wang ; Shujie Liu

【Abstract】: Low-resource stylized sequence-to-sequence (S2S) generation is in high demand. However, its development is hindered by the datasets which have limitations on scale and automatic evaluation methods. We construct two large-scale, multiple-reference datasets for low-resource stylized S2S, the Machine Translation Formality Corpus (MTFC) that is easy to evaluate and the Twitter Conversation Formality Corpus (TCFC) that tackles an important problem in chatbots. These datasets contain context to source style parallel data, source style to target parallel data, and non-parallel sentences in the target style to enable the semi-supervised learning. We provide three baselines, the pivot-based method, the teacher-student method, and the back-translation method. We find that the pivot-based method is the worst, and the other two methods achieve the best score on different metrics.

【Keywords】:

1140. Latent Opinions Transfer Network for Target-Oriented Opinion Words Extraction.

Paper Link】 【Pages】:9298-9305

【Authors】: Zhen Wu ; Fei Zhao ; Xin-Yu Dai ; Shujian Huang ; Jiajun Chen

【Abstract】: Target-oriented opinion words extraction (TOWE) is a new subtask of ABSA, which aims to extract the corresponding opinion words for a given opinion target in a sentence. Recently, neural network methods have been applied to this task and achieve promising results. However, the difficulty of annotation causes the datasets of TOWE to be insufficient, which heavily limits the performance of neural models. By contrast, abundant review sentiment classification data are easily available at online review sites. These reviews contain substantial latent opinions information and semantic patterns. In this paper, we propose a novel model to transfer these opinions knowledge from resource-rich review sentiment classification datasets to low-resource task TOWE. To address the challenges in the transfer process, we design an effective transformation method to obtain latent opinions, then integrate them into TOWE. Extensive experimental results show that our model achieves better performance compared to other state-of-the-art methods and significantly outperforms the base model without transferring opinions knowledge. Further analysis validates the effectiveness of our model.

【Keywords】:

1141. Copy or Rewrite: Hybrid Summarization with Hierarchical Reinforcement Learning.

Paper Link】 【Pages】:9306-9313

【Authors】: Liqiang Xiao ; Lu Wang ; Hao He ; Yaohui Jin

【Abstract】: Jointly using the extractive and abstractive summarization methods can combine their complementary advantages, generating both informative and concise summary. Existing methods that adopt an extract-then-abstract strategy have achieved impressive results, yet they suffer from the information loss in the abstraction step because they compress all the selected sentences without distinguish. Especially when the whole sentence is summary-worthy, salient content would be lost by compression. To address this problem, we propose HySum, a hybrid framework for summarization that can flexibly switch between copying sentence and rewriting sentence according to the degree of redundancy. In this way, our approach can effectively combine the advantages of two branches of summarization, juggling informativity and conciseness. Moreover, we based on Hierarchical Reinforcement Learning, propose an end-to-end reinforcing method to bridge together the extraction module and rewriting module, which can enhance the cooperation between them. Automatic evaluation shows that our approach significantly outperforms the state-of-the-arts on the CNN/DailyMail corpus. Human evaluation also demonstrates that our generated summaries are more informative and concise than popular models.

【Keywords】:

1142. Joint Entity and Relation Extraction with a Hybrid Transformer and Reinforcement Learning Based Model.

Paper Link】 【Pages】:9314-9321

【Authors】: Ya Xiao ; Chengxiang Tan ; Zhijie Fan ; Qian Xu ; Wenye Zhu

【Abstract】: Joint extraction of entities and relations is a task that extracts the entity mentions and semantic relations between entities from the unstructured texts with one single model. Existing entity and relation extraction datasets usually rely on distant supervision methods which cannot identify the corresponding relations between a relation and the sentence, thus suffers from noisy labeling problem. We propose a hybrid deep neural network model to jointly extract the entities and relations, and the model is also capable of filtering noisy data. The hybrid model contains a transformer-based encoding layer, an LSTM entity detection module and a reinforcement learning-based relation classification module. The output of the transformer encoder and the entity embedding generated from the entity detection module are combined as the input state of the reinforcement learning module to improve the relation classification and noisy data filtering. We conduct experiments on the public dataset produced by the distant supervision method to verify the effectiveness of our proposed model. Different experimental results show that our model gains better performance on entity and relation extraction than the compared methods and also has the ability to filter noisy sentences.

【Keywords】:

1143. Attentive User-Engaged Adversarial Neural Network for Community Question Answering.

Paper Link】 【Pages】:9322-9329

【Authors】: Yuexiang Xie ; Ying Shen ; Yaliang Li ; Min Yang ; Kai Lei

【Abstract】: We study the community question answering (CQA) problem that emerges with the advent of numerous community forums in the recent past. The task of finding appropriate answers to questions from informative but noisy crowdsourced answers is important yet challenging in practice. We present an Attentive User-engaged Adversarial Neural Network (AUANN), which interactively learns the context information of questions and answers, and enhances user engagement with the CQA task. A novel attentive mechanism is incorporated to model the semantic internal and external relations among questions, answers and user contexts. To handle the noise issue caused by introducing user context, we design a two-step denoise mechanism, including a coarse-grained selection process by similarity measurement, and a fine-grained selection process by applying an adversarial training module. We evaluate the proposed method on large-scale real-world datasets SemEval-2016 and SemEval-2017. Experimental results verify the benefits of incorporating user information, and show that our proposed model significantly outperforms the state-of-the-art methods.

【Keywords】:

1144. Hashing Based Answer Selection.

Paper Link】 【Pages】:9330-9337

【Authors】: Dong Xu ; Wu-Jun Li

【Abstract】: Answer selection is an important subtask of question answering (QA), in which deep models usually achieve better performance than non-deep models. Most deep models adopt question-answer interaction mechanisms, such as attention, to get vector representations for answers. When these interaction based deep models are deployed for online prediction, the representations of all answers need to be recalculated for each question. This procedure is time-consuming for deep models with complex encoders like BERT which usually have better accuracy than simple encoders. One possible solution is to store the matrix representation (encoder output) of each answer in memory to avoid recalculation. But this will bring large memory cost. In this paper, we propose a novel method, called hashing based answer selection (HAS), to tackle this problem. HAS adopts a hashing strategy to learn a binary matrix representation for each answer, which can dramatically reduce the memory cost for storing the matrix representations of answers. Hence, HAS can adopt complex encoders like BERT in the model, but the online prediction of HAS is still fast with a low memory cost. Experimental results on three popular answer selection datasets show that HAS can outperform existing models to achieve state-of-the-art performance.

【Keywords】:

1145. Knowledge Graph Grounded Goal Planning for Open-Domain Conversation Generation.

Paper Link】 【Pages】:9338-9345

【Authors】: Jun Xu ; Haifeng Wang ; Zhengyu Niu ; Hua Wu ; Wanxiang Che

【Abstract】: Previous neural models on open-domain conversation generation have no effective mechanisms to manage chatting topics, and tend to produce less coherent dialogs. Inspired by the strategies in human-human dialogs, we divide the task of multi-turn open-domain conversation generation into two sub-tasks: explicit goal (chatting about a topic) sequence planning and goal completion by topic elaboration. To this end, we propose a three-layer Knowledge aware Hierarchical Reinforcement Learning based Model (KnowHRL). Specifically, for the first sub-task, the upper-layer policy learns to traverse a knowledge graph (KG) in order to plan a high-level goal sequence towards a good balance between dialog coherence and topic consistency with user interests. For the second sub-task, the middle-layer policy and the lower-layer one work together to produce an in-depth multi-turn conversation about a single topic with a goal-driven generation mechanism. The capability of goal-sequence planning enables chatbots to conduct proactive open-domain conversations towards recommended topics, which has many practical applications. Experiments demonstrate that our model outperforms state of the art baselines in terms of user-interest consistency, dialog coherence, and knowledge accuracy.

【Keywords】:

1146. The Value of Paraphrase for Knowledge Base Predicates.

Paper Link】 【Pages】:9346-9353

【Authors】: Bingcong Xue ; Sen Hu ; Lei Zou ; Jiashu Cheng

【Abstract】: Paraphrase, i.e., differing textual realizations of the same meaning, has proven useful for many natural language processing (NLP) applications. Collecting paraphrase for predicates in knowledge bases (KBs) is the key to comprehend the RDF triples in KBs. Existing works have published some paraphrase datasets automatically extracted from large corpora, but have too many redundant pairs or don't cover enough predicates, which cannot be improved by computer only and need the help of human beings. This paper shows a full process of collecting large-scale and high-quality paraphrase dictionaries for predicates in knowledge bases, which takes advantage of existing datasets and combines the technologies of machine mining and crowdsourcing. Our dataset comprises 2284 distinct predicates in DBpedia and 31130 paraphrase pairs in total, the quality of which is a great leap over previous works. Then it is demonstrated that such good paraphrase dictionaries can do great help to natural language processing tasks such as question answering and language generation. We also publish our own dictionary for further research.

【Keywords】:

1147. Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment.

Paper Link】 【Pages】:9354-9361

【Authors】: Kun Xu ; Linfeng Song ; Yansong Feng ; Yan Song ; Dong Yu

【Abstract】: Existing entity alignment methods mainly vary on the choices of encoding the knowledge graph, but they typically use the same decoding method, which independently chooses the local optimal match for each source entity. This decoding method may not only cause the “many-to-one” problem but also neglect the coordinated nature of this task, that is, each alignment decision may highly correlate to the other decisions. In this paper, we introduce two coordinated reasoning methods, i.e., the Easy-to-Hard decoding strategy and joint entity alignment algorithm. Specifically, the Easy-to-Hard strategy first retrieves the model-confident alignments from the predicted results and then incorporates them as additional knowledge to resolve the remaining model-uncertain alignments. To achieve this, we further propose an enhanced alignment model that is built on the current state-of-the-art baseline. In addition, to address the many-to-one problem, we propose to jointly predict entity alignments so that the one-to-one constraint can be naturally incorporated into the alignment prediction. Experimental results show that our model achieves the state-of-the-art performance and our reasoning methods can also significantly improve existing baselines.

【Keywords】:

1148. Improving Domain-Adapted Sentiment Classification by Deep Adversarial Mutual Learning.

Paper Link】 【Pages】:9362-9369

【Authors】: Qianming Xue ; Wei Zhang ; Hongyuan Zha

【Abstract】: Domain-adapted sentiment classification refers to training on a labeled source domain to well infer document-level sentiment on an unlabeled target domain. Most existing relevant models involve a feature extractor and a sentiment classifier, where the feature extractor works towards learning domain-invariant features from both domains, and the sentiment classifier is trained only on the source domain to guide the feature extractor. As such, they lack a mechanism to use sentiment polarity lying in the target domain. To improve domain-adapted sentiment classification by learning sentiment from the target domain as well, we devise a novel deep adversarial mutual learning approach involving two groups of feature extractors, domain discriminators, sentiment classifiers, and label probers. The domain discriminators enable the feature extractors to obtain domain-invariant features. Meanwhile, the label prober in each group explores document sentiment polarity of the target domain through the sentiment prediction generated by the classifier in the peer group, and guides the learning of the feature extractor in its own group. The proposed approach achieves the mutual learning of the two groups in an end-to-end manner. Experiments on multiple public datasets indicate our method obtains the state-of-the-art performance, validating the effectiveness of mutual learning through label probers.

【Keywords】:

1149. Knowledge and Cross-Pair Pattern Guided Semantic Matching for Question Answering.

Paper Link】 【Pages】:9370-9377

【Authors】: Zihan Xu ; Hai-Tao Zheng ; Shaopeng Zhai ; Dong Wang

【Abstract】: Semantic matching is a basic problem in natural language processing, but it is far from solved because of the differences between the pairs for matching. In question answering (QA), answer selection (AS) is a popular semantic matching task, usually reformulated as a paraphrase identification (PI) problem. However, QA is different from PI because the question and the answer are not synonymous sentences and not strictly comparable. In this work, a novel knowledge and cross-pair pattern guided semantic matching system (KCG) is proposed, which considers both knowledge and pattern conditions for QA. We apply explicit cross-pair matching based on Graph Convolutional Network (GCN) to help KCG recognize general domain-independent Q-to-A patterns better. And with the incorporation of domain-specific information from knowledge bases (KB), KCG is able to capture and explore various relations within Q-A pairs. Experiments show that KCG is robust against the diversity of Q-A pairs and outperforms the state-of-the-art systems on different answer selection tasks.

【Keywords】:

1150. Towards Making the Most of BERT in Neural Machine Translation.

Paper Link】 【Pages】:9378-9385

【Authors】: Jiacheng Yang ; Mingxuan Wang ; Hao Zhou ; Chengqi Zhao ; Weinan Zhang ; Yong Yu ; Lei Li

【Abstract】: GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various natural language processing tasks. However, LM fine-tuning often suffers from catastrophic forgetting when applied to resource-rich tasks. In this work, we introduce a concerted training framework (CTnmt) that is the key to integrate the pre-trained LMs to neural machine translation (NMT). Our proposed CTnmt} consists of three techniques: a) asymptotic distillation to ensure that the NMT model can retain the previous pre-trained knowledge; b) a dynamic switching gate to avoid catastrophic forgetting of pre-trained knowledge; and c) a strategy to adjust the learning paces according to a scheduled policy. Our experiments in machine translation show CTnmt gains of up to 3 BLEU score on the WMT14 English-German language pair which even surpasses the previous state-of-the-art pre-training aided NMT by 1.4 BLEU score. While for the large WMT14 English-French task with 40 millions of sentence-pairs, our base model still significantly improves upon the state-of-the-art Transformer big model by more than 1 BLEU score.

【Keywords】:

1151. Alternating Language Modeling for Cross-Lingual Pre-Training.

Paper Link】 【Pages】:9386-9393

【Authors】: Jian Yang ; Shuming Ma ; Dongdong Zhang ; Shuangzhi Wu ; Zhoujun Li ; Ming Zhou

【Abstract】: Language model pre-training has achieved success in many natural language processing tasks. Existing methods for cross-lingual pre-training adopt Translation Language Model to predict masked words with the concatenation of the source sentence and its target equivalent. In this work, we introduce a novel cross-lingual pre-training method, called Alternating Language Modeling (ALM). It code-switches sentences of different languages rather than simple concatenation, hoping to capture the rich cross-lingual context of words and phrases. More specifically, we randomly substitute source phrases with target translations to create code-switched sentences. Then, we use these code-switched data to train ALM model to learn to predict words of different languages. We evaluate our pre-training ALM on the downstream tasks of machine translation and cross-lingual classification. Experiments show that ALM can outperform the previous pre-training methods on three benchmarks.1

【Keywords】:

1152. Generalize Sentence Representation with Self-Inference.

Paper Link】 【Pages】:9394-9401

【Authors】: Kai-Chou Yang ; Hung-Yu Kao

【Abstract】: In this paper, we propose Self Inference Neural Network (SINN), a simple yet efficient sentence encoder which leverages knowledge from recurrent and convolutional neural networks. SINN gathers semantic evidence in an interaction space which is subsequently fused by a shared vector gate to determine the most relevant mixture of contextual information. We evaluate the proposed method on four benchmarks among three NLP tasks. Experimental results demonstrate that our model sets a new state-of-the-art on MultiNLI, Scitail and is competitive on the remaining two datasets over all sentence encoding methods. The encoding and inference process in our model is highly interpretable. Through visualizations of the fusion component, we open the black box of our network and explore the applicability of the base encoding methods case by case.

【Keywords】:

1153. End-to-End Bootstrapping Neural Network for Entity Set Expansion.

Paper Link】 【Pages】:9402-9409

【Authors】: Lingyong Yan ; Xianpei Han ; Ben He ; Le Sun

【Abstract】: Bootstrapping for entity set expansion (ESE) has long been modeled as a multi-step pipelined process. Such a paradigm, unfortunately, often suffers from two main challenges: 1) the entities are expanded in multiple separate steps, which tends to introduce noisy entities and results in the semantic drift problem; 2) it is hard to exploit the high-order entity-pattern relations for entity set expansion. In this paper, we propose an end-to-end bootstrapping neural network for entity set expansion, named BootstrapNet, which models the bootstrapping in an encoder-decoder architecture. In the encoding stage, a graph attention network is used to capture both the first- and the high-order relations between entities and patterns, and encode useful information into their representations. In the decoding stage, the entities are sequentially expanded through a recurrent neural network, which outputs entities at each stage, and its hidden state vectors, representing the target category, are updated at each expansion step. Experimental results demonstrate substantial improvement of our model over previous ESE approaches.

【Keywords】:

1154. Be Relevant, Non-Redundant, and Timely: Deep Reinforcement Learning for Real-Time Event Summarization.

Paper Link】 【Pages】:9410-9417

【Authors】: Min Yang ; Chengming Li ; Fei Sun ; Zhou Zhao ; Ying Shen ; Chenglin Wu

【Abstract】: Real-time event summarization is an essential task in natural language processing and information retrieval areas. Despite the progress of previous work, generating relevant, non-redundant, and timely event summaries remains challenging in practice. In this paper, we propose a Deep Reinforcement learning framework for real-time Event Summarization (DRES), which shows promising performance for resolving all three challenges (i.e., relevance, non-redundancy, timeliness) in a unified framework. Specifically, we (i) devise a hierarchical cross-attention network with intra- and inter-document attentions to integrate important semantic features within and between the query and input document for better text matching. In addition, relevance prediction is leveraged as an auxiliary task to strengthen the document modeling and help to extract relevant documents; (ii) propose a multi-topic dynamic memory network to capture the sequential patterns of different topics belonging to the event of interest and temporally memorize the input facts from the evolving document stream, avoiding extracting redundant information at each time step; (iii) consider both historical dependencies and future uncertainty of the document stream for generating relevant and timely summaries by exploiting the reinforcement learning technique. Experimental results on two real-world datasets have demonstrated the advantages of DRES model with significant improvement in generating relevant, non-redundant, and timely event summaries against the state-of-the-arts.

【Keywords】:

1155. Visual Agreement Regularized Training for Multi-Modal Machine Translation.

Paper Link】 【Pages】:9418-9425

【Authors】: Pengcheng Yang ; Boxing Chen ; Pei Zhang ; Xu Sun

【Abstract】: Multi-modal machine translation aims at translating the source sentence into a different language in the presence of the paired image. Previous work suggests that additional visual information only provides dispensable help to translation, which is needed in several very special cases such as translating ambiguous words. To make better use of visual information, this work presents visual agreement regularized training. The proposed approach jointly trains the source-to-target and target-to-source translation models and encourages them to share the same focus on the visual information when generating semantically equivalent visual words (e.g. “ball” in English and “ballon” in French). Besides, a simple yet effective multi-head co-attention model is also introduced to capture interactions between visual and textual features. The results show that our approaches can outperform competitive baselines by a large margin on the Multi30k dataset. Further analysis demonstrates that the proposed regularized training can effectively improve the agreement of attention on the image, leading to better use of visual information.

【Keywords】:

1156. Causally Denoise Word Embeddings Using Half-Sibling Regression.

Paper Link】 【Pages】:9426-9433

【Authors】: Zekun Yang ; Tianlin Liu

【Abstract】: Distributional representations of words, also known as word vectors, have become crucial for modern natural language processing tasks due to their wide applications. Recently, a growing body of word vector postprocessing algorithm has emerged, aiming to render off-the-shelf word vectors even stronger. In line with these investigations, we introduce a novel word vector postprocessing scheme under a causal inference framework. Concretely, the postprocessing pipeline is realized by Half-Sibling Regression (HSR), which allows us to identify and remove confounding noise contained in word vectors. Compared to previous work, our proposed method has the advantages of interpretability and transparency due to its causal inference grounding. Evaluated on a battery of standard lexical-level evaluation tasks and downstream sentiment analysis tasks, our method reaches state-of-the-art performance.

【Keywords】:

1157. A Causal Inference Method for Reducing Gender Bias in Word Embedding Relations.

Paper Link】 【Pages】:9434-9441

【Authors】: Zekun Yang ; Juan Feng

【Abstract】: Word embedding has become essential for natural language processing as it boosts empirical performances of various tasks. However, recent research discovers that gender bias is incorporated in neural word embeddings, and downstream tasks that rely on these biased word vectors also produce gender-biased results. While some word-embedding gender-debiasing methods have been developed, these methods mainly focus on reducing gender bias associated with gender direction and fail to reduce the gender bias presented in word embedding relations. In this paper, we design a causal and simple approach for mitigating gender bias in word vector relation by utilizing the statistical dependency between gender-definition word embeddings and gender-biased word embeddings. Our method attains state-of-the-art results on gender-debiasing tasks, lexical- and sentence-level evaluation tasks, and downstream coreference resolution tasks.

【Keywords】:

1158. Integrating Relation Constraints with Neural Relation Extractors.

Paper Link】 【Pages】:9442-9449

【Authors】: Yuan Ye ; Yansong Feng ; Bingfeng Luo ; Yuxuan Lai ; Dongyan Zhao

【Abstract】: Recent years have seen rapid progress in identifying predefined relationship between entity pairs using neural networks (NNs). However, such models often make predictions for each entity pair individually, thus often fail to solve the inconsistency among different predictions, which can be characterized by discrete relation constraints. These constraints are often defined over combinations of entity-relation-entity triples, since there often lack of explicitly well-defined type and cardinality requirements for the relations. In this paper, we propose a unified framework to integrate relation constraints with NNs by introducing a new loss term, Constraint Loss. Particularly, we develop two efficient methods to capture how well the local predictions from multiple instance pairs satisfy the relation constraints. Experiments on both English and Chinese datasets show that our approach can help NNs learn from discrete relation constraints to reduce inconsistency among local predictions, and outperform popular neural relation extraction (NRE) models even enhanced with extra post-processing. Our source code and datasets will be released at https://github.com/PKUYeYuan/Constraint-Loss-AAAI-2020.

【Keywords】:

1159. MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space.

Paper Link】 【Pages】:9450-9457

【Authors】: Xiaoyuan Yi ; Ruoyu Li ; Cheng Yang ; Wenhao Li ; Maosong Sun

【Abstract】: As an essential step towards computer creativity, automatic poetry generation has gained increasing attention these years. Though recent neural models make prominent progress in some criteria of poetry quality, generated poems still suffer from the problem of poor diversity. Related literature researches show that different factors, such as life experience, historical background, etc., would influence composition styles of poets, which considerably contributes to the high diversity of human-authored poetry. Inspired by this, we propose MixPoet, a novel model that absorbs multiple factors to create various styles and promote diversity. Based on a semi-supervised variational autoencoder, our model disentangles the latent space into some subspaces, with each conditioned on one influence factor by adversarial training. In this way, the model learns a controllable latent variable to capture and mix generalized factor-related properties. Different factor mixtures lead to diverse styles and hence further differentiate generated poems from each other. Experiment results on Chinese poetry demonstrate that MixPoet improves both diversity and quality against three state-of-the-art models.

【Keywords】:

1160. PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network.

Paper Link】 【Pages】:9458-9465

【Authors】: Dacheng Yin ; Chong Luo ; Zhiwei Xiong ; Wenjun Zeng

【Abstract】: Time-frequency (T-F) domain masking is a mainstream approach for single-channel speech enhancement. Recently, focuses have been put to phase prediction in addition to amplitude prediction. In this paper, we propose a phase-and-harmonics-aware deep neural network (DNN), named PHASEN, for this task. Unlike previous methods which directly use a complex ideal ratio mask to supervise the DNN learning, we design a two-stream network, where amplitude stream and phase stream are dedicated to amplitude and phase prediction. We discover that the two streams should communicate with each other, and this is crucial to phase prediction. In addition, we propose frequency transformation blocks to catch long-range correlations along the frequency axis. Visualization shows that the learned transformation matrix implicitly captures the harmonic correlation, which has been proven to be helpful for T-F spectrogram reconstruction. With these two innovations, PHASEN acquires the ability to handle detailed phase patterns and to utilize harmonic patterns, getting 1.76dB SDR improvement on AVSpeech + AudioSet dataset. It also achieves significant gains over Google's network on this dataset. On Voice Bank + DEMAND dataset, PHASEN outperforms previous methods by a large margin on four metrics.

【Keywords】:

1161. Meta-CoTGAN: A Meta Cooperative Training Paradigm for Improving Adversarial Text Generation.

Paper Link】 【Pages】:9466-9473

【Authors】: Haiyan Yin ; Dingcheng Li ; Xu Li ; Ping Li

【Abstract】: Training generative models that can generate high-quality text with sufficient diversity is an important open problem for Natural Language Generation (NLG) community. Recently, generative adversarial models have been applied extensively on text generation tasks, where the adversarially trained generators alleviate the exposure bias experienced by conventional maximum likelihood approaches and result in promising generation quality. However, due to the notorious defect of mode collapse for adversarial training, the adversarially trained generators face a quality-diversity trade-off, i.e., the generator models tend to sacrifice generation diversity severely for increasing generation quality. In this paper, we propose a novel approach which aims to improve the performance of adversarial text generation via efficiently decelerating mode collapse of the adversarial training. To this end, we introduce a cooperative training paradigm, where a language model is cooperatively trained with the generator and we utilize the language model to efficiently shape the data distribution of the generator against mode collapse. Moreover, instead of engaging the cooperative update for the generator in a principled way, we formulate a meta learning mechanism, where the cooperative update to the generator serves as a high level meta task, with an intuition of ensuring the parameters of the generator after the adversarial update would stay resistant against mode collapse. In the experiment, we demonstrate our proposed approach can efficiently slow down the pace of mode collapse for the adversarial text generators. Overall, our proposed method is able to outperform the baseline approaches with significant margins in terms of both generation quality and diversity in the testified domains.

【Keywords】:

1162. Dialog State Tracking with Reinforced Data Augmentation.

Paper Link】 【Pages】:9474-9481

【Authors】: Yichun Yin ; Lifeng Shang ; Xin Jiang ; Xiao Chen ; Qun Liu

【Abstract】: Neural dialog state trackers are generally limited due to the lack of quantity and diversity of annotated training data. In this paper, we address this difficulty by proposing a reinforcement learning (RL) based framework for data augmentation that can generate high-quality data to improve the neural state tracker. Specifically, we introduce a novel contextual bandit generator to learn fine-grained augmentation policies that can generate new effective instances by choosing suitable replacements for specific context. Moreover, by alternately learning between the generator and the state tracker, we can keep refining the generative policies to generate more high-quality training data for neural state tracker. Experimental results on the WoZ and MultiWoZ (restaurant) datasets demonstrate that the proposed framework significantly improves the performance over the state-of-the-art models, especially with limited training data.

【Keywords】:

1163. Enhancing Pointer Network for Sentence Ordering with Pairwise Ordering Predictions.

Paper Link】 【Pages】:9482-9489

【Authors】: Yongjing Yin ; Fandong Meng ; Jinsong Su ; Yubin Ge ; Linfeng Song ; Jie Zhou ; Jiebo Luo

【Abstract】: Dominant sentence ordering models use a pointer network decoder to generate ordering sequences in a left-to-right fashion. However, such a decoder only exploits the noisy left-side encoded context, which is insufficient to ensure correct sentence ordering. To address this deficiency, we propose to enhance the pointer network decoder by using two pairwise ordering prediction modules: The FUTURE module predicts the relative orientations of other unordered sentences with respect to the candidate sentence, and the HISTORY module measures the local coherence between several (e.g., 2) previously ordered sentences and the candidate sentence, without the influence of noisy left-side context. Using the pointer mechanism, we then incorporate this dynamically generated information into the decoder as a supplement to the left-side context for better predictions. On several commonly-used datasets, our model significantly outperforms other baselines, achieving the state-of-the-art performance. Further analyses verify that pairwise ordering predictions indeed provide extra useful context as expected, leading to better sentence ordering. We also evaluate our sentence ordering models on a downstream task, multi-document summarization, and the summaries reordered by our model achieve the best coherence scores. Our code is available at https://github.com/DeepLearnXMU/Pairwise.git.

【Keywords】:

1164. Automatic Generation of Headlines for Online Math Questions.

Paper Link】 【Pages】:9490-9497

【Authors】: Ke Yuan ; Dafang He ; Zhuoren Jiang ; Liangcai Gao ; Zhi Tang ; C. Lee Giles

【Abstract】: Mathematical equations are an important part of dissemination and communication of scientific information. Students, however, often feel challenged in reading and understanding math content and equations. With the development of the Web, students are posting their math questions online. Nevertheless, constructing a concise math headline that gives a good description of the posted detailed math question is nontrivial. In this study, we explore a novel summarization task denoted as geNerating A concise Math hEadline from a detailed math question (NAME). Compared to conventional summarization tasks, this task has two extra and essential constraints: 1) Detailed math questions consist of text and math equations which require a unified framework to jointly model textual and mathematical information; 2) Unlike text, math equations contain semantic and structural features, and both of them should be captured together. To address these issues, we propose MathSum, a novel summarization model which utilizes a pointer mechanism combined with a multi-head attention mechanism for mathematical representation augmentation. The pointer mechanism can either copy textual tokens or math tokens from source questions in order to generate math headlines. The multi-head attention mechanism is designed to enrich the representation of math equations by modeling and integrating both its semantic and structural features. For evaluation, we collect and make available two sets of real-world detailed math questions along with human-written math headlines, namely EXEQ-300k and OFEQ-10k. Experimental results demonstrate that our model (MathSum) significantly outperforms state-of-the-art models for both the EXEQ-300k and OFEQ-10k datasets.

【Keywords】:

1165. Improving Context-Aware Neural Machine Translation Using Self-Attentive Sentence Embedding.

Paper Link】 【Pages】:9498-9506

【Authors】: Hyeongu Yun ; Yongkeun Hwang ; Kyomin Jung

【Abstract】: Fully Attentional Networks (FAN) like Transformer (Vaswani et al. 2017) has shown superior results in Neural Machine Translation (NMT) tasks and has become a solid baseline for translation tasks. More recent studies also have reported experimental results that additional contextual sentences improve translation qualities of NMT models (Voita et al. 2018; Müller et al. 2018; Zhang et al. 2018). However, those studies have exploited multiple context sentences as a single long concatenated sentence, that may cause the models to suffer from inefficient computational complexities and long-range dependencies. In this paper, we propose Hierarchical Context Encoder (HCE) that is able to exploit multiple context sentences separately using the hierarchical FAN structure. Our proposed encoder first abstracts sentence-level information from preceding sentences in a self-attentive way, and then hierarchically encodes context-level information. Through extensive experiments, we observe that our HCE records the best performance measured in BLEU score on English-German, English-Turkish, and English-Korean corpus. In addition, we observe that our HCE records the best performance in a crowd-sourced test set which is designed to evaluate how well an encoder can exploit contextual information. Finally, evaluation on English-Korean pronoun resolution test suite also shows that our HCE can properly exploit contextual information.

【Keywords】:

1166. CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning.

Paper Link】 【Pages】:9507-9514

【Authors】: Daojian Zeng ; Haoran Zhang ; Qianying Liu

【Abstract】: Joint extraction of entities and relations has received significant attention due to its potential of providing higher performance for both tasks. Among existing methods, CopyRE is effective and novel, which uses a sequence-to-sequence framework and copy mechanism to directly generate the relation triplets. However, it suffers from two fatal problems. The model is extremely weak at differing the head and tail entity, resulting in inaccurate entity extraction. It also cannot predict multi-token entities (e.g. Steven Jobs). To address these problems, we give a detailed analysis of the reasons behind the inaccurate entity extraction problem, and then propose a simple but extremely effective model structure to solve this problem. In addition, we propose a multi-task learning framework equipped with copy mechanism, called CopyMTL, to allow the model to predict multi-token entities. Experiments reveal the problems of CopyRE and show that our model achieves significant improvement over the current state-of-the-art method by 9% in NYT and 16% in WebNLG (F1 score). Our code is available at https://github.com/WindChimeRan/CopyMTL

【Keywords】:

1167. Neural Simile Recognition with Cyclic Multitask Learning and Local Attention.

Paper Link】 【Pages】:9515-9522

【Authors】: Jiali Zeng ; Linfeng Song ; Jinsong Su ; Jun Xie ; Wei Song ; Jiebo Luo

【Abstract】: Simile recognition is to detect simile sentences and to extract simile components, i.e., tenors and vehicles. It involves two subtasks: simile sentence classification and simile component extraction. Recent work has shown that standard multitask learning is effective for Chinese simile recognition, but it is still uncertain whether the mutual effects between the subtasks have been well captured by simple parameter sharing. We propose a novel cyclic multitask learning framework for neural simile recognition, which stacks the subtasks and makes them into a loop by connecting the last to the first. It iteratively performs each subtask, taking the outputs of the previous subtask as additional inputs to the current one, so that the interdependence between the subtasks can be better explored. Extensive experiments show that our framework significantly outperforms the current state-of-the-art model and our carefully designed baselines, and the gains are still remarkable using BERT. Source Code of this paper are available on https://github.com/DeepLearnXMU/Cyclic.

【Keywords】:

1168. Span Model for Open Information Extraction on Accurate Corpus.

Paper Link】 【Pages】:9523-9530

【Authors】: Junlang Zhan ; Hai Zhao

【Abstract】: Open Information Extraction (Open IE) is a challenging task especially due to its brittle data basis. Most of Open IE systems have to be trained on automatically built corpus and evaluated on inaccurate test set. In this work, we first alleviate this difficulty from both sides of training and test sets. For the former, we propose an improved model design to more sufficiently exploit training dataset. For the latter, we present our accurately re-annotated benchmark test set (Re-OIE2016) according to a series of linguistic observation and analysis. Then, we introduce a span model instead of previous adopted sequence labeling formulization for n-ary Open IE. Our newly introduced model achieves new state-of-the-art performance on both benchmark evaluation datasets.

【Keywords】:

1169. Multi-Point Semantic Representation for Intent Classification.

Paper Link】 【Pages】:9531-9538

【Authors】: Jinghan Zhang ; Yuxiao Ye ; Yue Zhang ; Likun Qiu ; Bin Fu ; Yang Li ; Zhenglu Yang ; Jian Sun

【Abstract】: Detecting user intents from utterances is the basis of natural language understanding (NLU) task. To understand the meaning of utterances, some work focuses on fully representing utterances via semantic parsing in which annotation cost is labor-intentsive. While some researchers simply view this as intent classification or frequently asked questions (FAQs) retrieval, they do not leverage the shared utterances among different intents. We propose a simple and novel multi-point semantic representation framework with relatively low annotation cost to leverage the fine-grained factor information, decomposing queries into four factors, i.e., topic, predicate, object/condition, query type. Besides, we propose a compositional intent bi-attention model under multi-task learning with three kinds of attention mechanisms among queries, labels and factors, which jointly combines coarse-grained intent and fine-grained factor information. Extensive experiments show that our framework and model significantly outperform several state-of-the-art approaches with an improvement of 1.35%-2.47% in terms of accuracy.

【Keywords】:

1170. Graph LSTM with Context-Gated Mechanism for Spoken Language Understanding.

Paper Link】 【Pages】:9539-9546

【Authors】: Linhao Zhang ; Dehong Ma ; Xiaodong Zhang ; Xiaohui Yan ; Houfeng Wang

【Abstract】: Much research in recent years has focused on spoken language understanding (SLU), which usually involves two tasks: intent detection and slot filling. Since Yao et al.(2013), almost all SLU systems are RNN-based, which have been shown to suffer various limitations due to their sequential nature. In this paper, we propose to tackle this task with Graph LSTM, which first converts text into a graph and then utilizes the message passing mechanism to learn the node representation. Not only the Graph LSTM addresses the limitations of sequential models, but it can also help to utilize the semantic correlation between slot and intent. We further propose a context-gated mechanism to make better use of context information for slot filling. Our extensive evaluation shows that the proposed model outperforms the state-of-the-art results by a large margin.

【Keywords】:

1171. Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification.

Paper Link】 【Pages】:9547-9554

【Authors】: Mozhi Zhang ; Yoshinari Fujinuma ; Jordan L. Boyd-Graber

【Abstract】: Text classification must sometimes be applied in a low-resource language with no labeled training data. However, training data may be available in a related language. We investigate whether character-level knowledge transfer from a related language helps text classification. We present a cross-lingual document classification framework (caco) that exploits cross-lingual subword similarity by jointly training a character-based embedder and a word-based classifier. The embedder derives vector representations for input words from their written forms, and the classifier makes predictions based on the word vectors. We use a joint character representation for both the source language and the target language, which allows the embedder to generalize knowledge about source language words to target language words with similar forms. We propose a multi-task objective that can further improve the model if additional cross-lingual or monolingual resources are available. Experiments confirm that character-level knowledge transfer is more data-efficient than word-level transfer between related languages.

【Keywords】:

1172. Structure Learning for Headline Generation.

Paper Link】 【Pages】:9555-9562

【Authors】: Ruqing Zhang ; Jiafeng Guo ; Yixing Fan ; Yanyan Lan ; Xueqi Cheng

【Abstract】: Headline generation is an important problem in natural language processing, which aims to describe a document by a compact and informative headline. Some recent successes on this task have been achieved by advanced graph-based neural models, which marry the representational power of deep neural networks with the structural modeling ability of the relational sentence graphs. The advantages of graph-based neural models over traditional Seq2Seq models lie in that they can encode long-distance relationship between sentences beyond the surface linear structure. However, since documents are typically weakly-structured data, modern graph-based neural models usually rely on manually designed rules or some heuristics to construct the sentence graph a prior. This may largely limit the power and increase the cost of the graph-based methods. In this paper, therefore, we propose to incorporate structure learning into the graph-based neural models for headline generation. That is, we want to automatically learn the sentence graph using a data-driven way, so that we can unveil the document structure flexibly without prior heuristics or rules. To achieve this goal, we employ a deep & wide network to encode rich relational information between sentences for the sentence graph learning. For the deep component, we leverage neural matching models, either representation-focused or interaction-focused model, to learn semantic similarity between sentences. For the wide component, we encode a variety of discourse relations between sentences. A Graph Convolutional Network (GCN) is then applied over the sentence graph to generate high-level relational representations for headline generation. The whole model could be optimized end-to-end so that the structure and representation could be learned jointly. Empirical studies show that our model can significantly outperform the state-of-the-art headline generation models.

【Keywords】:

1173. DCMN+: Dual Co-Matching Network for Multi-Choice Reading Comprehension.

Paper Link】 【Pages】:9563-9570

【Authors】: Shuailiang Zhang ; Hai Zhao ; Yuwei Wu ; Zhuosheng Zhang ; Xi Zhou ; Xiang Zhou

【Abstract】: Multi-choice reading comprehension is a challenging task to select an answer from a set of candidate options when given passage and question. Previous approaches usually only calculate question-aware passage representation and ignore passage-aware question representation when modeling the relationship between passage and question, which cannot effectively capture the relationship between passage and question. In this work, we propose dual co-matching network (DCMN) which models the relationship among passage, question and answer options bidirectionally. Besides, inspired by how humans solve multi-choice questions, we integrate two reading strategies into our model: (i) passage sentence selection that finds the most salient supporting sentences to answer the question, (ii) answer option interaction that encodes the comparison information between answer options. DCMN equipped with the two strategies (DCMN+) obtains state-of-the-art results on five multi-choice reading comprehension datasets from different domains: RACE, SemEval-2018 Task 11, ROCStories, COIN, MCTest.

【Keywords】:

1174. Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption.

Paper Link】 【Pages】:9571-9578

【Authors】: Wei Zhang ; Yue Ying ; Pan Lu ; Hongyuan Zha

【Abstract】: Personalized image caption, a natural extension of the standard image caption task, requires to generate brief image descriptions tailored for users' writing style and traits, and is more practical to meet users' real demands. Only a few recent studies shed light on this crucial task and learn static user representations to capture their long-term literal-preference. However, it is insufficient to achieve satisfactory performance due to the intrinsic existence of not only long-term user literal-preference, but also short-term literal-preference which is associated with users' recent states. To bridge this gap, we develop a novel multimodal hierarchical transformer network (MHTN) for personalized image caption in this paper. It learns short-term user literal-preference based on users' recent captions through a short-term user encoder at the low level. And at the high level, the multimodal encoder integrates target image representations with short-term literal-preference, as well as long-term literal-preference learned from user IDs. These two encoders enjoy the advantages of the powerful transformer networks. Extensive experiments on two real datasets show the effectiveness of considering two types of user literal-preference simultaneously and better performance over the state-of-the-art models.

【Keywords】:

1175. Learning Conceptual-Contextual Embeddings for Medical Text.

Paper Link】 【Pages】:9579-9586

【Authors】: Xiao Zhang ; Dejing Dou ; Ji Wu

【Abstract】: External knowledge is often useful for natural language understanding tasks. We introduce a contextual text representation model called Conceptual-Contextual (CC) embeddings, which incorporates structured knowledge into text representations. Unlike entity embedding methods, our approach encodes a knowledge graph into a context model. CC embeddings can be easily reused for a wide range of tasks in a similar fashion to pre-trained language models. Our model effectively encodes the huge UMLS database by leveraging semantic generalizability. Experiments on electronic health records (EHRs) and medical text processing benchmarks showed our model gives a major boost to the performance of supervised medical NLP tasks.

【Keywords】:

1176. Filling Conversation Ellipsis for Better Social Dialog Understanding.

Paper Link】 【Pages】:9587-9595

【Authors】: Xiyuan Zhang ; Chengxi Li ; Dian Yu ; Samuel Davidson ; Zhou Yu

【Abstract】: The phenomenon of ellipsis is prevalent in social conversations. Ellipsis increases the difficulty of a series of downstream language understanding tasks, such as dialog act prediction and semantic role labeling. We propose to resolve ellipsis through automatic sentence completion to improve language understanding. However, automatic ellipsis completion can result in output which does not accurately reflect user intent. To address this issue, we propose a method which considers both the original utterance that has ellipsis and the automatically completed utterance in dialog act and semantic role labeling tasks. Specifically, we first complete user utterances to resolve ellipsis using an end-to-end pointer network model. We then train a prediction model using both utterances containing ellipsis and our automatically completed utterances. Finally, we combine the prediction results from these two utterances using a selection model that is guided by expert knowledge. Our approach improves dialog act prediction and semantic role labeling by 1.3% and 2.5% in F1 score respectively in social conversations. We also present an open-domain human-machine conversation dataset with manually completed user utterances and annotated semantic role labeling after manual completion.

【Keywords】:

1177. CFGNN: Cross Flow Graph Neural Networks for Question Answering on Complex Tables.

Paper Link】 【Pages】:9596-9603

【Authors】: Xuanyu Zhang

【Abstract】: Question answering on complex tables is a challenging task for machines. In the Spider, a large-scale complex table dataset, relationships between tables and columns can be easily modeled as graph. But most of graph neural networks (GNNs) ignore the relationship of sibling nodes and use summation as aggregation function to model the relationship of parent-child nodes. It may cause nodes with less degrees, like column nodes in schema graph, to obtain little information. And the context information is important for natural language. To leverage more context information flow comprehensively, we propose novel cross flow graph neural networks in this paper. The information flows of parent-child and sibling nodes cross with history states between different layers. Besides, we use hierarchical encoding layer to obtain contextualized representation in tables. Experiments on the Spider show that our approach achieves substantial performance improvement comparing with previous GNN models and their variants.

【Keywords】:

1178. Task-Oriented Dialog Systems That Consider Multiple Appropriate Responses under the Same Context.

Paper Link】 【Pages】:9604-9611

【Authors】: Yichi Zhang ; Zhijian Ou ; Zhou Yu

【Abstract】: Conversations have an intrinsic one-to-many property, which means that multiple responses can be appropriate for the same dialog context. In task-oriented dialogs, this property leads to different valid dialog policies towards task completion. However, none of the existing task-oriented dialog generation approaches takes this property into account. We propose a Multi-Action Data Augmentation (MADA) framework to utilize the one-to-many property to generate diverse appropriate dialog responses. Specifically, we first use dialog states to summarize the dialog history, and then discover all possible mappings from every dialog state to its different valid system actions. During dialog system training, we enable the current dialog state to map to all valid system actions discovered in the previous process to create additional state-action pairs. By incorporating these additional pairs, the dialog policy learns a balanced action distribution, which further guides the dialog model to generate diverse responses. Experimental results show that the proposed framework consistently improves dialog policy diversity, and results in improved response diversity and appropriateness. Our model obtains state-of-the-art results on MultiWOZ.

【Keywords】:

1179. Relational Graph Neural Network with Hierarchical Attention for Knowledge Graph Completion.

Paper Link】 【Pages】:9612-9619

【Authors】: Zhao Zhang ; Fuzhen Zhuang ; Hengshu Zhu ; Zhi-Ping Shi ; Hui Xiong ; Qing He

【Abstract】: The rapid proliferation of knowledge graphs (KGs) has changed the paradigm for various AI-related applications. Despite their large sizes, modern KGs are far from complete and comprehensive. This has motivated the research in knowledge graph completion (KGC), which aims to infer missing values in incomplete knowledge triples. However, most existing KGC models treat the triples in KGs independently without leveraging the inherent and valuable information from the local neighborhood surrounding an entity. To this end, we propose a Relational Graph neural network with Hierarchical ATtention (RGHAT) for the KGC task. The proposed model is equipped with a two-level attention mechanism: (i) the first level is the relation-level attention, which is inspired by the intuition that different relations have different weights for indicating an entity; (ii) the second level is the entity-level attention, which enables our model to highlight the importance of different neighboring entities under the same relation. The hierarchical attention mechanism makes our model more effective to utilize the neighborhood information of an entity. Finally, we extensively validate the superiority of RGHAT against various state-of-the-art baselines.

【Keywords】:

1180. Distilling Knowledge from Well-Informed Soft Labels for Neural Relation Extraction.

Paper Link】 【Pages】:9620-9627

【Authors】: Zhenyu Zhang ; Xiaobo Shu ; Bowen Yu ; Tingwen Liu ; Jiapeng Zhao ; Quangang Li ; Li Guo

【Abstract】: Extracting relations from plain text is an important task with wide application. Most existing methods formulate it as a supervised problem and utilize one-hot hard labels as the sole target in training, neglecting the rich semantic information among relations. In this paper, we aim to explore the supervision with soft labels in relation extraction, which makes it possible to integrate prior knowledge. Specifically, a bipartite graph is first devised to discover type constraints between entities and relations based on the entire corpus. Then, we combine such type constraints with neural networks to achieve a knowledgeable model. Furthermore, this model is regarded as teacher to generate well-informed soft labels and guide the optimization of a student network via knowledge distillation. Besides, a multi-aspect attention mechanism is introduced to help student mine latent information from text. In this way, the enhanced student inherits the dark knowledge (e.g., type constraints and relevance among relations) from teacher, and directly serves the testing scenarios without any extra constraints. We conduct extensive experiments on the TACRED and SemEval datasets, the experimental results justify the effectiveness of our approach.

【Keywords】:

1181. Semantics-Aware BERT for Language Understanding.

Paper Link】 【Pages】:9628-9635

【Authors】: Zhuosheng Zhang ; Yuwei Wu ; Hai Zhao ; Zuchao Li ; Shuailiang Zhang ; Xi Zhou ; Xiang Zhou

【Abstract】: The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks. However, the existing language representation models including ELMo, GPT and BERT only exploit plain context-sensitive features such as character or word embeddings. They rarely consider incorporating structured semantic information which can provide rich semantics for language representation. To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. SemBERT keeps the convenient usability of its BERT precursor in a light fine-tuning way without substantial task-specific modifications. Compared with BERT, semantics-aware BERT is as simple in concept but more powerful. It obtains new state-of-the-art or substantially improves results on ten reading comprehension and language inference tasks.

【Keywords】:

1182. SG-Net: Syntax-Guided Machine Reading Comprehension.

Paper Link】 【Pages】:9636-9643

【Authors】: Zhuosheng Zhang ; Yuwei Wu ; Junru Zhou ; Sufeng Duan ; Hai Zhao ; Rui Wang

【Abstract】: For machine reading comprehension, the capacity of effectively modeling the linguistic knowledge from the detail-riddled and lengthy passages and getting ride of the noises is essential to improve its performance. Traditional attentive models attend to all words without explicit constraint, which results in inaccurate concentration on some dispensable words. In this work, we propose using syntax to guide the text modeling by incorporating explicit syntactic constraints into attention mechanism for better linguistically motivated word representations. In detail, for self-attention network (SAN) sponsored Transformer-based encoder, we introduce syntactic dependency of interest (SDOI) design into the SAN to form an SDOI-SAN with syntax-guided self-attention. Syntax-guided network (SG-Net) is then composed of this extra SDOI-SAN and the SAN from the original Transformer encoder through a dual contextual architecture for better linguistics inspired representation. To verify its effectiveness, the proposed SG-Net is applied to typical pre-trained language model BERT which is right based on a Transformer encoder. Extensive experiments on popular benchmarks including SQuAD 2.0 and RACE show that the proposed SG-Net design helps achieve substantial performance improvement over strong baselines.

【Keywords】:

1183. Weakly-Supervised Opinion Summarization by Leveraging External Information.

Paper Link】 【Pages】:9644-9651

【Authors】: Chao Zhao ; Snigdha Chaturvedi

【Abstract】: Opinion summarization from online product reviews is a challenging task, which involves identifying opinions related to various aspects of the product being reviewed. While previous works require additional human effort to identify relevant aspects, we instead apply domain knowledge from external sources to automatically achieve the same goal. This work proposes AspMem, a generative method that contains an array of memory cells to store aspect-related knowledge. This explicit memory can help obtain a better opinion representation and infer the aspect information more precisely. We evaluate this method on both aspect identification and opinion summarization tasks. Our experiments show that AspMem outperforms the state-of-the-art methods even though, unlike the baselines, it does not rely on human supervision which is carefully handcrafted for the given tasks.

【Keywords】:

1184. Reinforced Curriculum Learning on Pre-Trained Neural Machine Translation Models.

Paper Link】 【Pages】:9652-9659

【Authors】: Mingjun Zhao ; Haijiang Wu ; Di Niu ; Xiaoli Wang

【Abstract】: The competitive performance of neural machine translation (NMT) critically relies on large amounts of training data. However, acquiring high-quality translation pairs requires expert knowledge and is costly. Therefore, how to best utilize a given dataset of samples with diverse quality and characteristics becomes an important yet understudied question in NMT. Curriculum learning methods have been introduced to NMT to optimize a model's performance by prescribing the data input order, based on heuristics such as the assessment of noise and difficulty levels. However, existing methods require training from scratch, while in practice most NMT models are pre-trained on big data already. Moreover, as heuristics, they do not generalize well. In this paper, we aim to learn a curriculum for improving a pre-trained NMT model by re-selecting influential data samples from the original training set and formulate this task as a reinforcement learning problem. Specifically, we propose a data selection framework based on Deterministic Actor-Critic, in which a critic network predicts the expected change of model performance due to a certain sample, while an actor network learns to select the best sample out of a random batch of samples presented to it. Experiments on several translation datasets show that our method can further improve the performance of NMT when original batch training reaches its ceiling, without using additional new training data, and significantly outperforms several strong baseline methods.

【Keywords】:

1185. Balancing Quality and Human Involvement: An Effective Approach to Interactive Neural Machine Translation.

Paper Link】 【Pages】:9660-9667

【Authors】: Tianxiang Zhao ; Lemao Liu ; Guoping Huang ; Huayang Li ; Yingling Liu ; Guiquan Liu ; Shuming Shi

【Abstract】: Conventional interactive machine translation typically requires a human translator to validate every generated target word, even though most of them are correct in the advanced neural machine translation (NMT) scenario. Previous studies have exploited confidence approaches to address the intensive human involvement issue, which request human guidance only for a few number of words with low confidences. However, such approaches do not take the history of human involvement into account, and optimize the models only for the translation quality while ignoring the cost of human involvement. In response to these pitfalls, we propose a novel interactive NMT model, which explicitly accounts the history of human involvements and particularly is optimized towards two objectives corresponding to the translation quality and the cost of human involvement, respectively. Specifically, the model jointly predicts a target word and a decision on whether to request human guidance, which is based on both the partial translation and the history of human involvements. Since there is no explicit signals on the decisions of requesting human guidance in the bilingual corpus, we optimize the model with the reinforcement learning technique which enables our model to accurately predict when to request human guidance. Simulated and real experiments show that the proposed model can achieve higher translation quality with similar or less human involvement over the confidence-based baseline.

【Keywords】:

1186. Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders.

Paper Link】 【Pages】:9668-9675

【Authors】: Yanbin Zhao ; Lu Chen ; Zhi Chen ; Kai Yu

【Abstract】: Text simplification (TS) rephrases long sentences into simplified variants while preserving inherent semantics. Traditional sequence-to-sequence models heavily rely on the quantity and quality of parallel sentences, which limits their applicability in different languages and domains. This work investigates how to leverage large amounts of unpaired corpora in TS task. We adopt the back-translation architecture in unsupervised machine translation (NMT), including denoising autoencoders for language modeling and automatic generation of parallel data by iterative back-translation. However, it is non-trivial to generate appropriate complex-simple pair if we directly treat the set of simple and complex corpora as two different languages, since the two types of sentences are quite similar and it is hard for the model to capture the characteristics in different types of sentences. To tackle this problem, we propose asymmetric denoising methods for sentences with separate complexity. When modeling simple and complex sentences with autoencoders, we introduce different types of noise into the training process. Such a method can significantly improve the simplification performance. Our model can be trained in both unsupervised and semi-supervised manner. Automatic and human evaluations show that our unsupervised model outperforms the previous systems, and with limited supervision, our model can perform competitively with multiple state-of-the-art simplification systems.

【Keywords】:

1187. Dynamic Reward-Based Dueling Deep Dyna-Q: Robust Policy Learning in Noisy Environments.

Paper Link】 【Pages】:9676-9684

【Authors】: Yangyang Zhao ; Zhenyu Wang ; Kai Yin ; Rui Zhang ; Zhenhua Huang ; Pei Wang

【Abstract】: Task-oriented dialogue systems provide a convenient interface to help users complete tasks. An important consideration for task-oriented dialogue systems is the ability to against the noise commonly existed in the real-world conversation. Both rule-based strategies and statistical modeling techniques can solve noise problems, but they are costly. In this paper, we propose a new approach, called Dynamic Reward-based Dueling Deep Dyna-Q (DR-D3Q). The DR-D3Q can learn policies in noise robustly, and it is easy to implement by combining dynamic reward and the Dueling Deep Q-Network (Dueling DQN) into Deep Dyna-Q (DDQ) framework. The Dueling DQN can mitigate the negative impact of noise on learning policies, but it is inapplicable to dialogue domain due to different reward mechanisms. Unlike typical dialogue reward function, we integrate dynamic reward that provides reward in real-time for agent to make Dueling DQN adapt to dialogue domain. For the purpose of supplementing the limited amount of real user experiences, we take the DDQ framework as the basic framework. Experiments using simulation and human evaluation show that the DR-D3Q significantly improve the performance of policy learning tasks in noisy environments.1

【Keywords】:

1188. Replicate, Walk, and Stop on Syntax: An Effective Neural Network Model for Aspect-Level Sentiment Classification.

Paper Link】 【Pages】:9685-9692

【Authors】: Yaowei Zheng ; Richong Zhang ; Samuel Mensah ; Yongyi Mao

【Abstract】: Aspect-level sentiment classification (ALSC) aims at predicting the sentiment polarity of a specific aspect term occurring in a sentence. This task requires learning a representation by aggregating the relevant contextual features concerning the aspect term. Existing methods cannot sufficiently leverage the syntactic structure of the sentence, and hence are difficult to distinguish different sentiments for multiple aspects in a sentence. We perceive the limitations of the previous methods and propose a hypothesis about finding crucial contextual information with the help of syntactic structure. For this purpose, we present a neural network model named RepWalk which performs a replicated random walk on a syntax graph, to effectively focus on the informative contextual words. Empirical studies show that our model outperforms recent models on most of the benchmark datasets for the ALSC task. The results suggest that our method for incorporating syntactic structure enriches the representation for the classification.

【Keywords】:

1189. A Pre-Training Based Personalized Dialogue Generation Model with Persona-Sparse Data.

Paper Link】 【Pages】:9693-9700

【Authors】: Yinhe Zheng ; Rongsheng Zhang ; Minlie Huang ; Xiaoxi Mao

【Abstract】: Endowing dialogue systems with personas is essential to deliver more human-like conversations. However, this problem is still far from well explored due to the difficulties of both embodying personalities in natural languages and the persona sparsity issue observed in most dialogue corpora. This paper proposes a pre-training based personalized dialogue model that can generate coherent responses using persona-sparse dialogue data. In this method, a pre-trained language model is used to initialize an encoder and decoder, and personal attribute embeddings are devised to model richer dialogue contexts by encoding speakers' personas together with dialogue histories. Further, to incorporate the target persona in the decoding process and to balance its contribution, an attention routing structure is devised in the decoder to merge features extracted from the target persona and dialogue contexts using dynamically predicted weights. Our model can utilize persona-sparse dialogues in a unified manner during the training process, and can also control the amount of persona-related features to exhibit during the inference process. Both automatic and manual evaluation demonstrates that the proposed model outperforms state-of-the-art methods for generating more coherent and persona consistent responses with persona-sparse data.

【Keywords】:

Paper Link】 【Pages】:9701-9708

【Authors】: Haoxi Zhong ; Chaojun Xiao ; Cunchao Tu ; Tianyang Zhang ; Zhiyuan Liu ; Maosong Sun

【Abstract】: We present JEC-QA, the largest question answering dataset in the legal domain, collected from the National Judicial Examination of China. The examination is a comprehensive evaluation of professional skills for legal practitioners. College students are required to pass the examination to be certified as a lawyer or a judge. The dataset is challenging for existing question answering methods, because both retrieving relevant materials and answering questions require the ability of logic reasoning. Due to the high demand of multiple reasoning abilities to answer legal questions, the state-of-the-art models can only achieve about 28% accuracy on JEC-QA, while skilled humans and unskilled humans can reach 81% and 64% accuracy respectively, which indicates a huge gap between humans and machines on this task. We will release JEC-QA and our baselines to help improve the reasoning ability of machine comprehension models. You can access the dataset from http://jecqa.thunlp.org/.

【Keywords】:

1191. Discourse Level Factors for Sentence Deletion in Text Simplification.

Paper Link】 【Pages】:9709-9716

【Authors】: Yang Zhong ; Chao Jiang ; Wei Xu ; Junyi Jessy Li

【Abstract】: This paper presents a data-driven study focusing on analyzing and predicting sentence deletion — a prevalent but understudied phenomenon in document simplification — on a large English text simplification corpus. We inspect various document and discourse factors associated with sentence deletion, using a new manually annotated sentence alignment corpus we collected. We reveal that professional editors utilize different strategies to meet readability standards of elementary and middle schools. To predict whether a sentence will be deleted during simplification to a certain level, we harness automatically aligned data to train a classification model. Evaluated on our manually annotated data, our best models reached F1 scores of 65.2 and 59.7 for this task at the levels of elementary and middle school, respectively. We find that discourse level factors contribute to the challenging task of predicting sentence deletion for simplification.

【Keywords】:

1192. Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models.

Paper Link】 【Pages】:9717-9724

【Authors】: Wangchunshu Zhou ; Ke Xu

【Abstract】: Automated evaluation of open domain natural language generation (NLG) models remains a challenge and widely used metrics such as BLEU and Perplexity can be misleading in some cases. In our paper, we propose to evaluate natural language generation models by learning to compare a pair of generated sentences by fine-tuning BERT, which has been shown to have good natural language understanding ability. We also propose to evaluate the model-level quality of NLG models with sample-level comparison results with skill rating system. While able to be trained in a fully self-supervised fashion, our model can be further fine-tuned with a little amount of human preference annotation to better imitate human judgment. In addition to evaluating trained models, we propose to apply our model as a performance indicator during training for better hyperparameter tuning and early-stopping. We evaluate our approach on both story generation and chit-chat dialogue response generation. Experimental results show that our model correlates better with human preference compared with previous automated evaluation approaches. Training with the proposed metric yields better performance in human evaluation, which further demonstrates the effectiveness of the proposed model.

【Keywords】:

1193. Co-Attention Hierarchical Network: Generating Coherent Long Distractors for Reading Comprehension.

Paper Link】 【Pages】:9725-9732

【Authors】: Xiaorui Zhou ; Senlin Luo ; Yunfang Wu

【Abstract】: In reading comprehension, generating sentence-level distractors is a significant task, which requires a deep understanding of the article and question. The traditional entity-centered methods can only generate word-level or phrase-level distractors. Although recently proposed neural-based methods like sequence-to-sequence (Seq2Seq) model show great potential in generating creative text, the previous neural methods for distractor generation ignore two important aspects. First, they didn't model the interactions between the article and question, making the generated distractors tend to be too general or not relevant to question context. Second, they didn't emphasize the relationship between the distractor and article, making the generated distractors not semantically relevant to the article and thus fail to form a set of meaningful options. To solve the first problem, we propose a co-attention enhanced hierarchical architecture to better capture the interactions between the article and question, thus guide the decoder to generate more coherent distractors. To alleviate the second problem, we add an additional semantic similarity loss to push the generated distractors more relevant to the article. Experimental results show that our model outperforms several strong baselines on automatic metrics, achieving state-of-the-art performance. Further human evaluation indicates that our generated distractors are more coherent and more educative compared with those distractors generated by baselines.

【Keywords】:

1194. Evaluating Commonsense in Pre-Trained Language Models.

Paper Link】 【Pages】:9733-9740

【Authors】: Xuhui Zhou ; Yue Zhang ; Leyang Cui ; Dandan Huang

【Abstract】: Contextualized representations trained over large raw text data have given remarkable improvements for NLP tasks including question answering and reading comprehension. There have been works showing that syntactic, semantic and word sense knowledge are contained in such representations, which explains why they benefit such tasks. However, relatively little work has been done investigating commonsense knowledge contained in contextualized representations, which is crucial for human question answering and reading comprehension. We study the commonsense ability of GPT, BERT, XLNet, and RoBERTa by testing them on seven challenging benchmarks, finding that language modeling and its variants are effective objectives for promoting models' commonsense ability while bi-directional context and larger training set are bonuses. We additionally find that current models do poorly on tasks require more necessary inference steps. Finally, we test the robustness of models by making dual test cases, which are correlated so that the correct prediction of one sample should lead to correct prediction of the other. Interestingly, the models show confusion on these test cases, which suggests that they learn commonsense at the surface rather than the deep level. We release a test set, named CATs publicly, for future research.

【Keywords】:

1195. Who Did They Respond to? Conversation Structure Modeling Using Masked Hierarchical Transformer.

Paper Link】 【Pages】:9741-9748

【Authors】: Henghui Zhu ; Feng Nan ; Zhiguo Wang ; Ramesh Nallapati ; Bing Xiang

【Abstract】: Conversation structure is useful for both understanding the nature of conversation dynamics and for providing features for many downstream applications such as summarization of conversations. In this work, we define the problem of conversation structure modeling as identifying the parent utterance(s) to which each utterance in the conversation responds to. Previous work usually took a pair of utterances to decide whether one utterance is the parent of the other. We believe the entire ancestral history is a very important information source to make accurate prediction. Therefore, we design a novel masking mechanism to guide the ancestor flow, and leverage the transformer model to aggregate all ancestors to predict parent utterances. Our experiments are performed on the Reddit dataset (Zhang, Culbertson, and Paritosh 2017) and the Ubuntu IRC dataset (Kummerfeld et al. 2019). In addition, we also report experiments on a new larger corpus from the Reddit platform and release this dataset. We show that the proposed model, that takes into account the ancestral history of the conversation, significantly outperforms several strong baselines including the BERT model on all datasets.

【Keywords】:

1196. Multimodal Summarization with Guidance of Multimodal Reference.

Paper Link】 【Pages】:9749-9756

【Authors】: Junnan Zhu ; Yu Zhou ; Jiajun Zhang ; Haoran Li ; Chengqing Zong ; Changliang Li

【Abstract】: Multimodal summarization with multimodal output (MSMO) is to generate a multimodal summary for a multimodal news report, which has been proven to effectively improve users' satisfaction. The existing MSMO methods are trained by the target of text modality, leading to the modality-bias problem that ignores the quality of model-selected image during training. To alleviate this problem, we propose a multimodal objective function with the guidance of multimodal reference to use the loss from the summary generation and the image selection. Due to the lack of multimodal reference data, we present two strategies, i.e., ROUGE-ranking and Order-ranking, to construct the multimodal reference by extending the text reference. Meanwhile, to better evaluate multimodal outputs, we propose a novel evaluation metric based on joint multimodal representation, projecting the model output and multimodal reference into a joint semantic space during evaluation. Experimental results have shown that our proposed model achieves the new state-of-the-art on both automatic and manual evaluation metrics. Besides, our proposed evaluation method can effectively improve the correlation with human judgments.

【Keywords】:

1197. LATTE: Latent Type Modeling for Biomedical Entity Linking.

Paper Link】 【Pages】:9757-9764

【Authors】: Ming Zhu ; Busra Celikkaya ; Parminder Bhatia ; Chandan K. Reddy

【Abstract】: Entity linking is the task of linking mentions of named entities in natural language text, to entities in a curated knowledge-base. This is of significant importance in the biomedical domain, where it could be used to semantically annotate a large volume of clinical records and biomedical literature, to standardized concepts described in an ontology such as Unified Medical Language System (UMLS). We observe that with precise type information, entity disambiguation becomes a straightforward task. However, fine-grained type information is usually not available in biomedical domain. Thus, we propose LATTE, a LATent Type Entity Linking model, that improves entity linking by modeling the latent fine-grained type information about mentions and entities. Unlike previous methods that perform entity linking directly between the mentions and the entities, LATTE jointly does entity disambiguation, and latent fine-grained type learning, without direct supervision. We evaluate our model on two biomedical datasets: MedMentions, a large scale public dataset annotated with UMLS concepts, and a de-identified corpus of dictated doctor's notes that has been annotated with ICD concepts. Extensive experimental evaluation shows our model achieves significant performance improvements over several state-of-the-art techniques.

【Keywords】:

AAAI Technical Track: Planning, Routing, and Scheduling 31

1198. Hybrid Compositional Reasoning for Reactive Synthesis from Finite-Horizon Specifications.

Paper Link】 【Pages】:9766-9774

【Authors】: Suguman Bansal ; Yong Li ; Lucas M. Tabajara ; Moshe Y. Vardi

【Abstract】: LTLf synthesis is the automated construction of a reactive system from a high-level description, expressed in LTLf, of its finite-horizon behavior. So far, the conversion of LTLf formulas to deterministic finite-state automata (DFAs) has been identified as the primary bottleneck to the scalabity of synthesis. Recent investigations have also shown that the size of the DFA state space plays a critical role in synthesis as well.Therefore, effective resolution of the bottleneck for synthesis requires the conversion to be time and memory performant, and prevent state-space explosion. Current conversion approaches, however, which are based either on explicit-state representation or symbolic-state representation, fail to address these necessities adequately at scale: Explicit-state approaches generate minimal DFA but are slow due to expensive DFA minimization. Symbolic-state representations can be succinct, but due to the lack of DFA minimization they generate such large state spaces that even their symbolic representations cannot compensate for the blow-up.This work proposes a hybrid representation approach for the conversion. Our approach utilizes both explicit and symbolic representations of the state-space, and effectively leverages their complementary strengths. In doing so, we offer an LTLf to DFA conversion technique that addresses all three necessities, hence resolving the bottleneck. A comprehensive empirical evaluation on conversion and synthesis benchmarks supports the merits of our hybrid approach.

【Keywords】:

1199. On Succinct Groundings of HTN Planning Problems.

Paper Link】 【Pages】:9775-9784

【Authors】: Gregor Behnke ; Daniel Höller ; Alexander Schmid ; Pascal Bercher ; Susanne Biundo

【Abstract】: Both search-based and translation-based planning systems usually operate on grounded representations of the problem. Planning models, however, are commonly defined using lifted description languages. Thus, planning systems usually generate a grounded representation of the lifted model as a preprocessing step. For HTN planning models, only one method to ground lifted models has been published so far. In this paper we present a new approach for grounding HTN planning problems that produces smaller groundings in a shorter timespan than the previously published method.

【Keywords】:

1200. POP ≡ POCL, Right? Complexity Results for Partial Order (Causal Link) Makespan Minimization.

Paper Link】 【Pages】:9785-9793

【Authors】: Pascal Bercher ; Conny Olz

【Abstract】: We study PO and POCL plans with regard to their makespan – the execution time when allowing the parallel execution of causally independent actions. Partially ordered (PO) plans are often assumed to be equivalent to partial order causal link (POCL) plans, where the causal relationships between actions are explicitly represented via causal links. As a first contribution, we study the similarities and differences of PO and POCL plans, thereby clarifying a common misconception about their relationship: There are PO plans for which there does not exist a POCL plan with the same orderings. We prove that we can still always find a POCL plan with the same makespan in polynomial time. As another main result we prove that turning a PO or POCL plan into one with minimal makespan by only removing ordering constraints (called deordering) is NP-complete. We provide a series of further results on special cases and implications, such as reordering, where orderings can be changed arbitrarily.

【Keywords】:

1201. Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes.

Paper Link】 【Pages】:9794-9801

【Authors】: Tomás Brázdil ; Krishnendu Chatterjee ; Petr Novotný ; Jiri Vahala

【Abstract】: Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 106 states.

【Keywords】:

Paper Link】 【Pages】:9802-9809

【Authors】: Lukás Chrpa ; Jakub Gemrot ; Martin Pilát

【Abstract】: Automated Planning addresses the problem of finding a sequence of actions, a plan, transforming the environment from its initial state to some goal state. In real-world environments, exogenous events might occur and might modify the environment without agent's consent. Besides disrupting agent's plan, events might hinder agent's pursuit towards its goals and even cause damage (e.g. destroying the robot).In this paper, we leverage the notion of Safe States in dynamic environments under presence of non-deterministic exogenous events that might eventually cause dead-ends (e.g. “damage” the agent) if the agent is not careful while executing its plan. We introduce a technique for generating plans that constrains the number of consecutive “unsafe” actions in a plan and a technique for generating “robust” plans that effectively evade event effects. Combination of both approaches plans and executes robust plans between safe states. We empirically show that such an approach effectively navigates the agent towards its goals in spite of presence of dead-ends.

【Keywords】:

1203. Optimizing Reachability Sets in Temporal Graphs by Delaying.

Paper Link】 【Pages】:9810-9817

【Authors】: Argyrios Deligkas ; Igor Potapov

【Abstract】: A temporal graph is a dynamic graph where every edge is assigned a set of integer time labels that indicate at which discrete time step the edge is available. In this paper, we study how changes of the time labels, corresponding to delays on the availability of the edges, affect the reachability sets from given sources. The questions about reachability sets are motivated by numerous applications of temporal graphs in network epidemiology and scheduling problems in supply networks in manufacturing. We introduce control mechanisms for reachability sets that are based on two natural operations of delaying time events. The first operation, termed merging, is global and batches together consecutive time labels in the whole network simultaneously. This corresponds to postponing all events until a particular time. The second, imposes independent delays on the time labels of every edge of the graph. We provide a thorough investigation of the computational complexity of different objectives related to reachability sets when these operations are used. For the merging operation, we prove NP-hardness results for several minimization and maximization reachability objectives, even for very simple graph structures. For the second operation, we prove that the minimization problems are NP-hard when the number of allowed delays is bounded. We complement this with a polynomial-time algorithm for the case of unbounded delays.

【Keywords】:

1204. A New Approach to Plan-Space Explanation: Analyzing Plan-Property Dependencies in Oversubscription Planning.

Paper Link】 【Pages】:9818-9826

【Authors】: Rebecca Eifler ; Michael Cashmore ; Jörg Hoffmann ; Daniele Magazzeni ; Marcel Steinmetz

【Abstract】: In many usage scenarios of AI Planning technology, users will want not just a plan π but an explanation of the space of possible plans, justifying π. In particular, in oversubscription planning where not all goals can be achieved, users may ask why a conjunction A of goals is not achieved by π. We propose to answer this kind of question with the goal conjunctions B excluded by A, i. e., that could not be achieved if A were to be enforced. We formalize this approach in terms of plan-property dependencies, where plan properties are propositional formulas over the goals achieved by a plan, and dependencies are entailment relations in plan space. We focus on entailment relations of the form ∧g∈A g ⇒ ⌝ ∧g∈B g, and devise analysis techniques globally identifying all such relations, or locally identifying the implications of a single given plan property (user question) ∧g∈A g. We show how, via compilation, one can analyze dependencies between a richer form of plan properties, specifying formulas over action subsets touched by the plan. We run comprehensive experiments on adapted IPC benchmarks, and find that the suggested analyses are reasonably feasible at the global level, and become significantly more effective at the local level.

【Keywords】:

1205. Beliefs We Can Believe in: Replacing Assumptions with Data in Real-Time Search.

Paper Link】 【Pages】:9827-9834

【Authors】: Maximilian Fickert ; Tianyi Gu ; Leonhard Staut ; Wheeler Ruml ; Jörg Hoffmann ; Marek Petrik

【Abstract】: Suboptimal heuristic search algorithms can benefit from reasoning about heuristic error, especially in a real-time setting where there is not enough time to search all the way to a goal. However, current reasoning methods implicitly or explicitly incorporate assumptions about the cost-to-go function. We consider a recent real-time search algorithm, called Nancy, that manipulates explicit beliefs about the cost-to-go. The original presentation of Nancy assumed that these beliefs are Gaussian, with parameters following a certain form. In this paper, we explore how to replace these assumptions with actual data. We develop a data-driven variant of Nancy, DDNancy, that bases its beliefs on heuristic performance statistics from the same domain. We extend Nancy and DDNancy with the notion of persistence and prove their completeness. Experimental results show that DDNancy can perform well in domains in which the original assumption-based Nancy performs poorly.

【Keywords】:

1206. Lifted Fact-Alternating Mutex Groups and Pruned Grounding of Classical Planning Problems.

Paper Link】 【Pages】:9835-9842

【Authors】: Daniel Fiser

【Abstract】: In this paper, we focus on the inference of mutex groups in the lifted (PDDL) representation. We formalize the inference and prove that the most commonly used translator from the Fast Downward (FD) planning system infers a certain subclass of mutex groups, called fact-alternating mutex groups (fam-groups). Based on that, we show that the previously proposed fam-groups-based pruning techniques for the STRIPS representation can be utilized during the grounding process with lifted fam-groups, i.e., before the full STRIPS representation is known. Furthermore, we propose an improved inference algorithm for lifted fam-groups that produces a richer set of fam-groups than the FD translator and we demonstrate a positive impact on the number of pruned operators and overall coverage.

【Keywords】:

1207. Time-Inconsistent Planning: Simple Motivation Is Hard to Find.

Paper Link】 【Pages】:9843-9850

【Authors】: Fedor V. Fomin ; Torstein J. F. Strømme

【Abstract】: People sometimes act differently when making decisions affecting the present moment versus decisions affecting the future only. This is referred to as time-inconsistent behaviour, and can be modeled as agents exhibiting present bias. A resulting phenomenon is abandonment, which is when an agent initially pursues a task, but ultimately gives up before reaping the rewards. With the introduction of the graph-theoretic time-inconsistent planning model due to Kleinberg and Oren, it has been possible to investigate the computational complexity of how a task designer best can support a present-biased agent in completing the task. In this paper, we study the complexity of finding a choice reduction for the agent; that is, how to remove edges and vertices from the task graph such that a present-biased agent will remain motivated to reach his target even for a limited reward. While this problem is NP-complete in general, this is not necessarily true for instances which occur in practice, or for solutions which are of interest to task designers. For instance, a task designer may desire to find the best task graph which is not too complicated. We therefore investigate the problem of finding simple motivating subgraphs. These are structures where the agent will modify his plan at most k times along the way. We quantify this simplicity in the time-inconsistency model as a structural parameter: The number of branching vertices (vertices with out-degree at least 2) in a minimal motivating subgraph. Our results are as follows: We give a linear algorithm for finding an optimal motivating path, i. e. when k = 0. On the negative side, we show that finding a simple motivating subgraph is NP-complete even if we allow only a single branching vertex — revealing that simple motivating subgraphs are indeed hard to find. However, we give a pseudo-polynomial algorithm for the case when k is fixed and edge weights are rationals, which might be a reasonable assumption in practice.

【Keywords】:

1208. Dynamic Control of Probabilistic Simple Temporal Networks.

Paper Link】 【Pages】:9851-9858

【Authors】: Michael Gao ; Lindsay Popowski ; Jim Boerkoel

【Abstract】: The controllability of a temporal network is defined as an agent's ability to navigate around the uncertainty in its schedule and is well-studied for certain networks of temporal constraints. However, many interesting real-world problems can be better represented as Probabilistic Simple Temporal Networks (PSTNs) in which the uncertain durations are represented using potentially-unbounded probability density functions. This can make it inherently impossible to control for all eventualities. In this paper, we propose two new dynamic controllability algorithms that attempt to maximize the likelihood of successfully executing a schedule within a PSTN. The first approach, which we call Min-Loss DC, finds a dynamic scheduling strategy that minimizes loss of control by using a conflict-directed search to decide where to sacrifice the control in a way that optimizes overall success. The second approach, which we call Max-Gain DC, works in the other direction: it finds a dynamically controllable schedule and then attempts to progressively strengthen it by capturing additional uncertainty. Our approaches are the first known that work by finding maximally dynamically controllable schedules. We empirically compare our approaches against two existing PSTN offline dispatch approaches and one online approach and show that our Min-Loss DC algorithm outperforms the others in terms of maximizing execution success while maintaining competitive runtimes.

【Keywords】:

1209. Decidability and Complexity of Action-Based Temporal Planning over Dense Time.

Paper Link】 【Pages】:9859-9866

【Authors】: Nicola Gigante ; Andrea Micheli ; Angelo Montanari ; Enrico Scala

【Abstract】: This paper studies the computational complexity of temporal planning, as represented by PDDL 2.1, interpreted over dense time. When time is considered discrete, the problem is known to be EXPSPACE-complete. However, the official PDDL 2.1 semantics, and many implementations, interpret time as a dense domain. This work provides several results about the complexity of the problem, studying a few interesting cases: whether a minimum amount ϵ of separation between mutually exclusive events is given, in contrast to the separation being simply required to be non-zero, and whether or not actions are allowed to overlap already running instances of themselves. We prove the problem to be PSPACE-complete when self-overlap is forbidden, whereas, when allowed, it becomes EXPSPACE-complete with ϵ-separation and undecidable with non-zero separation. These results clarify the computational consequences of different choices in the definition of the PDDL 2.1 semantics, which were vague until now.

【Keywords】:

1210. Solving Sum-of-Costs Multi-Agent Pathfinding with Answer-Set Programming.

Paper Link】 【Pages】:9867-9874

【Authors】: Rodrigo N. Gómez ; Carlos Hernández ; Jorge A. Baier

【Abstract】: Solving a Multi-Agent Pathfinding (MAPF) problem involves finding non-conflicting paths that lead a number of agents to their goal location. In the sum-of-costs variant of MAPF, one is also required to minimize the total number of moves performed by agents before stopping at the goal. Not surprisingly, since MAPF is combinatorial, a number of compilations to Satisfiability solving (SAT) and Answer Set Programming (ASP) exist. In this paper, we propose the first family of compilations to ASP that solve sum-of-costs MAPF over 4-connected grids. Unlike existing compilations to ASP that we are aware of, our encoding is the first that, after grounding, produces a number of clauses that is linear on the number of agents. In addition, the representation of the optimization objective is also carefully written, such that its size after grounding does not depend on the size of the grid. In our experimental evaluation, we show that our approach outperforms search- and SAT-based sum-of-costs MAPF solvers when grids are congested with agents.

【Keywords】:

1211. Novel Is Not Always Better: On the Relation between Novelty and Dominance Pruning.

Paper Link】 【Pages】:9875-9882

【Authors】: Joschka Gross ; Álvaro Torralba ; Maximilian Fickert

【Abstract】: Novelty pruning is a planning technique that focuses on exploring states that are novel, i.e., those containing facts that have not been seen before. This seemingly simple idea has had a huge impact on the state of the art in planning though its effectiveness is not entirely understood yet.We relate novelty to dominance pruning, which compares states to previously seen states to eliminate those that are provably worse in terms of goal distance. Novelty can be interpreted as an unsafe approximation of dominance, where states containing novel facts are relevant because they enable new paths to the goal and, therefore, they are less likely to be dominated by others. This provides a framework to understand the success of novelty, resulting in new variants that combine both techniques.

【Keywords】:

1212. HDDL: An Extension to PDDL for Expressing Hierarchical Planning Problems.

Paper Link】 【Pages】:9883-9891

【Authors】: Daniel Höller ; Gregor Behnke ; Pascal Bercher ; Susanne Biundo ; Humbert Fiorino ; Damien Pellier ; Ron Alford

【Abstract】: The research in hierarchical planning has made considerable progress in the last few years. Many recent systems do not rely on hand-tailored advice anymore to find solutions, but are supposed to be domain-independent systems that come with sophisticated solving techniques. In principle, this development would make the comparison between systems easier (because the domains are not tailored to a single system anymore) and – much more important – also the integration into other systems, because the modeling process is less tedious (due to the lack of advice) and there is no (or less) commitment to a certain planning system the model is created for. However, these advantages are destroyed by the lack of a common input language and feature set supported by the different systems. In this paper, we propose an extension to PDDL, the description language used in non-hierarchical planning, to the needs of hierarchical planning systems.

【Keywords】:

1213. Reshaping Diverse Planning.

Paper Link】 【Pages】:9892-9899

【Authors】: Michael Katz ; Shirin Sohrabi

【Abstract】: The need for multiple plans has been established by various planning applications. In some, solution quality has the predominant role, while in others diversity is the key factor. Most recent work takes both plan quality and solution diversity into account under the generic umbrella of diverse planning. There is no common agreement, however, on a collection of computational problems that fall under that generic umbrella. This in particular might lead to a comparison between planners that have different solution guarantees or optimization criteria in mind. In this work we revisit diverse planning literature in search of such a collection of computational problems, classifying the existing planners to these problems. We formally define a taxonomy of computational problems with respect to both plan quality and solution diversity, extending the existing work. We propose a novel approach to diverse planning, exploiting existing classical planners via planning task reformulation and choosing a subset of plans of required size in post-processing. Based on that, we present planners for two computational problems, that most existing planners solve. Our experiments show that the proposed approach significantly improves over the best performing existing planners in terms of coverage, the overall solution quality, and the overall diversity according to various diversity metrics.

【Keywords】:

1214. Top-Quality Planning: Finding Practically Useful Sets of Best Plans.

Paper Link】 【Pages】:9900-9907

【Authors】: Michael Katz ; Shirin Sohrabi ; Octavian Udrea

【Abstract】: The need for finding a set of plans rather than one has been motivated by a variety of planning applications. The problem is studied in the context of both diverse and top-k planning: while diverse planning focuses on the difference between pairs of plans, the focus of top-k planning is on the quality of each individual plan. Recent work in diverse planning introduced additionally restrictions on solution quality. Naturally, there are application domains where diversity plays the major role and domains where quality is the predominant feature. In both cases, however, the amount of produced plans is often an artificial constraint, and therefore the actual number has little meaning. Inspired by the recent work in diverse planning, we propose a new family of computational problems called top-quality planning, where solution validity is defined through plan quality bound rather than an arbitrary number of plans. Switching to bounding plan quality allows us to implicitly represent sets of plans. In particular, it makes it possible to represent sets of plans that correspond to valid plan reorderings with a single plan. We formally define the unordered top-quality planning computational problem and present the first planner for that problem. We empirically demonstrate the superior performance of our approach compared to a top-k planner-based baseline, ranging from 41% increase in coverage for finding all optimal plans to 69% increase in coverage for finding all plans of quality up to 120% of optimal plan cost. Finally, complementing the new approach by a complete procedure for generating all valid reorderings of a given plan, we derive a top-quality planner. We show the planner to be competitive with a top-k planner based baseline.

【Keywords】:

1215. Information Shaping for Enhanced Goal Recognition of Partially-Informed Agents.

Paper Link】 【Pages】:9908-9915

【Authors】: Sarah Keren ; Haifeng Xu ; Kofi Kwapong ; David C. Parkes ; Barbara Grosz

【Abstract】: We extend goal recognition design to account for partially informed agents. In particular, we consider a two-agent setting in which one agent, the actor, seeks to achieve a goal but has only incomplete information about the environment. The second agent, the recognizer, has perfect information and aims to recognize the actor's goal from its behavior as quickly as possible. As a one-time offline intervention and with the objective of facilitating the recognition task, the recognizer can selectively reveal information to the actor. The problem of selecting which information to reveal, which we call information shaping, is challenging not only because the space of information shaping options may be large, but also because more information revelation need not make it easier to recognize an agent's goal. We formally define this problem, and suggest a pruning approach for efficiently searching the search space. We demonstrate the effectiveness and efficiency of the suggested method on standard benchmarks.

【Keywords】:

Paper Link】 【Pages】:9916-9924

【Authors】: Beomjoon Kim ; Kyungjae Lee ; Sungbin Lim ; Leslie Pack Kaelbling ; Tomás Lozano-Pérez

【Abstract】: Many important applications, including robotics, data-center management, and process control, require planning action sequences in domains with continuous state and action spaces and discontinuous objective functions. Monte Carlo tree search (MCTS) is an effective strategy for planning in discrete action spaces. We provide a novel MCTS algorithm (voot) for deterministic environments with continuous action spaces, which, in turn, is based on a novel black-box function-optimization algorithm (voo) to efficiently sample actions. The voo algorithm uses Voronoi partitioning to guide sampling, and is particularly efficient in high-dimensional spaces. The voot algorithm has an instance of voo at each node in the tree. We provide regret bounds for both algorithms and demonstrate their empirical effectiveness in several high-dimensional problems including two difficult robotics planning problems.

【Keywords】:

1217. Idle Time Optimization for Target Assignment and Path Finding in Sortation Centers.

Paper Link】 【Pages】:9925-9932

【Authors】: Ngai Meng Kou ; Cheng Peng ; Hang Ma ; T. K. Satish Kumar ; Sven Koenig

【Abstract】: In this paper, we study the one-shot and lifelong versions of the Target Assignment and Path Finding problem in automated sortation centers, where each agent needs to constantly assign itself a sorting station, move to its assigned station without colliding with obstacles or other agents, wait in the queue of that station to obtain a parcel for delivery, and then deliver the parcel to a sorting bin. The throughput of such centers is largely determined by the total idle time of all stations since their queues can frequently become empty. To address this problem, we first formalize and study the one-shot version that assigns stations to a set of agents and finds collision-free paths for the agents to their assigned stations. We present efficient algorithms for this task based on a novel min-cost max-flow formulation that minimizes the total idle time of all stations in a fixed time window. We then demonstrate how our algorithms for solving the one-shot problem can be applied to solving the lifelong problem as well. Experimentally, we believe to be the first researchers to consider real-world automated sortation centers using an industrial simulator with realistic data and a kinodynamic model of real robots. On this simulator, we showcase the benefits of our algorithms by demonstrating their efficiency and effectiveness for up to 350 agents.

【Keywords】:

1218. Semantic Attachments for HTN Planning.

Paper Link】 【Pages】:9933-9940

【Authors】: Mauricio Cecilio Magnaguagno ; Felipe Meneguzzi

【Abstract】: Hierarchical Task Networks (HTN) planning uses a decomposition process guided by domain knowledge to guide search towards a planning task. While many HTN planners allow calls to external processes (e.g. to a simulator interface) during the decomposition process, this is a computationally expensive process, so planner implementations often use such calls in an ad-hoc way using very specialized domain knowledge to limit the number of calls. Conversely, the classical planners that are capable of using external calls (often called semantic attachments) during planning are limited to generating a fixed number of ground operators at problem grounding time. We formalize Semantic Attachments for HTN planning using semi coroutines, allowing such procedurally defined predicates to link the planning process to custom unifications outside of the planner, such as numerical results from a robotics simulator. The resulting planner then uses such coroutines as part of its backtracking mechanism to search through parallel dimensions of the state-space (e.g. through numeric variables). We show empirically that our planner outperforms the state-of-the-art numeric planners in a number of domains using minimal extra domain knowledge.

【Keywords】:

1219. Automated Synthesis of Social Laws in STRIPS.

Paper Link】 【Pages】:9941-9948

【Authors】: Ronen Nir ; Alexander Shleyfman ; Erez Karpas

【Abstract】: Agents operating in a multi-agent environment must consider not just their actions, but also those of the other agents in the system. Artificial social systems are a well-known means for coordinating a set of agents, without requiring centralized planning or online negotiation between agents. Artificial social systems enact a social law which restricts the agents from performing some actions under some circumstances. A robust social law prevents the agents from interfering with each other, but does not prevent them from achieving their goals. Previous work has addressed how to check if a given social law, formulated in a variant of ma-strips, is robust, via compilation to planning. However, the social law was manually specified. In this paper, we address the problem of automatically synthesizing a robust social law for a given multi-agent environment. We treat the problem of social law synthesis as a search through the space of possible social laws, relying on the robustness verification procedure as a goal test. We also show how to exploit additional information produced by the robustness verification procedure to guide the search.

【Keywords】:

1220. Generalized Planning with Positive and Negative Examples.

Paper Link】 【Pages】:9949-9956

【Authors】: Javier Segovia Aguas ; Sergio Jiménez ; Anders Jonsson

【Abstract】: Generalized planning aims at computing an algorithm-like structure (generalized plan) that solves a set of multiple planning instances. In this paper we define negative examples for generalized planning as planning instances that must not be solved by a generalized plan. With this regard the paper extends the notion of validation of a generalized plan as the problem of verifying that a given generalized plan solves the set of input positives instances while it fails to solve a given input set of negative examples. This notion of plan validation allows us to define quantitative metrics to asses the generalization capacity of generalized plans. The paper also shows how to incorporate this new notion of plan validation into a compilation for plan synthesis that takes both positive and negative instances as input. Experiments show that incorporating negative examples can accelerate plan synthesis in several domains and leverage quantitative metrics to evaluate the generalization capacity of the synthesized plans.

【Keywords】:

1221. Active Goal Recognition.

Paper Link】 【Pages】:9957-9966

【Authors】: Maayan Shvo ; Sheila A. McIlraith

【Abstract】: The objective of goal recognition is to infer a goal that accounts for the observed behavior of an actor. In this work, we introduce and formalize the notion of active goal recognition in which we endow the observer with agency to sense, reason, and act in the world with a view to enhancing and possibly expediting goal recognition, and/or to intervening in goal achievement. To this end, we present an algorithm for active goal recognition and a landmark-based approach to the elimination of hypothesized goals which leverages automated planning. Experiments demonstrate the merits of providing agency to the observer, and the effectiveness of our approach in potentially enhancing the observational power of the observer, as well as expediting and in some cases making possible the recognition of the actor's goal.

【Keywords】:

1222. Symbolic Top-k Planning.

Paper Link】 【Pages】:9967-9974

【Authors】: David Speck ; Robert Mattmüller ; Bernhard Nebel

【Abstract】: The objective of top-k planning is to determine a set of k different plans with lowest cost for a given planning task. In practice, such a set of best plans can be preferred to a single best plan generated by ordinary optimal planners, as it allows the user to choose between different alternatives and thus take into account preferences that may be difficult to model. In this paper we show that, in general, the decision problem version of top-k planning is PSPACE-complete, as is the decision problem version of ordinary classical planning. This does not hold for polynomially bounded plans for which the decision problem turns out to be PP-hard, while the ordinary case is NP-hard. We present a novel approach to top-k planning, called sym-k, which is based on symbolic search, and prove that sym-k is sound and complete. Our empirical analysis shows that sym-k exceeds the current state of the art for both small and large k.

【Keywords】:

1223. Temporal Planning with Intermediate Conditions and Effects.

Paper Link】 【Pages】:9975-9982

【Authors】: Alessandro Valentini ; Andrea Micheli ; Alessandro Cimatti

【Abstract】: Automated temporal planning is the technology of choice when controlling systems that can execute more actions in parallel and when temporal constraints, such as deadlines, are needed in the model. One limitation of several action-based planning systems is that actions are modeled as intervals having conditions and effects only at the extremes and as invariants, but no conditions nor effects can be specified at arbitrary points or sub-intervals.In this paper, we address this limitation by providing an effective heuristic-search technique for temporal planning, allowing the definition of actions with conditions and effects at any arbitrary time within the action duration. We experimentally demonstrate that our approach is far better than standard encodings in PDDL 2.1 and is competitive with other approaches that can (directly or indirectly) represent intermediate action conditions or effects.

【Keywords】:

Paper Link】 【Pages】:9983-9991

【Authors】: Linnan Wang ; Yiyang Zhao ; Yuu Jinnai ; Yuandong Tian ; Rodrigo Fonseca

【Abstract】: Neural Architecture Search (NAS) has shown great success in automating the design of neural networks, but the prohibitive amount of computations behind current NAS methods requires further investigations in improving the sample efficiency and the network evaluation cost to get better results in a shorter time. In this paper, we present a novel scalable Monte Carlo Tree Search (MCTS) based NAS agent, named AlphaX, to tackle these two aspects. AlphaX improves the search efficiency by adaptively balancing the exploration and exploitation at the state level, and by a Meta-Deep Neural Network (DNN) to predict network accuracies for biasing the search toward a promising region. To amortize the network evaluation cost, AlphaX accelerates MCTS rollouts with a distributed design and reduces the number of epochs in evaluating a network by transfer learning, which is guided with the tree structure in MCTS. In 12 GPU days and 1000 samples, AlphaX found an architecture that reaches 97.84% top-1 accuracy on CIFAR-10, and 75.5% top-1 accuracy on ImageNet, exceeding SOTA NAS methods in both the accuracy and sampling efficiency. Particularly, we also evaluate AlphaX on NASBench-101, a large scale NAS dataset; AlphaX is 3x and 2.8x more sample efficient than Random Search and Regularized Evolution in finding the global optimum. Finally, we show the searched architecture improves a variety of vision applications from Neural Style Transfer, to Image Captioning and Object Detection.

【Keywords】:

1225. Planning with Abstract Learned Models While Learning Transferable Subtasks.

Paper Link】 【Pages】:9992-10000

【Authors】: John Winder ; Stephanie Milani ; Matthew Landen ; Erebus Oh ; Shane Parr ; Shawn Squire ; Marie desJardins ; Cynthia Matuszek

【Abstract】: We introduce an algorithm for model-based hierarchical reinforcement learning to acquire self-contained transition and reward models suitable for probabilistic planning at multiple levels of abstraction. We call this framework Planning with Abstract Learned Models (PALM). By representing subtasks symbolically using a new formal structure, the lifted abstract Markov decision process (L-AMDP), PALM learns models that are independent and modular. Through our experiments, we show how PALM integrates planning and execution, facilitating a rapid and efficient learning of abstract, hierarchical models. We also demonstrate the increased potential for learned models to be transferred to new and related tasks.

【Keywords】:

Paper Link】 【Pages】:10001-10008

【Authors】: Qiaoyun Wu ; Dinesh Manocha ; Jun Wang ; Kai Xu

【Abstract】: We propose improving the cross-target and cross-scene generalization of visual navigation through learning an agent that is guided by conceiving the next observations it expects to see. This is achieved by learning a variational Bayesian model, called NeoNav, which generates the next expected observations (NEO) conditioned on the current observations of the agent and the target view. Our generative model is learned through optimizing a variational objective encompassing two key designs. First, the latent distribution is conditioned on current observations and the target view, leading to a model-based, target-driven navigation. Second, the latent space is modeled with a Mixture of Gaussians conditioned on the current observation and the next best action. Our use of mixture-of-posteriors prior effectively alleviates the issue of over-regularized latent space, thus significantly boosting the model generalization for new targets and in novel scenes. Moreover, the NEO generation models the forward dynamics of agent-environment interaction, which improves the quality of approximate inference and hence benefits data efficiency. We have conducted extensive evaluations on both real-world and synthetic benchmarks, and show that our model consistently outperforms the state-of-the-art models in terms of success rate, data efficiency, and generalization.

【Keywords】:

1227. Refining HTN Methods via Task Insertion with Preferences.

Paper Link】 【Pages】:10009-10016

【Authors】: Zhanhao Xiao ; Hai Wan ; Hankz Hankui Zhuo ; Andreas Herzig ; Laurent Perrussel ; Peilin Chen

【Abstract】: Hierarchical Task Network (HTN) planning is showing its power in real-world planning. Although domain experts have partial hierarchical domain knowledge, it is time-consuming to specify all HTN methods, leaving them incomplete. On the other hand, traditional HTN learning approaches focus only on declarative goals, omitting the hierarchical domain knowledge. In this paper, we propose a novel learning framework to refine HTN methods via task insertion with completely preserving the original methods. As it is difficult to identify incomplete methods without designating declarative goals for compound tasks, we introduce the notion of prioritized preference to capture the incompleteness possibility of methods. Specifically, the framework first computes the preferred completion profile w.r.t. the prioritized preference to refine the incomplete methods. Then it finds the minimal set of refined methods via a method substitution operation. Experimental analysis demonstrates that our approach is effective, especially in solving new HTN planning instances.

【Keywords】:

1228. Computing Superior Counter-Examples for Conformant Planning.

Paper Link】 【Pages】:10017-10024

【Authors】: Xiaodi Zhang ; Alban Grastien ; Enrico Scala

【Abstract】: In a counter-example based approach to conformant planning, choosing the right counter-example can improve performance. We formalise this observation by introducing the notion of “superiority” of a counter-example over another one, that holds whenever the superior counter-example exhibits more tags than the latter. We provide a theoretical explanation that supports the strategy of searching for maximally superior counter-examples, and we show how this strategy can be implemented. The empirical experiments validate our approach.

【Keywords】:

AAAI Technical Track: Reasoning under Uncertainty 38

1229. Deep Bayesian Nonparametric Learning of Rules and Plans from Demonstrations with a Learned Automaton Prior.

Paper Link】 【Pages】:10026-10034

【Authors】: Brandon Araki ; Kiran Vodrahalli ; Thomas Leech ; Cristian Ioan Vasile ; Mark Donahue ; Daniela Rus

【Abstract】: We introduce a method to learn imitative policies from expert demonstrations that are interpretable and manipulable. We achieve interpretability by modeling the interactions between high-level actions as an automaton with connections to formal logic. We achieve manipulability by integrating this automaton into planning, so that changes to the automaton have predictable effects on the learned behavior. These qualities allow a human user to first understand what the model has learned, and then either correct the learned behavior or zero-shot generalize to new, similar tasks. We build upon previous work by no longer requiring additional supervised information which is hard to collect in practice. We achieve this by using a deep Bayesian nonparametric hierarchical model. We test our model on several domains and also show results for a real-world implementation on a mobile robotic arm platform.

【Keywords】:

Paper Link】 【Pages】:10035-10043

【Authors】: Syrine Belakaria ; Aryan Deshwal ; Janardhan Rao Doppa

【Abstract】: We study the novel problem of blackbox optimization of multiple objectives via multi-fidelity function evaluations that vary in the amount of resources consumed and their accuracy. The overall goal is to appromixate the true Pareto set of solutions by minimizing the resources consumed for function evaluations. For example, in power system design optimization, we need to find designs that trade-off cost, size, efficiency, and thermal tolerance using multi-fidelity simulators for design evaluations. In this paper, we propose a novel approach referred as Multi-Fidelity Output Space Entropy Search for Multi-objective Optimization (MF-OSEMO) to solve this problem. The key idea is to select the sequence of candidate input and fidelity-vector pairs that maximize the information gained about the true Pareto front per unit resource cost. Our experiments on several synthetic and real-world benchmark problems show that MF-OSEMO, with both approximations, significantly improves over the state-of-the-art single-fidelity algorithms for multi-objective optimization. Please note: A corrigendum was submitted for this paper on 24 September 2020.

【Keywords】:

Paper Link】 【Pages】:10044-10052

【Authors】: Syrine Belakaria ; Aryan Deshwal ; Nitthilan Kannappan Jayakodi ; Janardhan Rao Doppa

【Abstract】: We consider the problem of multi-objective (MO) blackbox optimization using expensive function evaluations, where the goal is to approximate the true Pareto set of solutions while minimizing the number of function evaluations. For example, in hardware design optimization, we need to find the designs that trade-off performance, energy, and area overhead using expensive simulations. We propose a novel uncertainty-aware search framework referred to as USeMO to efficiently select the sequence of inputs for evaluation to solve this problem. The selection method of USeMO consists of solving a cheap MO optimization problem via surrogate models of the true functions to identify the most promising candidates and picking the best candidate based on a measure of uncertainty. We also provide theoretical analysis to characterize the efficacy of our approach. Our experiments on several synthetic and six diverse real-world benchmark problems show that USeMO consistently outperforms the state-of-the-art algorithms.

【Keywords】:

1232. Exchangeable Generative Models with Flow Scans.

Paper Link】 【Pages】:10053-10060

【Authors】: Christopher M. Bender ; Kevin O'Connor ; Yang Li ; Juan Jose Garcia ; Junier Oliva ; Manzil Zaheer

【Abstract】: In this work, we develop a new approach to generative density estimation for exchangeable, non-i.i.d. data. The proposed framework, FlowScan, combines invertible flow transformations with a sorted scan to flexibly model the data while preserving exchangeability. Unlike most existing methods, FlowScan exploits the intradependencies within sets to learn both global and local structure. FlowScan represents the first approach that is able to apply sequential methods to exchangeable density estimation without resorting to averaging over all possible permutations. We achieve new state-of-the-art performance on point cloud and image set modeling.

【Keywords】:

1233. Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes.

Paper Link】 【Pages】:10061-10068

【Authors】: Maxime Bouton ; Jana Tumova ; Mykel J. Kochenderfer

【Abstract】: Autonomous systems are often required to operate in partially observable environments. They must reliably execute a specified objective even with incomplete information about the state of the environment. We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP). By formulating a planning problem, we show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula and compute the associated belief state policy. We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.

【Keywords】:

1234. Scalable Methods for Computing State Similarity in Deterministic Markov Decision Processes.

Paper Link】 【Pages】:10069-10076

【Authors】: Pablo Samuel Castro

【Abstract】: We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an elegant formalism that capture behavioral equivalence between states and provide strong theoretical guarantees on differences in optimal behaviour. Unfortunately, their computation is expensive and requires a tabular representation of the states, which has thus far rendered them impractical for large problems. In this paper we present a new version of the metric that is tied to a behavior policy in an MDP, along with an analysis of its theoretical properties. We then present two new algorithms for approximating bisimulation metrics in large, deterministic MDPs. The first does so via sampling and is guaranteed to converge to the true metric. The second is a differentiable loss which allows us to learn an approximation even for continuous state MDPs, which prior to this work had not been possible.

【Keywords】:

1235. Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns.

Paper Link】 【Pages】:10077-10084

【Authors】: YooJung Choi ; Golnoosh Farnadi ; Behrouz Babaki ; Guy Van den Broeck

【Abstract】: As machine learning is increasingly used to make real-world decisions, recent research efforts aim to define and ensure fairness in algorithmic decision making. Existing methods often assume a fixed set of observable features to define individuals, but lack a discussion of certain features not being observed at test time. In this paper, we study fairness of naive Bayes classifiers, which allow partial observations. In particular, we introduce the notion of a discrimination pattern, which refers to an individual receiving different classifications depending on whether some sensitive attributes were observed. Then a model is considered fair if it has no such pattern. We propose an algorithm to discover and mine for discrimination patterns in a naive Bayes classifier, and show how to learn maximum-likelihood parameters subject to these fairness constraints. Our approach iteratively discovers and eliminates discrimination patterns until a fair model is learned. An empirical evaluation on three real-world datasets demonstrates that we can remove exponentially many discrimination patterns by only adding a small fraction of them as constraints.

【Keywords】:

1236. Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory.

Paper Link】 【Pages】:10085-10092

【Authors】: Arghya Roy Chaudhuri ; Shivaram Kalyanakrishnan

【Abstract】: Regret minimisation in stochastic multi-armed bandits is a well-studied problem, for which several optimal algorithms have been proposed. Such algorithms depend on (sufficient statistics of) the empirical reward distributions of the arms to decide which arm to pull next. In this paper, we consider the design of algorithms that are constrained to store statistics from only a bounded number of arms. For bandits with a finite set of arms, we derive a sub-linear upper bound on the regret that decreases with the “arm memory” size M. For instances with a large, possibly infinite, set of arms, we show a sub-linear bound on the quantile regret.Our problem formulation generalises that of Liau et al. (2018), who fix M = O(1), and so do not obtain bounds that depend on M. More importantly, our algorithms keep exploration and exploitation tightly coupled, without a dedicated exploration phase as employed by Liau et al. (2018). Although this choice makes our analysis harder, it leads to much-improved practical performance. For bandits with a large number of arms and no known structure on the rewards, our algorithms serve as a viable option. Unlike many other approaches to restrict the memory of bandit algorithms, our algorithms do not need any additional technical assumptions.

【Keywords】:

1237. A Calculus for Stochastic Interventions: Causal Effect Identification and Surrogate Experiments.

Paper Link】 【Pages】:10093-10100

【Authors】: Juan D. Correa ; Elias Bareinboim

【Abstract】: Some of the most prominent results in causal inference have been developed in the context of atomic interventions, following the semantics of the do-operator and the inferential power of the do-calculus. In practice, many real-world settings require more complex types of interventions that cannot be represented by a simple atomic intervention. In this paper, we investigate a general class of interventions that covers some non-trivial types of policies (conditional and stochastic), which goes beyond the atomic class. Our goal is to develop general understanding and formal machinery to be able to reason about the effects of those policies, similar to the robust treatment developed to handle the atomic case. Specifically, in this paper, we introduce a new set of inference rules (akin to do-calculus) that can be used to derive claims about general interventions, which we call σ-calculus. We develop a systematic and efficient procedure for finding estimands of the effect of general policies as a function of the available observational and experimental distributions. We then prove that our algorithm and σ-calculus are both sound for the tasks of identification (Pearl, 1995) and z-identification (Bareinboim and Pearl, 2012) under this class of interventions.

【Keywords】:

1238. Reliable and Efficient Anytime Skeleton Learning.

Paper Link】 【Pages】:10101-10109

【Authors】: Rui Ding ; Yanzhi Liu ; Jingjing Tian ; Zhouyu Fu ; Shi Han ; Dongmei Zhang

【Abstract】: Skeleton Learning (SL) is the task for learning an undirected graph from the input data that captures their dependency relations. SL plays a pivotal role in causal learning and has attracted growing attention in the research community lately. Due to the high time complexity, anytime SL has emerged which learns a skeleton incrementally and improves it overtime. In this paper, we first propose and advocate the reliability requirement for anytime SL to be practically useful. Reliability requires the intermediately learned skeleton to have precision and persistency. We also present REAL, a novel Reliable and Efficient Anytime Learning algorithm of skeleton. Specifically, we point out that the commonly existing Functional Dependency (FD) among variables could make the learned skeleton violate faithfulness assumption, thus we propose a theory to resolve such incompatibility. Based on this, REAL conducts SL on a reduced set of variables with guaranteed correctness thus drastically improves efficiency. Furthermore, it employs a novel edge-insertion and best-first strategy in anytime fashion for skeleton growing to achieve high reliability and efficiency. We prove that the skeleton learned by REAL converges to the correct skeleton under standard assumptions. Thorough experiments were conducted on both benchmark and real-world datasets demonstrate that REAL significantly outperforms the other state-of-the-art algorithms.

【Keywords】:

1239. Deception through Half-Truths.

Paper Link】 【Pages】:10110-10117

【Authors】: Andrew Estornell ; Sanmay Das ; Yevgeniy Vorobeychik

【Abstract】: Deception is a fundamental issue across a diverse array of settings, from cybersecurity, where decoys (e.g., honeypots) are an important tool, to politics that can feature politically motivated “leaks” and fake news about candidates. Typical considerations of deception view it as providing false information. However, just as important but less frequently studied is a more tacit form where information is strategically hidden or leaked. We consider the problem of how much an adversary can affect a principal's decision by “half-truths”, that is, by masking or hiding bits of information, when the principal is oblivious to the presence of the adversary. The principal's problem can be modeled as one of predicting future states of variables in a dynamic Bayes network, and we show that, while theoretically the principal's decisions can be made arbitrarily bad, the optimal attack is NP-hard to approximate, even under strong assumptions favoring the attacker. However, we also describe an important special case where the dependency of future states on past states is additive, in which we can efficiently compute an approximately optimal attack. Moreover, in networks with a linear transition function we can solve the problem optimally in polynomial time.

【Keywords】:

1240. Causal Transfer for Imitation Learning and Decision Making under Sensor-Shift.

Paper Link】 【Pages】:10118-10125

【Authors】: Jalal Etesami ; Philipp Geiger

【Abstract】: Learning from demonstrations (LfD) is an efficient paradigm to train AI agents. But major issues arise when there are differences between (a) the demonstrator's own sensory input, (b) our sensors that observe the demonstrator and (c) the sensory input of the agent we train.In this paper, we propose a causal model-based framework for transfer learning under such “sensor-shifts”, for two common LfD tasks: (1) inferring the effect of the demonstrator's actions and (2) imitation learning. First we rigorously analyze, on the population-level, to what extent the relevant underlying mechanisms (the action effects and the demonstrator policy) can be identified and transferred from the available observations together with prior knowledge of sensor characteristics. And we device an algorithm to infer these mechanisms. Then we introduce several proxy methods which are easier to calculate, estimate from finite data and interpret than the exact solutions, alongside theoretical bounds on their closeness to the exact ones. We validate our two main methods on simulated and semi-real world data.

【Keywords】:

1241. Low-Variance Black-Box Gradient Estimates for the Plackett-Luce Distribution.

Paper Link】 【Pages】:10126-10135

【Authors】: Artyom Gadetsky ; Kirill Struminsky ; Christopher Robinson ; Novi Quadrianto ; Dmitry P. Vetrov

【Abstract】: Learning models with discrete latent variables using stochastic gradient descent remains a challenge due to the high variance of gradient estimates. Modern variance reduction techniques mostly consider categorical distributions and have limited applicability when the number of possible outcomes becomes large. In this work, we consider models with latent permutations and propose control variates for the Plackett-Luce distribution. In particular, the control variates allow us to optimize black-box functions over permutations using stochastic gradient descent. To illustrate the approach, we consider a variety of causal structure learning tasks for continuous and discrete data. We show that our method outperforms competitive relaxation-based optimization methods and is also applicable to non-differentiable score functions.

【Keywords】:

1242. An Efficient Algorithm for Counting Markov Equivalent DAGs.

Paper Link】 【Pages】:10136-10143

【Authors】: Robert Ganian ; Thekla Hamm ; Topi Talvitie

【Abstract】: We consider the problem of counting the number of DAGs which are Markov-equivalent, i.e., which encode the same conditional independencies between random variables. The problem has been studied, among others, in the context of causal discovery, and it is known that it reduces to counting the number of so-called moral acyclic orientations of certain undirected graphs, notably chordal graphs.Our main empirical contribution is a new algorithm which outperforms previously known exact algorithms for the considered problem by a significant margin. On the theoretical side, we show that our algorithm is guaranteed to run in polynomial time on a broad class of chordal graphs, including interval graphs.

【Keywords】:

1243. A MaxSAT-Based Framework for Group Testing.

Paper Link】 【Pages】:10144-10152

【Authors】: Lorenzo Ciampiconi ; Bishwamittra Ghosh ; Jonathan Scarlett ; Kuldeep S. Meel

【Abstract】: The success of MaxSAT (maximum satisfiability) solving in recent years has motivated researchers to apply MaxSAT solvers in diverse discrete combinatorial optimization problems. Group testing has been studied as a combinatorial optimization problem, where the goal is to find defective items among a set of items by performing sets of tests on items. In this paper, we propose a MaxSAT-based framework, called MGT, that solves group testing, in particular, the decoding phase of non-adaptive group testing. We extend this approach to the noisy variant of group testing, and propose a compact MaxSAT-based encoding that guarantees an optimal solution. Our extensive experimental results show that MGT can solve group testing instances of 10000 items with 3% defectivity, which no prior work can handle to the best of our knowledge. Furthermore, MGT has better accuracy than the LP-based approach. We also discover an interesting phase transition behavior in the runtime, which reveals the easy-hard-easy nature of group testing.

【Keywords】:

1244. Causal Discovery from Multiple Data Sets with Non-Identical Variable Sets.

Paper Link】 【Pages】:10153-10161

【Authors】: Biwei Huang ; Kun Zhang ; Mingming Gong ; Clark Glymour

【Abstract】: A number of approaches to causal discovery assume that there are no hidden confounders and are designed to learn a fixed causal model from a single data set. Over the last decade, with closer cooperation across laboratories, we are able to accumulate more variables and data for analysis, while each lab may only measure a subset of them, due to technical constraints or to save time and cost. This raises a question of how to handle causal discovery from multiple data sets with non-identical variable sets, and at the same time, it would be interesting to see how more recorded variables can help to mitigate the confounding problem. In this paper, we propose a principled method to uniquely identify causal relationships over the integrated set of variables from multiple data sets, in linear, non-Gaussian cases. The proposed method also allows distribution shifts across data sets. Theoretically, we show that the causal structure over the integrated set of variables is identifiable under testable conditions. Furthermore, we present two types of approaches to parameter estimation: one is based on maximum likelihood, and the other is likelihood free and leverages generative adversarial nets to improve scalability of the estimation procedure. Experimental results on various synthetic and real-world data sets are presented to demonstrate the efficacy of our methods.

【Keywords】:

1245. Introducing Probabilistic Bézier Curves for N-Step Sequence Prediction.

Paper Link】 【Pages】:10162-10169

【Authors】: Ronny Hug ; Wolfgang Hübner ; Michael Arens

【Abstract】: Representations of sequential data are commonly based on the assumption that observed sequences are realizations of an unknown underlying stochastic process, where the learning problem includes determination of the model parameters. In this context, a model must be able to capture the multi-modal nature of the data, without blurring between single modes. This paper proposes probabilistic B'{e}zier curves (𝒩-Curves) as a basis for effectively modeling continuous-time stochastic processes. The model is based on Mixture Density Networks (MDN) and B'{e}zier curves with Gaussian random variables as control points. Key advantages of the model include the ability of generating smooth multi-mode predictions in a single inference step which reduces the need for Monte Carlo simulation. This property is in line with recent attempts to address the problem of quantifying uncertainty as a regression problem. Essential properties of the proposed approach are illustrated by several toy examples and the task of multi-step sequence prediction. As an initial proof of concept, the model performance is compared to an LSTM-MDN model and recurrent Gaussian processes on two real world use-cases, trajectory prediction and motion capture sequence prediction.

【Keywords】:

1246. Probabilistic Reasoning Across the Causal Hierarchy.

Paper Link】 【Pages】:10170-10177

【Authors】: Duligur Ibeling ; Thomas Icard

【Abstract】: We propose a formalization of the three-tier causal hierarchy of association, intervention, and counterfactuals as a series of probabilistic logical languages. Our languages are of strictly increasing expressivity, the first capable of expressing quantitative probabilistic reasoning—including conditional independence and Bayesian inference—the second encoding do-calculus reasoning for causal effects, and the third capturing a fully expressive do-calculus for arbitrary counterfactual queries. We give a corresponding series of finitary axiomatizations complete over both structural causal models and probabilistic programs, and show that satisfiability and validity for each language are decidable in polynomial space.

【Keywords】:

1247. The Choice Function Framework for Online Policy Improvement.

Paper Link】 【Pages】:10178-10185

【Authors】: Murugeswari Issakkimuthu ; Alan Fern ; Prasad Tadepalli

【Abstract】: There are notable examples of online search improving over hand-coded or learned policies (e.g. AlphaZero) for sequential decision making. It is not clear, however, whether or not policy improvement is guaranteed for many of these approaches, even when given a perfect leaf evaluation function and transition model. Indeed, simple counterexamples show that seemingly reasonable online search procedures can hurt performance compared to the original policy. To address this issue, we introduce the choice function framework for analyzing online search procedures for policy improvement. A choice function specifies the actions to be considered at every node of a search tree, with all other actions being pruned. Our main contribution is to give sufficient conditions for stationary and non-stationary choice functions to guarantee that the value achieved by online search is no worse than the original policy. In addition, we describe a general parametric class of choice functions that satisfy those conditions and present an illustrative use case of the empirical utility of the framework.

【Keywords】:

1248. Estimating Causal Effects Using Weighting-Based Estimators.

Paper Link】 【Pages】:10186-10193

【Authors】: Yonghan Jung ; Jin Tian ; Elias Bareinboim

【Abstract】: Causal effect identification is one of the most prominent and well-understood problems in causal inference. Despite the generality and power of the results developed so far, there are still challenges in their applicability to practical settings, arguably due to the finitude of the samples. Simply put, there is a gap between causal effect identification and estimation. One popular setting in which sample-efficient estimators from finite samples exist is when the celebrated back-door condition holds. In this paper, we extend weighting-based methods developed for the back-door case to more general settings, and develop novel machinery for estimating causal effects using the weighting-based method as a building block. We derive graphical criteria under which causal effects can be estimated using this new machinery and demonstrate the effectiveness of the proposed method through simulation studies.

【Keywords】:

1249. Error-Correcting and Verifiable Parallel Inference in Graphical Models.

Paper Link】 【Pages】:10194-10201

【Authors】: Negin Karimi ; Petteri Kaski ; Mikko Koivisto

【Abstract】: We present a novel framework for parallel exact inference in graphical models. Our framework supports error-correction during inference and enables fast verification that the result of inference is correct, with probabilistic soundness. The computational complexity of inference essentially matches the cost of w-cutset conditioning, a known generalization of Pearl's classical loop-cutset conditioning for inference. Verifying the result for correctness can be done with as little as essentially the square root of the cost of inference. Our main technical contribution amounts to designing a low-degree polynomial extension of the cutset approach, and then reducing to a univariate polynomial employing techniques recently developed for noninteractive probabilistic proof systems.

【Keywords】:

1250. Safe Linear Stochastic Bandits.

Paper Link】 【Pages】:10202-10209

【Authors】: Kia Khezeli ; Eilyan Bitar

【Abstract】: We introduce the safe linear stochastic bandit framework—a generalization of linear stochastic bandits—where, in each stage, the learner is required to select an arm with an expected reward that is no less than a predetermined (safe) threshold with high probability. We assume that the learner initially has knowledge of an arm that is known to be safe, but not necessarily optimal. Leveraging on this assumption, we introduce a learning algorithm that systematically combines known safe arms with exploratory arms to safely expand the set of safe arms over time, while facilitating safe greedy exploitation in subsequent stages. In addition to ensuring the satisfaction of the safety constraint at every stage of play, the proposed algorithm is shown to exhibit an expected regret that is no more than O(√T log(T)) after T stages of play.

【Keywords】:

1251. General Transportability - Synthesizing Observations and Experiments from Heterogeneous Domains.

Paper Link】 【Pages】:10210-10217

【Authors】: Sanghack Lee ; Juan D. Correa ; Elias Bareinboim

【Abstract】: The process of transporting and synthesizing experimental findings from heterogeneous data collections to construct causal explanations is arguably one of the most central and challenging problems in modern data science. This problem has been studied in the causal inference literature under the rubric of causal effect identifiability and transportability (Bareinboim and Pearl 2016). In this paper, we investigate a general version of this challenge where the goal is to learn conditional causal effects from an arbitrary combination of datasets collected under different conditions, observational or experimental, and from heterogeneous populations. Specifically, we introduce a unified graphical criterion that characterizes the conditions under which conditional causal effects can be uniquely determined from the disparate data collections. We further develop an efficient, sound, and complete algorithm that outputs an expression for the conditional effect whenever it exists, which synthesizes the available causal knowledge and empirical evidence; if the algorithm is unable to find a formula, then such synthesis is provably impossible, unless further parametric assumptions are made. Finally, we prove that do-calculus (Pearl 1995) is complete for this task, i.e., the inexistence of a do-calculus derivation implies the impossibility of constructing the targeted causal explanation.

【Keywords】:

1252. Temporal Logics Over Finite Traces with Uncertainty.

Paper Link】 【Pages】:10218-10225

【Authors】: Fabrizio Maria Maggi ; Marco Montali ; Rafael Peñaloza

【Abstract】: Temporal logics over finite traces have recently seen wide application in a number of areas, from business process modelling, monitoring, and mining to planning and decision making. However, real-life dynamic systems contain a degree of uncertainty which cannot be handled with classical logics. We thus propose a new probabilistic temporal logic over finite traces using superposition semantics, where all possible evolutions are possible, until observed. We study the properties of the logic and provide automata-based mechanisms for deriving probabilistic inferences from its formulas. We then study a fragment of the logic with better computational properties. Notably, formulas in this fragment can be discovered from event log data using off-the-shelf existing declarative process discovery techniques.

【Keywords】:

Paper Link】 【Pages】:10226-10234

【Authors】: Radu Marinescu ; Akihiro Kishimoto ; Adi Botea

【Abstract】: Marginal MAP is a difficult mixed inference task for graphical models. Existing state-of-the-art algorithms for solving exactly this task are based on either depth-first or best-first sequential search over an AND/OR search space. In this paper, we explore and evaluate for the first time the power of parallel search for exact Marginal MAP inference. We introduce a new parallel shared-memory recursive best-first AND/OR search algorithm that explores the search space in a best-first manner while operating with limited memory. Subsequently, we develop a complete parallel search scheme that only parallelizes the conditional likelihood computations. We also extend the proposed algorithms into depth-first parallel search schemes. Our experiments on difficult benchmarks demonstrate the effectiveness of the parallel search algorithms against current sequential methods for solving Marginal MAP exactly.

【Keywords】:

1254. Experimental Design for Optimization of Orthogonal Projection Pursuit Models.

Paper Link】 【Pages】:10235-10242

【Authors】: Mojmir Mutny ; Johannes Kirschner ; Andreas Krause

【Abstract】: Bayesian optimization and kernelized bandit algorithms are widely used techniques for sequential black box function optimization with applications in parameter tuning, control, robotics among many others. To be effective in high dimensional settings, previous approaches make additional assumptions, for example on low-dimensional subspaces or an additive structure. In this work, we go beyond the additivity assumption and use an orthogonal projection pursuit regression model, which strictly generalizes additive models. We present a two-stage algorithm motivated by experimental design to first decorrelate the additive components. Subsequently, the bandit optimization benefits from the statistically efficient additive model. Our method provably decorrelates the fully additive model and achieves optimal sublinear simple regret in terms of the number of function evaluations. To prove the rotation recovery, we derive novel concentration inequalities for linear regression on subspaces. In addition, we specifically address the issue of acquisition function optimization and present two domain dependent efficient algorithms. We validate the algorithm numerically on synthetic as well as real-world optimization problems.

【Keywords】:

1255. Adversarial Disentanglement with Grouped Observations.

Paper Link】 【Pages】:10243-10250

【Authors】: József Németh

【Abstract】: We consider the disentanglement of the representations of the relevant attributes of the data (content) from all other factors of variations (style) using Variational Autoencoders. Some recent works addressed this problem by utilizing grouped observations, where the content attributes are assumed to be common within each group, while there is no any supervised information on the style factors. In many cases, however, these methods fail to prevent the models from using the style variables to encode content related features as well. This work supplements these algorithms with a method that eliminates the content information in the style representations. For that purpose the training objective is augmented to minimize an appropriately defined mutual information term in an adversarial way. Experimental results and comparisons on image datasets show that the resulting method can efficiently separate the content and style related attributes and generalizes to unseen data.

【Keywords】:

1256. Few-Shot Bayesian Imitation Learning with Logical Program Policies.

Paper Link】 【Pages】:10251-10258

【Authors】: Tom Silver ; Kelsey R. Allen ; Alex K. Lew ; Leslie Pack Kaelbling ; Josh Tenenbaum

【Abstract】: Humans can learn many novel tasks from a very small number (1–5) of demonstrations, in stark contrast to the data requirements of nearly tabula rasa deep learning methods. We propose an expressive class of policies, a strong but general prior, and a learning algorithm that, together, can learn interesting policies from very few examples. We represent policies as logical combinations of programs drawn from a domain-specific language (DSL), define a prior over policies with a probabilistic grammar, and derive an approximate Bayesian inference algorithm to learn policies from demonstrations. In experiments, we study six strategy games played on a 2D grid with one shared DSL. After a few demonstrations of each game, the inferred policies generalize to new game instances that differ substantially from the demonstrations. Our policy learning is 20–1,000x more data efficient than convolutional and fully convolutional policy learning and many orders of magnitude more computationally efficient than vanilla program induction. We argue that the proposed method is an apt choice for tasks that have scarce training data and feature significant, structured variation between task instances.

【Keywords】:

1257. Tandem Inference: An Out-of-Core Streaming Algorithm for Very Large-Scale Relational Inference.

Paper Link】 【Pages】:10259-10266

【Authors】: Sriram Srinivasan ; Eriq Augustine ; Lise Getoor

【Abstract】: Statistical relational learning (SRL) frameworks allow users to create large, complex graphical models using a compact, rule-based representation. However, these models can quickly become prohibitively large and not fit into machine memory. In this work we address this issue by introducing a novel technique called tandem inference (ti). The primary idea of ti is to combine grounding and inference such that both processes happen in tandem. ti uses an out-of-core streaming approach to overcome memory limitations. Even when memory is not an issue, we show that our proposed approach is able to do inference faster while using less memory than existing approaches. To show the effectiveness of ti, we use a popular SRL framework called Probabilistic Soft Logic (PSL). We implement ti for PSL by proposing a gradient-based inference engine and a streaming approach to grounding. We show that we are able to run an SRL model with over 1B cliques in under nine hours and using only 10 GB of RAM; previous approaches required more than 800 GB for this model and are infeasible on common hardware. To the best of our knowledge, this is the largest SRL model ever run.

【Keywords】:

1258. BOWL: Bayesian Optimization for Weight Learning in Probabilistic Soft Logic.

Paper Link】 【Pages】:10267-10275

【Authors】: Sriram Srinivasan ; Golnoosh Farnadi ; Lise Getoor

【Abstract】: Probabilistic soft logic (PSL) is a statistical relational learning framework that represents complex relational models with weighted first-order logical rules. The weights of the rules in PSL indicate their importance in the model and influence the effectiveness of the model on a given task. Existing weight learning approaches often attempt to learn a set of weights that maximizes some function of data likelihood. However, this does not always translate to optimal performance on a desired domain metric, such as accuracy or F1 score. In this paper, we introduce a new weight learning approach called Bayesian optimization for weight learning (BOWL) based on Gaussian process regression that directly optimizes weights on a chosen domain performance metric. The key to the success of our approach is a novel projection that captures the semantic distance between the possible weight configurations. Our experimental results show that our proposed approach outperforms likelihood-based approaches and yields up to a 10% improvement across a variety of performance metrics. Further, we performed experiments to measure the scalability and robustness of our approach on various realworld datasets.

【Keywords】:

1259. Off-Policy Evaluation in Partially Observable Environments.

Paper Link】 【Pages】:10276-10283

【Authors】: Guy Tennenholtz ; Uri Shalit ; Shie Mannor

【Abstract】: This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments. Off-policy evaluation under partial observability is inherently prone to bias, with risk of arbitrarily large errors. We define the problem of off-policy evaluation for Partially Observable Markov Decision Processes (POMDPs) and establish what we believe is the first off-policy evaluation result for POMDPs. In addition, we formulate a model in which observed and unobserved variables are decoupled into two dynamic processes, called a Decoupled POMDP. We show how off-policy evaluation can be performed under this new model, mitigating estimation errors inherent to general POMDPs. We demonstrate the pitfalls of off-policy evaluation in POMDPs using a well-known off-policy method, Importance Sampling, and compare it with our result on synthetic medical data.

【Keywords】:

1260. Beyond the Grounding Bottleneck: Datalog Techniques for Inference in Probabilistic Logic Programs.

Paper Link】 【Pages】:10284-10291

【Authors】: Efthymia Tsamoura ; Víctor Gutiérrez-Basulto ; Angelika Kimmig

【Abstract】: State-of-the-art inference approaches in probabilistic logic programming typically start by computing the relevant ground program with respect to the queries of interest, and then use this program for probabilistic inference using knowledge compilation and weighted model counting. We propose an alternative approach that uses efficient Datalog techniques to integrate knowledge compilation with forward reasoning with a non-ground program. This effectively eliminates the grounding bottleneck that so far has prohibited the application of probabilistic logic programming in query answering scenarios over knowledge graphs, while also providing fast approximations on classical benchmarks in the field.

【Keywords】:

1261. Gradient-Based Optimization for Bayesian Preference Elicitation.

Paper Link】 【Pages】:10292-10301

【Authors】: Ivan Vendrov ; Tyler Lu ; Qingqing Huang ; Craig Boutilier

【Abstract】: Effective techniques for eliciting user preferences have taken on added importance as recommender systems (RSs) become increasingly interactive and conversational. A common and conceptually appealing Bayesian criterion for selecting queries is expected value of information (EVOI). Unfortunately, it is computationally prohibitive to construct queries with maximum EVOI in RSs with large item spaces. We tackle this issue by introducing a continuous formulation of EVOI as a differentiable network that can be optimized using gradient methods available in modern machine learning computational frameworks (e.g., TensorFlow, PyTorch). We exploit this to develop a novel Monte Carlo method for EVOI optimization, which is much more scalable for large item spaces than methods requiring explicit enumeration of items. While we emphasize the use of this approach for pairwise (or k-wise) comparisons of items, we also demonstrate how our method can be adapted to queries involving subsets of item attributes or “partial items,” which are often more cognitively manageable for users. Experiments show that our gradient-based EVOI technique achieves state-of-the-art performance across several domains while scaling to large item spaces.

【Keywords】:

1262. Recovering Causal Structures from Low-Order Conditional Independencies.

Paper Link】 【Pages】:10302-10309

【Authors】: Marcel Wienöbst ; Maciej Liskiewicz

【Abstract】: One of the common obstacles for learning causal models from data is that high-order conditional independence (CI) relationships between random variables are difficult to estimate. Since CI tests with conditioning sets of low order can be performed accurately even for a small number of observations, a reasonable approach to determine casual structures is to base merely on the low-order CIs. Recent research has confirmed that, e.g. in the case of sparse true causal models, structures learned even from zero- and first-order conditional independencies yield good approximations of the models. However, a challenging task here is to provide methods that faithfully explain a given set of low-order CIs. In this paper, we propose an algorithm which, for a given set of conditional independencies of order less or equal to k, where k is a small fixed number, computes a faithful graphical representation of the given set. Our results complete and generalize the previous work on learning from pairwise marginal independencies. Moreover, they enable to improve upon the 0-1 graph model which, e.g. is heavily used in the estimation of genome networks.

【Keywords】:

1263. A New Framework for Online Testing of Heterogeneous Treatment Effect.

Paper Link】 【Pages】:10310-10317

【Authors】: Miao Yu ; Wenbin Lu ; Rui Song

【Abstract】: We propose a new framework for online testing of heterogeneous treatment effects. The proposed test, named sequential score test (SST), is able to control type I error under continuous monitoring and detect multi-dimensional heterogeneous treatment effects. We provide an online p-value calculation for SST, making it convenient for continuous monitoring, and extend our tests to online multiple testing settings by controlling the false discovery rate. We examine the empirical performance of the proposed tests and compare them with a state-of-art online test, named mSPRT using simulations and a real data. The results show that our proposed test controls type I error at any time, has higher detection power and allows quick inference on online A/B testing.

【Keywords】:

1264. A Simultaneous Discover-Identify Approach to Causal Inference in Linear Models.

Paper Link】 【Pages】:10318-10325

【Authors】: Chi Zhang ; Bryant Chen ; Judea Pearl

【Abstract】: Modern causal analysis involves two major tasks, discovery and identification. The first aims to learn a causal structure compatible with the available data, the second leverages that structure to estimate causal effects. Rather than performing the two tasks in tandem, as is usually done in the literature, we propose a symbiotic approach in which the two are performed simultaneously for mutual benefit; information gained through identification helps causal discovery and vice versa. This approach enables the usage of Verma constraints, which remain dormant in constraint-based methods of discovery, and permit us to learn more complete structures, hence identify a larger set of causal effects than previously achievable with standard methods.

【Keywords】:

1265. Modeling Probabilistic Commitments for Maintenance Is Inherently Harder than for Achievement.

Paper Link】 【Pages】:10326-10333

【Authors】: Qi Zhang ; Edmund H. Durfee ; Satinder Singh

【Abstract】: Most research on probabilistic commitments focuses on commitments to achieve enabling preconditions for other agents. Our work reveals that probabilistic commitments to instead maintain preconditions for others are surprisingly harder to use well than their achievement counterparts, despite strong semantic similarities. We isolate the key difference as being not in how the commitment provider is constrained, but rather in how the commitment recipient can locally use the commitment specification to approximately model the provider's effects on the preconditions of interest. Our theoretic analyses show that we can more tightly bound the potential suboptimality due to approximate modeling for achievement than for maintenance commitments. We empirically evaluate alternative approximate modeling strategies, confirming that probabilistic maintenance commitments are qualitatively more challenging for the recipient to model well, and indicating the need for more detailed specifications that can sacrifice some of the agents' autonomy.

【Keywords】:

1266. Factorized Inference in Deep Markov Models for Incomplete Multimodal Time Series.

Paper Link】 【Pages】:10334-10341

【Authors】: Zhi-Xuan Tan ; Harold Soh ; Desmond Ong

【Abstract】: Integrating deep learning with latent state space models has the potential to yield temporal models that are powerful, yet tractable and interpretable. Unfortunately, current models are not designed to handle missing data or multiple data modalities, which are both prevalent in real-world data. In this work, we introduce a factorized inference method for Multimodal Deep Markov Models (MDMMs), allowing us to filter and smooth in the presence of missing data, while also performing uncertainty-aware multimodal fusion. We derive this method by factorizing the posterior p(z|x) for non-linear state space models, and develop a variational backward-forward algorithm for inference. Because our method handles incompleteness over both time and modalities, it is capable of interpolation, extrapolation, conditional generation, label prediction, and weakly supervised learning of multimodal time series. We demonstrate these capabilities on both synthetic and real-world multimodal data under high levels of data deletion. Our method performs well even with more than 50% missing data, and outperforms existing deep approaches to inference in latent time series.

【Keywords】:

AAAI Technical Track: Robotics 11

1267. That and There: Judging the Intent of Pointing Actions with Robotic Arms.

Paper Link】 【Pages】:10343-10351

【Authors】: Malihe Alikhani ; Baber Khalid ; Rahul Shome ; Chaitanya Mitash ; Kostas E. Bekris ; Matthew Stone

【Abstract】: Collaborative robotics requires effective communication between a robot and a human partner. This work proposes a set of interpretive principles for how a robotic arm can use pointing actions to communicate task information to people by extending existing models from the related literature. These principles are evaluated through studies where English-speaking human subjects view animations of simulated robots instructing pick-and-place tasks. The evaluation distinguishes two classes of pointing actions that arise in pick-and-place tasks: referential pointing (identifying objects) and locating pointing (identifying locations). The study indicates that human subjects show greater flexibility in interpreting the intent of referential pointing compared to locating pointing, which needs to be more deliberate. The results also demonstrate the effects of variation in the environment and task context on the interpretation of pointing. Our corpus, experiments and design principles advance models of context, common sense reasoning and communication in embodied communication.

【Keywords】:

1268. Learning from Interventions Using Hierarchical Policies for Safe Learning.

Paper Link】 【Pages】:10352-10360

【Authors】: Jing Bi ; Vikas Dhiman ; Tianyou Xiao ; Chenliang Xu

【Abstract】: Learning from Demonstrations (LfD) via Behavior Cloning (BC) works well on multiple complex tasks. However, a limitation of the typical LfD approach is that it requires expert demonstrations for all scenarios, including those in which the algorithm is already well-trained. The recently proposed Learning from Interventions (LfI) overcomes this limitation by using an expert overseer. The expert overseer only intervenes when it suspects that an unsafe action is about to be taken. Although LfI significantly improves over LfD, the state-of-the-art LfI fails to account for delay caused by the expert's reaction time and only learns short-term behavior. We address these limitations by 1) interpolating the expert's interventions back in time, and 2) by splitting the policy into two hierarchical levels, one that generates sub-goals for the future and another that generates actions to reach those desired sub-goals. This sub-goal prediction forces the algorithm to learn long-term behavior while also being robust to the expert's reaction time. Our experiments show that LfI using sub-goals in a hierarchical policy framework trains faster and achieves better asymptotic performance than typical LfD.

【Keywords】:

1269. On the Problem of Covering a 3-D Terrain.

Paper Link】 【Pages】:10361-10368

【Authors】: Eduard Eiben ; Isuru S. Godage ; Iyad Kanj ; Ge Xia

【Abstract】: We study the problem of covering a 3-dimensional terrain by a sweeping robot that is equipped with a camera. We model the terrain as a mesh in a way that captures the elevation levels of the terrain; this enables a graph-theoretic formulation of the problem in which the underlying graph is a weighted plane graph. We show that the associated graph problem is NP-hard, and that it admits a polynomial time approximation scheme (PTAS). Finally, we implement two heuristic algorithms based on greedy approaches and report our findings.

【Keywords】:

1270. Long-Term Loop Closure Detection through Visual-Spatial Information Preserving Multi-Order Graph Matching.

Paper Link】 【Pages】:10369-10376

【Authors】: Peng Gao ; Hao Zhang

【Abstract】: Loop closure detection is a fundamental problem for simultaneous localization and mapping (SLAM) in robotics. Most of the previous methods only consider one type of information, based on either visual appearances or spatial relationships of landmarks. In this paper, we introduce a novel visual-spatial information preserving multi-order graph matching approach for long-term loop closure detection. Our approach constructs a graph representation of a place from an input image to integrate visual-spatial information, including visual appearances of the landmarks and the background environment, as well as the second and third-order spatial relationships between two and three landmarks, respectively. Furthermore, we introduce a new formulation that formulates loop closure detection as a multi-order graph matching problem to compute a similarity score directly from the graph representations of the query and template images, instead of performing conventional vector-based image matching. We evaluate the proposed multi-order graph matching approach based on two public long-term loop closure detection benchmark datasets, including the St. Lucia and CMU-VL datasets. Experimental results have shown that our approach is effective for long-term loop closure detection and it outperforms the previous state-of-the-art methods.

【Keywords】:

1271. Adversarial Fence Patrolling: Non-Uniform Policies for Asymmetric Environments.

Paper Link】 【Pages】:10377-10384

【Authors】: Yaniv Oshrat ; Noa Agmon ; Sarit Kraus

【Abstract】: Robot teams are very useful in patrol tasks, where the robots are required to repeatedly visit a target area in order to detect an adversary. In this work we examine the Fence Patrol problem, in which the robots must travel back and forth along an open polyline and the adversary is aware of the robots' patrol strategy. Previous work has suggested non-deterministic patrol schemes, characterized by a uniform policy along the entire area, guaranteeing that the minimal probability of penetration detection throughout the area is maximized. We present a patrol strategy with a non-uniform policy along different points of the fence, based on the location and other properties of the point. We explore this strategy in different kinds of tracks and show that the minimal probability of penetration detection achieved by this non-uniform (variant) policy is higher than former policies. We further consider applying this model in multi-robot scenarios, exploiting robot cooperation to enhance patrol efficiency. We propose novel methods for calculating the variant values, and demonstrate their performance empirically.

【Keywords】:

1272. Task and Motion Planning Is PSPACE-Complete.

Paper Link】 【Pages】:10385-10392

【Authors】: William Vega-Brown ; Nicholas Roy

【Abstract】: We present a new representation for task and motion planning that uses constraints to capture both continuous and discrete phenomena in a unified framework. We show that we can decide if a feasible plan exists for a given problem instance using only polynomial space if the constraints are semialgebraic and all actions have uniform stratified accessibility, a technical condition closely related to both controllability and to the existence of a symbolic representation of a planning domain. We show that there cannot exist an algorithm that solves the more general problem of deciding if a plan exists for an instance with arbitrary semialgebraic constraints. Finally, we show that our formalism is universal, in the sense that every deterministic robotic planning problem can be well-approximated within our formalism. Together, these results imply task and motion planning is PSPACE-complete.

【Keywords】:

1273. AtLoc: Attention Guided Camera Localization.

Paper Link】 【Pages】:10393-10401

【Authors】: Bing Wang ; Changhao Chen ; Chris Xiaoxuan Lu ; Peijun Zhao ; Niki Trigoni ; Andrew Markham

【Abstract】: Deep learning has achieved impressive results in camera localization, but current single-image techniques typically suffer from a lack of robustness, leading to large outliers. To some extent, this has been tackled by sequential (multi-images) or geometry constraint approaches, which can learn to reject dynamic objects and illumination conditions to achieve better performance. In this work, we show that attention can be used to force the network to focus on more geometrically robust objects and features, achieving state-of-the-art performance in common benchmark, even if using only a single image as input. Extensive experimental evidence is provided through public indoor and outdoor datasets. Through visualization of the saliency maps, we demonstrate how the network learns to reject dynamic objects, yielding superior global camera pose regression performance. The source code is avaliable at https://github.com/BingCS/AtLoc.

【Keywords】:

1274. RoboCoDraw: Robotic Avatar Drawing with GAN-Based Style Transfer and Time-Efficient Path Optimization.

Paper Link】 【Pages】:10402-10409

【Authors】: Tianying Wang ; Wei Qi Toh ; Hao Zhang ; Xiuchao Sui ; Shaohua Li ; Yong Liu ; Wei Jing

【Abstract】: Robotic drawing has become increasingly popular as an entertainment and interactive tool. In this paper we present RoboCoDraw, a real-time collaborative robot-based drawing system that draws stylized human face sketches interactively in front of human users, by using the Generative Adversarial Network (GAN)-based style transfer and a Random-Key Genetic Algorithm (RKGA)-based path optimization. The proposed RoboCoDraw system takes a real human face image as input, converts it to a stylized avatar, then draws it with a robotic arm. A core component in this system is the AvatarGAN proposed by us, which generates a cartoon avatar face image from a real human face. AvatarGAN is trained with unpaired face and avatar images only and can generate avatar images of much better likeness with human face images in comparison with the vanilla CycleGAN. After the avatar image is generated, it is fed to a line extraction algorithm and converted to sketches. An RKGA-based path optimization algorithm is applied to find a time-efficient robotic drawing path to be executed by the robotic arm. We demonstrate the capability of RoboCoDraw on various face images using a lightweight, safe collaborative robot UR5.

【Keywords】:

1275. Dempster-Shafer Theoretic Learning of Indirect Speech Act Comprehension Norms.

Paper Link】 【Pages】:10410-10417

【Authors】: Ruchen Wen ; Mohammed Aun Siddiqui ; Tom Williams

【Abstract】: For robots to successfully operate as members of human-robot teams, it is crucial for robots to correctly understand the intentions of their human teammates. This task is particularly difficult due to human sociocultural norms: for reasons of social courtesy (e.g., politeness), people rarely express their intentions directly, instead typically employing polite utterance forms such as Indirect Speech Acts (ISAs). It is thus critical for robots to be capable of inferring the intentions behind their teammates' utterances based on both their interaction context (including, e.g., social roles) and their knowledge of the sociocultural norms that are applicable within that context. This work builds off of previous research on understanding and generation of ISAs using Dempster-Shafer Theoretic Uncertain Logic, by showing how other recent work in Dempster-Shafer Theoretic rule learning can be used to learn appropriate uncertainty intervals for robots' representations of sociocultural politeness norms.

【Keywords】:

1276. Modular Robot Design Synthesis with Deep Reinforcement Learning.

Paper Link】 【Pages】:10418-10425

【Authors】: Julian Whitman ; Raunaq M. Bhirangi ; Matthew J. Travers ; Howie Choset

【Abstract】: Modular robots hold the promise of versatility in that their components can be re-arranged to adapt the robot design to a task at deployment time. Even for the simplest designs, determining the optimal design is exponentially complex due to the number of permutations of ways the modules can be connected. Further, when selecting the design for a given task, there is an additional computational burden in evaluating the capability of each robot, e.g., whether it can reach certain points in the workspace. This work uses deep reinforcement learning to create a search heuristic that allows us to efficiently search the space of modular serial manipulator designs. We show that our algorithm is more computationally efficient in determining robot designs for given tasks in comparison to the current state-of-the-art.

【Keywords】:

1277. Visual Tactile Fusion Object Clustering.

Paper Link】 【Pages】:10426-10433

【Authors】: Tao Zhang ; Yang Cong ; Gan Sun ; Qianqian Wang ; Zhengming Ding

【Abstract】: Object clustering, aiming at grouping similar objects into one cluster with an unsupervised strategy, has been extensively-studied among various data-driven applications. However, most existing state-of-the-art object clustering methods (e.g., single-view or multi-view clustering methods) only explore visual information, while ignoring one of most important sensing modalities, i.e., tactile information which can help capture different object properties and further boost the performance of object clustering task. To effectively benefit both visual and tactile modalities for object clustering, in this paper, we propose a deep Auto-Encoder-like Non-negative Matrix Factorization framework for visual-tactile fusion clustering. Specifically, deep matrix factorization constrained by an under-complete Auto-Encoder-like architecture is employed to jointly learn hierarchical expression of visual-tactile fusion data, and preserve the local structure of data generating distribution of visual and tactile modalities. Meanwhile, a graph regularizer is introduced to capture the intrinsic relations of data samples within each modality. Furthermore, we propose a modality-level consensus regularizer to effectively align the visual and tactile data in a common subspace in which the gap between visual and tactile data is mitigated. For the model optimization, we present an efficient alternating minimization strategy to solve our proposed model. Finally, we conduct extensive experiments on public datasets to verify the effectiveness of our framework.

【Keywords】:

AAAI Technical Track: Vision 333

1278. Learning End-to-End Scene Flow by Distilling Single Tasks Knowledge.

Paper Link】 【Pages】:10435-10442

【Authors】: Filippo Aleotti ; Matteo Poggi ; Fabio Tosi ; Stefano Mattoccia

【Abstract】: Scene flow is a challenging task aimed at jointly estimating the 3D structure and motion of the sensed environment. Although deep learning solutions achieve outstanding performance in terms of accuracy, these approaches divide the whole problem into standalone tasks (stereo and optical flow) addressing them with independent networks. Such a strategy dramatically increases the complexity of the training procedure and requires power-hungry GPUs to infer scene flow barely at 1 FPS. Conversely, we propose DWARF, a novel and lightweight architecture able to infer full scene flow jointly reasoning about depth and optical flow easily and elegantly trainable end-to-end from scratch. Moreover, since ground truth images for full scene flow are scarce, we propose to leverage on the knowledge learned by networks specialized in stereo or flow, for which much more data are available, to distill proxy annotations. Exhaustive experiments show that i) DWARF runs at about 10 FPS on a single high-end GPU and about 1 FPS on NVIDIA Jetson TX2 embedded at KITTI resolution, with moderate drop in accuracy compared to 10× deeper models, ii) learning from many distilled samples is more effective than from the few, annotated ones available.

【Keywords】:

1279. Ultrafast Photorealistic Style Transfer via Neural Architecture Search.

Paper Link】 【Pages】:10443-10450

【Authors】: Jie An ; Haoyi Xiong ; Jun Huan ; Jiebo Luo

【Abstract】: The key challenge in photorealistic style transfer is that an algorithm should faithfully transfer the style of a reference photo to a content photo while the generated image should look like one captured by a camera. Although several photorealistic style transfer algorithms have been proposed, they need to rely on post- and/or pre-processing to make the generated images look photorealistic. If we disable the additional processing, these algorithms would fail to produce plausible photorealistic stylization in terms of detail preservation and photorealism. In this work, we propose an effective solution to these issues. Our method consists of a construction step (C-step) to build a photorealistic stylization network and a pruning step (P-step) for acceleration. In the C-step, we propose a dense auto-encoder named PhotoNet based on a carefully designed pre-analysis. PhotoNet integrates a feature aggregation module (BFA) and instance normalized skip links (INSL). To generate faithful stylization, we introduce multiple style transfer modules in the decoder and INSLs. PhotoNet significantly outperforms existing algorithms in terms of both efficiency and effectiveness. In the P-step, we adopt a neural architecture search method to accelerate PhotoNet. We propose an automatic network pruning framework in the manner of teacher-student learning for photorealistic stylization. The network architecture named PhotoNAS resulted from the search achieves significant acceleration over PhotoNet while keeping the stylization effects almost intact. We conduct extensive experiments on both image and video transfer. The results show that our method can produce favorable results while achieving 20-30 times acceleration in comparison with the existing state-of-the-art approaches. It is worth noting that the proposed algorithm accomplishes better performance without any pre- or post-processing.

【Keywords】:

1280. PsyNet: Self-Supervised Approach to Object Localization Using Point Symmetric Transformation.

Paper Link】 【Pages】:10451-10459

【Authors】: Kyungjune Baek ; Minhyun Lee ; Hyunjung Shim

【Abstract】: Existing co-localization techniques significantly lose performance over weakly or fully supervised methods in accuracy and inference time. In this paper, we overcome common drawbacks of co-localization techniques by utilizing self-supervised learning approach. The major technical contributions of the proposed method are two-fold. 1) We devise a new geometric transformation, namely point symmetric transformation and utilize its parameters as an artificial label for self-supervised learning. This new transformation can also play the role of region-drop based regularization. 2) We suggest a heat map extraction method for computing the heat map from the network trained by self-supervision, namely class-agnostic activation mapping. It is done by computing the spatial attention map. Based on extensive evaluations, we observe that the proposed method records new state-of-the-art performance in three fine-grained datasets for unsupervised object localization. Moreover, we show that the idea of the proposed method can be adopted in a modified manner to solve the weakly supervised object localization task. As a result, we outperform the current state-of-the-art technique in weakly supervised object localization by a significant gap.

【Keywords】:

1281. Detecting Human-Object Interactions via Functional Generalization.

Paper Link】 【Pages】:10460-10469

【Authors】: Ankan Bansal ; Sai Saketh Rambhatla ; Abhinav Shrivastava ; Rama Chellappa

【Abstract】: We present an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner. The proposed model is simple and efficiently uses the data, visual features of the human, relative spatial orientation of the human and the object, and the knowledge that functionally similar objects take part in similar interactions with humans. We provide extensive experimental validation for our approach and demonstrate state-of-the-art results for HOI detection. On the HICO-Det dataset our method achieves a gain of over 2.5% absolute points in mean average precision (mAP) over state-of-the-art. We also show that our approach leads to significant performance gains for zero-shot HOI detection in the seen object setting. We further demonstrate that using a generic object detector, our model can generalize to interactions involving previously unseen objects.

【Keywords】:

1282. Incremental Multi-Domain Learning with Network Latent Tensor Factorization.

Paper Link】 【Pages】:10470-10477

【Authors】: Adrian Bulat ; Jean Kossaifi ; Georgios Tzimiropoulos ; Maja Pantic

【Abstract】: The prominence of deep learning, large amount of annotated data and increasingly powerful hardware made it possible to reach remarkable performance for supervised classification tasks, in many cases saturating the training sets. However the resulting models are specialized to a single very specific task and domain. Adapting the learned classification to new domains is a hard problem due to at least three reasons: (1) the new domains and the tasks might be drastically different; (2) there might be very limited amount of annotated data on the new domain and (3) full training of a new model for each new task is prohibitive in terms of computation and memory, due to the sheer number of parameters of deep CNNs. In this paper, we present a method to learn new-domains and tasks incrementally, building on prior knowledge from already learned tasks and without catastrophic forgetting. We do so by jointly parametrizing weights across layers using low-rank Tucker structure. The core is task agnostic while a set of task specific factors are learnt on each new domain. We show that leveraging tensor structure enables better performance than simply using matrix operations. Joint tensor modelling also naturally leverages correlations across different layers. Compared with previous methods which have focused on adapting each layer separately, our approach results in more compact representations for each new task/domain. We apply the proposed method to the 10 datasets of the Visual Decathlon Challenge and show that our method offers on average about 7.5× reduction in number of parameters and competitive performance in terms of both classification accuracy and Decathlon score.

【Keywords】:

1283. Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation.

Paper Link】 【Pages】:10478-10485

【Authors】: Yingjie Cai ; Buyu Li ; Zeyu Jiao ; Hongsheng Li ; Xingyu Zeng ; Xiaogang Wang

【Abstract】: Monocular 3D object detection task aims to predict the 3D bounding boxes of objects based on monocular RGB images. Since the location recovery in 3D space is quite difficult on account of absence of depth information, this paper proposes a novel unified framework which decomposes the detection problem into a structured polygon prediction task and a depth recovery task. Different from the widely studied 2D bounding boxes, the proposed novel structured polygon in the 2D image consists of several projected surfaces of the target object. Compared to the widely-used 3D bounding box proposals, it is shown to be a better representation for 3D detection. In order to inversely project the predicted 2D structured polygon to a cuboid in the 3D physical world, the following depth recovery task uses the object height prior to complete the inverse projection transformation with the given camera projection matrix. Moreover, a fine-grained 3D box refinement scheme is proposed to further rectify the 3D detection results. Experiments are conducted on the challenging KITTI benchmark, in which our method achieves state-of-the-art detection accuracy.

【Keywords】:

1284. Auto-GAN: Self-Supervised Collaborative Learning for Medical Image Synthesis.

Paper Link】 【Pages】:10486-10493

【Authors】: Bing Cao ; Han Zhang ; Nannan Wang ; Xinbo Gao ; Dinggang Shen

【Abstract】: In various clinical scenarios, medical image is crucial in disease diagnosis and treatment. Different modalities of medical images provide complementary information and jointly helps doctors to make accurate clinical decision. However, due to clinical and practical restrictions, certain imaging modalities may be unavailable nor complete. To impute missing data with adequate clinical accuracy, here we propose a framework called self-supervised collaborative learning to synthesize missing modality for medical images. The proposed method comprehensively utilize all available information correlated to the target modality from multi-source-modality images to generate any missing modality in a single model. Different from the existing methods, we introduce an auto-encoder network as a novel, self-supervised constraint, which provides target-modality-specific information to guide generator training. In addition, we design a modality mask vector as the target modality label. With experiments on multiple medical image databases, we demonstrate a great generalization ability as well as specialty of our method compared with other state-of-the-arts.

【Keywords】:

1285. Feature Deformation Meta-Networks in Image Captioning of Novel Objects.

Paper Link】 【Pages】:10494-10501

【Authors】: Tingjia Cao ; Ke Han ; Xiaomei Wang ; Lin Ma ; Yanwei Fu ; Yu-Gang Jiang ; Xiangyang Xue

【Abstract】: This paper studies the task of image captioning with novel objects, which only exist in testing images. Intrinsically, this task can reflect the generalization ability of models in understanding and captioning the semantic meanings of visual concepts and objects unseen in training set, sharing the similarity to one/zero-shot learning. The critical difficulty thus comes from that no paired images and sentences of the novel objects can be used to help train the captioning model. Inspired by recent work (Chen et al. 2019b) that boosts one-shot learning by learning to generate various image deformations, we propose learning meta-networks for deforming features for novel object captioning. To this end, we introduce the feature deformation meta-networks (FDM-net), which is trained on source data, and learn to adapt to the novel object features detected by the auxiliary detection model. FDM-net includes two sub-nets: feature deformation, and scene graph sentence reconstruction, which produce the augmented image features and corresponding sentences, respectively. Thus, rather than directly deforming images, FDM-net can efficiently and dynamically enlarge the paired images and texts by learning to deform image features. Extensive experiments are conducted on the widely used novel object captioning dataset, and the results show the effectiveness of our FDM-net. Ablation study and qualitative visualization further give insights of our model.

【Keywords】:

1286. General Partial Label Learning via Dual Bipartite Graph Autoencoder.

Paper Link】 【Pages】:10502-10509

【Authors】: Brian Chen ; Bo Wu ; Alireza Zareian ; Hanwang Zhang ; Shih-Fu Chang

【Abstract】: We formulate a practical yet challenging problem: General Partial Label Learning (GPLL). Compared to the traditional Partial Label Learning (PLL) problem, GPLL relaxes the supervision assumption from instance-level — a label set partially labels an instance — to group-level: 1) a label set partially labels a group of instances, where the within-group instance-label link annotations are missing, and 2) cross-group links are allowed — instances in a group may be partially linked to the label set from another group. Such ambiguous group-level supervision is more practical in real-world scenarios as additional annotation on the instance-level is no longer required, e.g., face-naming in videos where the group consists of faces in a frame, labeled by a name set in the corresponding caption. In this paper, we propose a novel graph convolutional network (GCN) called Dual Bipartite Graph Autoencoder (DB-GAE) to tackle the label ambiguity challenge of GPLL. First, we exploit the cross-group correlations to represent the instance groups as dual bipartite graphs: within-group and cross-group, which reciprocally complements each other to resolve the linking ambiguities. Second, we design a GCN autoencoder to encode and decode them, where the decodings are considered as the refined results. It is worth noting that DB-GAE is self-supervised and transductive, as it only uses the group-level supervision without a separate offline training stage. Extensive experiments on two real-world datasets demonstrate that DB-GAE significantly outperforms the best baseline over absolute 0.159 F1-score and 24.8% accuracy. We further offer analysis on various levels of label ambiguities.

【Keywords】:

1287. Learning Deep Relations to Promote Saliency Detection.

Paper Link】 【Pages】:10510-10517

【Authors】: Changrui Chen ; Xin Sun ; Yang Hua ; Junyu Dong ; Hongwei Xv

【Abstract】: Though saliency detectors has made stunning progress recently. The performances of the state-of-the-art saliency detectors are not acceptable in some confusing areas, e.g., object boundary. We argue that the feature spatial independence should be one of the root cause. This paper explores the ubiquitous relations on the deep features to promote the existing saliency detectors efficiently. We establish the relation by maximizing the mutual information of the deep features of the same category via deep neural networks to break this independence. We introduce a threshold-constrained training pair construction strategy to ensure that we can accurately estimate the relations between different image parts in a self-supervised way. The relation can be utilized to further excavate the salient areas and inhibit confusing backgrounds. The experiments demonstrate that our method can significantly boost the performance of the state-of-the-art saliency detectors on various benchmark datasets. Besides, our model is label-free and extremely efficient. The inference speed is 140 FPS on a single GTX1080 GPU.

【Keywords】:

1288. Hierarchical Online Instance Matching for Person Search.

Paper Link】 【Pages】:10518-10525

【Authors】: Di Chen ; Shanshan Zhang ; Wanli Ouyang ; Jian Yang ; Bernt Schiele

【Abstract】: Person Search is a challenging task which requires to retrieve a person's image and the corresponding position from an image dataset. It consists of two sub-tasks: pedestrian detection and person re-identification (re-ID). One of the key challenges is to properly combine the two sub-tasks into a unified framework. Existing works usually adopt a straightforward strategy by concatenating a detector and a re-ID model directly, either into an integrated model or into separated models. We argue that simply concatenating detection and re-ID is a sub-optimal solution, and we propose a Hierarchical Online Instance Matching (HOIM) loss which exploits the hierarchical relationship between detection and re-ID to guide the learning of our network. Our novel HOIM loss function harmonizes the objectives of the two sub-tasks and encourages better feature learning. In addition, we improve the loss update policy by introducing Selective Memory Refreshment (SMR) for unlabeled persons, which takes advantage of the potential discrimination power of unlabeled data. From the experiments on two standard person search benchmarks, i.e. CUHK-SYSU and PRW, we achieve state-of-the-art performance, which justifies the effectiveness of our proposed HOIM loss on learning robust features.

【Keywords】:

1289. Binarized Neural Architecture Search.

Paper Link】 【Pages】:10526-10533

【Authors】: Hanlin Chen ; Li'an Zhuo ; Baochang Zhang ; Xiawu Zheng ; Jianzhuang Liu ; David S. Doermann ; Rongrong Ji

【Abstract】: Neural architecture search (NAS) can have a significant impact in computer vision by automatically designing optimal neural network architectures for various tasks. A variant, binarized neural architecture search (BNAS), with a search space of binarized convolutions, can produce extremely compressed models. Unfortunately, this area remains largely unexplored. BNAS is more challenging than NAS due to the learning inefficiency caused by optimization requirements and the huge architecture space. To address these issues, we introduce channel sampling and operation space reduction into a differentiable NAS to significantly reduce the cost of searching. This is accomplished through a performance-based strategy used to abandon less potential operations. Two optimization methods for binarized neural networks are used to validate the effectiveness of our BNAS. Extensive experiments demonstrate that the proposed BNAS achieves a performance comparable to NAS on both CIFAR and ImageNet databases. An accuracy of 96.53% vs. 97.22% is achieved on the CIFAR-10 dataset, but with a significantly compressed model, and a 40% faster search than the state-of-the-art PC-DARTS.

【Keywords】:

1290. End-to-End Learning of Object Motion Estimation from Retinal Events for Event-Based Object Tracking.

Paper Link】 【Pages】:10534-10541

【Authors】: Haosheng Chen ; David Suter ; Qiangqiang Wu ; Hanzi Wang

【Abstract】: Event cameras, which are asynchronous bio-inspired vision sensors, have shown great potential in computer vision and artificial intelligence. However, the application of event cameras to object-level motion estimation or tracking is still in its infancy. The main idea behind this work is to propose a novel deep neural network to learn and regress a parametric object-level motion/transform model for event-based object tracking. To achieve this goal, we propose a synchronous Time-Surface with Linear Time Decay (TSLTD) representation, which effectively encodes the spatio-temporal information of asynchronous retinal events into TSLTD frames with clear motion patterns. We feed the sequence of TSLTD frames to a novel Retinal Motion Regression Network (RMRNet) to perform an end-to-end 5-DoF object motion regression. Our method is compared with state-of-the-art object tracking methods, that are based on conventional cameras or event cameras. The experimental results show the superiority of our method in handling various challenging environments such as fast motion and low illumination conditions.

【Keywords】:

1291. Zero-Shot Ingredient Recognition by Multi-Relational Graph Convolutional Network.

Paper Link】 【Pages】:10542-10550

【Authors】: Jingjing Chen ; Liangming Pan ; Zhipeng Wei ; Xiang Wang ; Chong-Wah Ngo ; Tat-Seng Chua

【Abstract】: Recognizing ingredients for a given dish image is at the core of automatic dietary assessment, attracting increasing attention from both industry and academia. Nevertheless, the task is challenging due to the difficulty of collecting and labeling sufficient training data. On one hand, there are hundred thousands of food ingredients in the world, ranging from the common to rare. Collecting training samples for all of the ingredient categories is difficult. On the other hand, as the ingredient appearances exhibit huge visual variance during the food preparation, it requires to collect the training samples under different cooking and cutting methods for robust recognition. Since obtaining sufficient fully annotated training data is not easy, a more practical way of scaling up the recognition is to develop models that are capable of recognizing unseen ingredients. Therefore, in this paper, we target the problem of ingredient recognition with zero training samples. More specifically, we introduce multi-relational GCN (graph convolutional network) that integrates ingredient hierarchy, attribute as well as co-occurrence for zero-shot ingredient recognition. Extensive experiments on both Chinese and Japanese food datasets are performed to demonstrate the superior performance of multi-relational GCN and shed light on zero-shot ingredients recognition.

【Keywords】:

1292. Rethinking the Bottom-Up Framework for Query-Based Video Localization.

Paper Link】 【Pages】:10551-10558

【Authors】: Long Chen ; Chujie Lu ; Siliang Tang ; Jun Xiao ; Dong Zhang ; Chilie Tan ; Xiaolin Li

【Abstract】: In this paper, we focus on the task query-based video localization, i.e., localizing a query in a long and untrimmed video. The prevailing solutions for this problem can be grouped into two categories: i) Top-down approach: It pre-cuts the video into a set of moment candidates, then it does classification and regression for each candidate; ii) Bottom-up approach: It injects the whole query content into each video frame, then it predicts the probabilities of each frame as a ground truth segment boundary (i.e., start or end). Both two frameworks have respective shortcomings: the top-down models suffer from heavy computations and they are sensitive to the heuristic rules, while the performance of bottom-up models is behind the performance of top-down counterpart thus far. However, we argue that the performance of bottom-up framework is severely underestimated by current unreasonable designs, including both the backbone and head network. To this end, we design a novel bottom-up model: Graph-FPN with Dense Predictions (GDP). For the backbone, GDP firstly generates a frame feature pyramid to capture multi-level semantics, then it utilizes graph convolution to encode the plentiful scene relationships, which incidentally mitigates the semantic gaps in the multi-scale feature pyramid. For the head network, GDP regards all frames falling in the ground truth segment as the foreground, and each foreground frame regresses the unique distances from its location to bi-directional boundaries. Extensive experiments on two challenging query-based video localization tasks (natural language video localization and video relocalization), involving four challenging benchmarks (TACoS, Charades-STA, ActivityNet Captions, and Activity-VRL), have shown that GDP surpasses the state-of-the-art top-down models.

【Keywords】:

1293. Diversity Transfer Network for Few-Shot Learning.

Paper Link】 【Pages】:10559-10566

【Authors】: Mengting Chen ; Yuxin Fang ; Xinggang Wang ; Heng Luo ; Yifeng Geng ; Xinyu Zhang ; Chang Huang ; Wenyu Liu ; Bo Wang

【Abstract】: Few-shot learning is a challenging task that aims at training a classifier for unseen classes with only a few training examples. The main difficulty of few-shot learning lies in the lack of intra-class diversity within insufficient training samples. To alleviate this problem, we propose a novel generative framework, Diversity Transfer Network (DTN), that learns to transfer latent diversities from known categories and composite them with support features to generate diverse samples for novel categories in feature space. The learning problem of the sample generation (i.e., diversity transfer) is solved via minimizing an effective meta-classification loss in a single-stage network, instead of the generative loss in previous works. Besides, an organized auxiliary task co-training over known categories is proposed to stabilize the meta-training process of DTN. We perform extensive experiments and ablation studies on three datasets, i.e., miniImageNet, CIFAR100 and CUB. The results show that DTN, with single-stage training and faster convergence speed, obtains the state-of-the-art results among the feature generation based few-shot learning methods. Code and supplementary material are available at: https://github.com/Yuxin-CV/DTN.

【Keywords】:

1294. Structure-Aware Feature Fusion for Unsupervised Domain Adaptation.

Paper Link】 【Pages】:10567-10574

【Authors】: Qingchao Chen ; Yang Liu

【Abstract】: Unsupervised domain Adaptation (UDA) aims to learn and transfer generalized features from a labelled source domain to a target domain without any annotations. Existing methods only aligning high-level representation but without exploiting the complex multi-class structure and local spatial structure. This is problematic as 1) the model is prone to negative transfer when the features from different classes are misaligned; 2) missing the local spatial structure poses a major obstacle in performing the fine-grained feature alignment. In this paper, we integrate the valuable information conveyed in classifier prediction and local feature maps into global feature representation and then perform a single mini-max game to make it domain invariant. In this way, the domain-invariant feature not only describes the holistic representation of the original image but also preserves mode-structure and fine-grained spatial structural information. The feature integration is achieved by estimating and maximizing the mutual information (MI) among the global feature, local feature and classifier prediction simultaneously. As the MI is hard to measure directly in high-dimension spaces, we adopt a new objective function that implicitly maximizes the MI via an effective sampling strategy and a discriminator design. Our STructure-Aware Feature Fusion (STAFF) network achieves the state-of-the-art performances in various UDA datasets.

【Keywords】:

1295. Knowledge Graph Transfer Network for Few-Shot Recognition.

Paper Link】 【Pages】:10575-10582

【Authors】: Riquan Chen ; Tianshui Chen ; Xiaolu Hui ; Hefeng Wu ; Guanbin Li ; Liang Lin

【Abstract】: Few-shot learning aims to learn novel categories from very few samples given some base categories with sufficient training samples. The main challenge of this task is the novel categories are prone to dominated by color, texture, shape of the object or background context (namely specificity), which are distinct for the given few training samples but not common for the corresponding categories (see Figure 1). Fortunately, we find that transferring information of the correlated based categories can help learn the novel concepts and thus avoid the novel concept being dominated by the specificity. Besides, incorporating semantic correlations among different categories can effectively regularize this information transfer. In this work, we represent the semantic correlations in the form of structured knowledge graph and integrate this graph into deep neural networks to promote few-shot learning by a novel Knowledge Graph Transfer Network (KGTN). Specifically, by initializing each node with the classifier weight of the corresponding category, a propagation mechanism is learned to adaptively propagate node message through the graph to explore node interaction and transfer classifier information of the base categories to those of the novel ones. Extensive experiments on the ImageNet dataset show significant performance improvement compared with current leading competitors. Furthermore, we construct an ImageNet-6K dataset that covers larger scale categories, i.e, 6,000 categories, and experiments on this dataset further demonstrate the effectiveness of our proposed model.

【Keywords】:

1296. Expressing Objects Just Like Words: Recurrent Visual Embedding for Image-Text Matching.

Paper Link】 【Pages】:10583-10590

【Authors】: Tianlang Chen ; Jiebo Luo

【Abstract】: Existing image-text matching approaches typically infer the similarity of an image-text pair by capturing and aggregating the affinities between the text and each independent object of the image. However, they ignore the connections between the objects that are semantically related. These objects may collectively determine whether the image corresponds to a text or not. To address this problem, we propose a Dual Path Recurrent Neural Network (DP-RNN) which processes images and sentences symmetrically by recurrent neural networks (RNN). In particular, given an input image-text pair, our model reorders the image objects based on the positions of their most related words in the text. In the same way as extracting the hidden features from word embeddings, the model leverages RNN to extract high-level object features from the reordered object inputs. We validate that the high-level object features contain useful joint information of semantically related objects, which benefit the retrieval task. To compute the image-text similarity, we incorporate a Multi-attention Cross Matching Model into DP-RNN. It aggregates the affinity between objects and words with cross-modality guided attention and self-attention. Our model achieves the state-of-the-art performance on Flickr30K dataset and competitive performance on MS-COCO dataset. Extensive experiments demonstrate the effectiveness of our model.

【Keywords】:

1297. Frame-Guided Region-Aligned Representation for Video Person Re-Identification.

Paper Link】 【Pages】:10591-10598

【Authors】: Zengqun Chen ; Zhiheng Zhou ; Junchu Huang ; Pengyu Zhang ; Bo Li

【Abstract】: Pedestrians in videos are usually in a moving state, resulting in serious spatial misalignment like scale variations and pose changes, which makes the video-based person re-identification problem more challenging. To address the above issue, in this paper, we propose a Frame-Guided Region-Aligned model (FGRA) for discriminative representation learning in two steps in an end-to-end manner. Firstly, based on a frame-guided feature learning strategy and a non-parametric alignment module, a novel alignment mechanism is proposed to extract well-aligned region features. Secondly, in order to form a sequence representation, an effective feature aggregation strategy that utilizes temporal alignment score and spatial attention is adopted to fuse region features in the temporal and spatial dimensions, respectively. Experiments are conducted on benchmark datasets to demonstrate the effectiveness of the proposed method to solve the misalignment problem and the superiority of the proposed method to the existing video-based person re-identification methods.

【Keywords】:

1298. Global Context-Aware Progressive Aggregation Network for Salient Object Detection.

Paper Link】 【Pages】:10599-10606

【Authors】: Zuyao Chen ; Qianqian Xu ; Runmin Cong ; Qingming Huang

【Abstract】: Deep convolutional neural networks have achieved competitive performance in salient object detection, in which how to learn effective and comprehensive features plays a critical role. Most of the previous works mainly adopted multiple-level feature integration yet ignored the gap between different features. Besides, there also exists a dilution process of high-level features as they passed on the top-down pathway. To remedy these issues, we propose a novel network named GCPANet to effectively integrate low-level appearance features, high-level semantic features, and global context features through some progressive context-aware Feature Interweaved Aggregation (FIA) modules and generate the saliency map in a supervised way. Moreover, a Head Attention (HA) module is used to reduce information redundancy and enhance the top layers features by leveraging the spatial and channel-wise attention, and the Self Refinement (SR) module is utilized to further refine and heighten the input features. Furthermore, we design the Global Context Flow (GCF) module to generate the global context information at different stages, which aims to learn the relationship among different salient regions and alleviate the dilution effect of high-level features. Experimental results on six benchmark datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.

【Keywords】:

1299. Video Frame Interpolation via Deformable Separable Convolution.

Paper Link】 【Pages】:10607-10614

【Authors】: Xianhang Cheng ; Zhenzhong Chen

【Abstract】: Learning to synthesize non-existing frames from the original consecutive video frames is a challenging task. Recent kernel-based interpolation methods predict pixels with a single convolution process to replace the dependency of optical flow. However, when scene motion is larger than the pre-defined kernel size, these methods yield poor results even though they take thousands of neighboring pixels into account. To solve this problem in this paper, we propose to use deformable separable convolution (DSepConv) to adaptively estimate kernels, offsets and masks to allow the network to obtain information with much fewer but more relevant pixels. In addition, we show that the kernel-based methods and conventional flow-based methods are specific instances of the proposed DSepConv. Experimental results demonstrate that our method significantly outperforms the other kernel-based interpolation methods and shows strong performance on par or even better than the state-of-the-art algorithms both qualitatively and quantitatively.

【Keywords】:

1300. CSPN++: Learning Context and Resource Aware Convolutional Spatial Propagation Networks for Depth Completion.

Paper Link】 【Pages】:10615-10622

【Authors】: Xinjing Cheng ; Peng Wang ; Chenye Guan ; Ruigang Yang

【Abstract】: Depth Completion deals with the problem of converting a sparse depth map to a dense one, given the corresponding color image. Convolutional spatial propagation network (CSPN) is one of the state-of-the-art (SoTA) methods of depth completion, which recovers structural details of the scene. In this paper, we propose CSPN++, which further improves its effectiveness and efficiency by learning adaptive convolutional kernel sizes and the number of iterations for the propagation, thus the context and computational resource needed at each pixel could be dynamically assigned upon requests. Specifically, we formulate the learning of the two hyper-parameters as an architecture selection problem where various configurations of kernel sizes and numbers of iterations are first defined, and then a set of soft weighting parameters are trained to either properly assemble or select from the pre-defined configurations at each pixel. In our experiments, we find weighted assembling can lead to significant accuracy improvements, which we referred to as "context-aware CSPN", while weighted selection, "resource-aware CSPN" can reduce the computational resource significantly with similar or better accuracy. Besides, the resource needed for CSPN++ can be adjusted w.r.t. the computational budget automatically. Finally, to avoid the side effects of noise or inaccurate sparse depths, we embed a gated network inside CSPN++, which further improves the performance. We demonstrate the effectiveness of CSPN++ on the KITTI depth completion benchmark, where it significantly improves over CSPN and other SoTA methods 1.

【Keywords】:

1301. A Coarse-to-Fine Adaptive Network for Appearance-Based Gaze Estimation.

Paper Link】 【Pages】:10623-10630

【Authors】: Yihua Cheng ; Shiyao Huang ; Fei Wang ; Chen Qian ; Feng Lu

【Abstract】: Human gaze is essential for various appealing applications. Aiming at more accurate gaze estimation, a series of recent works propose to utilize face and eye images simultaneously. Nevertheless, face and eye images only serve as independent or parallel feature sources in those works, the intrinsic correlation between their features is overlooked. In this paper we make the following contributions: 1) We propose a coarse-to-fine strategy which estimates a basic gaze direction from face image and refines it with corresponding residual predicted from eye images. 2) Guided by the proposed strategy, we design a framework which introduces a bi-gram model to bridge gaze residual and basic gaze direction, and an attention component to adaptively acquire suitable fine-grained feature. 3) Integrating the above innovations, we construct a coarse-to-fine adaptive network named CA-Net and achieve state-of-the-art performances on MPIIGaze and EyeDiap.

【Keywords】:

1302. 3D Human Pose Estimation Using Spatio-Temporal Networks with Explicit Occlusion Training.

Paper Link】 【Pages】:10631-10638

【Authors】: Yu Cheng ; Bo Yang ; Bo Wang ; Robby T. Tan

【Abstract】: Estimating 3D poses from a monocular video is still a challenging task, despite the significant progress that has been made in the recent years. Generally, the performance of existing methods drops when the target person is too small/large, or the motion is too fast/slow relative to the scale and speed of the training data. Moreover, to our knowledge, many of these methods are not designed or trained under severe occlusion explicitly, making their performance on handling occlusion compromised. Addressing these problems, we introduce a spatio-temporal network for robust 3D human pose estimation. As humans in videos may appear in different scales and have various motion speeds, we apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multi-stride temporal convolutional networks (TCNs) to estimate 3D joints or keypoints. Furthermore, we design a spatio-temporal discriminator based on body structures as well as limb motions to assess whether the predicted pose forms a valid pose and a valid movement. During training, we explicitly mask out some keypoints to simulate various occlusion cases, from minor to severe occlusion, so that our network can learn better and becomes robust to various degrees of occlusion. As there are limited 3D ground truth data, we further utilize 2D video data to inject a semi-supervised learning capability to our network. Experiments on public data sets validate the effectiveness of our method, and our ablation studies show the strengths of our network's individual submodules.

【Keywords】:

1303. PedHunter: Occlusion Robust Pedestrian Detector in Crowded Scenes.

Paper Link】 【Pages】:10639-10646

【Authors】: Cheng Chi ; Shifeng Zhang ; Junliang Xing ; Zhen Lei ; Stan Z. Li ; Xudong Zou

【Abstract】: Pedestrian detection in crowded scenes is a challenging problem, because occlusion happens frequently among different pedestrians. In this paper, we propose an effective and efficient detection network to hunt pedestrians in crowd scenes. The proposed method, namely PedHunter, introduces strong occlusion handling ability to existing region-based detection networks without bringing extra computations in the inference stage. Specifically, we design a mask-guided module to leverage the head information to enhance the feature representation learning of the backbone network. Moreover, we develop a strict classification criterion by improving the quality of positive samples during training to eliminate common false positives of pedestrian detection in crowded scenes. Besides, we present an occlusion-simulated data augmentation to enrich the pattern and quantity of occlusion samples to improve the occlusion robustness. As a consequent, we achieve state-of-the-art results on three pedestrian detection datasets including CityPersons, Caltech-USA and CrowdHuman. To facilitate further studies on the occluded pedestrian detection in surveillance scenes, we release a new pedestrian dataset, called SUR-PED, with a total of over 162k high-quality manually labeled instances in 10k images. The proposed dataset, source codes and trained models are available at https://github.com/ChiCheng123/PedHunter.

【Keywords】:

1304. Relational Learning for Joint Head and Human Detection.

Paper Link】 【Pages】:10647-10654

【Authors】: Cheng Chi ; Shifeng Zhang ; Junliang Xing ; Zhen Lei ; Stan Z. Li ; Xudong Zou

【Abstract】: Head and human detection have been rapidly improved with the development of deep convolutional neural networks. However, these two tasks are often studied separately without considering their inherent correlation, leading to that 1) head detection is often trapped in more false positives, and 2) the performance of human detector frequently drops dramatically in crowd scenes. To handle these two issues, we present a novel joint head and human detection network, namely JointDet, which effectively detects head and human body simultaneously. Moreover, we design a head-body relationship discriminating module to perform relational learning between heads and human bodies, and leverage this learned relationship to regain the suppressed human detections and reduce head false positives. To verify the effectiveness of the proposed method, we annotate head bounding boxes of the CityPersons and Caltech-USA datasets, and conduct extensive experiments on the CrowdHuman, CityPersons and Caltech-USA datasets. As a consequence, the proposed JointDet detector achieves state-of-the-art performance on these three benchmarks. To facilitate further studies on the head and human detection problem, all new annotations, source codes and trained models are available at https://github.com/ChiCheng123/JointDet.

【Keywords】:

1305. Visual Domain Adaptation by Consensus-Based Transfer to Intermediate Domain.

Paper Link】 【Pages】:10655-10662

【Authors】: Jongwon Choi ; Youngjoon Choi ; Jihoon Kim ; Jin-Yeop Chang ; Ilhwan Kwon ; Youngjune Gwon ; Seungjai Min

【Abstract】: We describe an unsupervised domain adaptation framework for images by a transform to an abstract intermediate domain and ensemble classifiers seeking a consensus. The intermediate domain can be thought as a latent domain where both the source and target domains can be transferred easily. The proposed framework aligns both domains to the intermediate domain, which greatly improves the adaptation performance when the source and target domains are notably dissimilar. In addition, we propose an ensemble model trained by confusing multiple classifiers and letting them make a consensus alternately to enhance the adaptation performance for ambiguous samples. To estimate the hidden intermediate domain and the unknown labels of the target domain simultaneously, we develop a training algorithm using a double-structured architecture. We validate the proposed framework in hard adaptation scenarios with real-world datasets from simple synthetic domains to complex real-world domains. The proposed algorithm outperforms the previous state-of-the-art algorithms on various environments.

【Keywords】:

1306. Channel Attention Is All You Need for Video Frame Interpolation.

Paper Link】 【Pages】:10663-10671

【Authors】: Myungsub Choi ; Heewon Kim ; Bohyung Han ; Ning Xu ; Kyoung Mu Lee

【Abstract】: Prevailing video frame interpolation techniques rely heavily on optical flow estimation and require additional model complexity and computational cost; it is also susceptible to error propagation in challenging scenarios with large motion and heavy occlusion. To alleviate the limitation, we propose a simple but effective deep neural network for video frame interpolation, which is end-to-end trainable and is free from a motion estimation network component. Our algorithm employs a special feature reshaping operation, referred to as PixelShuffle, with a channel attention, which replaces the optical flow computation module. The main idea behind the design is to distribute the information in a feature map into multiple channels and extract motion information by attending the channels for pixel-level frame synthesis. The model given by this principle turns out to be effective in the presence of challenging motion and occlusion. We construct a comprehensive evaluation benchmark and demonstrate that the proposed approach achieves outstanding performance compared to the existing models with a component for optical flow computation.

【Keywords】:

1307. DASOT: A Unified Framework Integrating Data Association and Single Object Tracking for Online Multi-Object Tracking.

Paper Link】 【Pages】:10672-10679

【Authors】: Qi Chu ; Wanli Ouyang ; Bin Liu ; Feng Zhu ; Nenghai Yu

【Abstract】: In this paper, we propose an online multi-object tracking (MOT) approach that integrates data association and single object tracking (SOT) with a unified convolutional network (ConvNet), named DASOTNet. The intuition behind integrating data association and SOT is that they can complement each other. Following Siamese network architecture, DASOTNet consists of the shared feature ConvNet, the data association branch and the SOT branch. Data association is treated as a special re-identification task and solved by learning discriminative features for different targets in the data association branch. To handle the problem that the computational cost of SOT grows intolerably as the number of tracked objects increases, we propose an efficient two-stage tracking method in the SOT branch, which utilizes the merits of correlation features and can simultaneously track all the existing targets within one forward propagation. With feature sharing and the interaction between them, data association branch and the SOT branch learn to better complement each other. Using a multi-task objective, the whole network can be trained end-to-end. Compared with state-of-the-art online MOT methods, our method is much faster while maintaining a comparable performance.

【Keywords】:

1308. Towards Ghost-Free Shadow Removal via Dual Hierarchical Aggregation Network and Shadow Matting GAN.

Paper Link】 【Pages】:10680-10687

【Authors】: Xiaodong Cun ; Chi-Man Pun ; Cheng Shi

【Abstract】: Shadow removal is an essential task for scene understanding. Many studies consider only matching the image contents, which often causes two types of ghosts: color in-consistencies in shadow regions or artifacts on shadow boundaries (as shown in Figure. 1). In this paper, we tackle these issues in two ways. First, to carefully learn the border artifacts-free image, we propose a novel network structure named the dual hierarchically aggregation network (DHAN). It contains a series of growth dilated convolutions as the backbone without any down-samplings, and we hierarchically aggregate multi-context features for attention and prediction, respectively. Second, we argue that training on a limited dataset restricts the textural understanding of the network, which leads to the shadow region color in-consistencies. Currently, the largest dataset contains 2k+ shadow/shadow-free image pairs. However, it has only 0.1k+ unique scenes since many samples share exactly the same background with different shadow positions. Thus, we design a shadow matting generative adversarial network (SMGAN) to synthesize realistic shadow mattings from a given shadow mask and shadow-free image. With the help of novel masks or scenes, we enhance the current datasets using synthesized shadow images. Experiments show that our DHAN can erase the shadows and produce high-quality ghost-free images. After training on the synthesized and real datasets, our network outperforms other state-of-the-art methods by a large margin. The code is available: http://github.com/vinthony/ghost-free-shadow-removal/

【Keywords】:

1309. The Missing Data Encoder: Cross-Channel Image Completion with Hide-and-Seek Adversarial Network.

Paper Link】 【Pages】:10688-10695

【Authors】: Arnaud Dapogny ; Matthieu Cord ; Patrick Pérez

【Abstract】: Image completion is the problem of generating whole images from fragments only. It encompasses inpainting (generating a patch given its surrounding), reverse inpainting/extrapolation (generating the periphery given the central patch) as well as colorization (generating one or several channels given other ones). In this paper, we employ a deep network to perform image completion, with adversarial training as well as perceptual and completion losses, and call it the “missing data encoder” (MDE). We consider several configurations based on how the seed fragments are chosen. We show that training MDE for “random extrapolation and colorization” (MDE-REC), i.e. using random channel-independent fragments, allows a better capture of the image semantics and geometry. MDE training makes use of a novel “hide-and-seek” adversarial loss, where the discriminator seeks the original non-masked regions, while the generator tries to hide them. We validate our models qualitatively and quantitatively on several datasets, showing their interest for image completion, representation learning as well as face occlusion handling.

【Keywords】:

1310. Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement.

Paper Link】 【Pages】:10696-10703

【Authors】: Jianing Deng ; Li Wang ; Shiliang Pu ; Cheng Zhuo

【Abstract】: Recent years have witnessed remarkable success of deep learning methods in quality enhancement for compressed video. To better explore temporal information, existing methods usually estimate optical flow for temporal motion compensation. However, since compressed video could be seriously distorted by various compression artifacts, the estimated optical flow tends to be inaccurate and unreliable, thereby resulting in ineffective quality enhancement. In addition, optical flow estimation for consecutive frames is generally conducted in a pairwise manner, which is computational expensive and inefficient. In this paper, we propose a fast yet effective method for compressed video quality enhancement by incorporating a novel Spatio-Temporal Deformable Fusion (STDF) scheme to aggregate temporal information. Specifically, the proposed STDF takes a target frame along with its neighboring reference frames as input to jointly predict an offset field to deform the spatio-temporal sampling positions of convolution. As a result, complementary information from both target and reference frames can be fused within a single Spatio-Temporal Deformable Convolution (STDC) operation. Extensive experiments show that our method achieves the state-of-the-art performance of compressed video quality enhancement in terms of both accuracy and efficiency.

【Keywords】:

1311. Zero Shot Learning with the Isoperimetric Loss.

Paper Link】 【Pages】:10704-10712

【Authors】: Shay Deutsch ; Andrea L. Bertozzi ; Stefano Soatto

【Abstract】: We introduce the isoperimetric loss as a regularization criterion for learning the map from a visual representation to a semantic embedding, to be used to transfer knowledge to unknown classes in a zero-shot learning setting. We use a pre-trained deep neural network model as a visual representation of image data, a Word2Vec embedding of class labels, and linear maps between the visual and semantic embedding spaces. However, the spaces themselves are not linear, and we postulate the sample embedding to be populated by noisy samples near otherwise smooth manifolds. We exploit the graph structure defined by the sample points to regularize the estimates of the manifolds by inferring the graph connectivity using a generalization of the isoperimetric inequalities from Riemannian geometry to graphs. Surprisingly, this regularization alone, paired with the simplest baseline model, outperforms the state-of-the-art among fully automated methods in zero-shot learning benchmarks such as AwA and CUB. This improvement is achieved solely by learning the structure of the underlying spaces by imposing regularity.

【Keywords】:

1312. Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow.

Paper Link】 【Pages】:10713-10720

【Authors】: Mingyu Ding ; Zhe Wang ; Bolei Zhou ; Jianping Shi ; Zhiwu Lu ; Ping Luo

【Abstract】: A major challenge for video semantic segmentation is the lack of labeled data. In most benchmark datasets, only one frame of a video clip is annotated, which makes most supervised methods fail to utilize information from the rest of the frames. To exploit the spatio-temporal information in videos, many previous works use pre-computed optical flows, which encode the temporal consistency to improve the video segmentation. However, the video segmentation and optical flow estimation are still considered as two separate tasks. In this paper, we propose a novel framework for joint video semantic segmentation and optical flow estimation. Semantic segmentation brings semantic information to handle occlusion for more robust optical flow estimation, while the non-occluded optical flow provides accurate pixel-level temporal correspondences to guarantee the temporal consistency of the segmentation. Moreover, our framework is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference. Extensive experiments show that the proposed model makes the video semantic segmentation and optical flow estimation benefit from each other and outperforms existing methods under the same settings in both tasks.

【Keywords】:

1313. Cycle-CNN for Colorization towards Real Monochrome-Color Camera Systems.

Paper Link】 【Pages】:10721-10728

【Authors】: Xuan Dong ; Weixin Li ; Xiaojie Wang ; Yunhong Wang

【Abstract】: Colorization in monochrome-color camera systems aims to colorize the gray image IG from the monochrome camera using the color image RC from the color camera as reference. Since monochrome cameras have better imaging quality than color cameras, the colorization can help obtain higher quality color images. Related learning based methods usually simulate the monochrome-color camera systems to generate the synthesized data for training, due to the lack of ground-truth color information of the gray image in the real data. However, the methods that are trained relying on the synthesized data may get poor results when colorizing real data, because the synthesized data may deviate from the real data. We present a new CNN model, named cycle CNN, which can directly use the real data from monochrome-color camera systems for training. In detail, we use the colorization CNN model to do the colorization twice. First, we colorize IG using RC as reference to obtain the first-time colorization result IC. Second, we colorize the de-colored map of RC, i.e. RG, using the first-time colorization result IC as reference to obtain the second-time colorization result R′C. In this way, for the second-time colorization result R′C, we use the original color map RC as ground-truth and introduce the cycle consistency loss to push R′C ≈ RC. Also, for the first-time colorization result IC, we propose a structure similarity loss to encourage the luminance maps between IG and IC to have similar structures. In addition, we introduce a spatial smoothness loss within the colorization CNN model to encourage spatial smoothness of the colorization result. Combining all these losses, we could train the colorization CNN model using the real data in the absence of the ground-truth color information of IG. Experimental results show that we can outperform related methods largely for colorizing real data.

【Keywords】:

1314. FD-GAN: Generative Adversarial Networks with Fusion-Discriminator for Single Image Dehazing.

Paper Link】 【Pages】:10729-10736

【Authors】: Yu Dong ; Yihao Liu ; He Zhang ; Shifeng Chen ; Yu Qiao

【Abstract】: Recently, convolutional neural networks (CNNs) have achieved great improvements in single image dehazing and attained much attention in research. Most existing learning-based dehazing methods are not fully end-to-end, which still follow the traditional dehazing procedure: first estimate the medium transmission and the atmospheric light, then recover the haze-free image based on the atmospheric scattering model. However, in practice, due to lack of priors and constraints, it is hard to precisely estimate these intermediate parameters. Inaccurate estimation further degrades the performance of dehazing, resulting in artifacts, color distortion and insufficient haze removal. To address this, we propose a fully end-to-end Generative Adversarial Networks with Fusion-discriminator (FD-GAN) for image dehazing. With the proposed Fusion-discriminator which takes frequency information as additional priors, our model can generator more natural and realistic dehazed images with less color distortion and fewer artifacts. Moreover, we synthesize a large-scale training dataset including various indoor and outdoor hazy images to boost the performance and we reveal that for learning-based dehazing methods, the performance is strictly influenced by the training data. Experiments have shown that our method reaches state-of-the-art performance on both public synthetic datasets and real-world images with more visually pleasing dehazed results.

【Keywords】:

1315. Visual Relationship Detection with Low Rank Non-Negative Tensor Decomposition.

Paper Link】 【Pages】:10737-10744

【Authors】: Mohammed Haroon Dupty ; Zhen Zhang ; Wee Sun Lee

【Abstract】: We address the problem of Visual Relationship Detection (VRD) which aims to describe the relationships between pairs of objects in the form of triplets of (subject, predicate, object). We observe that given a pair of bounding box proposals, objects often participate in multiple relations implying the distribution of triplets is multimodal. We leverage the strong correlations within triplets to learn the joint distribution of triplet variables conditioned on the image and the bounding box proposals, doing away with the hitherto used independent distribution of triplets. To make learning the triplet joint distribution feasible, we introduce a novel technique of learning conditional triplet distributions in the form of their normalized low rank non-negative tensor decompositions. Normalized tensor decompositions take form of mixture distributions of discrete variables and thus are able to capture multimodality. This allows us to efficiently learn higher order discrete multimodal distributions and at the same time keep the parameter size manageable. We further model the probability of selecting an object proposal pair and include a relation triplet prior in our model. We show that each part of the model improves performance and the combination outperforms state-of-the-art score on the Visual Genome (VG) and Visual Relationship Detection (VRD) datasets.

【Keywords】:

1316. SubSpace Capsule Network.

Paper Link】 【Pages】:10745-10753

【Authors】: Marzieh Edraki ; Nazanin Rahnavard ; Mubarak Shah

【Abstract】: Convolutional neural networks (CNNs) have become a key asset to most of fields in AI. Despite their successful performance, CNNs suffer from a major drawback. They fail to capture the hierarchy of spatial relation among different parts of an entity. As a remedy to this problem, the idea of capsules was proposed by Hinton. In this paper, we propose the SubSpace Capsule Network (SCN) that exploits the idea of capsule networks to model possible variations in the appearance or implicitly-defined properties of an entity through a group of capsule subspaces instead of simply grouping neurons to create capsules. A capsule is created by projecting an input feature vector from a lower layer onto the capsule subspace using a learnable transformation. This transformation finds the degree of alignment of the input with the properties modeled by the capsule subspace.We show that SCN is a general capsule network that can successfully be applied to both discriminative and generative models without incurring computational overhead compared to CNN during test time. Effectiveness of SCN is evaluated through a comprehensive set of experiments on supervised image classification, semi-supervised image classification and high-resolution image generation tasks using the generative adversarial network (GAN) framework. SCN significantly improves the performance of the baseline models in all 3 tasks.

【Keywords】:

1317. Person Tube Retrieval via Language Description.

Paper Link】 【Pages】:10754-10761

【Authors】: Hehe Fan ; Yi Yang

【Abstract】: This paper focuses on the problem of person tube (a sequence of bounding boxes which encloses a person in a video) retrieval using a natural language query. Different from images in person re-identification (re-ID) or person search, besides appearance, person tube contains abundant action and information. We exploit a 2D and a 3D residual networks (ResNets) to extract the appearance and action representation, respectively. To transform tubes and descriptions into a shared latent space where data from the two different modalities can be compared directly, we propose a Multi-Scale Structure Preservation (MSSP) approach. MSSP splits a person tube into several element-tubes on average, whose features are extracted by the two ResNets. Any number of consecutive element-tubes forms a sub-tube. MSSP considers the following constraints for sub-tubes and descriptions in the shared space. 1) Bidirectional ranking. Matching sub-tubes (resp. descriptions) should get ranked higher than incorrect ones for each description (resp. sub-tube). 2) External structure preservation. Sub-tubes (resp. descriptions) from different persons should stay away from each other. 3) Internal structure preservation. Sub-tubes (resp. descriptions) from the same person should be close to each other. Experimental results on person tube retrieval via language description and other two related tasks demonstrate the efficacy of MSSP.

【Keywords】:

1318. CIAN: Cross-Image Affinity Net for Weakly Supervised Semantic Segmentation.

Paper Link】 【Pages】:10762-10769

【Authors】: Junsong Fan ; Zhaoxiang Zhang ; Tieniu Tan ; Chunfeng Song ; Jun Xiao

【Abstract】: Weakly supervised semantic segmentation with only image-level labels saves large human effort to annotate pixel-level labels. Cutting-edge approaches rely on various innovative constraints and heuristic rules to generate the masks for every single image. Although great progress has been achieved by these methods, they treat each image independently and do not take account of the relationships across different images. In this paper, however, we argue that the cross-image relationship is vital for weakly supervised segmentation. Because it connects related regions across images, where supplementary representations can be propagated to obtain more consistent and integral regions. To leverage this information, we propose an end-to-end cross-image affinity module, which exploits pixel-level cross-image relationships with only image-level labels. By means of this, our approach achieves 64.3% and 65.3% mIoU on Pascal VOC 2012 validation and test set respectively, which is a new state-of-the-art result by only using image-level labels for weakly supervised semantic segmentation, demonstrating the superiority of our approach.

【Keywords】:

1319. Scale-Wise Convolution for Image Restoration.

Paper Link】 【Pages】:10770-10777

【Authors】: Yuchen Fan ; Jiahui Yu ; Ding Liu ; Thomas S. Huang

【Abstract】: While scale-invariant modeling has substantially boosted the performance of visual recognition tasks, it remains largely under-explored in deep networks based image restoration. Naively applying those scale-invariant techniques (e.g., multi-scale testing, random-scale data augmentation) to image restoration tasks usually leads to inferior performance. In this paper, we show that properly modeling scale-invariance into neural networks can bring significant benefits to image restoration performance. Inspired from spatial-wise convolution for shift-invariance, “scale-wise convolution” is proposed to convolve across multiple scales for scale-invariance. In our scale-wise convolutional network (SCN), we first map the input image to the feature space and then build a feature pyramid representation via bi-linear down-scaling progressively. The feature pyramid is then passed to a residual network with scale-wise convolutions. The proposed scale-wise convolution learns to dynamically activate and aggregate features from different input scales in each residual building block, in order to exploit contextual information on multiple scales. In experiments, we compare the restoration accuracy and parameter efficiency among our model and many different variants of multi-scale neural networks. The proposed network with scale-wise convolution achieves superior performance in multiple image restoration tasks including image super-resolution, image denoising and image compression artifacts removal. Code and models are available at: https://github.com/ychfan/scn_sr.

【Keywords】:

1320. EHSOD: CAM-Guided End-to-End Hybrid-Supervised Object Detection with Cascade Refinement.

Paper Link】 【Pages】:10778-10785

【Authors】: Linpu Fang ; Hang Xu ; Zhili Liu ; Sarah Parisot ; Zhenguo Li

【Abstract】: Object detectors trained on fully-annotated data currently yield state of the art performance but require expensive manual annotations. On the other hand, weakly-supervised detectors have much lower performance and cannot be used reliably in a realistic setting. In this paper, we study the hybrid-supervised object detection problem, aiming to train a high quality detector with only a limited amount of fully-annotated data and fully exploiting cheap data with image-level labels. State of the art methods typically propose an iterative approach, alternating between generating pseudo-labels and updating a detector. This paradigm requires careful manual hyper-parameter tuning for mining good pseudo labels at each round and is quite time-consuming. To address these issues, we present EHSOD, an end-to-end hybrid-supervised object detection system which can be trained in one shot on both fully and weakly-annotated data. Specifically, based on a two-stage detector, we proposed two modules to fully utilize the information from both kinds of labels: 1) CAM-RPN module aims at finding foreground proposals guided by a class activation heat-map; 2) hybrid-supervised cascade module further refines the bounding-box position and classification with the help of an auxiliary head compatible with image-level data. Extensive experiments demonstrate the effectiveness of the proposed method and it achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data, e.g. 37.5% mAP on COCO. We will release the code and the trained models.

【Keywords】:

1321. Adversarial Attack on Deep Product Quantization Network for Image Retrieval.

Paper Link】 【Pages】:10786-10793

【Authors】: Yan Feng ; Bin Chen ; Tao Dai ; Shu-Tao Xia

【Abstract】: Deep product quantization network (DPQN) has recently received much attention in fast image retrieval tasks due to its efficiency of encoding high-dimensional visual features especially when dealing with large-scale datasets. Recent studies show that deep neural networks (DNNs) are vulnerable to input with small and maliciously designed perturbations (a.k.a., adversarial examples). This phenomenon raises the concern of security issues for DPQN in the testing/deploying stage as well. However, little effort has been devoted to investigating how adversarial examples affect DPQN. To this end, we propose product quantization adversarial generation (PQ-AG), a simple yet effective method to generate adversarial examples for product quantization based retrieval systems. PQ-AG aims to generate imperceptible adversarial perturbations for query images to form adversarial queries, whose nearest neighbors from a targeted product quantizaiton model are not semantically related to those from the original queries. Extensive experiments show that our PQ-AQ successfully creates adversarial examples to mislead targeted product quantization retrieval models. Besides, we found that our PQ-AG significantly degrades retrieval performance in both white-box and black-box settings.

【Keywords】:

1322. Dynamic Sampling Network for Semantic Segmentation.

Paper Link】 【Pages】:10794-10801

【Authors】: Bin Fu ; Junjun He ; Zhengfu Zhang ; Yu Qiao

【Abstract】: Sampling is a basic operation of modern convolutional neural networks (CNN) since down-sampling operators are employed to enlarge the receptive field while up-sampling operators are adopted to increase resolution. Most existing deep segmentation networks employ regular grid sampling operators, which can be suboptimal for semantic segmentation task due to large shape and scale variance. To address this problem, this paper proposes a Context Guided Dynamic Sampling (CGDS) module to obtain an effective representation with rich shape and scale information by adaptively sampling useful segmentation information in spatial space. Moreover, we utilize the multi-scale contextual representations to guide the sampling process. Therefore, our CGDS can adaptively capture shape and scale information according to not only the input feature map but also the multi-scale semantic context. CGDS provides a plug-and-play module which can be easily incorporated in deep segmentation networks. We incorporate our proposed CGDS module into Dynamic Sampling Network (DSNet) and perform extensive experiments on segmentation datasets. Experimental results show that our CGDS significantly improves semantic segmentation performance and achieves state-of-the-art performance on PASCAL VOC 2012 and ADE20K datasets. Our model achieves 85.2% mIOU on PASCAL VOC 2012 test set without MS COCO dataset pre-trained and 46.4% on ADE20K validation set. The codes will become publicly available after publication.

【Keywords】:

1323. Ultrafast Video Attention Prediction with Coupled Knowledge Distillation.

Paper Link】 【Pages】:10802-10809

【Authors】: Kui Fu ; Peipei Shi ; Yafei Song ; Shiming Ge ; Xiangju Lu ; Jia Li

【Abstract】: Large convolutional neural network models have recently demonstrated impressive performance on video attention prediction. Conventionally, these models are with intensive computation and large memory. To address these issues, we design an extremely light-weight network with ultrafast speed, named UVA-Net. The network is constructed based on depth-wise convolutions and takes low-resolution images as input. However, this straight-forward acceleration method will decrease performance dramatically. To this end, we propose a coupled knowledge distillation strategy to augment and train the network effectively. With this strategy, the model can further automatically discover and emphasize implicit useful cues contained in the data. Both spatial and temporal knowledge learned by the high-resolution complex teacher networks also can be distilled and transferred into the proposed low-resolution light-weight spatiotemporal network. Experimental results show that the performance of our model is comparable to 11 state-of-the-art models in video attention prediction, while it costs only 0.68 MB memory footprint, runs about 10,106 FPS on GPU and 404 FPS on CPU, which is 206 times faster than previous models.

【Keywords】:

1324. Accurate Temporal Action Proposal Generation with Relation-Aware Pyramid Network.

Paper Link】 【Pages】:10810-10817

【Authors】: Jialin Gao ; Zhixiang Shi ; Guanshuo Wang ; Jiani Li ; Yufeng Yuan ; Shiming Ge ; Xi Zhou

【Abstract】: Accurate temporal action proposals play an important role in detecting actions from untrimmed videos. The existing approaches have difficulties in capturing global contextual information and simultaneously localizing actions with different durations. To this end, we propose a Relation-aware pyramid Network (RapNet) to generate highly accurate temporal action proposals. In RapNet, a novel relation-aware module is introduced to exploit bi-directional long-range relations between local features for context distilling. This embedded module enhances the RapNet in terms of its multi-granularity temporal proposal generation ability, given predefined anchor boxes. We further introduce a two-stage adjustment scheme to refine the proposal boundaries and measure their confidence in containing an action with snippet-level actionness. Extensive experiments on the challenging ActivityNet and THUMOS14 benchmarks demonstrate our RapNet generates superior accurate proposals over the existing state-of-the-art methods.

【Keywords】:

1325. Channel Interaction Networks for Fine-Grained Image Categorization.

Paper Link】 【Pages】:10818-10825

【Authors】: Yu Gao ; Xintong Han ; Xun Wang ; Weilin Huang ; Matthew Scott

【Abstract】: Fine-grained image categorization is challenging due to the subtle inter-class differences. We posit that exploiting the rich relationships between channels can help capture such differences since different channels correspond to different semantics. In this paper, we propose a channel interaction network (CIN), which models the channel-wise interplay both within an image and across images. For a single image, a self-channel interaction (SCI) module is proposed to explore channel-wise correlation within the image. This allows the model to learn the complementary features from the correlated channels, yielding stronger fine-grained features. Furthermore, given an image pair, we introduce a contrastive channel interaction (CCI) module to model the cross-sample channel interaction with a metric learning framework, allowing the CIN to distinguish the subtle visual differences between images. Our model can be trained efficiently in an end-to-end fashion without the need of multi-stage training and testing. Finally, comprehensive experiments are conducted on three publicly available benchmarks, where the proposed method consistently outperforms the state-of-the-art approaches, such as DFL-CNN(Wang, Morariu, and Davis 2018) and NTS(Yang et al. 2018).

【Keywords】:

1326. KnowIT VQA: Answering Knowledge-Based Questions about Videos.

Paper Link】 【Pages】:10826-10834

【Authors】: Noa Garcia ; Mayu Otani ; Chenhui Chu ; Yuta Nakashima

【Abstract】: We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions, which need of the experience obtained from the viewing of the series to be answered. Second, we propose a video understanding model by combining the visual and textual video content with specific knowledge about the show. Our main findings are: (i) the incorporation of knowledge produces outstanding improvements for VQA in video, and (ii) the performance on KnowIT VQA still lags well behind human accuracy, indicating its usefulness for studying current video modelling limitations.

【Keywords】:

1327. Deep Reinforcement Learning for Active Human Pose Estimation.

Paper Link】 【Pages】:10835-10844

【Authors】: Erik Gärtner ; Aleksis Pirinen ; Cristian Sminchisescu

【Abstract】: Most 3d human pose estimation methods assume that input – be it images of a scene collected from one or several viewpoints, or from a video – is given. Consequently, they focus on estimates leveraging prior knowledge and measurement by fusing information spatially and/or temporally, whenever available. In this paper we address the problem of an active observer with freedom to move and explore the scene spatially – in ‘time-freeze’ mode – and/or temporally, by selecting informative viewpoints that improve its estimation accuracy. Towards this end, we introduce Pose-DRL, a fully trainable deep reinforcement learning-based active pose estimation architecture which learns to select appropriate views, in space and time, to feed an underlying monocular pose estimator. We evaluate our model using single- and multi-target estimators with strong result in both settings. Our system further learns automatic stopping conditions in time and transition functions to the next temporal processing step in videos. In extensive experiments with the Panoptic multi-view setup, and for complex scenes containing multiple people, we show that our model learns to select viewpoints that yield significantly more accurate pose estimates compared to strong multi-view baselines.

【Keywords】:

1328. Look One and More: Distilling Hybrid Order Relational Knowledge for Cross-Resolution Image Recognition.

Paper Link】 【Pages】:10845-10852

【Authors】: Shiming Ge ; Kangkai Zhang ; Haolin Liu ; Yingying Hua ; Shengwei Zhao ; Xin Jin ; Hao Wen

【Abstract】: In spite of great success in many image recognition tasks achieved by recent deep models, directly applying them to recognize low-resolution images may suffer from low accuracy due to the missing of informative details during resolution degradation. However, these images are still recognizable for subjects who are familiar with the corresponding high-resolution ones. Inspired by that, we propose a teacher-student learning approach to facilitate low-resolution image recognition via hybrid order relational knowledge distillation. The approach refers to three streams: the teacher stream is pretrained to recognize high-resolution images in high accuracy, the student stream is learned to identify low-resolution images by mimicking the teacher's behaviors, and the extra assistant stream is introduced as bridge to help knowledge transfer across the teacher to the student. To extract sufficient knowledge for reducing the loss in accuracy, the learning of student is supervised with multiple losses, which preserves the similarities in various order relational structures. In this way, the capability of recovering missing details of familiar low-resolution images can be effectively enhanced, leading to a better knowledge transfer. Extensive experiments on metric learning, low-resolution image classification and low-resolution face recognition tasks show the effectiveness of our approach, while taking reduced models.

【Keywords】:

1329. Symmetrical Synthesis for Deep Metric Learning.

Paper Link】 【Pages】:10853-10860

【Authors】: Geonmo Gu ; ByungSoo Ko

【Abstract】: Deep metric learning aims to learn embeddings that contain semantic similarity information among data points. To learn better embeddings, methods to generate synthetic hard samples have been proposed. Existing methods of synthetic hard sample generation are adopting autoencoders or generative adversarial networks, but this leads to more hyper-parameters, harder optimization, and slower training speed. In this paper, we address these problems by proposing a novel method of synthetic hard sample generation called symmetrical synthesis. Given two original feature points from the same class, the proposed method firstly generates synthetic points with each other as an axis of symmetry. Secondly, it performs hard negative pair mining within the original and synthetic points to select a more informative negative pair for computing the metric learning loss. Our proposed method is hyper-parameter free and plug-and-play for existing metric learning losses without network modification. We demonstrate the superiority of our proposed method over existing methods for a variety of loss functions on clustering and image retrieval tasks.

【Keywords】:

1330. FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis.

Paper Link】 【Pages】:10861-10868

【Authors】: Kuangxiao Gu ; Yuqian Zhou ; Thomas S. Huang

【Abstract】: Talking face synthesis has been widely studied in either appearance-based or warping-based methods. Previous works mostly utilize single face image as a source, and generate novel facial animations by merging other person's facial features. However, some facial regions like eyes or teeth, which may be hidden in the source image, can not be synthesized faithfully and stably. In this paper, We present a landmark driven two-stream network to generate faithful talking facial animation, in which more facial details are created, preserved and transferred from multiple source images instead of a single one. Specifically, we propose a network consisting of a learning and fetching stream. The fetching sub-net directly learns to attentively warp and merge facial regions from five source images of distinctive landmarks, while the learning pipeline renders facial organs from the training face space to compensate. Compared to baseline algorithms, extensive experiments demonstrate that the proposed method achieves a higher performance both quantitatively and qualitatively. Codes are at https://github.com/kgu3/FLNet_AAAI2020.

【Keywords】:

1331. Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection.

Paper Link】 【Pages】:10869-10876

【Authors】: Yuchao Gu ; Lijuan Wang ; Ziqin Wang ; Yun Liu ; Ming-Ming Cheng ; Shao-Ping Lu

【Abstract】: Spatiotemporal information is essential for video salient object detection (VSOD) due to the highly attractive object motion for human's attention. Previous VSOD methods usually use Long Short-Term Memory (LSTM) or 3D ConvNet (C3D), which can only encode motion information through step-by-step propagation in the temporal domain. Recently, the non-local mechanism is proposed to capture long-range dependencies directly. However, it is not straightforward to apply the non-local mechanism into VSOD, because i) it fails to capture motion cues and tends to learn motion-independent global contexts; ii) its computation and memory costs are prohibitive for video dense prediction tasks such as VSOD. To address the above problems, we design a Constrained Self-Attention (CSA) operation to capture motion cues, based on the prior that objects always move in a continuous trajectory. We group a set of CSA operations in Pyramid structures (PCSA) to capture objects at various scales and speeds. Extensive experimental results demonstrate that our method outperforms previous state-of-the-art methods in both accuracy and speed (110 FPS on a single Titan Xp) on five challenge datasets. Our code is available at https://github.com/guyuchao/PyramidCSA.

【Keywords】:

1332. Constructing Multiple Tasks for Augmentation: Improving Neural Image Classification with K-Means Features.

Paper Link】 【Pages】:10877-10884

【Authors】: Tao Gui ; Lizhi Qing ; Qi Zhang ; Jiacheng Ye ; Hang Yan ; Zichu Fei ; Xuanjing Huang

【Abstract】: Multi-task learning (MTL) has received considerable attention, and numerous deep learning applications benefit from MTL with multiple objectives. However, constructing multiple related tasks is difficult, and sometimes only a single task is available for training in a dataset. To tackle this problem, we explored the idea of using unsupervised clustering to construct a variety of auxiliary tasks from unlabeled data or existing labeled data. We found that some of these newly constructed tasks could exhibit semantic meanings corresponding to certain human-specific attributes, but some were non-ideal. In order to effectively reduce the impact of non-ideal auxiliary tasks on the main task, we further proposed a novel meta-learning-based multi-task learning approach, which trained the shared hidden layers on auxiliary tasks, while the meta-optimization objective was to minimize the loss on the main task, ensuring that the optimizing direction led to an improvement on the main task. Experimental results across five image datasets demonstrated that the proposed method significantly outperformed existing single task learning, semi-supervised learning, and some data augmentation methods, including an improvement of more than 9% on the Omniglot dataset.

【Keywords】:

1333. Channel Pruning Guided by Classification Loss and Feature Importance.

Paper Link】 【Pages】:10885-10892

【Authors】: Jinyang Guo ; Wanli Ouyang ; Dong Xu

【Abstract】: In this work, we propose a new layer-by-layer channel pruning method called Channel Pruning guided by classification Loss and feature Importance (CPLI). In contrast to the existing layer-by-layer channel pruning approaches that only consider how to reconstruct the features from the next layer, our approach additionally take the classification loss into account in the channel pruning process. We also observe that some reconstructed features will be removed at the next pruning stage. So it is unnecessary to reconstruct these features. To this end, we propose a new strategy to suppress the influence of unimportant features (i.e., the features will be removed at the next pruning stage). Our comprehensive experiments on three benchmark datasets, i.e., CIFAR-10, ImageNet, and UCF-101, demonstrate the effectiveness of our CPLI method.

【Keywords】:

1334. MarioNETte: Few-Shot Face Reenactment Preserving Identity of Unseen Targets.

Paper Link】 【Pages】:10893-10900

【Authors】: Sungjoo Ha ; Martin Kersner ; Beomsu Kim ; Seokjun Seo ; Dongyoung Kim

【Abstract】: When there is a mismatch between the target identity and the driver identity, face reenactment suffers severe degradation in the quality of the result, especially in a few-shot setting. The identity preservation problem, where the model loses the detailed information of the target leading to a defective output, is the most common failure mode. The problem has several potential sources such as the identity of the driver leaking due to the identity mismatch, or dealing with unseen large poses. To overcome such problems, we introduce components that address the mentioned problem: image attention block, target feature alignment, and landmark transformer. Through attending and warping the relevant features, the proposed architecture, called MarioNETte, produces high-quality reenactments of unseen identities in a few-shot setting. In addition, the landmark transformer dramatically alleviates the identity preservation problem by isolating the expression geometry through landmark disentanglement. Comprehensive experiments are performed to verify that the proposed framework can generate highly realistic faces, outperforming all other baselines, even under a significant mismatch of facial characteristics between the target and the driver.

【Keywords】:

1335. SADA: Semantic Adversarial Diagnostic Attacks for Autonomous Applications.

Paper Link】 【Pages】:10901-10908

【Authors】: Abdullah Hamdi ; Matthias Mueller ; Bernard Ghanem

【Abstract】: One major factor impeding more widespread adoption of deep neural networks (DNNs) is their lack of robustness, which is essential for safety-critical applications such as autonomous driving. This has motivated much recent work on adversarial attacks for DNNs, which mostly focus on pixel-level perturbations void of semantic meaning. In contrast, we present a general framework for adversarial attacks on trained agents, which covers semantic perturbations to the environment of the agent performing the task as well as pixel-level attacks. To do this, we re-frame the adversarial attack problem as learning a distribution of parameters that always fools the agent. In the semantic case, our proposed adversary (denoted as BBGAN) is trained to sample parameters that describe the environment with which the black-box agent interacts, such that the agent performs its dedicated task poorly in this environment. We apply BBGAN on three different tasks, primarily targeting aspects of autonomous navigation: object detection, self-driving, and autonomous UAV racing. On these tasks, BBGAN can generate failure cases that consistently fool a trained agent.

【Keywords】:

1336. Robust Conditional GAN from Uncertainty-Aware Pairwise Comparisons.

Paper Link】 【Pages】:10909-10916

【Authors】: Ligong Han ; Ruijiang Gao ; Mun Kim ; Xin Tao ; Bo Liu ; Dimitris N. Metaxas

【Abstract】: Conditional generative adversarial networks have shown exceptional generation performance over the past few years. However, they require large numbers of annotations. To address this problem, we propose a novel generative adversarial network utilizing weak supervision in the form of pairwise comparisons (PC-GAN) for image attribute editing. In the light of Bayesian uncertainty estimation and noise-tolerant adversarial training, PC-GAN can estimate attribute rating efficiently and demonstrate robust performance in noise resistance. Through extensive experiments, we show both qualitatively and quantitatively that PC-GAN performs comparably with fully-supervised methods and outperforms unsupervised baselines. Code and Supplementary can be found on the project website*.

【Keywords】:

1337. Complementary-View Multiple Human Tracking.

Paper Link】 【Pages】:10917-10924

【Authors】: Ruize Han ; Wei Feng ; Jiewen Zhao ; Zicheng Niu ; Yujun Zhang ; Liang Wan ; Song Wang

【Abstract】: The global trajectories of targets on ground can be well captured from a top view in a high altitude, e.g., by a drone-mounted camera, while their local detailed appearances can be better recorded from horizontal views, e.g., by a helmet camera worn by a person. This paper studies a new problem of multiple human tracking from a pair of top- and horizontal-view videos taken at the same time. Our goal is to track the humans in both views and identify the same person across the two complementary views frame by frame, which is very challenging due to very large field of view difference. In this paper, we model the data similarity in each view using appearance and motion reasoning and across views using appearance and spatial reasoning. Combing them, we formulate the proposed multiple human tracking as a joint optimization problem, which can be solved by constrained integer programming. We collect a new dataset consisting of top- and horizontal-view video pairs for performance evaluation and the experimental results show the effectiveness of the proposed method.

【Keywords】:

1338. Point2Node: Correlation Learning of Dynamic-Node for Point Cloud Feature Modeling.

Paper Link】 【Pages】:10925-10932

【Authors】: Wenkai Han ; Chenglu Wen ; Cheng Wang ; Xin Li ; Qing Li

【Abstract】: Fully exploring correlation among points in point clouds is essential for their feature modeling. This paper presents a novel end-to-end graph model, named Point2Node, to represent a given point cloud. Point2Node can dynamically explore correlation among all graph nodes from different levels, and adaptively aggregate the learned features. Specifically, first, to fully explore the spatial correlation among points for enhanced feature description, in a high-dimensional node graph, we dynamically integrate the node's correlation with self, local, and non-local nodes. Second, to more effectively integrate learned features, we design a data-aware gate mechanism to self-adaptively aggregate features at the channel level. Extensive experiments on various point cloud benchmarks demonstrate that our method outperforms the state-of-the-art.

【Keywords】:

1339. Tensor FISTA-Net for Real-Time Snapshot Compressive Imaging.

Paper Link】 【Pages】:10933-10940

【Authors】: Xiaochen Han ; Bo Wu ; Zheng Shou ; Xiao-Yang Liu ; Yimeng Zhang ; Linghe Kong

【Abstract】: Snapshot compressive imaging (SCI) cameras capture high-speed videos by compressing multiple video frames into a measurement frame. However, reconstructing video frames from the compressed measurement frame is challenging. The existing state-of-the-art reconstruction algorithms suffer from low reconstruction quality or heavy time consumption, making them not suitable for real-time applications. In this paper, exploiting the powerful learning ability of deep neural networks (DNN), we propose a novel Tensor Fast Iterative Shrinkage-Thresholding Algorithm Net (Tensor FISTA-Net) as a decoder for SCI video cameras. Tensor FISTA-Net not only learns the sparsest representation of the video frames through convolution layers, but also reduces the reconstruction time significantly through tensor calculations. Experimental results on synthetic datasets show that the proposed Tensor FISTA-Net achieves average PSNR improvement of 1.63∼3.89dB over the state-of-the-art algorithms. Moreover, Tensor FISTA-Net takes less than 2 seconds running time and 12MB memory footprint, making it practical for real-time IoT applications.

【Keywords】:

1340. Temporal Context Enhanced Feature Aggregation for Video Object Detection.

Paper Link】 【Pages】:10941-10948

【Authors】: Fei He ; Naiyu Gao ; Qiaozhe Li ; Senyao Du ; Xin Zhao ; Kaiqi Huang

【Abstract】: Video object detection is a challenging task because of the presence of appearance deterioration in certain video frames. One typical solution is to aggregate neighboring features to enhance per-frame appearance features. However, such a method ignores the temporal relations between the aggregated frames, which is critical for improving video recognition accuracy. To handle the appearance deterioration problem, this paper proposes a temporal context enhanced network (TCENet) to exploit temporal context information by temporal aggregation for video object detection. To handle the displacement of the objects in videos, a novel DeformAlign module is proposed to align the spatial features from frame to frame. Instead of adopting a fixed-length window fusion strategy, a temporal stride predictor is proposed to adaptively select video frames for aggregation, which facilitates exploiting variable temporal information and requiring fewer video frames for aggregation to achieve better results. Our TCENet achieves state-of-the-art performance on the ImageNet VID dataset and has a faster runtime. Without bells-and-whistles, our TCENet achieves 80.3% mAP by only aggregating 3 frames.

【Keywords】:

1341. Grapy-ML: Graph Pyramid Mutual Learning for Cross-Dataset Human Parsing.

Paper Link】 【Pages】:10949-10956

【Authors】: Haoyu He ; Jing Zhang ; Qiming Zhang ; Dacheng Tao

【Abstract】: Human parsing, or human body part semantic segmentation, has been an active research topic due to its wide potential applications. In this paper, we propose a novel GRAph PYramid Mutual Learning (Grapy-ML) method to address the cross-dataset human parsing problem, where the annotations are at different granularities. Starting from the prior knowledge of the human body hierarchical structure, we devise a graph pyramid module (GPM) by stacking three levels of graph structures from coarse granularity to fine granularity subsequently. At each level, GPM utilizes the self-attention mechanism to model the correlations between context nodes. Then, it adopts a top-down mechanism to progressively refine the hierarchical features through all the levels. GPM also enables efficient mutual learning. Specifically, the network weights of the first two levels are shared to exchange the learned coarse-granularity information across different datasets. By making use of the multi-granularity labels, Grapy-ML learns a more discriminative feature representation and achieves state-of-the-art performance, which is demonstrated by extensive experiments on the three popular benchmarks, e.g. CIHP dataset. The source code is publicly available at https://github.com/Charleshhy/Grapy-ML.

【Keywords】:

1342. Softmax Dissection: Towards Understanding Intra- and Inter-Class Objective for Embedding Learning.

Paper Link】 【Pages】:10957-10964

【Authors】: Lanqing He ; Zhongdao Wang ; Yali Li ; Shengjin Wang

【Abstract】: The softmax loss and its variants are widely used as objectives for embedding learning applications like face recognition. However, the intra- and inter-class objectives in Softmax are entangled, therefore a well-optimized inter-class objective leads to relaxation on the intra-class objective, and vice versa. In this paper, we propose to dissect Softmax into independent intra- and inter-class objective (D-Softmax) with a clear understanding. It is straightforward to tune each part to the best state with D-Softmax as objective.Furthermore, we find the computation of the inter-class part is redundant and propose sampling-based variants of D-Softmax to reduce the computation cost. The face recognition experiments on regular-scale data show D-Softmax is favorably comparable to existing losses such as SphereFace and ArcFace. Experiments on massive-scale data show the fast variants significantly accelerates the training process (such as 64×) with only a minor sacrifice in performance, outperforming existing acceleration methods of Softmax in terms of both performance and efficiency.

【Keywords】:

1343. RoadTagger: Robust Road Attribute Inference with Graph Neural Networks.

Paper Link】 【Pages】:10965-10972

【Authors】: Songtao He ; Favyen Bastani ; Satvat Jagwani ; Edward Park ; Sofiane Abbar ; Mohammad Alizadeh ; Hari Balakrishnan ; Sanjay Chawla ; Samuel Madden ; Mohammad Amin Sadeghi

【Abstract】: Inferring road attributes such as lane count and road type from satellite imagery is challenging. Often, due to the occlusion in satellite imagery and the spatial correlation of road attributes, a road attribute at one position on a road may only be apparent when considering far-away segments of the road. Thus, to robustly infer road attributes, the model must integrate scattered information and capture the spatial correlation of features along roads. Existing solutions that rely on image classifiers fail to capture this correlation, resulting in poor accuracy. We find this failure is caused by a fundamental limitation – the limited effective receptive field of image classifiers.To overcome this limitation, we propose RoadTagger, an end-to-end architecture which combines both Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs) to infer road attributes. Using a GNN allows information to propagate on the road network graph and eliminates the receptive field limitation of image classifiers. We evaluate RoadTagger on both a large real-world dataset covering 688 km2 area in 20 U.S. cities and a synthesized dataset. In the evaluation, RoadTagger improves inference accuracy over the CNN image classifier based approaches. In addition, RoadTagger is robust to disruptions in the satellite imagery and is able to learn complicated inductive rules for aggregating scattered information along the road network.

【Keywords】:

1344. Joint Commonsense and Relation Reasoning for Image and Video Captioning.

Paper Link】 【Pages】:10973-10980

【Authors】: Jingyi Hou ; Xinxiao Wu ; Xiaoxun Zhang ; Yayun Qi ; Yunde Jia ; Jiebo Luo

【Abstract】: Exploiting relationships between objects for image and video captioning has received increasing attention. Most existing methods depend heavily on pre-trained detectors of objects and their relationships, and thus may not work well when facing detection challenges such as heavy occlusion, tiny-size objects, and long-tail classes. In this paper, we propose a joint commonsense and relation reasoning method that exploits prior knowledge for image and video captioning without relying on any detectors. The prior knowledge provides semantic correlations and constraints between objects, serving as guidance to build semantic graphs that summarize object relationships, some of which cannot be directly perceived from images or videos. Particularly, our method is implemented by an iterative learning algorithm that alternates between 1) commonsense reasoning for embedding visual regions into the semantic space to build a semantic graph and 2) relation reasoning for encoding semantic graphs to generate sentences. Experiments on several benchmark datasets validate the effectiveness of our prior knowledge-based approach.

【Keywords】:

1345. Hierarchical Modes Exploring in Generative Adversarial Networks.

Paper Link】 【Pages】:10981-10988

【Authors】: Mengxiao Hu ; Jinlong Li ; Maolin Hu ; Tao Hu

【Abstract】: In conditional Generative Adversarial Networks (cGANs), when two different initial noises are concatenated with the same conditional information, the distance between their outputs is relatively smaller, which makes minor modes likely to collapse into large modes. To prevent this happen, we proposed a hierarchical mode exploring method to alleviate mode collapse in cGANs by introducing a diversity measurement into the objective function as the regularization term. We also introduced the Expected Ratios of Expansion (ERE) into the regularization term, by minimizing the sum of differences between the real change of distance and ERE, we can control the diversity of generated images w.r.t specific-level features. We validated the proposed algorithm on four conditional image synthesis tasks including categorical generation, paired and un-paired image translation and text-to-image generation. Both qualitative and quantitative results show that the proposed method is effective in alleviating the mode collapse problem in cGANs, and can control the diversity of output images w.r.t specific-level features.

【Keywords】:

1346. SPSTracker: Sub-Peak Suppression of Response Map for Robust Object Tracking.

Paper Link】 【Pages】:10989-10996

【Authors】: Qintao Hu ; Lijun Zhou ; Xiaoxiao Wang ; Yao Mao ; Jianlin Zhang ; Qixiang Ye

【Abstract】: Modern visual trackers usually construct online learning models under the assumption that the feature response has a Gaussian distribution with target-centered peak response. Nevertheless, such an assumption is implausible when there is progressive interference from other targets and/or background noise, which produce sub-peaks on the tracking response map and cause model drift. In this paper, we propose a rectified online learning approach for sub-peak response suppression and peak response enforcement and target at handling progressive interference in a systematic way. Our approach, referred to as SPSTracker, applies simple-yet-efficient Peak Response Pooling (PRP) to aggregate and align discriminative features, as well as leveraging a Boundary Response Truncation (BRT) to reduce the variance of feature response. By fusing with multi-scale features, SPSTracker aggregates the response distribution of multiple sub-peaks to a single maximum peak, which enforces the discriminative capability of features for robust object tracking. Experiments on the OTB, NFS and VOT2018 benchmarks demonstrate that SPSTrack outperforms the state-of-the-art real-time trackers with significant margins1

【Keywords】:

1347. 3D Shape Completion with Multi-View Consistent Inference.

Paper Link】 【Pages】:10997-11004

【Authors】: Tao Hu ; Zhizhong Han ; Matthias Zwicker

【Abstract】: 3D shape completion is important to enable machines to perceive the complete geometry of objects from partial observations. To address this problem, view-based methods have been presented. These methods represent shapes as multiple depth images, which can be back-projected to yield corresponding 3D point clouds, and they perform shape completion by learning to complete each depth image using neural networks. While view-based methods lead to state-of-the-art results, they currently do not enforce geometric consistency among the completed views during the inference stage. To resolve this issue, we propose a multi-view consistent inference technique for 3D shape completion, which we express as an energy minimization problem including a data term and a regularization term. We formulate the regularization term as a consistency loss that encourages geometric consistency among multiple views, while the data term guarantees that the optimized views do not drift away too much from a learned shape descriptor. Experimental results demonstrate that our method completes shapes more accurately than previous techniques.

【Keywords】:

1348. GTC: Guided Training of CTC towards Efficient and Accurate Scene Text Recognition.

Paper Link】 【Pages】:11005-11012

【Authors】: Wenyang Hu ; Xiaocong Cai ; Jun Hou ; Shuai Yi ; Zhiping Lin

【Abstract】: Connectionist Temporal Classification (CTC) and attention mechanism are two main approaches used in recent scene text recognition works. Compared with attention-based methods, CTC decoder has a much shorter inference time, yet a lower accuracy. To design an efficient and effective model, we propose the guided training of CTC (GTC), where CTC model learns a better alignment and feature representations from a more powerful attentional guidance. With the benefit of guided training, CTC model achieves robust and accurate prediction for both regular and irregular scene text while maintaining a fast inference speed. Moreover, to further leverage the potential of CTC decoder, a graph convolutional network (GCN) is proposed to learn the local correlations of extracted features. Extensive experiments on standard benchmarks demonstrate that our end-to-end model achieves a new state-of-the-art for regular and irregular scene text recognition and needs 6 times shorter inference time than attention-based methods.

【Keywords】:

1349. Coarse-to-Fine Hyper-Prior Modeling for Learned Image Compression.

Paper Link】 【Pages】:11013-11020

【Authors】: Yueyu Hu ; Wenhan Yang ; Jiaying Liu

【Abstract】: Approaches to image compression with machine learning now achieve superior performance on the compression rate compared to existing hybrid codecs. The conventional learning-based methods for image compression exploits hyper-prior and spatial context model to facilitate probability estimations. Such models have limitations in modeling long-term dependency and do not fully squeeze out the spatial redundancy in images. In this paper, we propose a coarse-to-fine framework with hierarchical layers of hyper-priors to conduct comprehensive analysis of the image and more effectively reduce spatial redundancy, which improves the rate-distortion performance of image compression significantly. Signal Preserving Hyper Transforms are designed to achieve an in-depth analysis of the latent representation and the Information Aggregation Reconstruction sub-network is proposed to maximally utilize side-information for reconstruction. Experimental results show the effectiveness of the proposed network to efficiently reduce the redundancies in images and improve the rate-distortion performance, especially for high-resolution images. Our project is publicly available at https://huzi96.github.io/coarse-to-fine-compression.html.

【Keywords】:

1350. Location-Aware Graph Convolutional Networks for Video Question Answering.

Paper Link】 【Pages】:11021-11028

【Authors】: Deng Huang ; Peihao Chen ; Runhao Zeng ; Qing Du ; Mingkui Tan ; Chuang Gan

【Abstract】: We addressed the challenging task of video question answering, which requires machines to answer questions about videos in a natural language form. Previous state-of-the-art methods attempt to apply spatio-temporal attention mechanism on video frame features without explicitly modeling the location and relations among object interaction occurred in videos. However, the relations between object interaction and their location information are very critical for both action recognition and question reasoning. In this work, we propose to represent the contents in the video as a location-aware graph by incorporating the location information of an object into the graph construction. Here, each node is associated with an object represented by its appearance and location features. Based on the constructed graph, we propose to use graph convolution to infer both the category and temporal locations of an action. As the graph is built on objects, our method is able to focus on the foreground action contents for better video question answering. Lastly, we leverage an attention mechanism to combine the output of graph convolution and encoded question features for final answer reasoning. Extensive experiments demonstrate the effectiveness of the proposed methods. Specifically, our method significantly outperforms state-of-the-art methods on TGIF-QA, Youtube2Text-QA and MSVD-QA datasets.

【Keywords】:

1351. Unsupervised Deep Learning via Affinity Diffusion.

Paper Link】 【Pages】:11029-11036

【Authors】: Jiabo Huang ; Qi Dong ; Shaogang Gong ; Xiatian Zhu

【Abstract】: Convolutional neural networks (CNNs) have achieved unprecedented success in a variety of computer vision tasks. However, they usually rely on supervised model learning with the need for massive labelled training data, limiting dramatically their usability and deployability in real-world scenarios without any labelling budget. In this work, we introduce a general-purpose unsupervised deep learning approach to deriving discriminative feature representations. It is based on self-discovering semantically consistent groups of unlabelled training samples with the same class concepts through a progressive affinity diffusion process. Extensive experiments on object image classification and clustering show the performance superiority of the proposed method over the state-of-the-art unsupervised learning models using six common image recognition benchmarks including MNIST, SVHN, STL10, CIFAR10, CIFAR100 and ImageNet.

【Keywords】:

1352. GlobalTrack: A Simple and Strong Baseline for Long-Term Tracking.

Paper Link】 【Pages】:11037-11044

【Authors】: Lianghua Huang ; Xin Zhao ; Kaiqi Huang

【Abstract】: A key capability of a long-term tracker is to search for targets in very large areas (typically the entire image) to handle possible target absences or tracking failures. However, currently there is a lack of such a strong baseline for global instance search. In this work, we aim to bridge this gap. Specifically, we propose GlobalTrack, a pure global instance search based tracker that makes no assumption on the temporal consistency of the target's positions and scales. GlobalTrack is developed based on two-stage object detectors, and it is able to perform full-image and multi-scale search of arbitrary instances with only a single query as the guide. We further propose a cross-query loss to improve the robustness of our approach against distractors. With no online learning, no punishment on position or scale changes, no scale smoothing and no trajectory refinement, our pure global instance search based tracker achieves comparable, sometimes much better performance on four large-scale tracking benchmarks (i.e., 52.1% AUC on LaSOT, 63.8% success rate on TLP, 60.3% MaxGM on OxUvA and 75.4% normalized precision on TrackingNet), compared to state-of-the-art approaches that typically require complex post-processing. More importantly, our tracker runs without cumulative errors, i.e., any type of temporary tracking failures will not affect its performance on future frames, making it ideal for long-term tracking. We hope this work will be a strong baseline for long-term tracking and will stimulate future works in this area.

【Keywords】:

1353. Part-Level Graph Convolutional Network for Skeleton-Based Action Recognition.

Paper Link】 【Pages】:11045-11052

【Authors】: Linjiang Huang ; Yan Huang ; Wanli Ouyang ; Liang Wang

【Abstract】: Recently, graph convolutional networks have achieved remarkable performance for skeleton-based action recognition. In this work, we identify a problem posed by the GCNs for skeleton-based action recognition, namely part-level action modeling. To address this problem, a novel Part-Level Graph Convolutional Network (PL-GCN) is proposed to capture part-level information of skeletons. Different from previous methods, the partition of body parts is learnable rather than manually defined. We propose two part-level blocks, namely Part Relation block (PR block) and Part Attention block (PA block), which are achieved by two differentiable operations, namely graph pooling operation and graph unpooling operation. The PR block aims at learning high-level relations between body parts while the PA block aims at highlighting the important body parts in the action. Integrating the original GCN with the two blocks, the PL-GCN can learn both part-level and joint-level information of the action. Extensive experiments on two benchmark datasets show the state-of-the-art performance on skeleton-based action recognition and demonstrate the effectiveness of the proposed method.

【Keywords】:

1354. Relational Prototypical Network for Weakly Supervised Temporal Action Localization.

Paper Link】 【Pages】:11053-11060

【Authors】: Linjiang Huang ; Yan Huang ; Wanli Ouyang ; Liang Wang

【Abstract】: In this paper, we propose a weakly supervised temporal action localization method on untrimmed videos based on prototypical networks. We observe two challenges posed by weakly supervision, namely action-background separation and action relation construction. Unlike the previous method, we propose to achieve action-background separation only by the original videos. To achieve this, a clustering loss is adopted to separate actions from backgrounds and learn intra-compact features, which helps in detecting complete action instances. Besides, a similarity weighting module is devised to further separate actions from backgrounds. To effectively identify actions, we propose to construct relations among actions for prototype learning. A GCN-based prototype embedding module is introduced to generate relational prototypes. Experiments on THUMOS14 and ActivityNet1.2 datasets show that our method outperforms the state-of-the-art methods.

【Keywords】:

1355. AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation.

Paper Link】 【Pages】:11061-11068

【Authors】: Weiting Huang ; Pengfei Ren ; Jingyu Wang ; Qi Qi ; Haifeng Sun

【Abstract】: In this paper, we propose an adaptive weighting regression (AWR) method to leverage the advantages of both detection-based and regression-based method. Hand joint coordinates are estimated as discrete integration of all pixels in dense representation, guided by adaptive weight maps. This learnable aggregation process introduces both dense and joint supervision that allows end-to-end training and brings adaptability to weight maps, making network more accurate and robust. Comprehensive exploration experiments are conducted to validate the effectiveness and generality of AWR under various experimental settings, especially its usefulness for different types of dense representation and input modality. Our method outperforms other state-of-the-art methods on four publicly available datasets, including NYU, ICVL, MSRA and HANDS 2017 dataset.

【Keywords】:

1356. Domain Adaptive Attention Learning for Unsupervised Person Re-Identification.

Paper Link】 【Pages】:11069-11076

【Authors】: Yangru Huang ; Peixi Peng ; Yi Jin ; Yidong Li ; Junliang Xing

【Abstract】: Person re-identification (Re-ID) across multiple datasets is a challenging task due to two main reasons: the presence of large cross-dataset distinctions and the absence of annotated target instances. To address these two issues, this paper proposes a domain adaptive attention learning approach to reliably transfer discriminative representation from the labeled source domain to the unlabeled target domain. In this approach, a domain adaptive attention model is learned to separate the feature map into domain-shared part and domain-specific part. In this manner, the domain-shared part is used to capture transferable cues that can compensate cross-dataset distinctions and give positive contributions to the target task, while the domain-specific part aims to model the noisy information to avoid the negative transfer caused by domain diversity. A soft label loss is further employed to take full use of unlabeled target data by estimating pseudo labels. Extensive experiments on the Market-1501, DukeMTMC-reID and MSMT17 benchmarks demonstrate the proposed approach outperforms the state-of-the-arts.

【Keywords】:

1357. Weakly-Supervised Video Re-Localization with Multiscale Attention Model.

Paper Link】 【Pages】:11077-11084

【Authors】: Yung-Han Huang ; Kuang-Jui Hsu ; Shyh-Kang Jeng ; Yen-Yu Lin

【Abstract】: Video re-localization aims to localize a sub-sequence, called target segment, in an untrimmed reference video that is similar to a given query video. In this work, we propose an attention-based model to accomplish this task in a weakly supervised setting. Namely, we derive our CNN-based model without using the annotated locations of the target segments in reference videos. Our model contains three modules. First, it employs a pre-trained C3D network for feature extraction. Second, we design an attention mechanism to extract multiscale temporal features, which are then used to estimate the similarity between the query video and a reference video. Third, a localization layer detects where the target segment is in the reference video by determining whether each frame in the reference video is consistent with the query video. The resultant CNN model is derived based on the proposed co-attention loss which discriminatively separates the target segment from the reference video. This loss maximizes the similarity between the query video and the target segment while minimizing the similarity between the target segment and the rest of the reference video. Our model can be modified to fully supervised re-localization. Our method is evaluated on a public dataset and achieves the state-of-the-art performance under both weakly supervised and fully supervised settings.

【Keywords】:

1358. SGAP-Net: Semantic-Guided Attentive Prototypes Network for Few-Shot Human-Object Interaction Recognition.

Paper Link】 【Pages】:11085-11092

【Authors】: Zhong Ji ; Xiyao Liu ; Yanwei Pang ; Xuelong Li

【Abstract】: Extreme instance imbalance among categories and combinatorial explosion make the recognition of Human-Object Interaction (HOI) a challenging task. Few studies have addressed both challenges directly. Motivated by the success of few-shot learning that learns a robust model from a few instances, we formulate HOI as a few-shot task in a meta-learning framework to alleviate the above challenges. Due to the fact that the intrinsic characteristic of HOI is diverse and interactive, we propose a Semantic-Guided Attentive Prototypes Network (SGAP-Net) to learn a semantic-guided metric space where HOI recognition can be performed by computing distances to attentive prototypes of each class. Specifically, the model generates attentive prototypes guided by the category names of actions and objects, which highlight the commonalities of images from the same class in HOI. In addition, we design a novel decision method to alleviate the biases produced by different patterns of the same action in HOI. Finally, in order to realize the task of few-shot HOI, we reorganize two HOI benchmark datasets, i.e., HICO-FS and TUHOI-FS, to realize the task of few-shot HOI. Extensive experimental results on both datasets have demonstrated the effectiveness of our proposed SGAP-Net approach.

【Keywords】:

1359. ElixirNet: Relation-Aware Network Architecture Adaptation for Medical Lesion Detection.

Paper Link】 【Pages】:11093-11100

【Authors】: Chenhan Jiang ; Shaoju Wang ; Xiaodan Liang ; Hang Xu ; Nong Xiao

【Abstract】: Most advances in medical lesion detection network are limited to subtle modification on the conventional detection network designed for natural images. However, there exists a vast domain gap between medical images and natural images where the medical image detection often suffers from several domain-specific challenges, such as high lesion/background similarity, dominant tiny lesions, and severe class imbalance. Is a hand-crafted detection network tailored for natural image undoubtedly good enough over a discrepant medical lesion domain? Is there more powerful operations, filters, and sub-networks that better fit the medical lesion detection problem to be discovered? In this paper, we introduce a novel ElixirNet that includes three components: 1) TruncatedRPN balances positive and negative data for false positive reduction; 2) Auto-lesion Block is automatically customized for medical images to incorporates relation-aware operations among region proposals, and leads to more suitable and efficient classification and localization. 3) Relation transfer module incorporates the semantic relationship and transfers the relevant contextual information with an interpretable graph, thus alleviates the problem of lack of annotations for all types of lesions. Experiments on DeepLesion and Kits19 prove the effectiveness of ElixirNet, achieving improvement of both sensitivity and precision over FPN with fewer parameters.

【Keywords】:

1360. Divide and Conquer: Question-Guided Spatio-Temporal Contextual Attention for Video Question Answering.

Paper Link】 【Pages】:11101-11108

【Authors】: Jianwen Jiang ; Ziqiang Chen ; Haojie Lin ; Xibin Zhao ; Yue Gao

【Abstract】: Understanding questions and finding clues for answers are the key for video question answering. Compared with image question answering, video question answering (Video QA) requires to find the clues accurately on both spatial and temporal dimension simultaneously, and thus is more challenging. However, the relationship between spatio-temporal information and question still has not been well utilized in most existing methods for Video QA. To tackle this problem, we propose a Question-Guided Spatio-Temporal Contextual Attention Network (QueST) method. In QueST, we divide the semantic features generated from question into two separate parts: the spatial part and the temporal part, respectively guiding the process of constructing the contextual attention on spatial and temporal dimension. Under the guidance of the corresponding contextual attention, visual features can be better exploited on both spatial and temporal dimensions. To evaluate the effectiveness of the proposed method, experiments are conducted on TGIF-QA dataset, MSRVTT-QA dataset and MSVD-QA dataset. Experimental results and comparisons with the state-of-the-art methods have shown that our method can achieve superior performance.

【Keywords】:

1361. Reasoning with Heterogeneous Graph Alignment for Video Question Answering.

Paper Link】 【Pages】:11109-11116

【Authors】: Pin Jiang ; Yahong Han

【Abstract】: The dominant video question answering methods are based on fine-grained representation or model-specific attention mechanism. They usually process video and question separately, then feed the representations of different modalities into following late fusion networks. Although these methods use information of one modality to boost the other, they neglect to integrate correlations of both inter- and intra-modality in an uniform module. We propose a deep heterogeneous graph alignment network over the video shots and question words. Furthermore, we explore the network architecture from four steps: representation, fusion, alignment, and reasoning. Within our network, the inter- and intra-modality information can be aligned and interacted simultaneously over the heterogeneous graph and used for cross-modal reasoning. We evaluate our method on three benchmark datasets and conduct extensive ablation study to the effectiveness of the network architecture. Experiments show the network to be superior in quality.

【Keywords】:

1362. Recurrent Nested Model for Sequence Generation.

Paper Link】 【Pages】:11117-11124

【Authors】: Wenhao Jiang ; Lin Ma ; Wei Lu

【Abstract】: Depth has been shown beneficial to neural network models. In this paper, we make an attempt to make the encoder-decoder model deeper for sequence generation. We propose a module that can be plugged into the middle between the encoder and decoder to increase the depth of the whole model. The proposed module follows a nested structure, which is divided into blocks with each block containing several recurrent transition steps. To reduce the training difficulty and preserve the necessary information for the decoder during transitions, inter-block connections and intra-block connections are constructed in our model. The inter-block connections provide the thought vectors from the current block to all the subsequent blocks. The intra-block connections connect all the hidden states entering the current block to the current transition step. The advantages of our model are illustrated on the image captioning and code captioning tasks.

【Keywords】:

1363. DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue.

Paper Link】 【Pages】:11125-11132

【Authors】: Xiaoze Jiang ; Jing Yu ; Zengchang Qin ; Yingying Zhuang ; Xingxing Zhang ; Yue Hu ; Qi Wu

【Abstract】: Different from Visual Question Answering task that requires to answer only one question about an image, Visual Dialogue involves multiple questions which cover a broad range of visual content that could be related to any objects, relationships or semantics. The key challenge in Visual Dialogue task is thus to learn a more comprehensive and semantic-rich image representation which may have adaptive attentions on the image for variant questions. In this research, we propose a novel model to depict an image from both visual and semantic perspectives. Specifically, the visual view helps capture the appearance-level information, including objects and their relationships, while the semantic view enables the agent to understand high-level visual semantics from the whole image to the local regions. Futhermore, on top of such multi-view image features, we propose a feature selection framework which is able to adaptively capture question-relevant information hierarchically in fine-grained level. The proposed method achieved state-of-the-art results on benchmark Visual Dialogue datasets. More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values. It gives us insights in understanding of human cognition in Visual Dialogue.

【Keywords】:

1364. Rethinking Temporal Fusion for Video-Based Person Re-Identification on Semantic and Time Aspect.

Paper Link】 【Pages】:11133-11140

【Authors】: Xinyang Jiang ; Yifei Gong ; Xiaowei Guo ; Qize Yang ; Feiyue Huang ; Wei-Shi Zheng ; Feng Zheng ; Xing Sun

【Abstract】: Recently, the research interest of person re-identification (ReID) has gradually turned to video-based methods, which acquire a person representation by aggregating frame features of an entire video. However, existing video-based ReID methods do not consider the semantic difference brought by the outputs of different network stages, which potentially compromises the information richness of the person features. Furthermore, traditional methods ignore important relationship among frames, which causes information redundancy in fusion along the time axis. To address these issues, we propose a novel general temporal fusion framework to aggregate frame features on both semantic aspect and time aspect. As for the semantic aspect, a multi-stage fusion network is explored to fuse richer frame features at multiple semantic levels, which can effectively reduce the information loss caused by the traditional single-stage fusion. While, for the time axis, the existing intra-frame attention method is improved by adding a novel inter-frame attention module, which effectively reduces the information redundancy in temporal fusion by taking the relationship among frames into consideration. The experimental results show that our approach can effectively improve the video-based re-identification accuracy, achieving the state-of-the-art performance.

【Keywords】:

1365. Learning Light Field Angular Super-Resolution via a Geometry-Aware Network.

Paper Link】 【Pages】:11141-11148

【Authors】: Jing Jin ; Junhui Hou ; Hui Yuan ; Sam Kwong

【Abstract】: The acquisition of light field images with high angular resolution is costly. Although many methods have been proposed to improve the angular resolution of a sparsely-sampled light field, they always focus on the light field with a small baseline, which is captured by a consumer light field camera. By making full use of the intrinsic geometry information of light fields, in this paper we propose an end-to-end learning-based approach aiming at angularly super-resolving a sparsely-sampled light field with a large baseline. Our model consists of two learnable modules and a physically-based module. Specifically, it includes a depth estimation module for explicitly modeling the scene geometry, a physically-based warping for novel views synthesis, and a light field blending module specifically designed for light field reconstruction. Moreover, we introduce a novel loss function to promote the preservation of the light field parallax structure. Experimental results over various light field datasets including large baseline light field images demonstrate the significant superiority of our method when compared with state-of-the-art ones, i.e., our method improves the PSNR of the second best method up to 2 dB in average, while saves the execution time 48×. In addition, our method preserves the light field parallax structure better.

【Keywords】:

1366. EAC-Net: Efficient and Accurate Convolutional Network for Video Recognition.

Paper Link】 【Pages】:11149-11156

【Authors】: Bowei Jin ; Zhuo Xu

【Abstract】: Research for computation-efficient video understanding is of great importance to real-world deployment. However, most of high-performance approaches are too computationally expensive for practical application. Though several efficiency oriented works are proposed, they inevitably suffer degradation of performance in terms of accuracy. In this paper, we explore a new architecture EAC-Net, enjoying both high efficiency and high performance. Specifically, we propose Motion Guided Temporal Encode (MGTE) blocks for temporal modeling, which exploits motion information and temporal relations among neighbor frames. EAC-Net is then constructed by inserting multiple MGTE blocks to common 2D CNNs. Furthermore, we proposed Atrous Temporal Encode (ATE) block for capturing long-term temporal relations at multiple time scales for further enhancing representation power of EAC-Net. Through experiments on Kinetics, our EAC-Nets achieved better results than TSM models with fewer FLOPs. With same 2D backbones, EAC-Nets outperformed Non-Local I3D counterparts by achieving higher accuracy with only about 7× fewer FLOPs. On Something-Something-V1 dataset, EAC-Net achieved 47% top-1 accuracy with 70G FLOPs which is 0.9% more accurate and 8× less FLOPs than that of Non-Local I3D+GCN.

【Keywords】:

1367. SSAH: Semi-Supervised Adversarial Deep Hashing with Self-Paced Hard Sample Generation.

Paper Link】 【Pages】:11157-11164

【Authors】: Sheng Jin ; Shangchen Zhou ; Yao Liu ; Chao Chen ; Xiaoshuai Sun ; Hongxun Yao ; Xian-Sheng Hua

【Abstract】: Deep hashing methods have been proved to be effective and efficient for large-scale Web media search. The success of these data-driven methods largely depends on collecting sufficient labeled data, which is usually a crucial limitation in practical cases. The current solutions to this issue utilize Generative Adversarial Network (GAN) to augment data in semi-supervised learning. However, existing GAN-based methods treat image generations and hashing learning as two isolated processes, leading to generation ineffectiveness. Besides, most works fail to exploit the semantic information in unlabeled data. In this paper, we propose a novel Semi-supervised Self-pace Adversarial Hashing method, named SSAH to solve the above problems in a unified framework. The SSAH method consists of an adversarial network (A-Net) and a hashing network (H-Net). To improve the quality of generative images, first, the A-Net learns hard samples with multi-scale occlusions and multi-angle rotated deformations which compete against the learning of accurate hashing codes. Second, we design a novel self-paced hard generation policy to gradually increase the hashing difficulty of generated samples. To make use of the semantic information in unlabeled ones, we propose a semi-supervised consistent loss. The experimental results show that our method can significantly improve state-of-the-art models on both the widely-used hashing datasets and fine-grained datasets.

【Keywords】:

1368. Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification.

Paper Link】 【Pages】:11165-11172

【Authors】: Xin Jin ; Cuiling Lan ; Wenjun Zeng ; Zhibo Chen

【Abstract】: Object re-identification (re-id) aims to identify a specific object across times or camera views, with the person re-id and vehicle re-id as the most widely studied applications. Re-id is challenging because of the variations in viewpoints, (human) poses, and occlusions. Multi-shots of the same object can cover diverse viewpoints/poses and thus provide more comprehensive information. In this paper, we propose exploiting the multi-shots of the same identity to guide the feature learning of each individual image. Specifically, we design an Uncertainty-aware Multi-shot Teacher-Student (UMTS) Network. It consists of a teacher network (T-net) that learns the comprehensive features from multiple images of the same object, and a student network (S-net) that takes a single image as input. In particular, we take into account the data dependent heteroscedastic uncertainty for effectively transferring the knowledge from the T-net to S-net. To the best of our knowledge, we are the first to make use of multi-shots of an object in a teacher-student learning manner for effectively boosting the single image based re-id. We validate the effectiveness of our approach on the popular vehicle re-id and person re-id datasets. In inference, the S-net alone significantly outperforms the baselines and achieves the state-of-the-art performance.

【Keywords】:

1369. Semantics-Aligned Representation Learning for Person Re-Identification.

Paper Link】 【Pages】:11173-11180

【Authors】: Xin Jin ; Cuiling Lan ; Wenjun Zeng ; Guoqiang Wei ; Zhibo Chen

【Abstract】: Person re-identification (reID) aims to match person images to retrieve the ones with the same identity. This is a challenging task, as the images to be matched are generally semantically misaligned due to the diversity of human poses and capture viewpoints, incompleteness of the visible bodies (due to occlusion), etc. In this paper, we propose a framework that drives the reID network to learn semantics-aligned feature representation through delicate supervision designs. Specifically, we build a Semantics Aligning Network (SAN) which consists of a base network as encoder (SA-Enc) for re-ID, and a decoder (SA-Dec) for reconstructing/regressing the densely semantics aligned full texture image. We jointly train the SAN under the supervisions of person re-identification and aligned texture generation. Moreover, at the decoder, besides the reconstruction loss, we add Triplet ReID constraints over the feature maps as the perceptual losses. The decoder is discarded in the inference and thus our scheme is computationally efficient. Ablation studies demonstrate the effectiveness of our design. We achieve the state-of-the-art performances on the benchmark datasets CUHK03, Market1501, MSMT17, and the partial person reID dataset Partial REID.

【Keywords】:

1370. Overcoming Language Priors in VQA via Decomposed Linguistic Representations.

Paper Link】 【Pages】:11181-11188

【Authors】: Chenchen Jing ; Yuwei Wu ; Xiaoxun Zhang ; Yunde Jia ; Qi Wu

【Abstract】: Most existing Visual Question Answering (VQA) models overly rely on language priors between questions and answers. In this paper, we present a novel method of language attention-based VQA that learns decomposed linguistic representations of questions and utilizes the representations to infer answers for overcoming language priors. We introduce a modular language attention mechanism to parse a question into three phrase representations: type representation, object representation, and concept representation. We use the type representation to identify the question type and the possible answer set (yes/no or specific concepts such as colors or numbers), and the object representation to focus on the relevant region of an image. The concept representation is verified with the attended region to infer the final answer. The proposed method decouples the language-based concept discovery and vision-based concept verification in the process of answer inference to prevent language priors from dominating the answering process. Experiments on the VQA-CP dataset demonstrate the effectiveness of our method.

【Keywords】:

1371. Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search.

Paper Link】 【Pages】:11189-11196

【Authors】: Ya Jing ; Chenyang Si ; Junbo Wang ; Wei Wang ; Liang Wang ; Tieniu Tan

【Abstract】: Text-based person search aims to retrieve the corresponding person images in an image database by virtue of a describing sentence about the person, which poses great potential for various applications such as video surveillance. Extracting visual contents corresponding to the human description is the key to this cross-modal matching problem. Moreover, correlated images and descriptions involve different granularities of semantic relevance, which is usually ignored in previous methods. To exploit the multilevel corresponding visual contents, we propose a pose-guided multi-granularity attention network (PMA). Firstly, we propose a coarse alignment network (CA) to select the related image regions to the global description by a similarity-based attention. To further capture the phrase-related visual body part, a fine-grained alignment network (FA) is proposed, which employs pose information to learn latent semantic alignment between visual body part and textual noun phrase. To verify the effectiveness of our model, we perform extensive experiments on the CUHK Person Description Dataset (CUHK-PEDES) which is currently the only available dataset for text-based person search. Experimental results show that our approach outperforms the state-of-the-art methods by 15 % in terms of the top-1 metric.

【Keywords】:

1372. Associative Variational Auto-Encoder with Distributed Latent Spaces and Associators.

Paper Link】 【Pages】:11197-11204

【Authors】: Dae Ung Jo ; Byeongju Lee ; Jongwon Choi ; Haanju Yoo ; Jin Young Choi

【Abstract】: In this paper, we propose a novel structure for a multi-modal data association referred to as Associative Variational Auto-Encoder (AVAE). In contrast to the existing models using a shared latent space among modalities, our structure adopts distributed latent spaces for multi-modalities which are connected through cross-modal associators. The proposed structure successfully associates even heterogeneous modality data and easily incorporates the additional modality to the entire network via the associator. Furthermore, in our structure, only a small amount of supervised (paired) data is enough to train associators after training auto-encoders in an unsupervised manner. Through experiments, the effectiveness of the proposed structure is validated on various datasets including visual and auditory data.

【Keywords】:

1373. Real-Time Object Tracking via Meta-Learning: Efficient Model Adaptation and One-Shot Channel Pruning.

Paper Link】 【Pages】:11205-11212

【Authors】: Ilchae Jung ; Kihyun You ; Hyeonwoo Noh ; Minsu Cho ; Bohyung Han

【Abstract】: We propose a novel meta-learning framework for real-time object tracking with efficient model adaptation and channel pruning. Given an object tracker, our framework learns to fine-tune its model parameters in only a few gradient-descent iterations during tracking while pruning its network channels using the target ground-truth at the first frame. Such a learning problem is formulated as a meta-learning task, where a meta-tracker is trained by updating its meta-parameters for initial weights, learning rates, and pruning masks through carefully designed tracking simulations. The integrated meta-tracker greatly improves tracking performance by accelerating the convergence of online learning and reducing the cost of feature computation. Experimental evaluation on the standard datasets demonstrates its outstanding accuracy and speed compared to the state-of-the-art methods.

【Keywords】:

1374. Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling.

Paper Link】 【Pages】:11213-11220

【Authors】: Yunjae Jung ; Dahun Kim ; Sanghyun Woo ; Kyungsu Kim ; Sungjin Kim ; In So Kweon

【Abstract】: Visual storytelling is a task of creating a short story based on photo streams. Unlike existing visual captioning, storytelling aims to contain not only factual descriptions, but also human-like narration and semantics. However, the VIST dataset consists only of a small, fixed number of photos per story. Therefore, the main challenge of visual storytelling is to fill in the visual gap between photos with narrative and imaginative story. In this paper, we propose to explicitly learn to imagine a storyline that bridges the visual gap. During training, one or more photos is randomly omitted from the input stack, and we train the network to produce a full plausible story even with missing photo(s). Furthermore, we propose for visual storytelling a hide-and-tell model, which is designed to learn non-local relations across the photo streams and to refine and improve conventional RNN-based models. In experiments, we show that our scheme of hide-and-tell, and the network design are indeed effective at storytelling, and that our model outperforms previous state-of-the-art methods in automatic metrics. Finally, we qualitatively show the learned ability to interpolate storyline over visual gaps.

【Keywords】:

1375. Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild.

Paper Link】 【Pages】:11221-11228

【Authors】: Yueying Kao ; Weiming Li ; Qiang Wang ; Zhouchen Lin ; Wooshik Kim ; Sunghoon Hong

【Abstract】: Monocular object pose estimation is an important yet challenging computer vision problem. Depth features can provide useful information for pose estimation. However, existing methods rely on real depth images to extract depth features, leading to its difficulty on various applications. In this paper, we aim at extracting RGB and depth features from a single RGB image with the help of synthetic RGB-depth image pairs for object pose estimation. Specifically, a deep convolutional neural network is proposed with an RGB-to-Depth Embedding module and a Synthetic-Real Adaptation module. The embedding module is trained with synthetic pair data to learn a depth-oriented embedding space between RGB and depth images optimized for object pose estimation. The adaptation module is to further align distributions from synthetic to real data. Compared to existing methods, our method does not need any real depth images and can be trained easily with large-scale synthetic data. Extensive experiments and comparisons show that our method achieves best performance on a challenging public PASCAL 3D+ dataset in all the metrics, which substantiates the superiority of our method and the above modules.

【Keywords】:

1376. Group-Wise Dynamic Dropout Based on Latent Semantic Variations.

Paper Link】 【Pages】:11229-11236

【Authors】: Zhiwei Ke ; Zhiwei Wen ; Weicheng Xie ; Yi Wang ; Linlin Shen

【Abstract】: Dropout regularization has been widely used in various deep neural networks to combat overfitting. It works by training a network to be more robust on information-degraded data points for better generalization. Conventional dropout and variants are often applied to individual hidden units in a layer to break up co-adaptations of feature detectors. In this paper, we propose an adaptive dropout to reduce the co-adaptations in a group-wise manner by coarse semantic information to improve feature discriminability. In particular, we showed that adjusting the dropout probability based on local feature densities can not only improve the classification performance significantly but also enhance the network robustness against adversarial examples in some cases. The proposed approach was evaluated in comparison with the baseline and several state-of-the-art adaptive dropouts over four public datasets of Fashion-MNIST, CIFAR-10, CIFAR-100 and SVHN.

【Keywords】:

1377. Deep Generative Probabilistic Graph Neural Networks for Scene Graph Generation.

Paper Link】 【Pages】:11237-11245

【Authors】: Mahmoud Khademi ; Oliver Schulte

【Abstract】: We propose a new algorithm, called Deep Generative Probabilistic Graph Neural Networks (DG-PGNN), to generate a scene graph for an image. The input to DG-PGNN is an image, together with a set of region-grounded captions and object bounding-box proposals for the image. To generate the scene graph, DG-PGNN constructs and updates a new model, called a Probabilistic Graph Network (PGN). A PGN can be thought of as a scene graph with uncertainty: it represents each node and each edge by a CNN feature vector and defines a probability mass function (PMF) for node-type (object category) of each node and edge-type (predicate class) of each edge. The DG-PGNN sequentially adds a new node to the current PGN by learning the optimal ordering in a Deep Q-learning framework, where states are partial PGNs, actions choose a new node, and rewards are defined based on the ground-truth. After adding a node, DG-PGNN uses message passing to update the feature vectors of the current PGN by leveraging contextual relationship information, object co-occurrences, and language priors from captions. The updated features are then used to fine-tune the PMFs. Our experiments show that the proposed algorithm significantly outperforms the state-of-the-art results on the Visual Genome dataset for scene graph generation. We also show that the scene graphs constructed by DG-PGNN improve performance on the visual question answering task, for questions that need reasoning about objects and their interactions in the scene context.

【Keywords】:

1378. Tell Me What They're Holding: Weakly-Supervised Object Detection with Transferable Knowledge from Human-Object Interaction.

Paper Link】 【Pages】:11246-11253

【Authors】: Daesik Kim ; Gyujeong Lee ; Jisoo Jeong ; Nojun Kwak

【Abstract】: In this work, we introduce a novel weakly supervised object detection (WSOD) paradigm to detect objects belonging to rare classes that have not many examples using transferable knowledge from human-object interactions (HOI). While WSOD shows lower performance than full supervision, we mainly focus on HOI as the main context which can strongly supervise complex semantics in images. Therefore, we propose a novel module called RRPN (relational region proposal network) which outputs an object-localizing attention map only with human poses and action verbs. In the source domain, we fully train an object detector and the RRPN with full supervision of HOI. With transferred knowledge about localization map from the trained RRPN, a new object detector can learn unseen objects with weak verbal supervision of HOI without bounding box annotations in the target domain. Because the RRPN is designed as an add-on type, we can apply it not only to the object detection but also to other domains such as semantic segmentation. The experimental results on HICO-DET dataset show the possibility that the proposed method can be a cheap alternative for the current supervised object detection paradigm. Moreover, qualitative results demonstrate that our model can properly localize unseen objects on HICO-DET and V-COCO datasets.

【Keywords】:

1379. MULE: Multimodal Universal Language Embedding.

Paper Link】 【Pages】:11254-11261

【Authors】: Donghyun Kim ; Kuniaki Saito ; Kate Saenko ; Stan Sclaroff ; Bryan A. Plummer

【Abstract】: Existing vision-language methods typically support two languages at a time at most. In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages. We accomplish this by learning a single shared Multimodal Universal Language Embedding (MULE) which has been visually-semantically aligned across all languages. Then we learn to relate MULE to visual data as if it were a single language. Our method is not architecture specific, unlike prior work which typically learned separate branches for each language, enabling our approach to easily be adapted to many vision-language methods and tasks. Since MULE learns a single language branch in the multimodal model, we can also scale to support many languages, and languages with fewer annotations can take advantage of the good representation learned from other (more abundant) language data. We demonstrate the effectiveness of our embeddings on the bidirectional image-sentence retrieval task, supporting up to four languages in a single model. In addition, we show that Machine Translation can be used for data augmentation in multilingual learning, which, combined with MULE, improves mean recall by up to 20.2% on a single language compared to prior work, with the most significant gains seen on languages with relatively few annotations. Our code is publicly available1.

【Keywords】:

1380. REST: Performance Improvement of a Black Box Model via RL-Based Spatial Transformation.

Paper Link】 【Pages】:11262-11269

【Authors】: Jae-Myung Kim ; Hyungjin Kim ; Chanwoo Park ; Jungwoo Lee

【Abstract】: In recent years, deep neural networks (DNN) have become a highly active area of research, and shown remarkable achievements on a variety of computer vision tasks. DNNs, however, are known to often make overconfident yet incorrect predictions on out-of-distribution samples, which can be a major obstacle to real-world deployments because the training dataset is always limited compared to diverse real-world samples. Thus, it is fundamental to provide guarantees of robustness to the distribution shift between training and test time when we construct DNN models in practice. Moreover, in many cases, the deep learning models are deployed as black boxes and the performance has been already optimized for a training dataset, thus changing the black box itself can lead to performance degradation. We here study the robustness to the geometric transformations in a specific condition where the black-box image classifier is given. We propose an additional learner, REinforcement Spatial Transform learner (REST), that transforms the warped input data into samples regarded as in-distribution by the black-box models. Our work aims to improve the robustness by adding a REST module in front of any black boxes and training only the REST module without retraining the original black box model in an end-to-end manner, i.e. we try to convert the real-world data into training distribution which the performance of the black-box model is best suited for. We use a confidence score that is obtained from the black-box model to determine whether the transformed input is drawn from in-distribution. We empirically show that our method has an advantage in generalization to geometric transformations and sample efficiency.

【Keywords】:

1381. Spiking-YOLO: Spiking Neural Network for Energy-Efficient Object Detection.

Paper Link】 【Pages】:11270-11277

【Authors】: Sei Joon Kim ; Seongsik Park ; Byunggook Na ; Sungroh Yoon

【Abstract】: Over the past decade, deep neural networks (DNNs) have demonstrated remarkable performance in a variety of applications. As we try to solve more advanced problems, increasing demands for computing and power resources has become inevitable. Spiking neural networks (SNNs) have attracted widespread interest as the third-generation of neural networks due to their event-driven and low-powered nature. SNNs, however, are difficult to train, mainly owing to their complex dynamics of neurons and non-differentiable spike operations. Furthermore, their applications have been limited to relatively simple tasks such as image classification. In this study, we investigate the performance degradation of SNNs in a more challenging regression problem (i.e., object detection). Through our in-depth analysis, we introduce two novel methods: channel-wise normalization and signed neuron with imbalanced threshold, both of which provide fast and accurate information transmission for deep SNNs. Consequently, we present a first spiked-based object detection model, called Spiking-YOLO. Our experiments show that Spiking-YOLO achieves remarkable results that are comparable (up to 98%) to those of Tiny YOLO on non-trivial datasets, PASCAL VOC and MS COCO. Furthermore, Spiking-YOLO on a neuromorphic chip consumes approximately 280 times less energy than Tiny YOLO and converges 2.3 to 4 times faster than previous SNN conversion methods.

【Keywords】:

1382. FISR: Deep Joint Frame Interpolation and Super-Resolution with a Multi-Scale Temporal Loss.

Paper Link】 【Pages】:11278-11286

【Authors】: Soo Ye Kim ; Jihyong Oh ; Munchurl Kim

【Abstract】: Super-resolution (SR) has been widely used to convert low-resolution legacy videos to high-resolution (HR) ones, to suit the increasing resolution of displays (e.g. UHD TVs). However, it becomes easier for humans to notice motion artifacts (e.g. motion judder) in HR videos being rendered on larger-sized display devices. Thus, broadcasting standards support higher frame rates for UHD (Ultra High Definition) videos (4K@60 fps, 8K@120 fps), meaning that applying SR only is insufficient to produce genuine high quality videos. Hence, to up-convert legacy videos for realistic applications, not only SR but also video frame interpolation (VFI) is necessitated. In this paper, we first propose a joint VFI-SR framework for up-scaling the spatio-temporal resolution of videos from 2K 30 fps to 4K 60 fps. For this, we propose a novel training scheme with a multi-scale temporal loss that imposes temporal regularization on the input video sequence, which can be applied to any general video-related task. The proposed structure is analyzed in depth with extensive experiments.

【Keywords】:

1383. JSI-GAN: GAN-Based Joint Super-Resolution and Inverse Tone-Mapping with Pixel-Wise Task-Specific Filters for UHD HDR Video.

Paper Link】 【Pages】:11287-11295

【Authors】: Soo Ye Kim ; Jihyong Oh ; Munchurl Kim

【Abstract】: Joint learning of super-resolution (SR) and inverse tone-mapping (ITM) has been explored recently, to convert legacy low resolution (LR) standard dynamic range (SDR) videos to high resolution (HR) high dynamic range (HDR) videos for the growing need of UHD HDR TV/broadcasting applications. However, previous CNN-based methods directly reconstruct the HR HDR frames from LR SDR frames, and are only trained with a simple L2 loss. In this paper, we take a divide-and-conquer approach in designing a novel GAN-based joint SR-ITM network, called JSI-GAN, which is composed of three task-specific subnets: an image reconstruction subnet, a detail restoration (DR) subnet and a local contrast enhancement (LCE) subnet. We delicately design these subnets so that they are appropriately trained for the intended purpose, learning a pair of pixel-wise 1D separable filters via the DR subnet for detail restoration and a pixel-wise 2D local filter by the LCE subnet for contrast enhancement. Moreover, to train the JSI-GAN effectively, we propose a novel detail GAN loss alongside the conventional GAN loss, which helps enhancing both local details and contrasts to reconstruct high quality HR HDR results. When all subnets are jointly trained well, the predicted HR HDR results of higher quality are obtained with at least 0.41 dB gain in PSNR over those generated by the previous methods. The official Tensorflow code is available at https://github.com/JihyongOh/JSI-GAN.

【Keywords】:

1384. Unpaired Image Enhancement Featuring Reinforcement-Learning-Controlled Image Editing Software.

Paper Link】 【Pages】:11296-11303

【Authors】: Satoshi Kosugi ; Toshihiko Yamasaki

【Abstract】: This paper tackles unpaired image enhancement, a task of learning a mapping function which transforms input images into enhanced images in the absence of input-output image pairs. Our method is based on generative adversarial networks (GANs), but instead of simply generating images with a neural network, we enhance images utilizing image editing software such as Adobe® Photoshop® for the following three benefits: enhanced images have no artifacts, the same enhancement can be applied to larger images, and the enhancement is interpretable. To incorporate image editing software into a GAN, we propose a reinforcement learning framework where the generator works as the agent that selects the software's parameters and is rewarded when it fools the discriminator. Our framework can use high-quality non-differentiable filters present in image editing software, which enables image enhancement with high performance. We apply the proposed method to two unpaired image enhancement tasks: photo enhancement and face beautification. Our experimental results demonstrate that the proposed method achieves better performance, compared to the performances of the state-of-the-art methods based on unpaired learning.

【Keywords】:

1385. Adversary for Social Good: Protecting Familial Privacy through Joint Adversarial Attacks.

Paper Link】 【Pages】:11304-11311

【Authors】: Chetan Kumar ; Riazat Ryan ; Ming Shao

【Abstract】: Social media has been widely used among billions of people with dramatical participation of new users every day. Among them, social networks maintain the basic social characters and host huge amount of personal data. While protecting user sensitive data is obvious and demanding, information leakage due to adversarial attacks is somehow unavoidable, yet hard to detect. For example, implicit social relation such as family information may be simply exposed by network structure and hosted face images through off-the-shelf graph neural networks (GNN), which will be empirically proved in this paper. To address this issue, in this paper, we propose a novel adversarial attack algorithm for social good. First, we start from conventional visual family understanding problem, and demonstrate that familial information can easily be exposed to attackers by connecting sneak shots to social networks. Second, to protect family privacy on social networks, we propose a novel adversarial attack algorithm that produces both adversarial features and graph under a given budget. Specifically, both features on the node and edges between nodes will be perturbed gradually such that the probe images and its family information can not be identified correctly through conventional GNN. Extensive experiments on a popular visual social dataset have demonstrated that our defense strategy can significantly mitigate the impacts of family information leakage.

【Keywords】:

1386. Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation.

Paper Link】 【Pages】:11312-11319

【Authors】: Jogendra Nath Kundu ; Siddharth Seth ; Rahul M. V. ; Mugalodi Rakesh ; Venkatesh Babu Radhakrishnan ; Anirban Chakraborty

【Abstract】: Estimation of 3D human pose from monocular image has gained considerable attention, as a key step to several human-centric applications. However, generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable, as these models often perform unsatisfactorily on unseen in-the-wild environments. Though weakly-supervised models have been proposed to address this shortcoming, performance of such models relies on availability of paired supervision on some related task, such as 2D pose or multi-view image pairs. In contrast, we propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions. Our pose estimation framework relies on a minimal set of prior knowledge that defines the underlying kinematic 3D structure, such as skeletal joint connectivity information with bone-length ratios in a fixed canonical scale. The proposed model employs three consecutive differentiable transformations namely forward-kinematics, camera-projection and spatial-map transformation. This design not only acts as a suitable bottleneck stimulating effective pose disentanglement, but also yields interpretable latent pose representations avoiding training of an explicit latent embedding to pose mapper. Furthermore, devoid of unstable adversarial setup, we re-utilize the decoder to formalize an energy-based loss, which enables us to learn from in-the-wild videos, beyond laboratory settings. Comprehensive experiments demonstrate our state-of-the-art unsupervised and weakly-supervised pose estimation performance on both Human3.6M and MPI-INF-3DHP datasets. Qualitative results on unseen environments further establish our superior generalization ability.

【Keywords】:

1387. Background Suppression Network for Weakly-Supervised Temporal Action Localization.

Paper Link】 【Pages】:11320-11327

【Authors】: Pilhyeon Lee ; Youngjung Uh ; Hyeran Byun

【Abstract】: Weakly-supervised temporal action localization is a very challenging problem because frame-wise labels are not given in the training stage while the only hint is video-level labels: whether each video contains action frames of interest. Previous methods aggregate frame-level class scores to produce video-level prediction and learn from video-level action labels. This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level labels accurately. In this paper, we design Background Suppression Network (BaS-Net) which introduces an auxiliary class for background and has a two-branch weight-sharing architecture with an asymmetrical training strategy. This enables BaS-Net to suppress activations from background frames to improve localization performance. Extensive experiments demonstrate the effectiveness of BaS-Net and its superiority over the state-of-the-art methods on the most popular benchmarks – THUMOS'14 and ActivityNet. Our code and the trained model are available at https://github.com/Pilhyeon/BaSNet-pytorch.

【Keywords】:

1388. Multi-Question Learning for Visual Question Answering.

Paper Link】 【Pages】:11328-11335

【Authors】: Chenyi Lei ; Lei Wu ; Dong Liu ; Zhao Li ; Guoxin Wang ; Haihong Tang ; Houqiang Li

【Abstract】: Visual Question Answering (VQA) raises a great challenge for computer vision and natural language processing communities. Most of the existing approaches consider video-question pairs individually during training. However, we observe that there are usually multiple (either sequentially generated or not) questions for the target video in a VQA task, and the questions themselves have abundant semantic relations. To explore these relations, we propose a new paradigm for VQA termed Multi-Question Learning (MQL). Inspired by the multi-task learning, MQL learns from multiple questions jointly together with their corresponding answers for a target video sequence. The learned representations of video-question pairs are then more general to be transferred for new questions. We further propose an effective VQA framework and design a training procedure for MQL, where the specifically designed attention network models the relation between input video and corresponding questions, enabling multiple video-question pairs to be co-trained. Experimental results on public datasets show the favorable performance of the proposed MQL-VQA framework compared to state-of-the-arts.

【Keywords】:

1389. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training.

Paper Link】 【Pages】:11336-11344

【Authors】: Gen Li ; Nan Duan ; Yuejian Fang ; Ming Gong ; Daxin Jiang

【Abstract】: We propose Unicoder-VL, a universal encoder that aims to learn joint representations of vision and language in a pre-training manner. Borrow ideas from cross-lingual pre-trained models, such as XLM (Lample and Conneau 2019) and Unicoder (Huang et al. 2019), both visual and linguistic contents are fed into a multi-layer Transformer (Vaswani et al. 2017) for the cross-modal pre-training, where three pre-trained tasks are employed, including Masked Language Modeling(MLM), Masked Object Classification(MOC) and Visual-linguistic Matching(VLM). The first two tasks learn context-aware representations for input tokens based on linguistic and visual contents jointly. The last task tries to predict whether an image and a text describe each other. After pretraining on large-scale image-caption pairs, we transfer Unicoder-VL to caption-based image-text retrieval and visual commonsense reasoning, with just one additional output layer. We achieve state-of-the-art or comparable results on both two tasks and show the powerful ability of the cross-modal pre-training.

【Keywords】:

1390. Multi-Spectral Vehicle Re-Identification: A Challenge.

Paper Link】 【Pages】:11345-11353

【Authors】: Hongchao Li ; Chenglong Li ; Xianpeng Zhu ; Aihua Zheng ; Bin Luo

【Abstract】: Vehicle re-identification (Re-ID) is a crucial task in smart city and intelligent transportation, aiming to match vehicle images across non-overlapping surveillance camera views. Currently, most works focus on RGB-based vehicle Re-ID, which limits its capability of real-life applications in adverse environments such as dark environments and bad weathers. IR (Infrared) spectrum imaging offers complementary information to relieve the illumination issue in computer vision tasks. Furthermore, vehicle Re-ID suffers a big challenge of the diverse appearance with different views, such as trucks. In this work, we address the RGB and IR vehicle Re-ID problem and contribute a multi-spectral vehicle Re-ID benchmark named RGBN300, including RGB and NIR (Near Infrared) vehicle images of 300 identities from 8 camera views, giving in total 50125 RGB images and 50125 NIR images respectively. In addition, we have acquired additional TIR (Thermal Infrared) data for 100 vehicles from RGBN300 to form another dataset for three-spectral vehicle Re-ID. Furthermore, we propose a Heterogeneity-collaboration Aware Multi-stream convolutional Network (HAMNet) towards automatically fusing different spectrum features in an end-to-end learning framework. Comprehensive experiments on prevalent networks show that our HAMNet can effectively integrate multi-spectral data for robust vehicle Re-ID in day and night. Our work provides a benchmark dataset for RGB-NIR and RGB-NIR-TIR multi-spectral vehicle Re-ID and a baseline network for both research and industrial communities. The dataset and baseline codes are available at: https://github.com/ttaalle/multi-modal-vehicle-Re-ID.

【Keywords】:

1391. Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation.

Paper Link】 【Pages】:11354-11361

【Authors】: Jia Li ; Wen Su ; Zengfu Wang

【Abstract】: We rethink a well-known bottom-up approach for multi-person pose estimation and propose an improved one. The improved approach surpasses the baseline significantly thanks to (1) an intuitional yet more sensible representation, which we refer to as body parts to encode the connection information between keypoints, (2) an improved stacked hourglass network with attention mechanisms, (3) a novel focal L2 loss which is dedicated to “hard” keypoint and keypoint association (body part) mining, and (4) a robust greedy keypoint assignment algorithm for grouping the detected keypoints into individual poses. Our approach not only works straightforwardly but also outperforms the baseline by about 15% in average precision and is comparable to the state of the art on the MS-COCO test-dev dataset. The code and pre-trained models are publicly available on our project page1.

【Keywords】:

1392. Learning Part Generation and Assembly for Structure-Aware Shape Synthesis.

Paper Link】 【Pages】:11362-11369

【Authors】: Jun Li ; Chengjie Niu ; Kai Xu

【Abstract】: Learning powerful deep generative models for 3D shape synthesis is largely hindered by the difficulty in ensuring plausibility encompassing correct topology and reasonable geometry. Indeed, learning the distribution of plausible 3D shapes seems a daunting task for the holistic approaches, given the significant topological variations of 3D objects even within the same category. Enlightened by the fact that 3D shape structure is characterized as part composition and placement, we propose to model 3D shape variations with a part-aware deep generative network, coined as PAGENet. The network is composed of an array of per-part VAE-GANs, generating semantic parts composing a complete shape, followed by a part assembly module that estimates a transformation for each part to correlate and assemble them into a plausible structure. Through delegating the learning of part composition and part placement into separate networks, the difficulty of modeling structural variations of 3D shapes is greatly reduced. We demonstrate through both qualitative and quantitative evaluations that PAGENet generates 3D shapes with plausible, diverse and detailed structure, and show two applications, i.e., semantic shape segmentation and part-based shape editing.

【Keywords】:

1393. Hierarchical Knowledge Squeezed Adversarial Network Compression.

Paper Link】 【Pages】:11370-11377

【Authors】: Peng Li ; Chang Shu ; Yuan Xie ; Yan Qu ; Hui Kong

【Abstract】: Deep network compression has been achieved notable progress via knowledge distillation, where a teacher-student learning manner is adopted by using predetermined loss. Recently, more focuses have been transferred to employ the adversarial training to minimize the discrepancy between distributions of output from two networks. However, they always emphasize on result-oriented learning while neglecting the scheme of process-oriented learning, leading to the loss of rich information contained in the whole network pipeline. Whereas in other (non GAN-based) process-oriented methods, the knowledge have usually been transferred in a redundant manner. Observing that, the small network can not perfectly mimic a large one due to the huge gap of network scale, we propose a knowledge transfer method, involving effective intermediate supervision, under the adversarial training framework to learn the student network. Different from the other intermediate supervision methods, we design the knowledge representation in a compact form by introducing a task-driven attention mechanism. Meanwhile, to improve the representation capability of the attention-based method, a hierarchical structure is utilized so that powerful but highly squeezed knowledge is realized and the knowledge from teacher network could accommodate the size of student network. Extensive experimental results on three typical benchmark datasets, i.e., CIFAR-10, CIFAR-100, and ImageNet, demonstrate that our method achieves highly superior performances against state-of-the-art methods.

【Keywords】:

1394. Age Progression and Regression with Spatial Attention Modules.

Paper Link】 【Pages】:11378-11385

【Authors】: Qi Li ; Yunfan Liu ; Zhenan Sun

【Abstract】: Age progression and regression refers to aesthetically rendering a given face image to present effects of face aging and rejuvenation, respectively. Although numerous studies have been conducted in this topic, there are two major problems: 1) multiple models are usually trained to simulate different age mappings, and 2) the photo-realism of generated face images is heavily influenced by the variation of training images in terms of pose, illumination, and background. To address these issues, in this paper, we propose a framework based on conditional Generative Adversarial Networks (cGANs) to achieve age progression and regression simultaneously. Particularly, since face aging and rejuvenation are largely different in terms of image translation patterns, we model these two processes using two separate generators, each dedicated to one age changing process. In addition, we exploit spatial attention mechanisms to limit image modifications to regions closely related to age changes, so that images with high visual fidelity could be synthesized for in-the-wild cases. Experiments on multiple datasets demonstrate the ability of our model in synthesizing lifelike face images at desired ages with personalized features well preserved, and keeping age-irrelevant regions unchanged.

【Keywords】:

1395. Domain Conditioned Adaptation Network.

Paper Link】 【Pages】:11386-11393

【Authors】: Shuang Li ; Chi Harold Liu ; Qiuxia Lin ; Binhui Xie ; Zhengming Ding ; Gao Huang ; Jian Tang

【Abstract】: Tremendous research efforts have been made to thrive deep domain adaptation (DA) by seeking domain-invariant features. Most existing deep DA models only focus on aligning feature representations of task-specific layers across domains while integrating a totally shared convolutional architecture for source and target. However, we argue that such strongly-shared convolutional layers might be harmful for domain-specific feature learning when source and target data distribution differs to a large extent. In this paper, we relax a shared-convnets assumption made by previous DA methods and propose a Domain Conditioned Adaptation Network (DCAN), which aims to excite distinct convolutional channels with a domain conditioned channel attention mechanism. As a result, the critical low-level domain-dependent knowledge could be explored appropriately. As far as we know, this is the first work to explore the domain-wise convolutional channel activation for deep DA networks. Moreover, to effectively align high-level feature distributions across two domains, we further deploy domain conditioned feature correction blocks after task-specific layers, which will explicitly correct the domain discrepancy. Extensive experiments on three cross-domain benchmarks demonstrate the proposed approach outperforms existing methods by a large margin, especially on very tough cross-domain learning tasks.

【Keywords】:

1396. Appearance and Motion Enhancement for Video-Based Person Re-Identification.

Paper Link】 【Pages】:11394-11401

【Authors】: Shuzhao Li ; Huimin Yu ; Haoji Hu

【Abstract】: In this paper, we propose an Appearance and Motion Enhancement Model (AMEM) for video-based person re-identification to enrich the two kinds of information contained in the backbone network in a more interpretable way. Concretely, human attribute recognition under the supervision of pseudo labels is exploited in an Appearance Enhancement Module (AEM) to help enrich the appearance and semantic information. A Motion Enhancement Module (MEM) is designed to capture the identity-discriminative walking patterns through predicting future frames. Despite a complex model with several auxiliary modules during training, only the backbone model plus two small branches are kept for similarity evaluation which constitute a simple but effective final model. Extensive experiments conducted on three popular video-based person ReID benchmarks demonstrate the effectiveness of our proposed model and the state-of-the-art performance compared with existing methods.

【Keywords】:

1397. Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion.

Paper Link】 【Pages】:11402-11409

【Authors】: Siqi Li ; Changqing Zou ; Yipeng Li ; Xibin Zhao ; Yue Gao

【Abstract】: This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.

【Keywords】:

1398. OVL: One-View Learning for Human Retrieval.

Paper Link】 【Pages】:11410-11417

【Authors】: Wenjing Li ; ZhongCheng Wu

【Abstract】: This paper considers a novel problem, named One-View Learning (OVL), in human retrieval a.k.a. person re-identification (re-ID). Unlike fully-supervised learning, OVL only requires pretty cheap annotation cost: labeled training images are only provided from one camera view (source view/domain), while the annotations of training images from other camera views (target views/domains) are not available. OVL is a problem of multi-target open set domain adaptation that is difficult for existing domain adaptation methods to handle. This is because 1) unlabeled samples are drawn from multiple target views in different distributions, and 2) the target views may contain samples of “unknown identity” that are not shared by the source view. To address this problem, this work introduces a novel one-view learning framework for person re-ID. This is achieved by adversarial multi-view learning (AMVL) and adversarial unknown rejection learning (AURL). The former learns a multi-view discriminator by adversarial learning to align the feature distributions between all views. The later is designed to reject unknown samples from target views through adversarial learning with two unknown identity classifiers. Extensive experiments on three large-scale datasets demonstrate the advantage of the proposed method over state-of-the-art domain adaptation and semi-supervised methods.

【Keywords】:

1399. Gated Fully Fusion for Semantic Segmentation.

Paper Link】 【Pages】:11418-11425

【Authors】: Xiangtai Li ; Houlong Zhao ; Lei Han ; Yunhai Tong ; Shaohua Tan ; Kuiyuan Yang

【Abstract】: Semantic segmentation generates comprehensive understanding of scenes through densely predicting the category for each pixel. High-level features from Deep Convolutional Neural Networks already demonstrate their effectiveness in semantic segmentation tasks, however the coarse resolution of high-level features often leads to inferior results for small/thin objects where detailed information is important. It is natural to consider importing low level features to compensate for the lost detailed information in high-level features. Unfortunately, simply combining multi-level features suffers from the semantic gap among them. In this paper, we propose a new architecture, named Gated Fully Fusion(GFF), to selectively fuse features from multiple levels using gates in a fully connected way. Specifically, features at each level are enhanced by higher-level features with stronger semantics and lower-level features with more details, and gates are used to control the propagation of useful information which significantly reduces the noises during fusion. We achieve the state of the art results on four challenging scene parsing datasets including Cityscapes, Pascal Context, COCO-stuff and ADE20K.

【Keywords】:

1400. ScaleNet - Improve CNNs through Recursively Rescaling Objects.

Paper Link】 【Pages】:11426-11433

【Authors】: Xingyi Li ; Zhongang Qi ; Xiaoli Z. Fern ; Fuxin Li

【Abstract】: Deep networks are often not scale-invariant hence their performance can vary wildly if recognizable objects are at an unseen scale occurring only at testing time. In this paper, we propose ScaleNet, which recursively predicts object scale in a deep learning framework. With an explicit objective to predict the scale of objects in images, ScaleNet enables pretrained deep learning models to identify objects in the scales that are not present in their training sets. By recursively calling ScaleNet, one can generalize to very large scale changes unseen in the training set. To demonstrate the robustness of our proposed framework, we conduct experiments with pretrained as well as fine-tuned classification and detection frameworks on MNIST, CIFAR-10, and MS COCO datasets and results reveal that our proposed framework significantly boosts the performances of deep networks.

【Keywords】:

1401. Relation-Guided Spatial Attention and Temporal Refinement for Video-Based Person Re-Identification.

Paper Link】 【Pages】:11434-11441

【Authors】: Xingze Li ; Wengang Zhou ; Yun Zhou ; Houqiang Li

【Abstract】: Video-based person re-identification has received considerable attention in recent years due to its significant application in video surveillance. Compared with image-based person re-identification, video-based person re-identification is characterized by a much richer context, which raises the significance of identifying informative regions and fusing the temporal information across frames. In this paper, we propose two relation-guided modules to learn reinforced feature representations for effective re-identification. First, a relation-guided spatial attention (RGSA) module is designed to explore the discriminative regions globally. The weight at each position is determined by its feature as well as the relation features from other positions, revealing the dependence between local and global contents. Based on the adaptively weighted frame-level feature, then, a relation-guided temporal refinement (RGTR) module is proposed to further refine the feature representations across frames. The learned relation information via the RGTR module enables the individual frames to complement each other in an aggregation manner, leading to robust video-level feature representations. Extensive experiments on four prevalent benchmarks verify the state-of-the-art performance of the proposed method.

【Keywords】:

1402. Geometry-Driven Self-Supervised Method for 3D Human Pose Estimation.

Paper Link】 【Pages】:11442-11449

【Authors】: Yang Li ; Kan Li ; Shuai Jiang ; Ziyue Zhang ; Congzhentao Huang ; Richard Yi Da Xu

【Abstract】: The neural network based approach for 3D human pose estimation from monocular images has attracted growing interest. However, annotating 3D poses is a labor-intensive and expensive process. In this paper, we propose a novel self-supervised approach to avoid the need of manual annotations. Different from existing weakly/self-supervised methods that require extra unpaired 3D ground-truth data to alleviate the depth ambiguity problem, our method trains the network only relying on geometric knowledge without any additional 3D pose annotations. The proposed method follows the two-stage pipeline: 2D pose estimation and 2D-to-3D pose lifting. We design the transform re-projection loss that is an effective way to explore multi-view consistency for training the 2D-to-3D lifting network. Besides, we adopt the confidences of 2D joints to integrate losses from different views to alleviate the influence of noises caused by the self-occlusion problem. Finally, we design a two-branch training architecture, which helps to preserve the scale information of re-projected 2D poses during training, resulting in accurate 3D pose predictions. We demonstrate the effectiveness of our method on two popular 3D human pose datasets, Human3.6M and MPI-INF-3DHP. The results show that our method significantly outperforms recent weakly/self-supervised approaches.

【Keywords】:

1403. Natural Image Matting via Guided Contextual Attention.

Paper Link】 【Pages】:11450-11457

【Authors】: Yaoyi Li ; Hongtao Lu

【Abstract】: Over the last few years, deep learning based approaches have achieved outstanding improvements in natural image matting. Many of these methods can generate visually plausible alpha estimations, but typically yield blurry structures or textures in the semitransparent area. This is due to the local ambiguity of transparent objects. One possible solution is to leverage the far-surrounding information to estimate the local opacity. Traditional affinity-based methods often suffer from the high computational complexity, which are not suitable for high resolution alpha estimation. Inspired by affinity-based method and the successes of contextual attention in inpainting, we develop a novel end-to-end approach for natural image matting with a guided contextual attention module, which is specifically designed for image matting. Guided contextual attention module directly propagates high-level opacity information globally based on the learned low-level affinity. The proposed method can mimic information flow of affinity-based methods and utilize rich features learned by deep neural networks simultaneously. Experiment results on Composition-1k testing set and alphamatting.com benchmark dataset demonstrate that our method outperforms state-of-the-art approaches in natural image matting. Code and models are available at https://github.com/Yaoyi-Li/GCA-Matting.

【Keywords】:

1404. Learning Transferable Adversarial Examples via Ghost Networks.

Paper Link】 【Pages】:11458-11465

【Authors】: Yingwei Li ; Song Bai ; Yuyin Zhou ; Cihang Xie ; Zhishuai Zhang ; Alan L. Yuille

【Abstract】: Recent development of adversarial attacks has proven that ensemble-based methods outperform traditional, non-ensemble ones in black-box attack. However, as it is computationally prohibitive to acquire a family of diverse models, these methods achieve inferior performance constrained by the limited number of models to be ensembled.In this paper, we propose Ghost Networks to improve the transferability of adversarial examples. The critical principle of ghost networks is to apply feature-level perturbations to an existing model to potentially create a huge set of diverse models. After that, models are subsequently fused by longitudinal ensemble. Extensive experimental results suggest that the number of networks is essential for improving the transferability of adversarial examples, but it is less necessary to independently train different networks and ensemble them in an intensive aggregation way. Instead, our work can be used as a computationally cheap and easily applied plug-in to improve adversarial approaches both in single-model and multi-model attack, compatible with residual and non-residual networks. By reproducing the NeurIPS 2017 adversarial competition, our method outperforms the No.1 attack submission by a large margin, demonstrating its effectiveness and efficiency. Code is available at https://github.com/LiYingwei/ghost-network.

【Keywords】:

1405. Finding Action Tubes with a Sparse-to-Dense Framework.

Paper Link】 【Pages】:11466-11473

【Authors】: Yuxi Li ; Weiyao Lin ; Tao Wang ; John See ; Rui Qian ; Ning Xu ; Limin Wang ; Shugong Xu

【Abstract】: The task of spatial-temporal action detection has attracted increasing researchers. Existing dominant methods solve this problem by relying on short-term information and dense serial-wise detection on each individual frames or clips. Despite their effectiveness, these methods showed inadequate use of long-term information and are prone to inefficiency. In this paper, we propose for the first time, an efficient framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner. There are two key characteristics in this framework: (1) Both long-term and short-term sampled information are explicitly utilized in our spatio-temporal network, (2) A new dynamic feature sampling module (DTS) is designed to effectively approximate the tube output while keeping the system tractable. We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets, achieving promising results that are competitive to state-of-the-art methods. The proposed sparse-to-dense strategy rendered our framework about 7.6 times more efficient than the nearest competitor.

【Keywords】:

1406. Real-Time Scene Text Detection with Differentiable Binarization.

Paper Link】 【Pages】:11474-11481

【Authors】: Minghui Liao ; Zhaoyi Wan ; Cong Yao ; Kai Chen ; Xiang Bai

【Abstract】: Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset. Code is available at: https://github.com/MhLiao/DB.

【Keywords】:

1407. Object Instance Mining for Weakly Supervised Object Detection.

Paper Link】 【Pages】:11482-11489

【Authors】: Chenhao Lin ; Siwen Wang ; Dongqi Xu ; Yu Lu ; Wayne Zhang

【Abstract】: Weakly supervised object detection (WSOD) using only image-level annotations has attracted growing attention over the past few years. Existing approaches using multiple instance learning easily fall into local optima, because such mechanism tends to learn from the most discriminative object in an image for each category. Therefore, these methods suffer from missing object instances which degrade the performance of WSOD. To address this problem, this paper introduces an end-to-end object instance mining (OIM) framework for weakly supervised object detection. OIM attempts to detect all possible object instances existing in each image by introducing information propagation on the spatial and appearance graphs, without any additional annotations. During the iterative learning process, the less discriminative object instances from the same class can be gradually detected and utilized for training. In addition, we design an object instance reweighted loss to learn larger portion of each object instance to further improve the performance. The experimental results on two publicly available databases, VOC 2007 and 2012, demonstrate the efficacy of proposed approach.

【Keywords】:

1408. Multimodal Structure-Consistent Image-to-Image Translation.

Paper Link】 【Pages】:11490-11498

【Authors】: Che-Tsung Lin ; Yen-Yi Wu ; Po-Hao Hsu ; Shang-Hong Lai

【Abstract】: Unpaired image-to-image translation is proven quite effective in boosting a CNN-based object detector for a different domain by means of data augmentation that can well preserve the image-objects in the translated images. Recently, multimodal GAN (Generative Adversarial Network) models have been proposed and were expected to further boost the detector accuracy by generating a diverse collection of images in the target domain, given only a single/labelled image in the source domain. However, images generated by multimodal GANs would achieve even worse detection accuracy than the ones by a unimodal GAN with better object preservation. In this work, we introduce cycle-structure consistency for generating diverse and structure-preserved translated images across complex domains, such as between day and night, for object detector training. Qualitative results show that our model, Multimodal AugGAN, can generate diverse and realistic images for the target domain. For quantitative comparisons, we evaluate other competing methods and ours by using the generated images to train YOLO, Faster R-CNN and FCN models and prove that our model achieves significant improvement and outperforms other methods on the detection accuracies and the FCN scores. Also, we demonstrate that our model could provide more diverse object appearances in the target domain through comparison on the perceptual distance metric.

【Keywords】:

1409. Fast Learning of Temporal Action Proposal via Dense Boundary Generator.

Paper Link】 【Pages】:11499-11506

【Authors】: Chuming Lin ; Jian Li ; Yabiao Wang ; Ying Tai ; Donghao Luo ; Zhipeng Cui ; Chengjie Wang ; Jilin Li ; Feiyue Huang ; Rongrong Ji

【Abstract】: Generating temporal action proposals remains a very challenging problem, where the main issue lies in predicting precise temporal proposal boundaries and reliable action confidence in long and untrimmed real-world videos. In this paper, we propose an efficient and unified framework to generate temporal action proposals named Dense Boundary Generator (DBG), which draws inspiration from boundary-sensitive methods and implements boundary classification and action completeness regression for densely distributed proposals. In particular, the DBG consists of two modules: Temporal boundary classification (TBC) and Action-aware completeness regression (ACR). The TBC aims to provide two temporal boundary confidence maps by low-level two-stream features, while the ACR is designed to generate an action completeness score map by high-level action-aware features. Moreover, we introduce a dual stream BaseNet (DSB) to encode RGB and optical flow information, which helps to capture discriminative boundary and actionness features. Extensive experiments on popular benchmarks ActivityNet-1.3 and THUMOS14 demonstrate the superiority of DBG over the state-of-the-art proposal generator (e.g., MGG and BMN).

【Keywords】:

1410. Learning to Transfer: Unsupervised Domain Translation via Meta-Learning.

Paper Link】 【Pages】:11507-11514

【Authors】: Jianxin Lin ; Yijun Wang ; Zhibo Chen ; Tianyu He

【Abstract】: Unsupervised domain translation has recently achieved impressive performance with Generative Adversarial Network (GAN) and sufficient (unpaired) training data. However, existing domain translation frameworks form in a disposable way where the learning experiences are ignored and the obtained model cannot be adapted to a new coming domain. In this work, we take on unsupervised domain translation problems from a meta-learning perspective. We propose a model called Meta-Translation GAN (MT-GAN) to find good initialization of translation models. In the meta-training procedure, MT-GAN is explicitly trained with a primary translation task and a synthesized dual translation task. A cycle-consistency meta-optimization objective is designed to ensure the generalization ability. We demonstrate effectiveness of our model on ten diverse two-domain translation tasks and multiple face identity translation tasks. We show that our proposed approach significantly outperforms the existing domain translation methods when each domain contains no more than ten training samples.

【Keywords】:

1411. Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval.

Paper Link】 【Pages】:11515-11522

【Authors】: Kaiyi Lin ; Xing Xu ; Lianli Gao ; Zheng Wang ; Heng Tao Shen

【Abstract】: Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as well as strengthen relations between input data and semantic space to generalize across seen and unseen classes. In this paper, we propose a novel method termed Learning Cross-Aligned Latent Embeddings (LCALE) as an alternative to these GAN based methods for ZS-CMR. Unlike using the class-embeddings as the semantic space, our method seeks for a shared low-dimensional latent space of input multimodal features and class-embeddings by modality-specific variational autoencoders. Notably, we align the distributions learned from multimodal input features and from class-embeddings to construct latent embeddings that contain the essential cross-modal correlation associated with unseen classes. Effective cross-reconstruction and cross-alignment criterions are further developed to preserve class-discriminative information in latent space, which benefits the efficiency for retrieval and enable the knowledge transfer to unseen classes. We evaluate our model using four benchmark datasets on image-text retrieval tasks and one large-scale dataset on image-sketch retrieval tasks. The experimental results show that our method establishes the new state-of-the-art performance for both tasks on all datasets.

【Keywords】:

1412. Learning to Deblur Face Images via Sketch Synthesis.

Paper Link】 【Pages】:11523-11530

【Authors】: Songnan Lin ; Jiawei Zhang ; Jinshan Pan ; Yicun Liu ; Yongtian Wang ; Jing S. J. Chen ; Jimmy S. Ren

【Abstract】: The success of existing face deblurring methods based on deep neural networks is mainly due to the large model capacity. Few algorithms have been specially designed according to the domain knowledge of face images and the physical properties of the deblurring process. In this paper, we propose an effective face deblurring algorithm based on deep convolutional neural networks (CNNs). Motivated by the conventional deblurring process which usually involves the motion blur estimation and the latent clear image restoration, the proposed algorithm first estimates motion blur by a deep CNN and then restores latent clear images with the estimated motion blur. However, estimating motion blur from blurry face images is difficult as the textures of the blurry face images are scarce. As most face images share some common global structures which can be modeled well by sketch information, we propose to learn face sketches by a deep CNN so that the sketches can help the motion blur estimation. With the estimated motion blur, we then develop an effective latent image restoration algorithm based on a deep CNN. Although involving the several components, the proposed algorithm is trained in an end-to-end fashion. We analyze the effectiveness of each component on face image deblurring and show that the proposed algorithm is able to deblur face images with favorable performance against state-of-the-art methods.

【Keywords】:

1413. Self-Attention ConvLSTM for Spatiotemporal Prediction.

Paper Link】 【Pages】:11531-11538

【Authors】: Zhihui Lin ; Maomao Li ; Zhuobin Zheng ; Yangyang Cheng ; Chun Yuan

【Abstract】: Spatiotemporal prediction is challenging due to the complex dynamic motion and appearance changes. Existing work concentrates on embedding additional cells into the standard ConvLSTM to memorize spatial appearances during the prediction. These models always rely on the convolution layers to capture the spatial dependence, which are local and inefficient. However, long-range spatial dependencies are significant for spatial applications. To extract spatial features with both global and local dependencies, we introduce the self-attention mechanism into ConvLSTM. Specifically, a novel self-attention memory (SAM) is proposed to memorize features with long-range dependencies in terms of spatial and temporal domains. Based on the self-attention, SAM can produce features by aggregating features across all positions of both the input itself and memory features with pair-wise similarity scores. Moreover, the additional memory is updated by a gating mechanism on aggregated features and an established highway with the memory of the previous time step. Therefore, through SAM, we can extract features with long-range spatiotemporal dependencies. Furthermore, we embed the SAM into a standard ConvLSTM to construct a self-attention ConvLSTM (SA-ConvLSTM) for the spatiotemporal prediction. In experiments, we apply the SA-ConvLSTM to perform frame prediction on the MovingMNIST and KTH datasets and traffic flow prediction on the TexiBJ dataset. Our SA-ConvLSTM achieves state-of-the-art results on both datasets with fewer parameters and higher time efficiency than previous state-of-the-art method.

【Keywords】:

1414. Weakly-Supervised Video Moment Retrieval via Semantic Completion Network.

Paper Link】 【Pages】:11539-11546

【Authors】: Zhijie Lin ; Zhou Zhao ; Zhu Zhang ; Qi Wang ; Huasheng Liu

【Abstract】: Video moment retrieval is to search the moment that is most relevant to the given natural language query. Existing methods are mostly trained in a fully-supervised setting, which requires the full annotations of temporal boundary for each query. However, manually labeling the annotations is actually time-consuming and expensive. In this paper, we propose a novel weakly-supervised moment retrieval framework requiring only coarse video-level annotations for training. Specifically, we devise a proposal generation module that aggregates the context information to generate and score all candidate proposals in one single pass. We then devise an algorithm that considers both exploitation and exploration to select top-K proposals. Next, we build a semantic completion module to measure the semantic similarity between the selected proposals and query, compute reward and provide feedbacks to the proposal generation module for scoring refinement. Experiments on the ActivityCaptions and Charades-STA demonstrate the effectiveness of our proposed method.

【Keywords】:

1415. Zero-Shot Learning from Adversarial Feature Residual to Compact Visual Feature.

Paper Link】 【Pages】:11547-11554

【Authors】: Bo Liu ; Qiulei Dong ; Zhanyi Hu

【Abstract】: Recently, many zero-shot learning (ZSL) methods focused on learning discriminative object features in an embedding feature space, however, the distributions of the unseen-class features learned by these methods are prone to be partly overlapped, resulting in inaccurate object recognition. Addressing this problem, we propose a novel adversarial network to synthesize compact semantic visual features for ZSL, consisting of a residual generator, a prototype predictor, and a discriminator. The residual generator is to generate the visual feature residual, which is integrated with a visual prototype predicted via the prototype predictor for synthesizing the visual feature. The discriminator is to distinguish the synthetic visual features from the real ones extracted from an existing categorization CNN. Since the generated residuals are generally numerically much smaller than the distances among all the prototypes, the distributions of the unseen-class features synthesized by the proposed network are less overlapped. In addition, considering that the visual features from categorization CNNs are generally inconsistent with their semantic features, a simple feature selection strategy is introduced for extracting more compact semantic visual features. Extensive experimental results on six benchmark datasets demonstrate that our method could achieve a significantly better performance than existing state-of-the-art methods by ∼1.2-13.2% in most cases.

【Keywords】:

1416. Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization.

Paper Link】 【Pages】:11555-11562

【Authors】: Chuanbin Liu ; Hongtao Xie ; Zheng-Jun Zha ; Lingfeng Ma ; Lingyun Yu ; Yongdong Zhang

【Abstract】: Delicate attention of the discriminative regions plays a critical role in Fine-Grained Visual Categorization (FGVC). Unfortunately, most of the existing attention models perform poorly in FGVC, due to the pivotal limitations in discriminative regions proposing and region-based feature learning. 1) The discriminative regions are predominantly located based on the filter responses over the images, which can not be directly optimized with a performance metric. 2) Existing methods train the region-based feature extractor as a one-hot classification task individually, while neglecting the knowledge from the entire object. To address the above issues, in this paper, we propose a novel “Filtration and Distillation Learning” (FDL) model to enhance the region attention of discriminate parts for FGVC. Firstly, a Filtration Learning (FL) method is put forward for discriminative part regions proposing based on the matchability between proposing and predicting. Specifically, we utilize the proposing-predicting matchability as the performance metric of Region Proposal Network (RPN), thus enable a direct optimization of RPN to filtrate most discriminative regions. Go in detail, the object-based feature learning and region-based feature learning are formulated as “teacher” and “student”, which can furnish better supervision for region-based feature learning. Accordingly, our FDL can enhance the region attention effectively, and the overall framework can be trained end-to-end without neither object nor parts annotations. Extensive experiments verify that FDL yields state-of-the-art performance under the same backbone with the most competitive approaches on several FGVC tasks.

【Keywords】:

1417. HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs.

Paper Link】 【Pages】:11563-11571

【Authors】: Fangyu Liu ; Rongtian Ye ; Xun Wang ; Shuaipeng Li

【Abstract】: The hubness problem widely exists in high-dimensional embedding space and is a fundamental source of error for cross-modal matching tasks. In this work, we study the emergence of hubs in Visual Semantic Embeddings (VSE) with application to text-image matching. We analyze the pros and cons of two widely adopted optimization objectives for training VSE and propose a novel hubness-aware loss function (Hal) that addresses previous methods' defects. Unlike (Faghri et al. 2018) which simply takes the hardest sample within a mini-batch, Hal takes all samples into account, using both local and global statistics to scale up the weights of “hubs”. We experiment our method with various configurations of model architectures and datasets. The method exhibits exceptionally good robustness and brings consistent improvement on the task of text-image matching across all settings. Specifically, under the same model architectures as (Faghri et al. 2018) and (Lee et al. 2018), by switching only the learning objective, we report a maximum R@1 improvement of 7.4% on MS-COCO and 8.3% on Flickr30k.1

【Keywords】:

1418. Federated Learning for Vision-and-Language Grounding Problems.

Paper Link】 【Pages】:11572-11579

【Authors】: Fenglin Liu ; Xian Wu ; Shen Ge ; Wei Fan ; Yuexian Zou

【Abstract】: Recently, vision-and-language grounding problems, e.g., image captioning and visual question answering (VQA), has attracted extensive interests from both academic and industrial worlds. However, given the similarity of these tasks, the efforts to obtain better results by combining the merits of their algorithms are not well studied. Inspired by the recent success of federated learning, we propose a federated learning framework to obtain various types of image representations from different tasks, which are then fused together to form fine-grained image representations. The representations merge useful features from different vision-and-language grounding problems, and are thus much more powerful than the original representations alone in individual tasks. To learn such image representations, we propose the Aligning, Integrating and Mapping Network (aimNet). The aimNet is validated on three federated learning settings, which include horizontal federated learning, vertical federated learning, and federated transfer learning. Experiments of aimNet-based federated learning framework on two representative tasks, i.e., image captioning and VQA, demonstrate the effective and universal improvements of all metrics over the baselines. In image captioning, we are able to get 14% and 13% relative gain on the task-specific metrics CIDEr and SPICE, respectively. In VQA, we could also boost the performance of strong baselines by up to 3%.

【Keywords】:

1419. Learned Video Compression via Joint Spatial-Temporal Correlation Exploration.

Paper Link】 【Pages】:11580-11587

【Authors】: Haojie Liu ; Han Shen ; Lichao Huang ; Ming Lu ; Tong Chen ; Zhan Ma

【Abstract】: Traditional video compression technologies have been developed over decades in pursuit of higher coding efficiency. Efficient temporal information representation plays a key role in video coding. Thus, in this paper, we propose to exploit the temporal correlation using both first-order optical flow and second-order flow prediction. We suggest an one-stage learning approach to encapsulate flow as quantized features from consecutive frames which is then entropy coded with adaptive contexts conditioned on joint spatial-temporal priors to exploit second-order correlations. Joint priors are embedded in autoregressive spatial neighbors, co-located hyper elements and temporal neighbors using ConvLSTM recurrently. We evaluate our approach for the low-delay scenario with High-Efficiency Video Coding (H.265/HEVC), H.264/AVC and another learned video compression method, following the common test settings. Our work offers the state-of-the-art performance, with consistent gains across all popular test sequences.

【Keywords】:

1420. Interactive Dual Generative Adversarial Networks for Image Captioning.

Paper Link】 【Pages】:11588-11595

【Authors】: Junhao Liu ; Kai Wang ; Chunpu Xu ; Zhou Zhao ; Ruifeng Xu ; Ying Shen ; Min Yang

【Abstract】: Image captioning is usually built on either generation-based or retrieval-based approaches. Both ways have certain strengths but suffer from their own limitations. In this paper, we propose an Interactive Dual Generative Adversarial Network (IDGAN) for image captioning, which mutually combines the retrieval-based and generation-based methods to learn a better image captioning ensemble. IDGAN consists of two generators and two discriminators, where the generation- and retrieval-based generators mutually benefit from each other's complementary targets that are learned from two dual adversarial discriminators. Specifically, the generation- and retrieval-based generators provide improved synthetic and retrieved candidate captions with informative feedback signals from the two respective discriminators that are trained to distinguish the generated captions from the true captions and assign top rankings to true captions respectively, thus featuring the merits of both retrieval-based and generation-based approaches. Extensive experiments on MSCOCO dataset demonstrate that the proposed IDGAN model significantly outperforms the compared methods for image captioning.

【Keywords】:

1421. Morphing and Sampling Network for Dense Point Cloud Completion.

Paper Link】 【Pages】:11596-11603

【Authors】: Minghua Liu ; Lu Sheng ; Sheng Yang ; Jing Shao ; Shi-Min Hu

【Abstract】: 3D point cloud completion, the task of inferring the complete geometric shape from a partial point cloud, has been attracting attention in the community. For acquiring high-fidelity dense point clouds and avoiding uneven distribution, blurred details, or structural loss of existing methods' results, we propose a novel approach to complete the partial point cloud in two stages. Specifically, in the first stage, the approach predicts a complete but coarse-grained point cloud with a collection of parametric surface elements. Then, in the second stage, it merges the coarse-grained prediction with the input point cloud by a novel sampling algorithm. Our method utilizes a joint loss function to guide the distribution of the points. Extensive experiments verify the effectiveness of our method and demonstrate that it outperforms the existing methods in both the Earth Mover's Distance (EMD) and the Chamfer Distance (CD).

【Keywords】:

1422. Multi-Task Driven Feature Models for Thermal Infrared Tracking.

Paper Link】 【Pages】:11604-11611

【Authors】: Qiao Liu ; Xin Li ; Zhenyu He ; Nana Fan ; Di Yuan ; Wei Liu ; Yongsheng Liang

【Abstract】: Existing deep Thermal InfraRed (TIR) trackers usually use the feature models of RGB trackers for representation. However, these feature models learned on RGB images are neither effective in representing TIR objects nor taking fine-grained TIR information into consideration. To this end, we develop a multi-task framework to learn the TIR-specific discriminative features and fine-grained correlation features for TIR tracking. Specifically, we first use an auxiliary classification network to guide the generation of TIR-specific discriminative features for distinguishing the TIR objects belonging to different classes. Second, we design a fine-grained aware module to capture more subtle information for distinguishing the TIR objects belonging to the same class. These two kinds of features complement each other and recognize TIR objects in the levels of inter-class and intra-class respectively. These two feature models are learned using a multi-task matching framework and are jointly optimized on the TIR tracking task. In addition, we develop a large-scale TIR training dataset to train the network for adapting the model to the TIR domain. Extensive experimental results on three benchmarks show that the proposed algorithm achieves a relative gain of 10% over the baseline and performs favorably against the state-of-the-art methods. Codes and the proposed TIR dataset are available at https://github.com/QiaoLiuHit/MMNet.

【Keywords】:

1423. Progressive Boundary Refinement Network for Temporal Action Detection.

Paper Link】 【Pages】:11612-11619

【Authors】: Qinying Liu ; Zilei Wang

【Abstract】: Temporal action detection is a challenging task due to vagueness of action boundaries. To tackle this issue, we propose an end-to-end progressive boundary refinement network (PBRNet) in this paper. PBRNet belongs to the family of one-stage detectors and is equipped with three cascaded detection modules for localizing action boundary more and more precisely. Specifically, PBRNet mainly consists of coarse pyramidal detection, refined pyramidal detection, and fine-grained detection. The first two modules build two feature pyramids to perform the anchor-based detection, and the third one explores the frame-level features to refine the boundaries of each action instance. In the fined-grained detection module, three frame-level classification branches are proposed to augment the frame-level features and update the confidence scores of action instances. Evidently, PBRNet integrates the anchor-based and frame-level methods. We experimentally evaluate the proposed PBRNet and comprehensively investigate the effect of the main components. The results show PBRNet achieves the state-of-the-art detection performances on two popular benchmarks: THUMOS'14 and ActivityNet, and meanwhile possesses a high inference speed.

【Keywords】:

1424. A Generalized Framework for Edge-Preserving and Structure-Preserving Image Smoothing.

Paper Link】 【Pages】:11620-11628

【Authors】: Wei Liu ; Pingping Zhang ; Yinjie Lei ; Xiaolin Huang ; Jie Yang ; Ian D. Reid

【Abstract】: Image smoothing is a fundamental procedure in applications of both computer vision and graphics. The required smoothing properties can be different or even contradictive among different tasks. Nevertheless, the inherent smoothing nature of one smoothing operator is usually fixed and thus cannot meet the various requirements of different applications. In this paper, a non-convex non-smooth optimization framework is proposed to achieve diverse smoothing natures where even contradictive smoothing behaviors can be achieved. To this end, we first introduce the truncated Huber penalty function which has seldom been used in image smoothing. A robust framework is then proposed. When combined with the strong flexibility of the truncated Huber penalty function, our framework is capable of a range of applications and can outperform the state-of-the-art approaches in several tasks. In addition, an efficient numerical solution is provided and its convergence is theoretically guaranteed even the optimization framework is non-convex and non-smooth. The effectiveness and superior performance of our approach are validated through comprehensive experimental results in a range of applications.

【Keywords】:

1425. Importance-Aware Semantic Segmentation in Self-Driving with Discrete Wasserstein Training.

Paper Link】 【Pages】:11629-11636

【Authors】: Xiaofeng Liu ; Yuzhuo Han ; Song Bai ; Yi Ge ; Tianxing Wang ; Xu Han ; Site Li ; Jane You ; Jun Lu

【Abstract】: Semantic segmentation (SS) is an important perception manner for self-driving cars and robotics, which classifies each pixel into a pre-determined class. The widely-used cross entropy (CE) loss-based deep networks has achieved significant progress w.r.t. the mean Intersection-over Union (mIoU). However, the cross entropy loss can not take the different importance of each class in an self-driving system into account. For example, pedestrians in the image should be much more important than the surrounding buildings when make a decisions in the driving, so their segmentation results are expected to be as accurate as possible. In this paper, we propose to incorporate the importance-aware inter-class correlation in a Wasserstein training framework by configuring its ground distance matrix. The ground distance matrix can be pre-defined following a priori in a specific task, and the previous importance-ignored methods can be the particular cases. From an optimization perspective, we also extend our ground metric to a linear, convex or concave increasing function w.r.t. pre-defined ground distance. We evaluate our method on CamVid and Cityscapes datasets with different backbones (SegNet, ENet, FCN and Deeplab) in a plug and play fashion. In our extenssive experiments, Wasserstein loss demonstrates superior segmentation performance on the predefined critical classes for safe-driving.

【Keywords】:

1426. A New Dataset and Boundary-Attention Semantic Segmentation for Face Parsing.

Paper Link】 【Pages】:11637-11644

【Authors】: Yinglu Liu ; Hailin Shi ; Hao Shen ; Yue Si ; Xiaobo Wang ; Tao Mei

【Abstract】: Face parsing has recently attracted increasing interest due to its numerous application potentials, such as facial make up and facial image generation. In this paper, we make contributions on face parsing task from two aspects. First, we develop a high-efficiency framework for pixel-level face parsing annotating and construct a new large-scale Landmark guided face Parsing dataset (LaPa). It consists of more than 22,000 facial images with abundant variations in expression, pose and occlusion, and each image of LaPa is provided with an 11-category pixel-level label map and 106-point landmarks. The dataset is publicly accessible to the community for boosting the advance of face parsing.1 Second, a simple yet effective Boundary-Attention Semantic Segmentation (BASS) method is proposed for face parsing, which contains a three-branch network with elaborately developed loss functions to fully exploit the boundary information. Extensive experiments on our LaPa benchmark and the public Helen dataset show the superiority of our proposed method.

【Keywords】:

1427. Learning Cross-Modal Context Graph for Visual Grounding.

Paper Link】 【Pages】:11645-11652

【Authors】: Yongfei Liu ; Bo Wan ; Xiaodan Zhu ; Xuming He

【Abstract】: Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challenging due to large variations in visual and linguistic features of grounding entities, strong context effect and the resulting semantic ambiguities. Prior works typically focus on learning representations of individual phrases with limited context information. To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task. In particular, we introduce a modular graph neural network to compute context-aware representations of phrases and object proposals respectively via message propagation, followed by a graph-based matching module to generate globally consistent localization of grounding phrases. We train the entire graph neural network jointly in a two-stage strategy and evaluate it on the Flickr30K Entities benchmark. Extensive experiments show that our method outperforms the prior state of the arts by a sizable margin, evidencing the efficacy of our grounding framework. Code is available at https://github.com/youngfly11/LCMCG-PyTorch.

【Keywords】:

1428. CBNet: A Novel Composite Backbone Network Architecture for Object Detection.

Paper Link】 【Pages】:11653-11660

【Authors】: Yudong Liu ; Yongtao Wang ; Siwei Wang ; Tingting Liang ; Qijie Zhao ; Zhi Tang ; Haibin Ling

【Abstract】: In existing CNN based detectors, the backbone network is a very important component for basic feature1 extraction, and the performance of the detectors highly depends on it. In this paper, we aim to achieve better detection performance by building a more powerful backbone from existing ones like ResNet and ResNeXt. Specifically, we propose a novel strategy for assembling multiple identical backbones by composite connections between the adjacent backbones, to form a more powerful backbone named Composite Backbone Network (CBNet). In this way, CBNet iteratively feeds the output features of the previous backbone, namely high-level features, as part of input features to the succeeding backbone, in a stage-by-stage fashion, and finally the feature maps of the last backbone (named Lead Backbone) are used for object detection. We show that CBNet can be very easily integrated into most state-of-the-art detectors and significantly improve their performances. For example, it boosts the mAP of FPN, Mask R-CNN and Cascade R-CNN on the COCO dataset by about 1.5 to 3.0 points. Moreover, experimental results show that the instance segmentation results can be improved as well. Specifically, by simply integrating the proposed CBNet into the baseline detector Cascade Mask R-CNN, we achieve a new state-of-the-art result on COCO dataset (mAP of 53.3) with a single model, which demonstrates great effectiveness of the proposed CBNet architecture. Code will be made available at https://github.com/PKUbahuangliuhe/CBNet.

【Keywords】:

1429. Separate in Latent Space: Unsupervised Single Image Layer Separation.

Paper Link】 【Pages】:11661-11668

【Authors】: Yunfei Liu ; Feng Lu

【Abstract】: Many real world vision tasks, such as reflection removal from a transparent surface and intrinsic image decomposition, can be modeled as single image layer separation. However, this problem is highly ill-posed, requiring accurately aligned and hard to collect triplet data to train the CNN models. To address this problem, this paper proposes an unsupervised method that requires no ground truth data triplet in training. At the core of the method are two assumptions about data distributions in the latent spaces of different layers, based on which a novel unsupervised layer separation pipeline can be derived. Then the method can be constructed based on the GANs framework with self-supervision and cycle consistency constraints, etc. Experimental results demonstrate its successfulness in outperforming existing unsupervised methods in both synthetic and real world tasks. The method also shows its ability to solve a more challenging multi-layer separation task.

【Keywords】:

1430. TEINet: Towards an Efficient Architecture for Video Recognition.

Paper Link】 【Pages】:11669-11676

【Authors】: Zhaoyang Liu ; Donghao Luo ; Yabiao Wang ; Limin Wang ; Ying Tai ; Chengjie Wang ; Jilin Li ; Feiyue Huang ; Tong Lu

【Abstract】: Efficiency is an important issue in designing video architectures for action recognition. 3D CNNs have witnessed remarkable progress in action recognition from videos. However, compared with their 2D counterparts, 3D convolutions often introduce a large amount of parameters and cause high computational cost. To relieve this problem, we propose an efficient temporal module, termed as Temporal Enhancement-and-Interaction (TEI Module), which could be plugged into the existing 2D CNNs (denoted by TEINet). The TEI module presents a different paradigm to learn temporal features by decoupling the modeling of channel correlation and temporal interaction. First, it contains a Motion Enhanced Module (MEM) which is to enhance the motion-related features while suppress irrelevant information (e.g., background). Then, it introduces a Temporal Interaction Module (TIM) which supplements the temporal contextual information in a channel-wise manner. This two-stage modeling scheme is not only able to capture temporal structure flexibly and effectively, but also efficient for model inference. We conduct extensive experiments to verify the effectiveness of TEINet on several benchmarks (e.g., Something-Something V1&V2, Kinetics, UCF101 and HMDB51). Our proposed TEINet can achieve a good recognition accuracy on these datasets but still preserve a high efficiency.

【Keywords】:

1431. TANet: Robust 3D Object Detection from Point Clouds with Triple Attention.

Paper Link】 【Pages】:11677-11684

【Authors】: Zhe Liu ; Xin Zhao ; Tengteng Huang ; Ruolan Hu ; Yu Zhou ; Xiang Bai

【Abstract】: In this paper, we focus on exploring the robustness of the 3D object detection in point clouds, which has been rarely discussed in existing approaches. We observe two crucial phenomena: 1) the detection accuracy of the hard objects, e.g., Pedestrians, is unsatisfactory, 2) when adding additional noise points, the performance of existing approaches decreases rapidly. To alleviate these problems, a novel TANet is introduced in this paper, which mainly contains a Triple Attention (TA) module, and a Coarse-to-Fine Regression (CFR) module. By considering the channel-wise, point-wise and voxel-wise attention jointly, the TA module enhances the crucial information of the target while suppresses the unstable cloud points. Besides, the novel stacked TA further exploits the multi-level feature attention. In addition, the CFR module boosts the accuracy of localization without excessive computation cost. Experimental results on the validation set of KITTI dataset demonstrate that, in the challenging noisy cases, i.e., adding additional random noisy points around each object, the presented approach goes far beyond state-of-the-art approaches. Furthermore, for the 3D object detection task of the KITTI benchmark, our approach ranks the first place on Pedestrian class, by using the point clouds as the only input. The running speed is around 29 frames per second.

【Keywords】:

1432. Training-Time-Friendly Network for Real-Time Object Detection.

Paper Link】 【Pages】:11685-11692

【Authors】: Zili Liu ; Tu Zheng ; Guodong Xu ; Zheng Yang ; Haifeng Liu ; Deng Cai

【Abstract】: Modern object detectors can rarely achieve short training time, fast inference speed, and high accuracy at the same time. To strike a balance among them, we propose the Training-Time-Friendly Network (TTFNet). In this work, we start with light-head, single-stage, and anchor-free designs, which enable fast inference speed. Then, we focus on shortening training time. We notice that encoding more training samples from annotated boxes plays a similar role as increasing batch size, which helps enlarge the learning rate and accelerate the training process. To this end, we introduce a novel approach using Gaussian kernels to encode training samples. Besides, we design the initiative sample weights for better information utilization. Experiments on MS COCO show that our TTFNet has great advantages in balancing training time, inference speed, and accuracy. It has reduced training time by more than seven times compared to previous real-time detectors while maintaining state-of-the-art performances. In addition, our super-fast version of TTFNet-18 and TTFNet-53 can outperform SSD300 and YOLOv3 by less than one-tenth of their training time, respectively. The code has been made available at https://github.com/ZJULearning/ttfnet.

【Keywords】:

1433. Hybrid Graph Neural Networks for Crowd Counting.

Paper Link】 【Pages】:11693-11700

【Authors】: Ao Luo ; Fan Yang ; Xin Li ; Dong Nie ; Zhicheng Jiao ; Shangchen Zhou ; Hong Cheng

【Abstract】: Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to jointly represent the task-specific feature maps of different scales as nodes, and two types of relations as edges: (i) multi-scale relations capturing the feature dependencies across scales and (ii) mutual beneficial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can capture and distill richer relations between nodes to obtain more powerful representations, providing robust and accurate results. Our HyGnn performs significantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming the state-of-the-art algorithms by a large margin.

【Keywords】:

1434. Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning.

Paper Link】 【Pages】:11701-11708

【Authors】: Dezhao Luo ; Chang Liu ; Yu Zhou ; Dongbao Yang ; Can Ma ; Qixiang Ye ; Weiping Wang

【Abstract】: We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates “blanks” by withholding video clips and then creates “options” by applying spatio-temporal operations on the withheld clips. Finally, it fills the blanks with “options” and learns representations by predicting the categories of operations applied on the clips. VCP can act as either a proxy task or a target task in self-supervised learning. As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning. As a target task, it can assess learned representation models in a uniform and interpretable manner. With VCP, we train spatial-temporal representation models (3D-CNNs) and apply such models on action recognition and video retrieval tasks. Experiments on commonly used benchmarks show that the trained models outperform the state-of-the-art self-supervised models with significant margins.

【Keywords】:

1435. Context-Aware Zero-Shot Recognition.

Paper Link】 【Pages】:11709-11716

【Authors】: Ruotian Luo ; Ning Zhang ; Bohyung Han ; Linjie Yang

【Abstract】: We present a novel problem setting in zero-shot learning, zero-shot object recognition and detection in the context. Contrary to the traditional zero-shot learning methods, which simply infers unseen categories by transferring knowledge from the objects belonging to semantically similar seen categories, we aim to understand the identity of the novel objects in an image surrounded by the known objects using the inter-object relation prior. Specifically, we leverage the visual context and the geometric relationships between all pairs of objects in a single image, and capture the information useful to infer unseen categories. We integrate our context-aware zero-shot learning framework into the traditional zero-shot learning techniques seamlessly using a Conditional Random Field (CRF). The proposed algorithm is evaluated on both zero-shot region classification and zero-shot detection tasks. The results on Visual Genome (VG) dataset show that our model significantly boosts performance with the additional visual context compared to traditional methods.

【Keywords】:

1436. Learning Saliency-Free Model with Generic Features for Weakly-Supervised Semantic Segmentation.

Paper Link】 【Pages】:11717-11724

【Authors】: Wenfeng Luo ; Meng Yang

【Abstract】: Current weakly-supervised semantic segmentation methods often estimate initial supervision from class activation maps (CAM), which produce sparse discriminative object seeds and rely on image saliency to provide background cues when only class labels are used. To eliminate the demand of extra data for training saliency detector, we propose to discover class pattern inherent in the lower layer convolution features, which are scarcely explored as in previous CAM methods. Specifically, we first project the convolution features into a low-dimension space and then decide on a decision boundary to generate class-agnostic maps for each semantic category that exists in the image. Features from Lower layer are more generic, thus capable of generating proxy ground-truth with more accurate and integral objects. Experiments on the PASCAL VOC 2012 dataset show that the proposed saliency-free method outperforms the previous approaches under the same weakly-supervised setting and achieves superior segmentation results, which are 64.5% on the validation set and 64.6% on the test set concerning mIoU metric.

【Keywords】:

1437. An Integrated Enhancement Solution for 24-Hour Colorful Imaging.

Paper Link】 【Pages】:11725-11732

【Authors】: Feifan Lv ; Yinqiang Zheng ; Yicheng Li ; Feng Lu

【Abstract】: The current industry practice for 24-hour outdoor imaging is to use a silicon camera supplemented with near-infrared (NIR) illumination. This will result in color images with poor contrast at daytime and absence of chrominance at nighttime. For this dilemma, all existing solutions try to capture RGB and NIR images separately. However, they need additional hardware support and suffer from various drawbacks, including short service life, high price, specific usage scenario, etc. In this paper, we propose a novel and integrated enhancement solution that produces clear color images, whether at abundant sunlight daytime or extremely low-light nighttime. Our key idea is to separate the VIS and NIR information from mixed signals, and enhance the VIS signal adaptively with the NIR signal as assistance. To this end, we build an optical system to collect a new VIS-NIR-MIX dataset and present a physically meaningful image processing algorithm based on CNN. Extensive experiments show outstanding results, which demonstrate the effectiveness of our solution.

【Keywords】:

1438. A Variational Autoencoder with Deep Embedding Model for Generalized Zero-Shot Learning.

Paper Link】 【Pages】:11733-11740

【Authors】: Peirong Ma ; Xiao Hu

【Abstract】: Generalized zero-shot learning (GZSL) is a challenging task that aims to recognize not only unseen classes unavailable during training, but also seen classes used at training stage. It is achieved by transferring knowledge from seen classes to unseen classes via a shared semantic space (e.g. attribute space). Most existing GZSL methods usually learn a cross-modal mapping between the visual feature space and the semantic space. However, the mapping model learned only from the seen classes will produce an inherent bias when used in the unseen classes. In order to tackle such a problem, this paper integrates a deep embedding network (DE) and a modified variational autoencoder (VAE) into a novel model (DE-VAE) to learn a latent space shared by both image features and class embeddings. Specifically, the proposed model firstly employs DE to learn the mapping from the semantic space to the visual feature space, and then utilizes VAE to transform both original visual features and the features obtained by the mapping into latent features. Finally, the latent features are used to train a softmax classifier. Extensive experiments on four GZSL benchmark datasets show that the proposed model significantly outperforms the state of the arts.

【Keywords】:

1439. Fine-Grained Fashion Similarity Learning by Attribute-Specific Embedding Network.

Paper Link】 【Pages】:11741-11748

【Authors】: Zhe Ma ; Jianfeng Dong ; Zhongzi Long ; Yao Zhang ; Yuan He ; Hui Xue ; Shouling Ji

【Abstract】: This paper strives to learn fine-grained fashion similarity. In this similarity paradigm, one should pay more attention to the similarity in terms of a specific design/attribute among fashion items, which has potential values in many fashion related applications such as fashion copyright protection. To this end, we propose an Attribute-Specific Embedding Network (ASEN) to jointly learn multiple attribute-specific embeddings in an end-to-end manner, thus measure the fine-grained similarity in the corresponding space. With two attention modules, i.e., Attribute-aware Spatial Attention and Attribute-aware Channel Attention, ASEN is able to locate the related regions and capture the essential patterns under the guidance of the specified attribute, thus make the learned attribute-specific embeddings better reflect the fine-grained similarity. Extensive experiments on four fashion-related datasets show the effectiveness of ASEN for fine-grained fashion similarity learning and its potential for fashion reranking. Code and data are available at https://github.com/Maryeon/asen.

【Keywords】:

1440. Domain Generalization Using a Mixture of Multiple Latent Domains.

Paper Link】 【Pages】:11749-11756

【Authors】: Toshihiko Matsuura ; Tatsuya Harada

【Abstract】: When domains, which represent underlying data distributions, vary during training and testing processes, deep neural networks suffer a drop in their performance. Domain generalization allows improvements in the generalization performance for unseen target domains by using multiple source domains. Conventional methods assume that the domain to which each sample belongs is known in training. However, many datasets, such as those collected via web crawling, contain a mixture of multiple latent domains, in which the domain of each sample is unknown. This paper introduces domain generalization using a mixture of multiple latent domains as a novel and more realistic scenario, where we try to train a domain-generalized model without using domain labels. To address this scenario, we propose a method that iteratively divides samples into latent domains via clustering, and which trains the domain-invariant feature extractor shared among the divided latent domains via adversarial learning. We assume that the latent domain of images is reflected in their style, and thus, utilize style features for clustering. By using these features, our proposed method successfully discovers latent domains and achieves domain generalization even if the domain labels are not given. Experiments show that our proposed method can train a domain-generalized model without using domain labels. Moreover, it outperforms conventional domain generalization methods, including those that utilize domain labels.

【Keywords】:

1441. High-Order Residual Network for Light Field Super-Resolution.

Paper Link】 【Pages】:11757-11764

【Authors】: Nan Meng ; Xiaofei Wu ; Jianzhuang Liu ; Edmund Y. Lam

【Abstract】: Plenoptic cameras usually sacrifice the spatial resolution of their SAIs to acquire geometry information from different viewpoints. Several methods have been proposed to mitigate such spatio-angular trade-off, but seldom make use of the structural properties of the light field (LF) data efficiently. In this paper, we propose a novel high-order residual network to learn the geometric features hierarchically from the LF for reconstruction. An important component in the proposed network is the high-order residual block (HRB), which learns the local geometric features by considering the information from all input views. After fully obtaining the local features learned from each HRB, our model extracts the representative geometric features for spatio-angular upsampling through the global residual learning. Additionally, a refinement network is followed to further enhance the spatial details by minimizing a perceptual loss. Compared with previous work, our model is tailored to the rich structure inherent in the LF, and therefore can reduce the artifacts near non-Lambertian and occlusion regions. Experimental results show that our approach enables high-quality reconstruction even in challenging regions and outperforms state-of-the-art single image or LF reconstruction methods with both quantitative measurements and visual evaluation.

【Keywords】:

1442. Shallow Feature Based Dense Attention Network for Crowd Counting.

Paper Link】 【Pages】:11765-11772

【Authors】: Yunqi Miao ; Zijia Lin ; Guiguang Ding ; Jungong Han

【Abstract】: While the performance of crowd counting via deep learning has been improved dramatically in the recent years, it remains an ingrained problem due to cluttered backgrounds and varying scales of people within an image. In this paper, we propose a Shallow feature based Dense Attention Network (SDANet) for crowd counting from still images, which diminishes the impact of backgrounds via involving a shallow feature based attention model, and meanwhile, captures multi-scale information via densely connecting hierarchical image features. Specifically, inspired by the observation that backgrounds and human crowds generally have noticeably different responses in shallow features, we decide to build our attention model upon shallow-feature maps, which results in accurate background-pixel detection. Moreover, considering that the most representative features of people across different scales can appear in different layers of a feature extraction network, to better keep them all, we propose to densely connect hierarchical image features of different layers and subsequently encode them for estimating crowd density. Experimental results on three benchmark datasets clearly demonstrate the superiority of SDANet when dealing with different scenarios. Particularly, on the challenging UCF_CC_50 dataset, our method outperforms other existing methods by a large margin, as is evident from a remarkable 11.9% Mean Absolute Error (MAE) drop of our SDANet.

【Keywords】:

1443. Learning to Follow Directions in Street View.

Paper Link】 【Pages】:11773-11781

【Authors】: Karl Moritz Hermann ; Mateusz Malinowski ; Piotr Mirowski ; Andras Banki-Horvath ; Keith Anderson ; Raia Hadsell

【Abstract】: Navigating and understanding the real world remains a key challenge in machine learning and inspires a great variety of research in areas such as language grounding, planning, navigation and computer vision. We propose an instruction-following task that requires all of the above, and which combines the practicality of simulated environments with the challenges of ambiguous, noisy real world data. StreetNav is built on top of Google Street View and provides visually accurate environments representing real places. Agents are given driving instructions which they must learn to interpret in order to successfully navigate in this environment. Since humans equipped with driving instructions can readily navigate in previously unseen cities, we set a high bar and test our trained agents for similar cognitive capabilities. Although deep reinforcement learning (RL) methods are frequently evaluated only on data that closely follow the training distribution, our dataset extends to multiple cities and has a clean train/test separation. This allows for thorough testing of generalisation ability. This paper presents the StreetNav environment and tasks, models that establish strong baselines, and extensive analysis of the task and the trained agents.

【Keywords】:

1444. Pyramid Attention Aggregation Network for Semantic Segmentation of Surgical Instruments.

Paper Link】 【Pages】:11782-11790

【Authors】: Zhen-Liang Ni ; Gui-Bin Bian ; Guan'an Wang ; Xiao-Hu Zhou ; Zeng-Guang Hou ; Hua-Bin Chen ; Xiao-Liang Xie

【Abstract】: Semantic segmentation of surgical instruments plays a critical role in computer-assisted surgery. However, specular reflection and scale variation of instruments are likely to occur in the surgical environment, undesirably altering visual features of instruments, such as color and shape. These issues make semantic segmentation of surgical instruments more challenging. In this paper, a novel network, Pyramid Attention Aggregation Network, is proposed to aggregate multi-scale attentive features for surgical instruments. It contains two critical modules: Double Attention Module and Pyramid Upsampling Module. Specifically, the Double Attention Module includes two attention blocks (i.e., position attention block and channel attention block), which model semantic dependencies between positions and channels by capturing joint semantic information and global contexts, respectively. The attentive features generated by the Double Attention Module can distinguish target regions, contributing to solving the specular reflection issue. Moreover, the Pyramid Upsampling Module extracts local details and global contexts by aggregating multi-scale attentive features. It learns the shape and size features of surgical instruments in different receptive fields and thus addresses the scale variation issue. The proposed network achieves state-of-the-art performance on various datasets. It achieves a new record of 97.10% mean IOU on Cata7. Besides, it comes first in the MICCAI EndoVis Challenge 2017 with 9.90% increase on mean IOU.

【Keywords】:

1445. Spatial-Temporal Gaussian Scale Mixture Modeling for Foreground Estimation.

Paper Link】 【Pages】:11791-11798

【Authors】: Qian Ning ; Weisheng Dong ; Fangfang Wu ; Jinjian Wu ; Jie Lin ; Guangming Shi

【Abstract】: Subtracting the backgrounds from the video frames is an important step for many video analysis applications. Assuming that the backgrounds are low-rank and the foregrounds are sparse, the robust principle component analysis (RPCA)-based methods have shown promising results. However, the RPCA-based methods suffered from the scale issue, i.e., the ℓ1-sparsity regularizer fails to model the varying sparsity of the moving objects. While several efforts have been made to address this issue with advanced sparse models, previous methods cannot fully exploit the spatial-temporal correlations among the foregrounds. In this paper, we proposed a novel spatial-temporal Gaussian scale mixture (STGSM) model for foreground estimation. In the proposed STGSM model, a temporal consistent constraint is imposed over the estimated foregrounds through nonzero-means Gaussian models. Specifically, the estimates of the foregrounds obtained in the previous frame are used as the prior for these of the current frame, and nonzero means Gaussian scale mixture models (GSM) are developed. To better characterize the temporal correlations, the optical flow has been used to model the correspondences between foreground pixels in adjacent frames. The spatial correlations have also been exploited by considering that local correlated pixels should be characterized by the same STGSM model, leading to further performance improvements. Experimental results on real video datasets show that the proposed method performs comparably or even better than current state-of-the-art background subtraction methods.

【Keywords】:

1446. Crowd Counting with Decomposed Uncertainty.

Paper Link】 【Pages】:11799-11806

【Authors】: Min-hwan Oh ; Peder A. Olsen ; Karthikeyan Natesan Ramamurthy

【Abstract】: Research in neural networks in the field of computer vision has achieved remarkable accuracy for point estimation. However, the uncertainty in the estimation is rarely addressed. Uncertainty quantification accompanied by point estimation can lead to a more informed decision, and even improve the prediction quality. In this work, we focus on uncertainty estimation in the domain of crowd counting. With increasing occurrences of heavily crowded events such as political rallies, protests, concerts, etc., automated crowd analysis is becoming an increasingly crucial task. The stakes can be very high in many of these real-world applications. We propose a scalable neural network framework with quantification of decomposed uncertainty using a bootstrap ensemble. We demonstrate that the proposed uncertainty quantification method provides additional insight to the crowd counting problem and is simple to implement. We also show that our proposed method exhibits state-of-the-art performances in many benchmark crowd counting datasets.

【Keywords】:

1447. Image Formation Model Guided Deep Image Super-Resolution.

Paper Link】 【Pages】:11807-11814

【Authors】: Jinshan Pan ; Yang Liu ; Deqing Sun ; Jimmy S. J. Ren ; Ming-Ming Cheng ; Jian Yang ; Jinhui Tang

【Abstract】: We present a simple and effective image super-resolution algorithm that imposes an image formation constraint on the deep neural networks via pixel substitution. The proposed algorithm first uses a deep neural network to estimate intermediate high-resolution images, blurs the intermediate images using known blur kernels, and then substitutes values of the pixels at the un-decimated positions with those of the corresponding pixels from the low-resolution images. The output of the pixel substitution process strictly satisfies the image formation model and is further refined by the same deep neural network in a cascaded manner. The proposed framework is trained in an end-to-end fashion and can work with existing feed-forward deep neural networks for super-resolution and converges fast in practice. Extensive experimental results show that the proposed algorithm performs favorably against state-of-the-art methods.

【Keywords】:

1448. Adversarial Cross-Domain Action Recognition with Co-Attention.

Paper Link】 【Pages】:11815-11822

【Authors】: Boxiao Pan ; Zhangjie Cao ; Ehsan Adeli ; Juan Carlos Niebles

【Abstract】: Action recognition has been a widely studied topic with a heavy focus on supervised learning involving sufficient labeled videos. However, the problem of cross-domain action recognition, where training and testing videos are drawn from different underlying distributions, remains largely under-explored. Previous methods directly employ techniques for cross-domain image recognition, which tend to suffer from the severe temporal misalignment problem. This paper proposes a Temporal Co-attention Network (TCoN), which matches the distributions of temporally aligned action features between source and target domains using a novel cross-domain co-attention mechanism. Experimental results on three cross-domain action recognition datasets demonstrate that TCoN improves both previous single-domain and cross-domain methods significantly under the cross-domain setting.

【Keywords】:

1449. Further Understanding Videos through Adverbs: A New Video Task.

Paper Link】 【Pages】:11823-11830

【Authors】: Bo Pang ; Kaiwen Zha ; Yifan Zhang ; Cewu Lu

【Abstract】: Video understanding is a research hotspot of computer vision and significant progress has been made on video action recognition recently. However, the semantics information contained in actions is not rich enough to build powerful video understanding models. This paper first introduces a new video semantics: the Behavior Adverb (BA), which is a more expressive and difficult one covering subtle and inherent characteristics of human action behavior. To exhaustively decode this semantics, we construct the Videos with Action and Adverb Dataset (VAAD), which is a large-scale dataset with a semantically complete set of BAs. The dataset will be released to the public with this paper. We benchmark several representative video understanding methods (originally for action recognition) on BA and action recognition. The results show that BA recognition task is more challenging than conventional action recognition. Accordingly, we propose the BA Understanding Network (BAUN) to solve this problem and the experiments reveal that our BAUN is more suitable for BA recognition (11% better than I3D). Furthermore, we find these two semantics (action and BA) can propel each other forward to better performance: promoting action recognition results by 3.4% averagely on three standard action recognition datasets (UCF-101, HMDB-51, Kinetics).

【Keywords】:

1450. Visual Dialogue State Tracking for Question Generation.

Paper Link】 【Pages】:11831-11838

【Authors】: Wei Pang ; Xiaojie Wang

【Abstract】: GuessWhat?! is a visual dialogue task between a guesser and an oracle. The guesser aims to locate an object supposed by the oracle oneself in an image by asking a sequence of Yes/No questions. Asking proper questions with the progress of dialogue is vital for achieving successful final guess. As a result, the progress of dialogue should be properly represented and tracked. Previous models for question generation pay less attention on the representation and tracking of dialogue states, and therefore are prone to asking low quality questions such as repeated questions. This paper proposes visual dialogue state tracking (VDST) based method for question generation. A visual dialogue state is defined as the distribution on objects in the image as well as representations of objects. Representations of objects are updated with the change of the distribution on objects. An object-difference based attention is used to decode new question. The distribution on objects is updated by comparing the question-answer pair and objects. Experimental results on GuessWhat?! dataset show that our model significantly outperforms existing methods and achieves new state-of-the-art performance. It is also noticeable that our model reduces the rate of repeated questions from more than 50% to 21.9% compared with previous state-of-the-art methods.

【Keywords】:

1451. Relation Network for Person Re-Identification.

Paper Link】 【Pages】:11839-11847

【Authors】: Hyunjong Park ; Bumsub Ham

【Abstract】: Person re-identification (reID) aims at retrieving an image of the person of interest from a set of images typically captured by multiple cameras. Recent reID methods have shown that exploiting local features describing body parts, together with a global feature of a person image itself, gives robust feature representations, even in the case of missing body parts. However, using the individual part-level features directly, without considering relations between body parts, confuses differentiating identities of different persons having similar attributes in corresponding parts. To address this issue, we propose a new relation network for person reID that considers relations between individual body parts and the rest of them. Our model makes a single part-level feature incorporate partial information of other body parts as well, supporting it to be more discriminative. We also introduce a global contrastive pooling (GCP) method to obtain a global feature of a person image. We propose to use contrastive features for GCP to complement conventional max and averaging pooling techniques. We show that our model outperforms the state of the art on the Market1501, DukeMTMC-reID and CUHK03 datasets, demonstrating the effectiveness of our approach on discriminative person representations.

【Keywords】:

1452. Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA.

Paper Link】 【Pages】:11848-11855

【Authors】: Badri N. Patro ; Anupriy ; Vinay Namboodiri

【Abstract】: In this paper, we aim to obtain improved attention for a visual question answering (VQA) task. It is challenging to provide supervision for attention. An observation we make is that visual explanations as obtained through class activation mappings (specifically Grad-CAM) that are meant to explain the performance of various networks could form a means of supervision. However, as the distributions of attention maps and that of Grad-CAMs differ, it would not be suitable to directly use these as a form of supervision. Rather, we propose the use of a discriminator that aims to distinguish samples of visual explanation and attention maps. The use of adversarial training of the attention regions as a two-player game between attention and explanation serves to bring the distributions of attention maps and visual explanations closer. Significantly, we observe that providing such a means of supervision also results in attention maps that are more closely related to human attention resulting in a substantial improvement over baseline stacked attention network (SAN) models. It also results in a good improvement in rank correlation metric on the VQA task. This method can also be combined with recent MCB based methods and results in consistent improvement. We also provide comparisons with other means for learning distributions such as based on Correlation Alignment (Coral), Maximum Mean Discrepancy (MMD) and Mean Square Error (MSE) losses and observe that the adversarial loss outperforms the other forms of learning the attention maps. Visualization of the results also confirms our hypothesis that attention maps improve using this form of supervision.

【Keywords】:

1453. LCD: Learned Cross-Domain Descriptors for 2D-3D Matching.

Paper Link】 【Pages】:11856-11864

【Authors】: Quang-Hieu Pham ; Mikaela Angelina Uy ; Binh-Son Hua ; Duc Thanh Nguyen ; Gemma Roig ; Sai-Kit Yeung

【Abstract】: In this work, we present a novel method to learn a local cross-domain descriptor for 2D image and 3D point cloud matching. Our proposed method is a dual auto-encoder neural network that maps 2D and 3D input into a shared latent space representation. We show that such local cross-domain descriptors in the shared embedding are more discriminative than those obtained from individual training in 2D and 3D domains. To facilitate the training process, we built a new dataset by collecting ≈ 1.4 millions of 2D-3D correspondences with various lighting conditions and settings from publicly available RGB-D scenes. Our descriptor is evaluated in three main experiments: 2D-3D matching, cross-domain retrieval, and sparse-to-dense depth estimation. Experimental results confirm the robustness of our approach as well as its competitive performance not only in solving cross-domain tasks but also in being able to generalize to solve sole 2D and 3D tasks. Our dataset and code are released publicly at https://hkust-vgd.github.io/lcd.

【Keywords】:

1454. Exploit and Replace: An Asymmetrical Two-Stream Architecture for Versatile Light Field Saliency Detection.

Paper Link】 【Pages】:11865-11873

【Authors】: Yongri Piao ; Zhengkun Rong ; Miao Zhang ; Huchuan Lu

【Abstract】: Light field saliency detection is becoming of increasing interest in recent years due to the significant improvements in challenging scenes by using abundant light field cues. However, high dimension of light field data poses computation-intensive and memory-intensive challenges, and light field data access is far less ubiquitous as RGB data. These may severely impede practical applications of light field saliency detection. In this paper, we introduce an asymmetrical two-stream architecture inspired by knowledge distillation to confront these challenges. First, we design a teacher network to learn to exploit focal slices for higher requirements on desktop computers and meanwhile transfer comprehensive focusness knowledge to the student network. Our teacher network is achieved relying on two tailor-made modules, namely multi-focusness recruiting module (MFRM) and multi-focusness screening module (MFSM), respectively. Second, we propose two distillation schemes to train a student network towards memory and computation efficiency while ensuring the performance. The proposed distillation schemes ensure better absorption of focusness knowledge and enable the student to replace the focal slices with a single RGB image in an user-friendly way. We conduct the experiments on three benchmark datasets and demonstrate that our teacher network achieves state-of-the-arts performance and student network (ResNet18) achieves Top-1 accuracies on HFUT-LFSD dataset and Top-4 on DUT-LFSD, which tremendously minimizes the model size by 56% and boosts the Frame Per Second (FPS) by 159%, compared with the best performing method.

【Keywords】:

1455. Differentiable Grammars for Videos.

Paper Link】 【Pages】:11874-11881

【Authors】: A. J. Piergiovanni ; Anelia Angelova ; Michael S. Ryoo

【Abstract】: This paper proposes a novel algorithm which learns a formal regular grammar from real-world continuous data, such as videos. Learning latent terminals, non-terminals, and production rules directly from continuous data allows the construction of a generative model capturing sequential structures with multiple possibilities. Our model is fully differentiable, and provides easily interpretable results which are important in order to understand the learned structures. It outperforms the state-of-the-art on several challenging datasets and is more accurate for forecasting future activities in videos. We plan to open-source the code.1

【Keywords】:

1456. Region-Adaptive Dense Network for Efficient Motion Deblurring.

Paper Link】 【Pages】:11882-11889

【Authors】: Kuldeep Purohit ; A. N. Rajagopalan

【Abstract】: In this paper, we address the problem of dynamic scene deblurring in the presence of motion blur. Restoration of images affected by severe blur necessitates a network design with a large receptive field, which existing networks attempt to achieve through simple increment in the number of generic convolution layers, kernel-size, or the scales at which the image is processed. However, these techniques ignore the non-uniform nature of blur, and they come at the expense of an increase in model size and inference time. We present a new architecture composed of region adaptive dense deformable modules that implicitly discover the spatially varying shifts responsible for non-uniform blur in the input image and learn to modulate the filters. This capability is complemented by a self-attentive module which captures non-local spatial relationships among the intermediate features and enhances the spatially varying processing capability. We incorporate these modules into a densely connected encoder-decoder design which utilizes pre-trained Densenet filters to further improve the performance. Our network facilitates interpretable modeling of the spatially-varying deblurring process while dispensing with multi-scale processing and large filters entirely. Extensive comparisons with prior art on benchmark dynamic scene deblurring datasets clearly demonstrate the superiority of the proposed networks via significant improvements in accuracy and speed, enabling almost real-time deblurring.

【Keywords】:

1457. Visualizing Deep Networks by Optimizing with Integrated Gradients.

Paper Link】 【Pages】:11890-11898

【Authors】: Zhongang Qi ; Saeed Khorram ; Fuxin Li

【Abstract】: Understanding and interpreting the decisions made by deep learning models is valuable in many domains. In computer vision, computing heatmaps from a deep network is a popular approach for visualizing and understanding deep networks. However, heatmaps that do not correlate with the network may mislead human, hence the performance of heatmaps in providing a faithful explanation to the underlying deep network is crucial. In this paper, we propose I-GOS, which optimizes for a heatmap so that the classification scores on the masked image would maximally decrease. The main novelty of the approach is to compute descent directions based on the integrated gradients instead of the normal gradient, which avoids local optima and speeds up convergence. Compared with previous approaches, our method can flexibly compute heatmaps at any resolution for different user needs. Extensive experiments on several benchmark datasets show that the heatmaps produced by our approach are more correlated with the decision of the underlying deep network, in comparison with other state-of-the-art approaches.

【Keywords】:

1458. Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting.

Paper Link】 【Pages】:11899-11907

【Authors】: Liang Qiao ; Sanli Tang ; Zhanzhan Cheng ; Yunlu Xu ; Yi Niu ; Shiliang Pu ; Fei Wu

【Abstract】: Many approaches have recently been proposed to detect irregular scene text and achieved promising results. However, their localization results may not well satisfy the following text recognition part mainly because of two reasons: 1) recognizing arbitrary shaped text is still a challenging task, and 2) prevalent non-trainable pipeline strategies between text detection and text recognition will lead to suboptimal performances. To handle this incompatibility problem, in this paper we propose an end-to-end trainable text spotting approach named Text Perceptron. Concretely, Text Perceptron first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information. Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies without extra parameters. It unites text detection and the following recognition part into a whole framework, and helps the whole network achieve global optimization. Experiments show that our method achieves competitive performance on two standard text benchmarks, i.e., ICDAR 2013 and ICDAR 2015, and also obviously outperforms existing methods on irregular text benchmarks SCUT-CTW1500 and Total-Text.

【Keywords】:

1459. FFA-Net: Feature Fusion Attention Network for Single Image Dehazing.

Paper Link】 【Pages】:11908-11915

【Authors】: Xu Qin ; Zhilin Wang ; Yuanchao Bai ; Xiaodong Xie ; Huizhu Jia

【Abstract】: In this paper, we propose an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image. The FFA-Net architecture consists of three key components:1) A novel Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels. FA treats different features and pixels unequally, which provides additional flexibility in dealing with different types of information, expanding the representational ability of CNNs. 2) A basic block structure consists of Local Residual Learning and Feature Attention, Local Residual Learning allowing the less important information such as thin haze region or low-frequency to be bypassed through multiple local residual connections, let main network architecture focus on more effective information. 3) An Attention-based different levels Feature Fusion (FFA) structure, the feature weights are adaptively learned from the Feature Attention (FA) module, giving more weight to important features. This structure can also retain the information of shallow layers and pass it into deep layers.The experimental results demonstrate that our proposed FFA-Net surpasses previous state-of-the-art single image dehazing methods by a very large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 30.23 dB to 36.39 dB on the SOTS indoor test dataset. Code has been made available at GitHub.

【Keywords】:

1460. Learning Meta Model for Zero- and Few-Shot Face Anti-Spoofing.

Paper Link】 【Pages】:11916-11923

【Authors】: Yunxiao Qin ; Chenxu Zhao ; Xiangyu Zhu ; Zezheng Wang ; Zitong Yu ; Tianyu Fu ; Feng Zhou ; Jingping Shi ; Zhen Lei

【Abstract】: Face anti-spoofing is crucial to the security of face recognition systems. Most previous methods formulate face anti-spoofing as a supervised learning problem to detect various predefined presentation attacks, which need large scale training data to cover as many attacks as possible. However, the trained model is easy to overfit several common attacks and is still vulnerable to unseen attacks. To overcome this challenge, the detector should: 1) learn discriminative features that can generalize to unseen spoofing types from predefined presentation attacks; 2) quickly adapt to new spoofing types by learning from both the predefined attacks and a few examples of the new spoofing types. Therefore, we define face anti-spoofing as a zero- and few-shot learning problem. In this paper, we propose a novel Adaptive Inner-update Meta Face Anti-Spoofing (AIM-FAS) method to tackle this problem through meta-learning. Specifically, AIM-FAS trains a meta-learner focusing on the task of detecting unseen spoofing types by learning from predefined living and spoofing faces and a few examples of new attacks. To assess the proposed approach, we propose several benchmarks for zero- and few-shot FAS. Experiments show its superior performances on the presented benchmarks to existing methods in existing zero-shot FAS protocols.

【Keywords】:

1461. DGCN: Dynamic Graph Convolutional Network for Efficient Multi-Person Pose Estimation.

Paper Link】 【Pages】:11924-11931

【Authors】: Zhongwei Qiu ; Kai Qiu ; Jianlong Fu ; Dongmei Fu

【Abstract】: Multi-person pose estimation aims to detect human keypoints from images with multiple persons. Bottom-up methods for multi-person pose estimation have attracted extensive attention, owing to the good balance between efficiency and accuracy. Recent bottom-up methods usually follow the principle of keypoints localization and grouping, where relations between keypoints are the keys to group keypoints. These relations spontaneously construct a graph of keypoints, where the edges represent the relations between two nodes (i.e., keypoints). Existing bottom-up methods mainly define relations by empirically picking out edges from this graph, while omitting edges that may contain useful semantic relations. In this paper, we propose a novel Dynamic Graph Convolutional Module (DGCM) to model rich relations in the keypoints graph. Specifically, we take into account all relations (all edges of the graph) and construct dynamic graphs to tolerate large variations of human pose. The DGCM is quite lightweight, which allows it to be stacked like a pyramid architecture and learn structural relations from multi-level features. Our network with single DGCM based on ResNet-50 achieves relative gains of 3.2% and 4.8% over state-of-the-art bottom-up methods on COCO keypoints and MPII dataset, respectively.

【Keywords】:

1462. Improved Visual-Semantic Alignment for Zero-Shot Object Detection.

Paper Link】 【Pages】:11932-11939

【Authors】: Shafin Rahman ; Salman H. Khan ; Nick Barnes

【Abstract】: Zero-shot object detection is an emerging research topic that aims to recognize and localize previously ‘unseen’ objects. This setting gives rise to several unique challenges, e.g., highly imbalanced positive vs. negative instance ratio, proper alignment between visual and semantic concepts and the ambiguity between background and unseen classes. Here, we propose an end-to-end deep learning framework underpinned by a novel loss function that handles class-imbalance and seeks to properly align the visual and semantic cues for improved zero-shot learning. We call our objective the ‘Polarity loss’ because it explicitly maximizes the gap between positive and negative predictions. Such a margin maximizing formulation is not only important for visual-semantic alignment but it also resolves the ambiguity between background and unseen objects. Further, the semantic representations of objects are noisy, thus complicating the alignment between visual and semantic domains. To this end, we perform metric learning using a ‘Semantic vocabulary’ of related concepts that refines the noisy semantic embeddings and establishes a better synergy between visual and semantic domains. Our approach is inspired by the embodiment theories in cognitive science, that claim human semantic understanding to be grounded in past experiences (seen objects), related linguistic concepts (word vocabulary) and the visual perception (seen/unseen object images). Our extensive results on MS-COCO and Pascal VOC datasets show significant improvements over state of the art.1

【Keywords】:

1463. Dynamic Graph Representation for Occlusion Handling in Biometrics.

Paper Link】 【Pages】:11940-11947

【Authors】: Min Ren ; Yunlong Wang ; Zhenan Sun ; Tieniu Tan

【Abstract】: The generalization ability of Convolutional neural networks (CNNs) for biometrics drops greatly due to the adverse effects of various occlusions. To this end, we propose a novel unified framework integrated the merits of both CNNs and graphical models to learn dynamic graph representations for occlusion problems in biometrics, called Dynamic Graph Representation (DGR). Convolutional features onto certain regions are re-crafted by a graph generator to establish the connections among the spatial parts of biometrics and build Feature Graphs based on these node representations. Each node of Feature Graphs corresponds to a specific part of the input image and the edges express the spatial relationships between parts. By analyzing the similarities between the nodes, the framework is able to adaptively remove the nodes representing the occluded parts. During dynamic graph matching, we propose a novel strategy to measure the distances of both nodes and adjacent matrixes. In this way, the proposed method is more convincing than CNNs-based methods because the dynamic graph method implies a more illustrative and reasonable inference of the biometrics decision. Experiments conducted on iris and face demonstrate the superiority of the proposed framework, which boosts the accuracy of occluded biometrics recognition by a large margin comparing with baseline methods.

【Keywords】:

1464. Conquering the CNN Over-Parameterization Dilemma: A Volterra Filtering Approach for Action Recognition.

Paper Link】 【Pages】:11948-11956

【Authors】: Siddharth Roheda ; Hamid Krim

【Abstract】: The importance of inference in Machine Learning (ML) has led to an explosive number of different proposals in ML, and particularly in Deep Learning. In an attempt to reduce the complexity of Convolutional Neural Networks, we propose a Volterra filter-inspired Network architecture. This architecture introduces controlled non-linearities in the form of interactions between the delayed input samples of data. We propose a cascaded implementation of Volterra Filtering so as to significantly reduce the number of parameters required to carry out the same classification task as that of a conventional Neural Network. We demonstrate an efficient parallel implementation of this Volterra Neural Network (VNN), along with its remarkable performance while retaining a relatively simpler and potentially more tractable structure. Furthermore, we show a rather sophisticated adaptation of this network to nonlinearly fuse the RGB (spatial) information and the Optical Flow (temporal) information of a video sequence for action recognition. The proposed approach is evaluated on UCF-101 and HMDB-51 datasets for action recognition, and is shown to outperform state of the art CNN approaches.

【Keywords】:

1465. Hidden Trigger Backdoor Attacks.

Paper Link】 【Pages】:11957-11965

【Authors】: Aniruddha Saha ; Akshayvarun Subramanya ; Hamed Pirsiavash

【Abstract】: With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on deep networks where the attacker provides poisoned data to the victim to train the model with, and then activates the attack by showing a specific small trigger pattern at the test time. Most state-of-the-art backdoor attacks either provide mislabeled poisoning data that is possible to identify by visual inspection, reveal the trigger in the poisoned data, or use noise to hide the trigger. We propose a novel form of backdoor attack where poisoned data look natural with correct labels and also more importantly, the attacker hides the trigger in the poisoned data and keeps the trigger secret until the test time. We perform an extensive study on various image classification settings and show that our attack can fool the model by pasting the trigger at random locations on unseen images although the model performs well on clean data. We also show that our proposed attack cannot be easily defended using a state-of-the-art defense algorithm for backdoor attacks.

【Keywords】:

1466. Temporal Interlacing Network.

Paper Link】 【Pages】:11966-11973

【Authors】: Hao Shao ; Shengju Qian ; Yu Liu

【Abstract】: For a long time, the vision community tries to learn the spatio-temporal representation by combining convolutional neural network together with various temporal models, such as the families of Markov chain, optical flow, RNN and temporal convolution. However, these pipelines consume enormous computing resources due to the alternately learning process for spatial and temporal information. One natural question is whether we can embed the temporal information into the spatial one so the information in the two domains can be jointly learned once-only. In this work, we answer this question by presenting a simple yet powerful operator – temporal interlacing network (TIN). Instead of learning the temporal features, TIN fuses the two kinds of information by interlacing spatial representations from the past to the future, and vice versa. A differentiable interlacing target can be learned to control the interlacing process. In this way, a heavy temporal model is replaced by a simple interlacing operator. We theoretically prove that with a learnable interlacing target, TIN performs equivalently to the regularized temporal convolution network (r-TCN), but gains 4% more accuracy with 6x less latency on 6 challenging benchmarks. These results push the state-of-the-art performances of video understanding by a considerable margin. Not surprising, the ensemble model of the proposed TIN won the 1st place in the ICCV19 - Multi Moments in Time challenge. Code is made available to facilitate further research.1

【Keywords】:

1467. Regularized Fine-Grained Meta Face Anti-Spoofing.

Paper Link】 【Pages】:11974-11981

【Authors】: Rui Shao ; Xiangyuan Lan ; Pong C. Yuen

【Abstract】: Face presentation attacks have become an increasingly critical concern when face recognition is widely applied. Many face anti-spoofing methods have been proposed, but most of them ignore the generalization ability to unseen attacks. To overcome the limitation, this work casts face anti-spoofing as a domain generalization (DG) problem, and attempts to address this problem by developing a new meta-learning framework called Regularized Fine-grained Meta-learning. To let our face anti-spoofing model generalize well to unseen attacks, the proposed framework trains our model to perform well in the simulated domain shift scenarios, which is achieved by finding generalized learning directions in the meta-learning process. Specifically, the proposed framework incorporates the domain knowledge of face anti-spoofing as the regularization so that meta-learning is conducted in the feature space regularized by the supervision of domain knowledge. This enables our model more likely to find generalized learning directions with the regularized meta-learning for face anti-spoofing task. Besides, to further enhance the generalization ability of our model, the proposed framework adopts a fine-grained learning strategy that simultaneously conducts meta-learning in a variety of domain shift scenarios in each iteration. Extensive experiments on four public datasets validate the effectiveness of the proposed method.

【Keywords】:

1468. Multimodal Interaction-Aware Trajectory Prediction in Crowded Space.

Paper Link】 【Pages】:11982-11989

【Authors】: Xiaodan Shi ; Xiaowei Shao ; Zipei Fan ; Renhe Jiang ; Haoran Zhang ; Zhiling Guo ; Guangming Wu ; Wei Yuan ; Ryosuke Shibasaki

【Abstract】: Accurate human path forecasting in complex and crowded scenarios is critical for collision avoidance of autonomous driving and social robots navigation. It still remains as a challenging problem because of dynamic human interaction and intrinsic multimodality of human motion. Given the observation, there is a rich set of plausible ways for an agent to walk through the circumstance. To address those issues, we propose a spatio-temporal model that can aggregate the information from socially interacting agents and capture the multimodality of the motion patterns. We use mixture density functions to describe the human path and predict the distribution of future paths with explicit density. To integrate more factors to model interacting people, we further introduce a coordinate transformation to represent the relative motion between people. Extensive experiments over several trajectory prediction benchmarks demonstrate that our method is able to forecast various plausible futures in complex scenarios and achieves state-of-the-art performance.

【Keywords】:

1469. Optimal Feature Transport for Cross-View Image Geo-Localization.

Paper Link】 【Pages】:11990-11997

【Authors】: Yujiao Shi ; Xin Yu ; Liu Liu ; Tong Zhang ; Hongdong Li

【Abstract】: This paper addresses the problem of cross-view image geo-localization, where the geographic location of a ground-level street-view query image is estimated by matching it against a large scale aerial map (e.g., a high-resolution satellite image). State-of-the-art deep-learning based methods tackle this problem as deep metric learning which aims to learn global feature representations of the scene seen by the two different views. Despite promising results are obtained by such deep metric learning methods, they, however, fail to exploit a crucial cue relevant for localization, namely, the spatial layout of local features. Moreover, little attention is paid to the obvious domain gap (between aerial view and ground view) in the context of cross-view localization. This paper proposes a novel Cross-View Feature Transport (CVFT) technique to explicitly establish cross-view domain transfer that facilitates feature alignment between ground and aerial images. Specifically, we implement the CVFT as network layers, which transports features from one domain to the other, leading to more meaningful feature similarity comparison. Our model is differentiable and can be learned end-to-end. Experiments on large-scale datasets have demonstrated that our method has remarkably boosted the state-of-the-art cross-view localization performance, e.g., on the CVUSA dataset, with significant improvements for top-1 recall from 40.79% to 61.43%, and for top-10 from 76.36% to 90.49%. We expect the key insight of the paper (i.e., explicitly handling domain difference via domain transport) will prove to be useful for other similar problems in computer vision as well.

【Keywords】:

1470. Identifying Model Weakness with Adversarial Examiner.

Paper Link】 【Pages】:11998-12006

【Authors】: Michelle Shu ; Chenxi Liu ; Weichao Qiu ; Alan L. Yuille

【Abstract】: Machine learning models are usually evaluated according to the average case performance on the test set. However, this is not always ideal, because in some sensitive domains (e.g. autonomous driving), it is the worst case performance that matters more. In this paper, we are interested in systematic exploration of the input data space to identify the weakness of the model to be evaluated. We propose to use an adversarial examiner in the testing stage. Different from the existing strategy to always give the same (distribution of) test data, the adversarial examiner will dynamically select the next test data to hand out based on the testing history so far, with the goal being to undermine the model's performance. This sequence of test data not only helps us understand the current model, but also serves as constructive feedback to help improve the model in the next iteration. We conduct experiments on ShapeNet object classification. We show that our adversarial examiner can successfully put more emphasis on the weakness of the model, preventing performance estimates from being overly optimistic.

【Keywords】:

Paper Link】 【Pages】:12007-12014

【Authors】: Dehua Song ; Chang Xu ; Xu Jia ; Yiyi Chen ; Chunjing Xu ; Yunhe Wang

【Abstract】: Although remarkable progress has been made on single image super-resolution due to the revival of deep convolutional neural networks, deep learning methods are confronted with the challenges of computation and memory consumption in practice, especially for mobile devices. Focusing on this issue, we propose an efficient residual dense block search algorithm with multiple objectives to hunt for fast, lightweight and accurate networks for image super-resolution. Firstly, to accelerate super-resolution network, we exploit the variation of feature scale adequately with the proposed efficient residual dense blocks. In the proposed evolutionary algorithm, the locations of pooling and upsampling operator are searched automatically. Secondly, network architecture is evolved with the guidance of block credits to acquire accurate super-resolution network. The block credit reflects the effect of current block and is earned during model evaluation process. It guides the evolution by weighing the sampling probability of mutation to favor admirable blocks. Extensive experimental results demonstrate the effectiveness of the proposed searching method and the found efficient super-resolution models achieve better performance than the state-of-the-art methods with limited number of parameters and FLOPs.

【Keywords】:

1472. KPNet: Towards Minimal Face Detector.

Paper Link】 【Pages】:12015-12022

【Authors】: Guanglu Song ; Yu Liu ; Yuhang Zang ; Xiaogang Wang ; Biao Leng ; Qingsheng Yuan

【Abstract】: The small receptive field and capacity of minimal neural networks limit their performance when using them to be the backbone of detectors. In this work, we find that the appearance feature of a generic face is discriminative enough for a tiny and shallow neural network to verify from the background. And the essential barriers behind us are 1) the vague definition of the face bounding box and 2) tricky design of anchor-boxes or receptive field. Unlike most top-down methods for joint face detection and alignment, the proposed KPNet detects small facial keypoints instead of the whole face by in the bottom-up manner. It first predicts the facial landmarks from a low-resolution image via the well-designed fine-grained scale approximation and scale adaptive soft-argmax operator. Finally, the precise face bounding boxes, no matter how we define it, can be inferred from the keypoints. Without any complex head architecture or meticulous network designing, the KPNet achieves state-of-the-art accuracy on generic face detection and alignment benchmarks with only ∼ 1M parameters, which runs at 1000fps on GPU and is easy to perform real-time on most modern front-end chips.

【Keywords】:

1473. Multi-Spectral Salient Object Detection by Adversarial Domain Adaptation.

Paper Link】 【Pages】:12023-12030

【Authors】: Shaoyue Song ; Hongkai Yu ; Zhenjiang Miao ; Jianwu Fang ; Kang Zheng ; Cong Ma ; Song Wang

【Abstract】: Although there are many existing research works about the salient object detection (SOD) in RGB images, there are still many complex situations that regular RGB images cannot provide enough cues for the accurate SOD, such as the shadow effect, similar appearance between background and foreground, strong or insufficient illumination, etc. Because of the success of near-infrared spectrum in many computer vision tasks, we explore the multi-spectral SOD in the synchronized RGB images and near-infrared (NIR) images for the both simple and complex situations. We assume that the RGB SOD in the existing RGB image datasets could provide references for the multi-spectral SOD problem. In this paper, we first collect and will publicize a large multi-spectral dataset including 780 synchronized RGB and NIR image pairs for the multi-spectral SOD problem in the simple and complex situations. We model this research problem as an adversarial domain adaptation from the existing RGB image dataset (source domain) to the collected multi-spectral dataset (target domain). Experimental results show the effectiveness and accuracy of the proposed adversarial domain adaptation for the multi-spectral SOD.

【Keywords】:

1474. Stereoscopic Image Super-Resolution with Stereo Consistent Feature.

Paper Link】 【Pages】:12031-12038

【Authors】: Wonil Song ; Sungil Choi ; Somi Jeong ; Kwanghoon Sohn

【Abstract】: We present a first attempt for stereoscopic image super-resolution (SR) for recovering high-resolution details while preserving stereo-consistency between stereoscopic image pair. The most challenging issue in the stereoscopic SR is that the texture details should be consistent for corresponding pixels in stereoscopic SR image pair. However, existing stereo SR methods cannot maintain the stereo-consistency, thus causing 3D fatigue to the viewers. To address this issue, in this paper, we propose a self and parallax attention mechanism (SPAM) to aggregate the information from its own image and the counterpart stereo image simultaneously, thus reconstructing high-quality stereoscopic SR image pairs. Moreover, we design an efficient network architecture and effective loss functions to enforce stereo-consistency constraint. Finally, experimental results demonstrate the superiority of our method over state-of-the-art SR methods in terms of both quantitative metrics and qualitative visual quality while maintaining stereo-consistency between stereoscopic image pair.

【Keywords】:

1475. An Efficient Framework for Dense Video Captioning.

Paper Link】 【Pages】:12039-12046

【Authors】: Maitreya Suin ; A. N. Rajagopalan

【Abstract】: Dense video captioning is an extremely challenging task since an accurate and faithful description of events in a video requires a holistic knowledge of the video contents as well as contextual reasoning of individual events. Most existing approaches handle this problem by first proposing event boundaries from a video and then captioning on a subset of the proposals. Generation of dense temporal annotations and corresponding captions from long videos can be dramatically source consuming. In this paper, we focus on the task of generating a dense description of temporally untrimmed videos and aim to significantly reduce the computational cost by processing fewer frames while maintaining accuracy. Existing video captioning methods sample frames with a predefined frequency over the entire video or use all the frames. Instead, we propose a deep reinforcement-based approach which enables an agent to describe multiple events in a video by watching a portion of the frames. The agent needs to watch more frames when it is processing an informative part of the video, and skip frames when there is redundancy. The agent is trained using actor-critic algorithm, where the actor determines the frames to be watched from a video and the critic assesses the optimality of the decisions taken by the actor. Such an efficient frame selection simplifies the event proposal task considerably. This has the added effect of reducing the occurrence of unwanted proposals. The encoded state representation of the frame selection agent is further utilized for guiding event proposal and caption generation tasks. We also leverage the idea of knowledge distillation to improve the accuracy. We conduct extensive evaluations on ActivityNet captions dataset to validate our method.

【Keywords】:

1476. Fine-Grained Recognition: Accounting for Subtle Differences between Similar Classes.

Paper Link】 【Pages】:12047-12054

【Authors】: Guolei Sun ; Hisham Cholakkal ; Salman Khan ; Fahad H. Khan ; Ling Shao

【Abstract】: The main requisite for fine-grained recognition task is to focus on subtle discriminative details that make the subordinate classes different from each other. We note that existing methods implicitly address this requirement and leave it to a data-driven pipeline to figure out what makes a subordinate class different from the others. This results in two major limitations: First, the network focuses on the most obvious distinctions between classes and overlooks more subtle inter-class variations. Second, the chance of misclassifying a given sample in any of the negative classes is considered equal, while in fact, confusions generally occur among only the most similar classes. Here, we propose to explicitly force the network to find the subtle differences among closely related classes. In this pursuit, we introduce two key novelties that can be easily plugged into existing end-to-end deep learning pipelines. On one hand, we introduce “diversification block” which masks the most salient features for an input to force the network to use more subtle cues for its correct classification. Concurrently, we introduce a “gradient-boosting” loss function that focuses only on the confusing classes for each sample and therefore moves swiftly along the direction on the loss surface that seeks to resolve these ambiguities. The synergy between these two blocks helps the network to learn more effective feature representations. Comprehensive experiments are performed on five challenging datasets. Our approach outperforms existing methods using similar experimental setting on all five datasets.

【Keywords】:

1477. Relation-Aware Pedestrian Attribute Recognition with Graph Convolutional Networks.

Paper Link】 【Pages】:12055-12062

【Authors】: Zichang Tan ; Yang Yang ; Jun Wan ; Guodong Guo ; Stan Z. Li

【Abstract】: In this paper, we propose a new end-to-end network, named Joint Learning of Attribute and Contextual relations (JLAC), to solve the task of pedestrian attribute recognition. It includes two novel modules: Attribute Relation Module (ARM) and Contextual Relation Module (CRM). For ARM, we construct an attribute graph with attribute-specific features which are learned by the constrained losses, and further use Graph Convolutional Network (GCN) to explore the correlations among multiple attributes. For CRM, we first propose a graph projection scheme to project the 2-D feature map into a set of nodes from different image regions, and then employ GCN to explore the contextual relations among those regions. Since the relation information in the above two modules is correlated and complementary, we incorporate them into a unified framework to learn both together. Experiments on three benchmarks, including PA-100K, RAP, PETA attribute datasets, demonstrate the effectiveness of the proposed JLAC.

【Keywords】:

1478. R²MRF: Defocus Blur Detection via Recurrently Refining Multi-Scale Residual Features.

Paper Link】 【Pages】:12063-12070

【Authors】: Chang Tang ; Xinwang Liu ; Xinzhong Zhu ; En Zhu ; Kun Sun ; Pichao Wang ; Lizhe Wang ; Albert Y. Zomaya

【Abstract】: Defocus blur detection aims to separate the in-focus and out-of-focus regions in an image. Although attracting more and more attention due to its remarkable potential applications, there are still several challenges for accurate defocus blur detection, such as the interference of background clutter, sensitivity to scales and missing boundary details of defocus blur regions. In order to address these issues, we propose a deep neural network which Recurrently Refines Multi-scale Residual Features (R2MRF) for defocus blur detection. We firstly extract multi-scale deep features by utilizing a fully convolutional network. For each layer, we design a novel recurrent residual refinement branch embedded with multiple residual refinement modules (RRMs) to more accurately detect blur regions from the input image. Considering that the features from bottom layers are able to capture rich low-level features for details preservation while the features from top layers are capable of characterizing the semantic information for locating blur regions, we aggregate the deep features from different layers to learn the residual between the intermediate prediction and the ground truth for each recurrent step in each residual refinement branch. Since the defocus degree is sensitive to image scales, we finally fuse the side output of each branch to obtain the final blur detection map. We evaluate the proposed network on two commonly used defocus blur detection benchmark datasets by comparing it with other 11 state-of-the-art methods. Extensive experimental results with ablation studies demonstrate that R2MRF consistently and significantly outperforms the competitors in terms of both efficiency and accuracy.

【Keywords】:

1479. V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices.

Paper Link】 【Pages】:12071-12078

【Authors】: Damien Teney ; Peng Wang ; Jiewei Cao ; Lingqiao Liu ; Chunhua Shen ; Anton van den Hengel

【Abstract】: Advances in machine learning have generated increasing enthusiasm for tasks that require high-level reasoning on top of perceptual capabilities, particularly over visual data. Such tasks include, for example, image captioning, visual question answering, and visual navigation. Their evaluation is however hindered by task-specific confounding factors and dataset biases. In parallel, the existing benchmarks for abstract reasoning are limited to synthetic stimuli (e.g. images of simple shapes) and do not capture the challenges of real-world data. We propose a new large-scale benchmark to evaluates abstract reasoning over real visual data. The test involves visual questions that require operations fundamental to many high-level vision tasks, such as comparisons of counts and logical operations on complex visual properties. The benchmark measures a method's ability to infer high-level relationships and to generalise them over image-based concepts. We provide multiple training/test splits that require controlled levels of generalization. We evaluate a range of deep learning architectures, and find that existing models, including those popular for vision-and-language tasks, are unable to solve seemingly-simple instances. Models using relational networks fare better but leave substantial room for improvement.

【Keywords】:

1480. End-to-End Thorough Body Perception for Person Search.

Paper Link】 【Pages】:12079-12086

【Authors】: Kun Tian ; Houjing Huang ; Yun Ye ; Shiyu Li ; Jinbin Lin ; Guan Huang

【Abstract】: In this paper, we propose an improved end-to-end multi-branch person search network to jointly optimize person detection, re-identification, instance segmentation, and keypoint detection. First, we build a better and faster base model to extract non-highly correlated feature expression; Second, a foreground feature enhance module is used to alleviate undesirable background noise in person feature maps; Third, we design an algorithm to learn the part-aligned representation for person search. Extensive experiments with ablation analysis show the effectiveness of our proposed end-to-end multi-task model, and we demonstrate its superiority over the state-of-the-art methods on two benchmark datasets including CUHK-SYSU and PRW.

【Keywords】:

1481. Differentiable Meta-Learning Model for Few-Shot Semantic Segmentation.

Paper Link】 【Pages】:12087-12094

【Authors】: Pinzhuo Tian ; Zhangkai Wu ; Lei Qi ; Lei Wang ; Yinghuan Shi ; Yang Gao

【Abstract】: To address the annotation scarcity issue in some cases of semantic segmentation, there have been a few attempts to develop the segmentation model in the few-shot learning paradigm. However, most existing methods only focus on the traditional 1-way segmentation setting (i.e., one image only contains a single object). This is far away from practical semantic segmentation tasks where the K-way setting (K > 1) is usually required by performing the accurate multi-object segmentation. To deal with this issue, we formulate the few-shot semantic segmentation task as a learning-based pixel classification problem, and propose a novel framework called MetaSegNet based on meta-learning. In MetaSegNet, an architecture of embedding module consisting of the global and local feature branches is developed to extract the appropriate meta-knowledge for the few-shot segmentation. Moreover, we incorporate a linear model into MetaSegNet as a base learner to directly predict the label of each pixel for the multi-object segmentation. Furthermore, our MetaSegNet can be trained by the episodic training mechanism in an end-to-end manner from scratch. Experiments on two popular semantic segmentation datasets, i.e., PASCAL VOC and COCO, reveal the effectiveness of the proposed MetaSegNet in the K-way few-shot semantic segmentation task.

【Keywords】:

1482. Attention-Based View Selection Networks for Light-Field Disparity Estimation.

Paper Link】 【Pages】:12095-12103

【Authors】: Yu-Ju Tsai ; Yu-Lun Liu ; Ming Ouhyoung ; Yung-Yu Chuang

【Abstract】: This paper introduces a novel deep network for estimating depth maps from a light field image. For utilizing the views more effectively and reducing redundancy within views, we propose a view selection module that generates an attention map indicating the importance of each view and its potential for contributing to accurate depth estimation. By exploring the symmetric property of light field views, we enforce symmetry in the attention map and further improve accuracy. With the attention map, our architecture utilizes all views more effectively and efficiently. Experiments show that the proposed method achieves state-of-the-art performance in terms of accuracy and ranks the first on a popular benchmark for disparity estimation for light field images.

【Keywords】:

1483. Image Cropping with Composition and Saliency Aware Aesthetic Score Map.

Paper Link】 【Pages】:12104-12111

【Authors】: Yi Tu ; Li Niu ; Weijie Zhao ; Dawei Cheng ; Liqing Zhang

【Abstract】: Aesthetic image cropping is a practical but challenging task which aims at finding the best crops with the highest aesthetic quality in an image. Recently, many deep learning methods have been proposed to address this problem, but they did not reveal the intrinsic mechanism of aesthetic evaluation. In this paper, we propose an interpretable image cropping model to unveil the mystery. For each image, we use a fully convolutional network to produce an aesthetic score map, which is shared among all candidate crops during crop-level aesthetic evaluation. Then, we require the aesthetic score map to be both composition-aware and saliency-aware. In particular, the same region is assigned with different aesthetic scores based on its relative positions in different crops. Moreover, a visually salient region is supposed to have more sensitive aesthetic scores so that our network can learn to place salient objects at more proper positions. Such an aesthetic score map can be used to localize aesthetically important regions in an image, which sheds light on the composition rules learned by our model. We show the competitive performance of our model in the image cropping task on several benchmark datasets, and also demonstrate its generality in real-world applications.

【Keywords】:

1484. Optical Flow in Deep Visual Tracking.

Paper Link】 【Pages】:12112-12119

【Authors】: Mikko Vihlman ; Arto Visala

【Abstract】: Single-target tracking of generic objects is a difficult task since a trained tracker is given information present only in the first frame of a video. In recent years, increasingly many trackers have been based on deep neural networks that learn generic features relevant for tracking. This paper argues that deep architectures are often fit to learn implicit representations of optical flow. Optical flow is intuitively useful for tracking, but most deep trackers must learn it implicitly. This paper is among the first to study the role of optical flow in deep visual tracking. The architecture of a typical tracker is modified to reveal the presence of implicit representations of optical flow and to assess the effect of using the flow information more explicitly. The results show that the considered network learns implicitly an effective representation of optical flow. The implicit representation can be replaced by an explicit flow input without a notable effect on performance. Using the implicit and explicit representations at the same time does not improve tracking accuracy. The explicit flow input could allow constructing lighter networks for tracking.

【Keywords】:

1485. TextScanner: Reading Characters in Order for Robust Scene Text Recognition.

Paper Link】 【Pages】:12120-12127

【Authors】: Zhaoyi Wan ; Minghang He ; Haoran Chen ; Xiang Bai ; Cong Yao

【Abstract】: Driven by deep learning and a large volume of data, scene text recognition has evolved rapidly in recent years. Formerly, RNN-attention-based methods have dominated this field, but suffer from the problem of attention drift in certain situations. Lately, semantic segmentation based algorithms have proven effective at recognizing text of different forms (horizontal, oriented and curved). However, these methods may produce spurious characters or miss genuine characters, as they rely heavily on a thresholding procedure operated on segmentation maps. To tackle these challenges, we propose in this paper an alternative approach, called TextScanner, for scene text recognition. TextScanner bears three characteristics: (1) Basically, it belongs to the semantic segmentation family, as it generates pixel-wise, multi-channel segmentation maps for character class, position and order; (2) Meanwhile, akin to RNN-attention-based methods, it also adopts RNN for context modeling; (3) Moreover, it performs paralleled prediction for character position and class, and ensures that characters are transcripted in the correct order. The experiments on standard benchmark datasets demonstrate that TextScanner outperforms the state-of-the-art methods. Moreover, TextScanner shows its superiority in recognizing more difficult text such as Chinese transcripts and aligning with target characters.

【Keywords】:

1486. Progressive Feature Polishing Network for Salient Object Detection.

Paper Link】 【Pages】:12128-12135

【Authors】: Bo Wang ; Quan Chen ; Min Zhou ; Zhiqiang Zhang ; Xiaogang Jin ; Kun Gai

【Abstract】: Feature matters for salient object detection. Existing methods mainly focus on designing a sophisticated structure to incorporate multi-level features and filter out cluttered features. We present Progressive Feature Polishing Network (PFPN), a simple yet effective framework to progressively polish the multi-level features to be more accurate and representative. By employing multiple Feature Polishing Modules (FPMs) in a recurrent manner, our approach is able to detect salient objects with fine details without any post-processing. A FPM parallelly updates the features of each level by directly incorporating all higher level context information. Moreover, it can keep the dimensions and hierarchical structures of the feature maps, which makes it flexible to be integrated with any CNN-based models. Empirical experiments show that our results are monotonically getting better with increasing number of FPMs. Without bells and whistles, PFPN outperforms the state-of-the-art methods significantly on five benchmark datasets under various evaluation metrics. Our code is available at: https://github.com/chenquan-cq/PFPN.

【Keywords】:

1487. Region-Based Global Reasoning Networks.

Paper Link】 【Pages】:12136-12143

【Authors】: Chuanming Wang ; Huiyuan Fu ; Charles X. Ling ; Peilun Du ; Huadong Ma

【Abstract】: Global reasoning plays a significant role in many computer vision tasks which need to capture long-distance relationships. However, most current studies on global reasoning focus on exploring the relationship between pixels and ignore the critical role of the regions. In this paper, we propose an novel approach that explores the relationship between regions which have richer semantics than pixels. Specifically, we design a region aggregation method that can gather regional features automatically into a uniform shape, and adjust theirs positions adaptively for better alignment. To achieve the best performance of global reasoning, we propose various relationship exploration methods and apply them on the regional features. Our region-based global reasoning module, named ReGr, is end-to-end and can be inserted into existing visual understanding models without extra supervision. To evaluate our approach, we apply ReGr to fine-grained classification and action recognition benchmark tasks, and the experimental results demonstrate the effectiveness of our approach.

【Keywords】:

1488. Cross-Modality Paired-Images Generation for RGB-Infrared Person Re-Identification.

Paper Link】 【Pages】:12144-12151

【Authors】: Guan'an Wang ; Tianzhu Zhang ; Yang Yang ; Jian Cheng ; Jianlong Chang ; Xu Liang ; Zeng-Guang Hou

【Abstract】: RGB-Infrared (IR) person re-identification is very challenging due to the large cross-modality variations between RGB and IR images. The key solution is to learn aligned features to the bridge RGB and IR modalities. However, due to the lack of correspondence labels between every pair of RGB and IR images, most methods try to alleviate the variations with set-level alignment by reducing the distance between the entire RGB and IR sets. However, this set-level alignment may lead to misalignment of some instances, which limits the performance for RGB-IR Re-ID. Different from existing methods, in this paper, we propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments. Our proposed method enjoys several merits. First, our method can perform set-level alignment by disentangling modality-specific and modality-invariant features. Compared with conventional methods, ours can explicitly remove the modality-specific features and the modality variation can be better reduced. Second, given cross-modality unpaired-images of a person, our method can generate cross-modality paired images from exchanged images. With them, we can directly perform instance-level alignment by minimizing distances of every pair of images. Extensive experimental results on two standard benchmarks demonstrate that the proposed model favourably against state-of-the-art methods. Especially, on SYSU-MM01 dataset, our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and mAP. Code is available at https://github.com/wangguanan/JSIA-ReID.

【Keywords】:

1489. Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries.

Paper Link】 【Pages】:12152-12159

【Authors】: Hao Wang ; Cheng Deng ; Fan Ma ; Yi Yang

【Abstract】: Actor and action video segmentation with language queries aims to segment out the expression referred objects in the video. This process requires comprehensive language reasoning and fine-grained video understanding. Previous methods mainly leverage dynamic convolutional networks to match visual and semantic representations. However, the dynamic convolution neglects spatial context when processing each region in the frame and is thus challenging to segment similar objects in the complex scenarios. To address such limitation, we construct a context modulated dynamic convolutional network. Specifically, we propose a context modulated dynamic convolutional operation in the proposed framework. The kernels for the specific region are generated from both language sentences and surrounding context features. Moreover, we devise a temporal encoder to incorporate motions into the visual features to further match the query descriptions. Extensive experiments on two benchmark datasets, Actor-Action Dataset Sentences (A2D Sentences) and J-HMDB Sentences, demonstrate that our proposed approach notably outperforms state-of-the-art methods.

【Keywords】:

1490. All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting.

Paper Link】 【Pages】:12160-12167

【Authors】: Hao Wang ; Pu Lu ; Hui Zhang ; Mingkun Yang ; Xiang Bai ; Yongchao Xu ; Mengchao He ; Yongpan Wang ; Wenyu Liu

【Abstract】: Recently, end-to-end text spotting that aims to detect and recognize text from cluttered images simultaneously has received particularly growing interest in computer vision. Different from the existing approaches that formulate text detection as bounding box extraction or instance segmentation, we localize a set of points on the boundary of each text instance. With the representation of such boundary points, we establish a simple yet effective scheme for end-to-end text spotting, which can read the text of arbitrary shapes. Experiments on three challenging datasets, including ICDAR2015, TotalText and COCO-Text demonstrate that the proposed method consistently surpasses the state-of-the-art in both scene text detection and end-to-end text recognition tasks.

【Keywords】:

1491. Temporally Grounding Language Queries in Videos by Contextual Boundary-Aware Prediction.

Paper Link】 【Pages】:12168-12175

【Authors】: Jingwen Wang ; Lin Ma ; Wenhao Jiang

【Abstract】: The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence). It requires certain models to simultaneously perform visual and linguistic understandings. Previous work predominantly ignores the precision of segment localization. Sliding window based methods use predefined search window sizes, which suffer from redundant computation, while existing anchor-based approaches fail to yield precise localization. We address this issue by proposing an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information. To better detect semantic boundaries, we propose to aggregate contextual information by explicitly modeling the relationship between the current element and its neighbors. The most confident segments are subsequently selected based on both anchor and boundary predictions at the testing stage. The proposed model, dubbed Contextual Boundary-aware Prediction (CBP), outperforms its competitors with a clear margin on three public datasets.

【Keywords】:

1492. Show, Recall, and Tell: Image Captioning with Recall Mechanism.

Paper Link】 【Pages】:12176-12183

【Authors】: Li Wang ; Zechen Bai ; Yonghua Zhang ; Hongtao Lu

【Abstract】: Generating natural and accurate descriptions in image captioning has always been a challenge. In this paper, we propose a novel recall mechanism to imitate the way human conduct captioning. There are three parts in our recall mechanism : recall unit, semantic guide (SG) and recalled-word slot (RWS). Recall unit is a text-retrieval module designed to retrieve recalled words for images. SG and RWS are designed for the best use of recalled words. SG branch can generate a recalled context, which can guide the process of generating caption. RWS branch is responsible for copying recalled words to the caption. Inspired by pointing mechanism in text summarization, we adopt a soft switch to balance the generated-word probabilities between SG and RWS. In the CIDEr optimization step, we also introduce an individual recalled-word reward (WR) to boost training. Our proposed methods (SG+RWS+WR) achieve BLEU-4 / CIDEr / SPICE scores of 36.6 / 116.9 / 21.3 with cross-entropy loss and 38.7 / 129.1 / 22.4 with CIDEr optimization on MSCOCO Karpathy test split, which surpass the results of other state-of-the-art methods.

【Keywords】:

1493. POST: POlicy-Based Switch Tracking.

Paper Link】 【Pages】:12184-12191

【Authors】: Ning Wang ; Wengang Zhou ; Guojun Qi ; Houqiang Li

【Abstract】: In visual object tracking, by reasonably fusing multiple experts, ensemble framework typically achieves superior performance compared to the individual experts. However, the necessity of parallelly running all the experts in most existing ensemble frameworks heavily limits their efficiency. In this paper, we propose POST, a POlicy-based Switch Tracker for robust and efficient visual tracking. The proposed POST tracker consists of multiple weak but complementary experts (trackers) and adaptively assigns one suitable expert for tracking in each frame. By formulating this expert switch in consecutive frames as a decision-making problem, we learn an agent via reinforcement learning to directly decide which expert to handle the current frame without running others. In this way, the proposed POST tracker maintains the performance merit of multiple diverse models while favorably ensuring the tracking efficiency. Extensive ablation studies and experimental comparisons against state-of-the-art trackers on 5 prevalent benchmarks verify the effectiveness of the proposed method.

【Keywords】:

1494. Sparsity-Inducing Binarized Neural Networks.

Paper Link】 【Pages】:12192-12199

【Authors】: Peisong Wang ; Xiangyu He ; Gang Li ; Tianli Zhao ; Jian Cheng

【Abstract】: Binarization of feature representation is critical for Binarized Neural Networks (BNNs). Currently, sign function is the commonly used method for feature binarization. Although it works well on small datasets, the performance on ImageNet remains unsatisfied. Previous methods mainly focus on minimizing quantization error, improving the training strategies and decomposing each convolution layer into several binary convolution modules. However, whether sign is the only option for binarization has been largely overlooked. In this work, we propose the Sparsity-inducing Binarized Neural Network (Si-BNN), to quantize the activations to be either 0 or +1, which introduces sparsity into binary representation. We further introduce trainable thresholds into the backward function of binarization to guide the gradient propagation. Our method dramatically outperforms current state-of-the-arts, lowering the performance gap between full-precision networks and BNNs on mainstream architectures, achieving the new state-of-the-art on binarized AlexNet (Top-1 50.5%), ResNet-18 (Top-1 59.7%), and VGG-Net (Top-1 63.2%). At inference time, Si-BNN still enjoys the high efficiency of exclusive-not-or (xnor) operations.

【Keywords】:

1495. Multi-Speaker Video Dialog with Frame-Level Temporal Localization.

Paper Link】 【Pages】:12200-12207

【Authors】: Qiang Wang ; Pin Jiang ; Zhiyi Guo ; Yahong Han ; Zhou Zhao

【Abstract】: To simulate human interaction in real life, dialog systems are introduced to generate a response to previous chat utterances. There have been several studies for two-speaker video dialogs in the form of question answering. However, more informative semantic cues might be exploited via a multi-rounds chatting or discussing about the video among multiple speakers. So multi-speakers video dialogs are more applicable in real life. Besides, speakers always chat about a sub-segment of the long video fragment for a period of time. Current video dialog systems require to be directly given the relevant video sub-segment which speakers are chatting about. However, it is always hard to accurately spot the corresponding video sub-segment in practical applications. In this paper, we introduce a novel task of Multi-Speaker Video Dialog with frame-level Temporal Localization (MSVD-TL) to make video dialog systems more applicable. Given a long video fragment and a set of chat history utterances, MSVD-TL targets to predict the following response and localize the relevant video sub-segment in frame level, simultaneously. We develop a new multi-task model with a response prediction module and a frame-level temporal localization module. Besides, we focus on the characteristic of the video dialog generation process and exploit the relation among the video fragment, the chat history, and the following response to refine their representations. We evaluate our approach for both the Multi-Speaker Video Dialog without frame-level temporal localization (MSVD w/o TL) task and the MSVD-TL task. The experimental results further demonstrate that MSVD-TL enhances the applicability of video dialog in real life.

【Keywords】:

1496. RDSNet: A New Deep Architecture forReciprocal Object Detection and Instance Segmentation.

Paper Link】 【Pages】:12208-12215

【Authors】: Shaoru Wang ; Yongchao Gong ; Junliang Xing ; Lichao Huang ; Chang Huang ; Weiming Hu

【Abstract】: Object detection and instance segmentation are two fundamental computer vision tasks. They are closely correlated but their relations have not yet been fully explored in most previous work. This paper presents RDSNet, a novel deep architecture for reciprocal object detection and instance segmentation. To reciprocate these two tasks, we design a two-stream structure to learn features on both the object level (i.e., bounding boxes) and the pixel level (i.e., instance masks) jointly. Within this structure, information from the two streams is fused alternately, namely information on the object level introduces the awareness of instance and translation variance to the pixel level, and information on the pixel level refines the localization accuracy of objects on the object level in return. Specifically, a correlation module and a cropping module are proposed to yield instance masks, as well as a mask based boundary refinement module for more accurate bounding boxes. Extensive experimental analyses and comparisons on the COCO dataset demonstrate the effectiveness and efficiency of RDSNet. The source code is available at https://github.com/wangsr126/RDSNet.

【Keywords】:

1497. Decoupled Attention Network for Text Recognition.

Paper Link】 【Pages】:12216-12224

【Authors】: Tianwei Wang ; Yuanzhi Zhu ; Lianwen Jin ; Canjie Luo ; Xiaoxue Chen ; Yaqiang Wu ; Qianying Wang ; Mingxiang Cai

【Abstract】: Text recognition has attracted considerable research interests because of its various applications. The cutting-edge text recognition methods are based on attention mechanisms. However, most of attention methods usually suffer from serious alignment problem due to its recurrency alignment operation, where the alignment relies on historical decoding results. To remedy this issue, we propose a decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results. DAN is an effective, flexible and robust end-to-end text recognizer, which consists of three components: 1) a feature encoder that extracts visual features from the input image; 2) a convolutional alignment module that performs the alignment operation based on visual features from the encoder; and 3) a decoupled text decoder that makes final prediction by jointly using the feature map and attention maps. Experimental results show that DAN achieves state-of-the-art performance on multiple text recognition tasks, including offline handwritten text recognition and regular/irregular scene text recognition. Codes will be released.1

【Keywords】:

1498. One-Shot Learning for Long-Tail Visual Relation Detection.

Paper Link】 【Pages】:12225-12232

【Authors】: Weitao Wang ; Meng Wang ; Sen Wang ; Guodong Long ; Lina Yao ; Guilin Qi ; Yang Chen

【Abstract】: The aim of visual relation detection is to provide a comprehensive understanding of an image by describing all the objects within the scene, and how they relate to each other, in < object-predicate-object > form; for example, < person-lean on-wall > . This ability is vital for image captioning, visual question answering, and many other applications. However, visual relationships have long-tailed distributions and, thus, the limited availability of training samples is hampering the practicability of conventional detection approaches. With this in mind, we designed a novel model for visual relation detection that works in one-shot settings. The embeddings of objects and predicates are extracted through a network that includes a feature-level attention mechanism. Attention alleviates some of the problems with feature sparsity, and the resulting representations capture more discriminative latent features. The core of our model is a dual graph neural network that passes and aggregates the context information of predicates and objects in an episodic training scheme to improve recognition of the one-shot predicates and then generate the triplets. To the best of our knowledge, we are the first to center on the viability of one-shot learning for visual relation detection. Extensive experiments on two newly-constructed datasets show that our model significantly improved the performance of two tasks PredCls and SGCls from 2.8% to 12.2% compared with state-of-the-art baselines.

【Keywords】:

1499. Consistent Video Style Transfer via Compound Regularization.

Paper Link】 【Pages】:12233-12240

【Authors】: Wenjing Wang ; Jizheng Xu ; Li Zhang ; Yue Wang ; Jiaying Liu

【Abstract】: Recently, neural style transfer has drawn many attentions and significant progresses have been made, especially for image style transfer. However, flexible and consistent style transfer for videos remains a challenging problem. Existing training strategies, either using a significant amount of video data with optical flows or introducing single-frame regularizers, have limited performance on real videos. In this paper, we propose a novel interpretation of temporal consistency, based on which we analyze the drawbacks of existing training strategies; and then derive a new compound regularization. Experimental results show that the proposed regularization can better balance the spatial and temporal performance, which supports our modeling. Combining with the new cost formula, we design a zero-shot video style transfer framework. Moreover, for better feature migration, we introduce a new module to dynamically adjust inter-channel distributions. Quantitative and qualitative results demonstrate the superiority of our method over other state-of-the-art style transfer methods. Our project is publicly available at: https://daooshee.github.io/CompoundVST/.

【Keywords】:

1500. Mis-Classified Vector Guided Softmax Loss for Face Recognition.

Paper Link】 【Pages】:12241-12248

【Authors】: Xiaobo Wang ; Shifeng Zhang ; Shuo Wang ; Tianyu Fu ; Hailin Shi ; Tao Mei

【Abstract】: Face recognition has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs), the central task of which is how to improve the feature discrimination. To this end, several margin-based (e.g., angular, additive and additive angular margins) softmax loss functions have been proposed to increase the feature margin between different classes. However, despite great achievements have been made, they mainly suffer from three issues: 1) Obviously, they ignore the importance of informative features mining for discriminative learning; 2) They encourage the feature margin only from the ground truth class, without realizing the discriminability from other non-ground truth classes; 3) The feature margin between different classes is set to be same and fixed, which may not adapt the situations very well. To cope with these issues, this paper develops a novel loss function, which adaptively emphasizes the mis-classified feature vectors to guide the discriminative feature learning. Thus we can address all the above issues and achieve more discriminative face features. To the best of our knowledge, this is the first attempt to inherit the advantages of feature margin and feature mining into a unified loss function. Experimental results on several benchmarks have demonstrated the effectiveness of our method over state-of-the-art alternatives. Our code is available at http://www.cbsr.ia.ac.cn/users/xiaobowang/.

【Keywords】:

1501. Symbiotic Attention with Privileged Information for Egocentric Action Recognition.

Paper Link】 【Pages】:12249-12256

【Authors】: Xiaohan Wang ; Yu Wu ; Linchao Zhu ; Yi Yang

【Abstract】: Egocentric video recognition is a natural testbed for diverse interaction reasoning. Due to the large action vocabulary in egocentric video datasets, recent studies usually utilize a two-branch structure for action recognition, i.e., one branch for verb classification and the other branch for noun classification. However, correlation study between the verb and the noun branches have been largely ignored. Besides, the two branches fail to exploit local features due to the absence of position-aware attention mechanism. In this paper, we propose a novel Symbiotic Attention framework leveraging Privileged information (SAP) for egocentric video recognition. Finer position-aware object detection features can facilitate the understanding of actor's interaction with the object. We introduce these features in action recognition and regard them as privileged information. Our framework enables mutual communication among the verb branch, the noun branch, and the privileged information. This communication process not only injects local details into global features, but also exploits implicit guidance about the spatio-temporal position of an on-going action. We introduce a novel symbiotic attention (SA) to enable effective communication. It first normalizes the detection guided features on one branch to underline the action-relevant information from the other branch. SA adaptively enhances the interactions among the three sources. To further catalyze this communication, spatial relations are uncovered for the selection of most action-relevant information. It identifies the most valuable and discriminative feature for classification. We validate the effectiveness of our SAP quantitatively and qualitatively. Notably, it achieves the state-of-the-art on two large-scale egocentric video datasets.

【Keywords】:

1502. Task-Aware Monocular Depth Estimation for 3D Object Detection.

Paper Link】 【Pages】:12257-12264

【Authors】: Xinlong Wang ; Wei Yin ; Tao Kong ; Yuning Jiang ; Lei Li ; Chunhua Shen

【Abstract】: Monocular depth estimation enables 3D perception from a single 2D image, thus attracting much research attention for years. Almost all methods treat foreground and background regions (“things and stuff”) in an image equally. However, not all pixels are equal. Depth of foreground objects plays a crucial role in 3D object recognition and localization. To date how to boost the depth prediction accuracy of foreground objects is rarely discussed. In this paper, we first analyze the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground and background depth using separate optimization objectives and decoders. Our method significantly improves the depth estimation performance on foreground objects. Applying ForeSeE to 3D object detection, we achieve 7.5 AP gains and set new state-of-the-art results among other monocular methods. Code will be available at: https://github.com/WXinlong/ForeSeE.

【Keywords】:

1503. Multi-Label Classification with Label Graph Superimposing.

Paper Link】 【Pages】:12265-12272

【Authors】: Ya Wang ; Dongliang He ; Fu Li ; Xiang Long ; Zhichao Zhou ; Jinwen Ma ; Shilei Wen

【Abstract】: Images or videos always contain multiple objects or actions. Multi-label recognition has been witnessed to achieve pretty performance attribute to the rapid development of deep learning technologies. Recently, graph convolution network (GCN) is leveraged to boost the performance of multi-label recognition. However, what is the best way for label correlation modeling and how feature learning can be improved with label system awareness are still unclear. In this paper, we propose a label graph superimposing framework to improve the conventional GCN+CNN framework developed for multi-label recognition in the following two aspects. Firstly, we model the label correlations by superimposing label graph built from statistical co-occurrence information into the graph constructed from knowledge priors of labels, and then multi-layer graph convolutions are applied on the final superimposed graph for label embedding abstraction. Secondly, we propose to leverage embedding of the whole label system for better representation learning. In detail, lateral connections between GCN and CNN are added at shallow, middle and deep layers to inject information of label system into backbone CNN for label-awareness in the feature learning process. Extensive experiments are carried out on MS-COCO and Charades datasets, showing that our proposed solution can greatly improve the recognition performance and achieves new state-of-the-art recognition performance.

【Keywords】:

1504. Pruning from Scratch.

Paper Link】 【Pages】:12273-12280

【Authors】: Yulong Wang ; Xiaolu Zhang ; Lingxi Xie ; Jun Zhou ; Hang Su ; Bo Zhang ; Xiaolin Hu

【Abstract】: Network pruning is an important research field aiming at reducing computational costs of neural networks. Conventional approaches follow a fixed paradigm which first trains a large and redundant network, and then determines which units (e.g., channels) are less important and thus can be removed. In this work, we find that pre-training an over-parameterized model is not necessary for obtaining the target pruned structure. In fact, a fully-trained over-parameterized model will reduce the search space for the pruned structure. We empirically show that more diverse pruned structures can be directly pruned from randomly initialized weights, including potential models with better performance. Therefore, we propose a novel network pruning pipeline which allows pruning from scratch with little training overhead. In the experiments for compressing classification models on CIFAR10 and ImageNet datasets, our approach not only greatly reduces the pre-training burden of traditional pruning methods, but also achieves similar or even higher accuracy under the same computation budgets. Our results facilitate the community to rethink the effectiveness of existing techniques used for network pruning.

【Keywords】:

1505. Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions.

Paper Link】 【Pages】:12281-12288

【Authors】: Zhenyi Wang ; Ping Yu ; Yang Zhao ; Ruiyi Zhang ; Yufan Zhou ; Junsong Yuan ; Changyou Chen

【Abstract】: Human-motion generation is a long-standing challenging task due to the requirement of accurately modeling complex and diverse dynamic patterns. Most existing methods adopt sequence models such as RNN to directly model transitions in the original action space. Due to high dimensionality and potential noise, such modeling of action transitions is particularly challenging. In this paper, we focus on skeleton-based action generation and propose to model smooth and diverse transitions on a latent space of action sequences with much lower dimensionality. Conditioned on a latent sequence, actions are generated by a frame-wise decoder shared by all latent action-poses. Specifically, an implicit RNN is defined to model smooth latent sequences, whose randomness (diversity) is controlled by noise from the input. Different from standard action-prediction methods, our model can generate action sequences from pure noise without any conditional action poses. Remarkably, it can also generate unseen actions from mixed classes during training. Our model is learned with a bi-directional generative-adversarial-net framework, which can not only generate diverse action sequences of a particular class or mix classes, but also learns to classify action sequences within the same model. Experimental results show the superiority of our method in both diverse action-sequence generation and classification, relative to existing methods.

【Keywords】:

1506. Graph-Propagation Based Correlation Learning for Weakly Supervised Fine-Grained Image Classification.

Paper Link】 【Pages】:12289-12296

【Authors】: Zhuhui Wang ; Shijie Wang ; Haojie Li ; Zhi Dou ; Jianjun Li

【Abstract】: The key of Weakly Supervised Fine-grained Image Classification (WFGIC) is how to pick out the discriminative regions and learn the discriminative features from them. However, most recent WFGIC methods pick out the discriminative regions independently and utilize their features directly, while neglecting the facts that regions' features are mutually semantic correlated and region groups can be more discriminative. To address these issues, we propose an end-to-end Graph-propagation based Correlation Learning (GCL) model to fully mine and exploit the discriminative potentials of region correlations for WFGIC. Specifically, in discriminative region localization phase, a Criss-cross Graph Propagation (CGP) sub-network is proposed to learn region correlations, which establishes correlation between regions and then enhances each region by weighted aggregating other regions in a criss-cross way. By this means each region's representation encodes the global image-level context and local spatial context simultaneously, thus the network is guided to implicitly discover the more powerful discriminative region groups for WFGIC. In discriminative feature representation phase, the Correlation Feature Strengthening (CFS) sub-network is proposed to explore the internal semantic correlation among discriminative patches' feature vectors, to improve their discriminative power by iteratively enhancing informative elements while suppressing the useless ones. Extensive experiments demonstrate the effectiveness of proposed CGP and CFS sub-networks, and show that the GCL model achieves better performance both in accuracy and efficiency.

【Keywords】:

1507. Localize, Assemble, and Predicate: Contextual Object Proposal Embedding for Visual Relation Detection.

Paper Link】 【Pages】:12297-12304

【Authors】: Ruihai Wu ; Kehan Xu ; Chenchen Liu ; Nan Zhuang ; Yadong Mu

【Abstract】: Visual relation detection (VRD) aims to describe all interacting objects in an image using subject-predicate-object triplets. Critically, valid relations combinatorially grow in O(C2 R) for C object categories and R relationships. The frequencies of relation triplets exhibit a long-tailed distribution, which inevitably leads to bias towards popular visual relations in the learned VRD model. To address this problem, we propose localize-assemble-predicate network (LAP-Net), which decomposes VRD into three sub-tasks: localizing individual objects, assembling and predicting the subject-object pairs. In the first stage of LAP-Net, Region Proposal Network (RPN) is used to generate a few class-agnostic object proposals. Next, these proposals are assembled to form subject-object pairs via a second Pair Proposal Network (PPN), in which we propose a novel contextual embedding scheme. The inner product between embedded representations faithfully reflects the compatibility between a pair of proposals, without estimating object and subject class. Top-ranked pairs from stage two are fed into a third sub-network, which precisely estimates the relationship. The whole pipeline except for the last stage is object-category-agnostic in localizing relationships in an image, alleviating the bias in popular relations induced by training data. Our LAP-Net can be trained in an end-to-end fashion. We demonstrate that LAP-Net achieves state-of-the-art performance on the VRD benchmark while maintaining high speed in inference.

【Keywords】:

1508. EFANet: Exchangeable Feature Alignment Network for Arbitrary Style Transfer.

Paper Link】 【Pages】:12305-12312

【Authors】: Zhijie Wu ; Chunjin Song ; Yang Zhou ; Minglun Gong ; Hui Huang

【Abstract】: Style transfer has been an important topic both in computer vision and graphics. Since the seminal work of Gatys et al. first demonstrates the power of stylization through optimization in the deep feature space, quite a few approaches have achieved real-time arbitrary style transfer with straightforward statistic matching techniques. In this work, our key observation is that only considering features in the input style image for the global deep feature statistic matching or local patch swap may not always ensure a satisfactory style transfer; see e.g., Figure 1. Instead, we propose a novel transfer framework, EFANet, that aims to jointly analyze and better align exchangeable features extracted from the content and style image pair. In this way, the style feature from the style image seeks for the best compatibility with the content information in the content image, leading to more structured stylization results. In addition, a new whitening loss is developed for purifying the computed content features and better fusion with styles in feature space. Qualitative and quantitative experiments demonstrate the advantages of our approach.

【Keywords】:

1509. Adaptive Cross-Modal Embeddings for Image-Text Alignment.

Paper Link】 【Pages】:12313-12320

【Authors】: Jonatas Wehrmann ; Camila Kolling ; Rodrigo C. Barros

【Abstract】:

【Keywords】:

1510. F³Net: Fusion, Feedback and Focus for Salient Object Detection.

Paper Link】 【Pages】:12321-12328

【Authors】: Jun Wei ; Shuhui Wang ; Qingming Huang

【Abstract】: Most of existing salient object detection models have achieved great progress by aggregating multi-level features extracted from convolutional neural networks. However, because of the different receptive fields of different convolutional layers, there exists big differences between features generated by these layers. Common feature fusion strategies (addition or concatenation) ignore these differences and may cause suboptimal solutions. In this paper, we propose the F3Net to solve above problem, which mainly consists of cross feature module (CFM) and cascaded feedback decoder (CFD) trained by minimizing a new pixel position aware loss (PPA). Specifically, CFM aims to selectively aggregate multi-level features. Different from addition and concatenation, CFM adaptively selects complementary components from input features before fusion, which can effectively avoid introducing too much redundant information that may destroy the original features. Besides, CFD adopts a multi-stage feedback mechanism, where features closed to supervision will be introduced to the output of previous layers to supplement them and eliminate the differences between features. These refined features will go through multiple similar iterations before generating the final saliency maps. Furthermore, different from binary cross entropy, the proposed PPA loss doesn't treat pixels equally, which can synthesize the local structure information of a pixel to guide the network to focus more on local details. Hard pixels from boundaries or error-prone parts will be given more attention to emphasize their importance. F3Net is able to segment salient object regions accurately and provide clear local details. Comprehensive experiments on five benchmark datasets demonstrate that F3Net outperforms state-of-the-art approaches on six evaluation metrics. Code will be released at https://github.com/weijun88/F3Net.

【Keywords】:

1511. 3D Single-Person Concurrent Activity Detection Using Stacked Relation Network.

Paper Link】 【Pages】:12329-12337

【Authors】: Yi Wei ; Wenbo Li ; Yanbo Fan ; Linghan Xu ; Ming-Ching Chang ; Siwei Lyu

【Abstract】: We aim to detect real-world concurrent activities performed by a single person from a streaming 3D skeleton sequence. Different from most existing works that deal with concurrent activities performed by multiple persons that are seldom correlated, we focus on concurrent activities that are spatio-temporally or causally correlated and performed by a single person. For the sake of generalization, we propose an approach based on a decompositional design to learn a dedicated feature representation for each activity class. To address the scalability issue, we further extend the class-level decompositional design to the postural-primitive level, such that each class-wise representation does not need to be extracted by independent backbones, but through a dedicated weighted aggregation of a shared pool of postural primitives. There are multiple interdependent instances deriving from each decomposition. Thus, we propose Stacked Relation Networks (SRN), with a specialized relation network for each decomposition, so as to enhance the expressiveness of instance-wise representations via the inter-instance relationship modeling. SRN achieves state-of-the-art performance on a public dataset and a newly collected dataset. The relation weights within SRN are interpretable among the activity contexts. The new dataset and code are available at https://github.com/weiyi1991/UA_Concurrent/

【Keywords】:

1512. Heuristic Black-Box Adversarial Attacks on Video Recognition Models.

Paper Link】 【Pages】:12338-12345

【Authors】: Zhipeng Wei ; Jingjing Chen ; Xingxing Wei ; Linxi Jiang ; Tat-Seng Chua ; Fengfeng Zhou ; Yu-Gang Jiang

【Abstract】: We study the problem of attacking video recognition models in the black-box setting, where the model information is unknown and the adversary can only make queries to detect the predicted top-1 class and its probability. Compared with the black-box attack on images, attacking videos is more challenging as the computation cost for searching the adversarial perturbations on a video is much higher due to its high dimensionality. To overcome this challenge, we propose a heuristic black-box attack model that generates adversarial perturbations only on the selected frames and regions. More specifically, a heuristic-based algorithm is proposed to measure the importance of each frame in the video towards generating the adversarial examples. Based on the frames' importance, the proposed algorithm heuristically searches a subset of frames where the generated adversarial example has strong adversarial attack ability while keeps the perturbations lower than the given bound. Besides, to further boost the attack efficiency, we propose to generate the perturbations only on the salient regions of the selected frames. In this way, the generated perturbations are sparse in both temporal and spatial domains. Experimental results of attacking two mainstream video recognition methods on the UCF-101 dataset and the HMDB-51 dataset demonstrate that the proposed heuristic black-box adversarial attack method can significantly reduce the computation cost and lead to more than 28% reduction in query numbers for the untargeted attack on both datasets.

【Keywords】:

1513. Efficient Querying from Weighted Binary Codes.

Paper Link】 【Pages】:12346-12353

【Authors】: Zhenyu Weng ; Yuesheng Zhu

【Abstract】: Binary codes are widely used to represent the data due to their small storage and efficient computation. However, there exists an ambiguity problem that lots of binary codes share the same Hamming distance to a query. To alleviate the ambiguity problem, weighted binary codes assign different weights to each bit of binary codes and compare the binary codes by the weighted Hamming distance. Till now, performing the querying from the weighted binary codes efficiently is still an open issue. In this paper, we propose a new method to rank the weighted binary codes and return the nearest weighted binary codes of the query efficiently. In our method, based on the multi-index hash tables, two algorithms, the table bucket finding algorithm and the table merging algorithm, are proposed to select the nearest weighted binary codes of the query in a non-exhaustive and accurate way. The proposed algorithms are justified by proving their theoretic properties. The experiments on three large-scale datasets validate both the search efficiency and the search accuracy of our method. Especially for the number of weighted binary codes up to one billion, our method shows a great improvement of more than 1000 times faster than the linear scan.

【Keywords】:

1514. Online Hashing with Efficient Updating of Binary Codes.

Paper Link】 【Pages】:12354-12361

【Authors】: Zhenyu Weng ; Yuesheng Zhu

【Abstract】: Online hashing methods are efficient in learning the hash functions from the streaming data. However, when the hash functions change, the binary codes for the database have to be recomputed to guarantee the retrieval accuracy. Recomputing the binary codes by accumulating the whole database brings a timeliness challenge to the online retrieval process. In this paper, we propose a novel online hashing framework to update the binary codes efficiently without accumulating the whole database. In our framework, the hash functions are fixed and the projection functions are introduced to learn online from the streaming data. Therefore, inefficient updating of the binary codes by accumulating the whole database can be transformed to efficient updating of the binary codes by projecting the binary codes into another binary space. The queries and the binary code database are projected asymmetrically to further improve the retrieval accuracy. The experiments on two multi-label image databases demonstrate the effectiveness and the efficiency of our method for multi-label image retrieval.

【Keywords】:

1515. Tracklet Self-Supervised Learning for Unsupervised Person Re-Identification.

Paper Link】 【Pages】:12362-12369

【Authors】: Guile Wu ; Xiatian Zhu ; Shaogang Gong

【Abstract】: Existing unsupervised person re-identification (re-id) methods mainly focus on cross-domain adaptation or one-shot learning. Although they are more scalable than the supervised learning counterparts, relying on a relevant labelled source domain or one labelled tracklet per person initialisation still restricts their scalability in real-world deployments. To alleviate these problems, some recent studies develop unsupervised tracklet association and bottom-up image clustering methods, but they still rely on explicit camera annotation or merely utilise suboptimal global clustering. In this work, we formulate a novel tracklet self-supervised learning (TSSL) method, which is capable of capitalising directly from abundant unlabelled tracklet data, to optimise a feature embedding space for both video and image unsupervised re-id. This is achieved by designing a comprehensive unsupervised learning objective that accounts for tracklet frame coherence, tracklet neighbourhood compactness, and tracklet cluster structure in a unified formulation. As a pure unsupervised learning re-id model, TSSL is end-to-end trainable at the absence of source data annotation, person identity labels, and camera prior knowledge. Extensive experiments demonstrate the superiority of TSSL over a wide variety of the state-of-the-art alternative methods on four large-scale person re-id benchmarks, including Market-1501, DukeMTMC-ReID, MARS and DukeMTMC-VideoReID.

【Keywords】:

1516. CircleNet for Hip Landmark Detection.

Paper Link】 【Pages】:12370-12377

【Authors】: Hai Wu ; Hongtao Xie ; Chuanbin Liu ; Zheng-Jun Zha ; Jun Sun ; Yongdong Zhang

【Abstract】: Landmark detection plays a critical role in diagnosis of Developmental Dysplasia of the Hip (DDH). Heatmap and anchor-based object detection techniques could obtain reasonable results. However, they have limitations in both robustness and precision given the complexities and inhomogeneity of hip X-ray images. In this paper, we propose a much simpler and more efficient framework called CircleNet to improve the accuracy of landmark detection by predicting landmark and corresponding radius. Using the CircleNet, we not only constrain the relationship between landmarks but also integrate landmark detection and object detection into an end-to-end framework. In order to capture the effective information of the long-range dependency of landmarks in the DDH image, here we propose a new context modeling framework, named the Local Non-Local (LNL) block. The LNL block has the benefits of both non-local block and lightweight computation. We construct a professional DDH dataset for the first time and evaluate our CircleNet on it. The dataset has the largest number of DDH X-ray images in the world to our knowledge. Our results show that the CircleNet can achieve the state-of-the-art results for landmark detection on the dataset with a large margin of 1.8 average pixels compared to current methods. The dataset and source code will be publicly available.

【Keywords】:

1517. 3D Human Pose Estimation via Explicit Compositional Depth Maps.

Paper Link】 【Pages】:12378-12385

【Authors】: Haiping Wu ; Bin Xiao

【Abstract】: In this work, we tackle the problem of estimating 3D human pose in camera space from a monocular image. First, we propose to use densely-generated limb depth maps to ease the learning of body joints depth, which are well aligned with image cues. Then, we design a lifting module from 2D pixel coordinates to 3D camera coordinates which explicitly takes the depth values as inputs, and is aligned with camera perspective projection model. We show our method achieves superior performance on large-scale 3D pose datasets Human3.6M and MPI-INF-3DHP, and sets the new state-of-the-art.

【Keywords】:

1518. Tree-Structured Policy Based Progressive Reinforcement Learning for Temporally Language Grounding in Video.

Paper Link】 【Pages】:12386-12393

【Authors】: Jie Wu ; Guanbin Li ; Si Liu ; Liang Lin

【Abstract】: Temporally language grounding in untrimmed videos is a newly-raised task in video understanding. Most of the existing methods suffer from inferior efficiency, lacking interpretability, and deviating from the human perception mechanism. Inspired by human's coarse-to-fine decision-making paradigm, we formulate a novel Tree-Structured Policy based Progressive Reinforcement Learning (TSP-PRL) framework to sequentially regulate the temporal boundary by an iterative refinement process. The semantic concepts are explicitly represented as the branches in the policy, which contributes to efficiently decomposing complex policies into an interpretable primitive action. Progressive reinforcement learning provides correct credit assignment via two task-oriented rewards that encourage mutual promotion within the tree-structured policy. We extensively evaluate TSP-PRL on the Charades-STA and ActivityNet datasets, and experimental results show that TSP-PRL achieves competitive performance over existing state-of-the-art methods.

【Keywords】:

1519. Distraction-Aware Feature Learning for Human Attribute Recognition via Coarse-to-Fine Attention Mechanism.

Paper Link】 【Pages】:12394-12401

【Authors】: Mingda Wu ; Di Huang ; Yuanfang Guo ; Yunhong Wang

【Abstract】: Recently, Human Attribute Recognition (HAR) has become a hot topic due to its scientific challenges and application potentials, where localizing attributes is a crucial stage but not well handled. In this paper, we propose a novel deep learning approach to HAR, namely Distraction-aware HAR (Da-HAR). It enhances deep CNN feature learning by improving attribute localization through a coarse-to-fine attention mechanism. At the coarse step, a self-mask block is built to roughly discriminate and reduce distractions, while at the fine step, a masked attention branch is applied to further eliminate irrelevant regions. Thanks to this mechanism, feature learning is more accurate, especially when heavy occlusions and complex backgrounds exist. Extensive experiments are conducted on the WIDER-Attribute and RAP databases, and state-of-the-art results are achieved, demonstrating the effectiveness of the proposed approach.

【Keywords】:

1520. Patch Proposal Network for Fast Semantic Segmentation of High-Resolution Images.

Paper Link】 【Pages】:12402-12409

【Authors】: Tong Wu ; Zhenzhen Lei ; Bingqian Lin ; Cuihua Li ; Yanyun Qu ; Yuan Xie

【Abstract】: Despite recent progress on the segmentation of high-resolution images, there exist an unsolved problem, i.e., the trade-off among the segmentation accuracy, memory resources and inference speed. So far, GLNet is introduced for high or ultra-resolution image segmentation, which has reduced the computational memory of the segmentation network. However, it ignores the importances of different cropped patches, and treats tiled patches equally for fusion with the whole image, resulting in high computational cost. To solve this problem, we introduce a patch proposal network (PPN) in this paper, which adaptively distinguishes the critical patches from the trivial ones to fuse with the whole image for refining segmentation. PPN is a classification network which alleviates network training burden and improves segmentation accuracy. We further embed PPN in a global-local segmentation network, instructing global branch and refinement branch to work collaboratively. We implement our method on four image datasets:DeepGlobe, ISIC, CRAG and Cityscapes, the first two are ultra-resolution image datasets and the last two are high-resolution image datasets. The experimental results show that our method achieves almost the best segmentation performance compared with the state-of-the-art segmentation methods and the inference speed is 12.9 fps on DeepGlobe and 10 fps on ISIC. Moreover, we embed PPN with the general semantic segmentation network and the experimental results on Cityscapes which contains more object classes demonstrate the generalization ability on general semantic segmentation.

【Keywords】:

1521. SalSAC: A Video Saliency Prediction Model with Shuffled Attentions and Correlation-Based ConvLSTM.

Paper Link】 【Pages】:12410-12417

【Authors】: Xinyi Wu ; Zhenyao Wu ; Jinglin Zhang ; Lili Ju ; Song Wang

【Abstract】: The performance of predicting human fixations in videos has been much enhanced with the help of development of the convolutional neural networks (CNN). In this paper, we propose a novel end-to-end neural network “SalSAC” for video saliency prediction, which uses the CNN-LSTM-Attention as the basic architecture and utilizes the information from both static and dynamic aspects. To better represent the static information of each frame, we first extract multi-level features of same size from different layers of the encoder CNN and calculate the corresponding multi-level attentions, then we randomly shuffle these attention maps among levels and multiply them to the extracted multi-level features respectively. Through this way, we leverage the attention consistency across different layers to improve the robustness of the network. On the dynamic aspect, we propose a correlation-based ConvLSTM to appropriately balance the influence of the current and preceding frames to the prediction. Experimental results on the DHF1K, Hollywood2 and UCF-sports datasets show that SalSAC outperforms many existing state-of-the-art methods.

【Keywords】:

1522. Recognizing Instagram Filtered Images with Feature De-Stylization.

Paper Link】 【Pages】:12418-12425

【Authors】: Zhe Wu ; Zuxuan Wu ; Bharat Singh ; Larry S. Davis

【Abstract】: Deep neural networks have been shown to suffer from poor generalization when small perturbations are added (like Gaussian noise), yet little work has been done to evaluate their robustness to more natural image transformations like photo filters. This paper presents a study on how popular pretrained models are affected by commonly used Instagram filters. To this end, we introduce ImageNet-Instagram, a filtered version of ImageNet, where 20 popular Instagram filters are applied to each image in ImageNet. Our analysis suggests that simple structure preserving filters which only alter the global appearance of an image can lead to large differences in the convolutional feature space. To improve generalization, we introduce a lightweight de-stylization module that predicts parameters used for scaling and shifting feature maps to “undo” the changes incurred by filters, inverting the process of style transfer tasks. We further demonstrate the module can be readily plugged into modern CNN architectures together with skip connections. We conduct extensive studies on ImageNet-Instagram, and show quantitatively and qualitatively, that the proposed module, among other things, can effectively improve generalization by simply learning normalization parameters without retraining the entire network, thus recovering the alterations in the feature space caused by the filters.

【Keywords】:

1523. Convolutional Hierarchical Attention Network for Query-Focused Video Summarization.

Paper Link】 【Pages】:12426-12433

【Authors】: Shuwen Xiao ; Zhou Zhao ; Zijian Zhang ; Xiaohui Yan ; Min Yang

【Abstract】: Previous approaches for video summarization mainly concentrate on finding the most diverse and representative visual contents as video summary without considering the user's preference. This paper addresses the task of query-focused video summarization, which takes user's query and a long video as inputs and aims to generate a query-focused video summary. In this paper, we consider the task as a problem of computing similarity between video shots and query. To this end, we propose a method, named Convolutional Hierarchical Attention Network (CHAN), which consists of two parts: feature encoding network and query-relevance computing module. In the encoding network, we employ a convolutional network with local self-attention mechanism and query-aware global attention mechanism to learns visual information of each shot. The encoded features will be sent to query-relevance computing module to generate query-focused video summary. Extensive experiments on the benchmark dataset demonstrate the competitive performance and show the effectiveness of our approach.

【Keywords】:

1524. Adversarial Learning of Privacy-Preserving and Task-Oriented Representations.

Paper Link】 【Pages】:12434-12441

【Authors】: Taihong Xiao ; Yi-Hsuan Tsai ; Kihyuk Sohn ; Manmohan Chandraker ; Ming-Hsuan Yang

【Abstract】: Data privacy has emerged as an important issue as data-driven deep learning has been an essential component of modern machine learning systems. For instance, there could be a potential privacy risk of machine learning systems via the model inversion attack, whose goal is to reconstruct the input data from the latent representation of deep networks. Our work aims at learning a privacy-preserving and task-oriented representation to defend against such model inversion attacks. Specifically, we propose an adversarial reconstruction learning framework that prevents the latent representations decoded into original input data. By simulating the expected behavior of adversary, our framework is realized by minimizing the negative pixel reconstruction loss or the negative feature reconstruction (i.e., perceptual distance) loss. We validate the proposed method on face attribute prediction, showing that our method allows protecting visual privacy with a small decrease in utility performance. In addition, we show the utility-privacy trade-off with different choices of hyperparameter for negative perceptual distance loss at training, allowing service providers to determine the right level of privacy-protection with a certain utility performance. Moreover, we provide an extensive study with different selections of features, tasks, and the data to further analyze their influence on privacy protection.

【Keywords】:

1525. Motion-Based Generator Model: Unsupervised Disentanglement of Appearance, Trackable and Intrackable Motions in Dynamic Patterns.

Paper Link】 【Pages】:12442-12451

【Authors】: Jianwen Xie ; Ruiqi Gao ; Zilong Zheng ; Song-Chun Zhu ; Ying Nian Wu

【Abstract】: Dynamic patterns are characterized by complex spatial and motion patterns. Understanding dynamic patterns requires a disentangled representational model that separates the factorial components. A commonly used model for dynamic patterns is the state space model, where the state evolves over time according to a transition model and the state generates the observed image frames according to an emission model. To model the motions explicitly, it is natural for the model to be based on the motions or the displacement fields of the pixels. Thus in the emission model, we let the hidden state generate the displacement field, which warps the trackable component in the previous image frame to generate the next frame while adding a simultaneously emitted residual image to account for the change that cannot be explained by the deformation. The warping of the previous image is about the trackable part of the change of image frame, while the residual image is about the intrackable part of the image. We use a maximum likelihood algorithm to learn the model parameters that iterates between inferring latent noise vectors that drive the transition model and updating the parameters given the inferred latent vectors. Meanwhile we adopt a regularization term to penalize the norms of the residual images to encourage the model to explain the change of image frames by trackable motion. Unlike existing methods on dynamic patterns, we learn our model in unsupervised setting without ground truth displacement fields or optical flows. In addition, our model defines a notion of intrackability by the separation of warped component and residual component in each image frame. We show that our method can synthesize realistic dynamic pattern, and disentangling appearance, trackable and intrackable motions. The learned models can be useful for motion transfer, and it is natural to adopt it to define and measure intrackability of a dynamic pattern.

【Keywords】:

1526. Segmenting Medical MRI via Recurrent Decoding Cell.

Paper Link】 【Pages】:12452-12459

【Authors】: Ying Wen ; Kai Xie ; Lianghua He

【Abstract】: The encoder-decoder networks are commonly used in medical image segmentation due to their remarkable performance in hierarchical feature fusion. However, the expanding path for feature decoding and spatial recovery does not consider the long-term dependency when fusing feature maps from different layers, and the universal encoder-decoder network does not make full use of the multi-modality information to improve the network robustness especially for segmenting medical MRI. In this paper, we propose a novel feature fusion unit called Recurrent Decoding Cell (RDC) which leverages convolutional RNNs to memorize the long-term context information from the previous layers in the decoding phase. An encoder-decoder network, named Convolutional Recurrent Decoding Network (CRDN), is also proposed based on RDC for segmenting multi-modality medical MRI. CRDN adopts CNN backbone to encode image features and decode them hierarchically through a chain of RDCs to obtain the final high-resolution score map. The evaluation experiments on BrainWeb, MRBrainS and HVSMR datasets demonstrate that the introduction of RDC effectively improves the segmentation accuracy as well as reduces the model size, and the proposed CRDN owns its robustness to image noise and intensity non-uniformity in medical MRI.

【Keywords】:

1527. PI-RCNN: An Efficient Multi-Sensor 3D Object Detector with Point-Based Attentive Cont-Conv Fusion Module.

Paper Link】 【Pages】:12460-12467

【Authors】: Liang Xie ; Chao Xiang ; Zhengxu Yu ; Guodong Xu ; Zheng Yang ; Deng Cai ; Xiaofei He

【Abstract】: LIDAR point clouds and RGB-images are both extremely essential for 3D object detection. So many state-of-the-art 3D detection algorithms dedicate in fusing these two types of data effectively. However, their fusion methods based on Bird's Eye View (BEV) or voxel format are not accurate. In this paper, we propose a novel fusion approach named Point-based Attentive Cont-conv Fusion(PACF) module, which fuses multi-sensor features directly on 3D points. Except for continuous convolution, we additionally add a Point-Pooling and an Attentive Aggregation to make the fused features more expressive. Moreover, based on the PACF module, we propose a 3D multi-sensor multi-task network called Pointcloud-Image RCNN(PI-RCNN as brief), which handles the image segmentation and 3D object detection tasks. PI-RCNN employs a segmentation sub-network to extract full-resolution semantic feature maps from images and then fuses the multi-sensor features via powerful PACF module. Beneficial from the effectiveness of the PACF module and the expressive semantic features from the segmentation module, PI-RCNN can improve much in 3D object detection. We demonstrate the effectiveness of the PACF module and PI-RCNN on the KITTI 3D Detection benchmark, and our method can achieve state-of-the-art on the metric of 3D AP.

【Keywords】:

1528. Video Face Super-Resolution with Motion-Adaptive Feedback Cell.

Paper Link】 【Pages】:12468-12475

【Authors】: Jingwei Xin ; Nannan Wang ; Jie Li ; Xinbo Gao ; Zhifeng Li

【Abstract】: Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN). Current state-of-the-art CNN methods usually treat the VSR problem as a large number of separate multi-frame super-resolution tasks, at which a batch of low resolution (LR) frames is utilized to generate a single high resolution (HR) frame, and running a slide window to select LR frames over the entire video would obtain a series of HR frames. However, duo to the complex temporal dependency between frames, with the number of LR input frames increase, the performance of the reconstructed HR frames become worse. The reason is in that these methods lack the ability to model complex temporal dependencies and hard to give an accurate motion estimation and compensation for VSR process. Which makes the performance degrade drastically when the motion in frames is complex. In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way. Our approach efficiently utilizes the information of the inter-frame motion, the dependence of the network on motion estimation and compensation method can be avoid. In addition, benefiting from the excellent nature of MAFC, the network can achieve better performance in the case of extremely complex motion scenarios. Extensive evaluations and comparisons validate the strengths of our approach, and the experimental results demonstrated that the proposed framework is outperform the state-of-the-art methods.

【Keywords】:

1529. Facial Attribute Capsules for Noise Face Super Resolution.

Paper Link】 【Pages】:12476-12483

【Authors】: Jingwei Xin ; Nannan Wang ; Xinrui Jiang ; Jie Li ; Xinbo Gao ; Zhifeng Li

【Abstract】: Existing face super-resolution (SR) methods mainly assume the input image to be noise-free. Their performance degrades drastically when applied to real-world scenarios where the input image is always contaminated by noise. In this paper, we propose a Facial Attribute Capsules Network (FACN) to deal with the problem of high-scale super-resolution of noisy face image. Capsule is a group of neurons whose activity vector models different properties of the same entity. Inspired by the concept of capsule, we propose an integrated representation model of facial information, which named Facial Attribute Capsule (FAC). In the SR processing, we first generated a group of FACs from the input LR face, and then reconstructed the HR face from this group of FACs. Aiming to effectively improve the robustness of FAC to noise, we generate FAC in semantic, probabilistic and facial attributes manners by means of integrated learning strategy. Each FAC can be divided into two sub-capsules: Semantic Capsule (SC) and Probabilistic Capsule (PC). Them describe an explicit facial attribute in detail from two aspects of semantic representation and probability distribution. The group of FACs model an image as a combination of facial attribute information in the semantic space and probabilistic space by an attribute-disentangling way. The diverse FACs could better combine the face prior information to generate the face images with fine-grained semantic attributes. Extensive benchmark experiments show that our method achieves superior hallucination results and outperforms state-of-the-art for very low resolution (LR) noise face image super resolution.

【Keywords】:

1530. FusionDN: A Unified Densely Connected Network for Image Fusion.

Paper Link】 【Pages】:12484-12491

【Authors】: Han Xu ; Jiayi Ma ; Zhuliang Le ; Junjun Jiang ; Xiaojie Guo

【Abstract】: In this paper, we present a new unsupervised and unified densely connected network for different types of image fusion tasks, termed as FusionDN. In our method, the densely connected network is trained to generate the fused image conditioned on source images. Meanwhile, a weight block is applied to obtain two data-driven weights as the retention degrees of features in different source images, which are the measurement of the quality and the amount of information in them. Losses of similarities based on these weights are applied for unsupervised learning. In addition, we obtain a single model applicable to multiple fusion tasks by applying elastic weight consolidation to avoid forgetting what has been learned from previous tasks when training multiple tasks sequentially, rather than train individual models for every fusion task or jointly train tasks roughly. Qualitative and quantitative results demonstrate the advantages of FusionDN compared with state-of-the-art methods in different fusion tasks.

【Keywords】:

1531. Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN.

Paper Link】 【Pages】:12492-12499

【Authors】: Hang Xu ; Linpu Fang ; Xiaodan Liang ; Wenxiong Kang ; Zhenguo Li

【Abstract】: The dominant object detection approaches treat each dataset separately and fit towards a specific domain, which cannot adapt to other domains without extensive retraining. In this paper, we address the problem of designing a universal object detection model that exploits diverse category granularity from multiple domains and predict all kinds of categories in one system. Existing works treat this problem by integrating multiple detection branches upon one shared backbone network. However, this paradigm overlooks the crucial semantic correlations between multiple domains, such as categories hierarchy, visual similarity, and linguistic relationship. To address these drawbacks, we present a novel universal object detector called Universal-RCNN that incorporates graph transfer learning for propagating relevant semantic information across multiple datasets to reach semantic coherency. Specifically, we first generate a global semantic pool by integrating all high-level semantic representation of all the categories. Then an Intra-Domain Reasoning Module learns and propagates the sparse graph representation within one dataset guided by a spatial-aware GCN. Finally, an Inter-Domain Transfer Module is proposed to exploit diverse transfer dependencies across all domains and enhance the regional feature representation by attending and transferring semantic contexts globally. Extensive experiments demonstrate that the proposed method significantly outperforms multiple-branch models and achieves the state-of-the-art results on multiple object detection benchmarks (mAP: 49.1% on COCO).

【Keywords】:

1532. Geometry Sharing Network for 3D Point Cloud Classification and Segmentation.

Paper Link】 【Pages】:12500-12507

【Authors】: Mingye Xu ; Zhipeng Zhou ; Yu Qiao

【Abstract】: In spite of the recent progresses on classifying 3D point cloud with deep CNNs, large geometric transformations like rotation and translation remain challenging problem and harm the final classification performance. To address this challenge, we propose Geometry Sharing Network (GS-Net) which effectively learns point descriptors with holistic context to enhance the robustness to geometric transformations. Compared with previous 3D point CNNs which perform convolution on nearby points, GS-Net can aggregate point features in a more global way. Specially, GS-Net consists of Geometry Similarity Connection (GSC) modules which exploit Eigen-Graph to group distant points with similar and relevant geometric information, and aggregate features from nearest neighbors in both Euclidean space and Eigenvalue space. This design allows GS-Net to efficiently capture both local and holistic geometric features such as symmetry, curvature, convexity and connectivity. Theoretically, we show the nearest neighbors of each point in Eigenvalue space are invariant to rotation and translation. We conduct extensive experiments on public datasets, ModelNet40, ShapeNet Part. Experiments demonstrate that GS-Net achieves the state-of-the-art performances on major datasets, 93.3% on ModelNet40, and are more robust to geometric transformations.

【Keywords】:

1533. Learning Inverse Depth Regression for Multi-View Stereo with Correlation Cost Volume.

Paper Link】 【Pages】:12508-12515

【Authors】: Qingshan Xu ; Wenbing Tao

【Abstract】: Deep learning has shown to be effective for depth inference in multi-view stereo (MVS). However, the scalability and accuracy still remain an open problem in this domain. This can be attributed to the memory-consuming cost volume representation and inappropriate depth inference. Inspired by the group-wise correlation in stereo matching, we propose an average group-wise correlation similarity measure to construct a lightweight cost volume. This can not only reduce the memory consumption but also reduce the computational burden in the cost volume filtering. Based on our effective cost volume representation, we propose a cascade 3D U-Net module to regularize the cost volume to further boost the performance. Unlike the previous methods that treat multi-view depth inference as a depth regression problem or an inverse depth classification problem, we recast multi-view depth inference as an inverse depth regression task. This allows our network to achieve sub-pixel estimation and be applicable to large-scale scenes. Through extensive experiments on DTU dataset and Tanks and Temples dataset, we show that our proposed network with Correlation cost volume and Inverse DEpth Regression (CIDER1), achieves state-of-the-art results, demonstrating its superior performance on scalability and accuracy.

【Keywords】:

1534. Planar Prior Assisted PatchMatch Multi-View Stereo.

Paper Link】 【Pages】:12516-12523

【Authors】: Qingshan Xu ; Wenbing Tao

【Abstract】: The completeness of 3D models is still a challenging problem in multi-view stereo (MVS) due to the unreliable photometric consistency in low-textured areas. Since low-textured areas usually exhibit strong planarity, planar models are advantageous to the depth estimation of low-textured areas. On the other hand, PatchMatch multi-view stereo is very efficient for its sampling and propagation scheme. By taking advantage of planar models and PatchMatch multi-view stereo, we propose a planar prior assisted PatchMatch multi-view stereo framework in this paper. In detail, we utilize a probabilistic graphical model to embed planar models into PatchMatch multi-view stereo and contribute a novel multi-view aggregated matching cost. This novel cost takes both photometric consistency and planar compatibility into consideration, making it suited for the depth estimation of both non-planar and planar regions. Experimental results demonstrate that our method can efficiently recover the depth information of extremely low-textured areas, thus obtaining high complete 3D models and achieving state-of-the-art performance.

【Keywords】:

1535. A Proposal-Based Approach for Activity Image-to-Video Retrieval.

Paper Link】 【Pages】:12524-12531

【Authors】: Ruicong Xu ; Li Niu ; Jianfu Zhang ; Liqing Zhang

【Abstract】: Activity image-to-video retrieval task aims to retrieve videos containing the similar activity as the query image, which is a challenging task because videos generally have many background segments irrelevant to the activity. In this paper, we utilize R-C3D model to represent a video by a bag of activity proposals, which can filter out background segments to some extent. However, there are still noisy proposals in each bag. Thus, we propose an Activity Proposal-based Image-to-Video Retrieval (APIVR) approach, which incorporates multi-instance learning into cross-modal retrieval framework to address the proposal noise issue. Specifically, we propose a Graph Multi-Instance Learning (GMIL) module with graph convolutional layer, and integrate this module with classification loss, adversarial loss, and triplet loss in our cross-modal retrieval framework. Moreover, we propose geometry-aware triplet loss based on point-to-subspace distance to preserve the structural information of activity proposals. Extensive experiments on three widely-used datasets verify the effectiveness of our approach.

【Keywords】:

1536. GDFace: Gated Deformation for Multi-View Face Image Synthesis.

Paper Link】 【Pages】:12532-12540

【Authors】: Xuemiao Xu ; Keke Li ; Cheng Xu ; Shengfeng He

【Abstract】: Photorealistic multi-view face synthesis from a single image is an important but challenging problem. Existing methods mainly learn a texture mapping model from the source face to the target face. However, they fail to consider the internal deformation caused by the change of poses, leading to the unsatisfactory synthesized results for large pose variations. In this paper, we propose a Gated Deformable Face Synthesis Network to model the deformation of faces that aids the synthesis of the target face image. Specifically, we propose a dual network that consists of two modules. The first module estimates the deformation of two views in the form of convolution offsets according to the input and target poses. The second one, on the other hand, leverages the predicted deformation offsets to create the target face image. In this way, pose changes are explicitly modeled in the face generator to cope with geometric transformation, by adaptively focusing on pertinent regions of the source image. To compensate offset estimation errors, we introduce a soft-gating mechanism that enables adaptive fusion between deformable features and primitive features. Extensive experimental results on five widely-used benchmarks show that our approach performs favorably against the state-of-the-arts on multi-view face synthesis, especially for large pose changes.

【Keywords】:

1537. CF-LSTM: Cascaded Feature-Based Long Short-Term Networks for Predicting Pedestrian Trajectory.

Paper Link】 【Pages】:12541-12548

【Authors】: Yi Xu ; Jing Yang ; Shaoyi Du

【Abstract】: Pedestrian trajectory prediction is an important but difficult task in self-driving or autonomous mobile robot field because there are complex unpredictable human-human interactions in crowded scenarios. There have been a large number of studies that attempt to understand humans' social behavior. However, most of these studies extract location features from previous one time step while neglecting the vital velocity features. In order to address this issue, we propose a novel feature-cascaded framework for long short-term network (CF-LSTM) without extra artificial settings or social rules. In this framework, feature information from previous two time steps are firstly extracted and then integrated as a cascaded feature to LSTM, which is able to capture the previous location information and dynamic velocity information, simultaneously. In addition, this scene-agnostic cascaded feature is the external manifestation of complex human-human interactions, which can also effectively capture dynamic interaction information in different scenes without any other pedestrians' information. Experiments on public benchmark datasets indicate that our model achieves better performance than the state-of-the-art methods and this feature-cascaded framework has the ability to implicitly learn human-human interactions.

【Keywords】:

1538. SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines.

Paper Link】 【Pages】:12549-12556

【Authors】: Yinda Xu ; Zeyu Wang ; Zuoxin Li ; Ye Yuan ; Gang Yu

【Abstract】: Visual tracking problem demands to efficiently perform robust classification and accurate target state estimation over a given target at the same time. Former methods have proposed various ways of target state estimation, yet few of them took the particularity of the visual tracking problem itself into consideration. Based on a careful analysis, we propose a set of practical guidelines of target state estimation for high-performance generic object tracker design. Following these guidelines, we design our Fully Convolutional Siamese tracker++ (SiamFC++) by introducing both classification and target state estimation branch (G1), classification score without ambiguity (G2), tracking without prior knowledge (G3), and estimation quality score (G4). Extensive analysis and ablation studies demonstrate the effectiveness of our proposed guidelines. Without bells and whistles, our SiamFC++ tracker achieves state-of-the-art performance on five challenging benchmarks(OTB2015, VOT2018, LaSOT, GOT-10k, TrackingNet), which proves both the tracking and generalization ability of the tracker. Particularly, on the large-scale TrackingNet dataset, SiamFC++ achieves a previously unseen AUC score of 75.4 while running at over 90 FPS, which is far above the real-time requirement.

【Keywords】:

1539. ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection.

Paper Link】 【Pages】:12557-12564

【Authors】: Zhenbo Xu ; Wei Zhang ; Xiaoqing Ye ; Xiao Tan ; Wei Yang ; Shilei Wen ; Errui Ding ; Ajin Meng ; Liusheng Huang

【Abstract】: 3D object detection is an essential task in autonomous driving and robotics. Though great progress has been made, challenges remain in estimating 3D pose for distant and occluded objects. In this paper, we present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in rgb images for more accurate disparity estimation, we introduce a conceptually straight-forward module – adaptive zooming, which simultaneously resizes 2D instance bounding boxes to a unified resolution and adjusts the camera intrinsic parameters accordingly. In this way, we are able to estimate higher-quality disparity maps from the resized box images then construct dense point clouds for both nearby and distant objects. Moreover, we introduce to learn part locations as complementary features to improve the resistance against occlusion and put forward the 3D fitting score to better estimate the 3D detection quality. Extensive experiments on the popular KITTI 3D detection dataset indicate ZoomNet surpasses all previous state-of-the-art methods by large margins (improved by 9.4% on APbv (IoU=0.7) over pseudo-LiDAR). Ablation study also demonstrates that our adaptive zooming strategy brings an improvement of over 10% on AP3d (IoU=0.7). In addition, since the official KITTI benchmark lacks fine-grained annotations like pixel-wise part locations, we also present our KFG dataset by augmenting KITTI with detailed instance-wise annotations including pixel-wise part location, pixel-wise disparity, etc.. Both the KFG dataset and our codes will be publicly available at https://github.com/detectRecog/ZoomNet.

【Keywords】:

1540. Shape-Aware Organ Segmentation by Predicting Signed Distance Maps.

Paper Link】 【Pages】:12565-12572

【Authors】: Yuan Xue ; Hui Tang ; Zhi Qiao ; Guanzhong Gong ; Yong Yin ; Zhen Qian ; Chao Huang ; Wei Fan ; Xiaolei Huang

【Abstract】: In this work, we propose to resolve the issue existing in current deep learning based organ segmentation systems that they often produce results that do not capture the overall shape of the target organ and often lack smoothness. Since there is a rigorous mapping between the Signed Distance Map (SDM) calculated from object boundary contours and the binary segmentation map, we exploit the feasibility of learning the SDM directly from medical scans. By converting the segmentation task into predicting an SDM, we show that our proposed method retains superior segmentation performance and has better smoothness and continuity in shape. To leverage the complementary information in traditional segmentation training, we introduce an approximated Heaviside function to train the model by predicting SDMs and segmentation maps simultaneously. We validate our proposed models by conducting extensive experiments on a hippocampus segmentation dataset and the public MICCAI 2015 Head and Neck Auto Segmentation Challenge dataset with multiple organs. While our carefully designed backbone 3D segmentation network improves the Dice coefficient by more than 5% compared to current state-of-the-arts, the proposed model with SDM learning produces smoother segmentation results with smaller Hausdorff distance and average surface distance, thus proving the effectiveness of our method.

【Keywords】:

1541. FAS-Net: Construct Effective Features Adaptively for Multi-Scale Object Detection.

Paper Link】 【Pages】:12573-12580

【Authors】: Jiangqiao Yan ; Yue Zhang ; Zhonghan Chang ; Tengfei Zhang ; Menglong Yan ; Wenhui Diao ; Hongqi Wang ; Xian Sun

【Abstract】: Feature pyramid is the mainstream method for multi-scale object detection. In most detectors with feature pyramid, each proposal is predicted based on feature grids pooled from only one feature level, which is assigned heuristically. Recent studies report that the feature representation extracted using this method is sub-optimal, since they ignore the valid information exists on other unselected layers of the feature pyramid. To address this issue, researchers present to fuse valid information across all feature levels. However, these methods can be further improved: the feature fusion strategies, which use common operation (element-wise max or sum) in most detectors, should be replaced by a more flexible way. In this work, a novel method called feature adaptive selection subnetwork (FAS-Net) is proposed to construct effective features for detecting objects of different scales. Particularly, its adaption consists of two level: global attention and local adaptive selection. First, we model the global context of each feature map with global attention based feature selection module (GAFSM), which can strengthen the effective features across each layer adaptively. Then we extract the features of each region of interest (RoI) on the entire feature pyramid to construct a RoI feature pyramid. Finally, the RoI feature pyramid is sent to the feature adaptive selection module (FASM) to integrate the strengthened features according to the input adaptively. Our FAS-Net can be easily extended to other two-stage object detectors with feature pyramid, and supports to analyze the importance of different feature levels for multi-scale objects quantitatively. Besides, FAS-Net can also be further applied to instance segmentation task and get consistent improvements. Experiments on PASCAL07/12 and MSCOCO17 demonstrate the effectiveness and generalization of the proposed method.

【Keywords】:

1542. Gated Convolutional Networks with Hybrid Connectivity for Image Classification.

Paper Link】 【Pages】:12581-12588

【Authors】: Chuanguang Yang ; Zhulin An ; Hui Zhu ; Xiaolong Hu ; Kun Zhang ; Kaiqiang Xu ; Chao Li ; Yongjun Xu

【Abstract】: We propose a simple yet effective method to reduce the redundancy of DenseNet by substantially decreasing the number of stacked modules by replacing the original bottleneck by our SMG module, which is augmented by local residual. Furthermore, SMG module is equipped with an efficient two-stage pipeline, which aims to DenseNet-like architectures that need to integrate all previous outputs, i.e., squeezing the incoming informative but redundant features gradually by hierarchical convolutions as a hourglass shape and then exciting it by multi-kernel depthwise convolutions, the output of which would be compact and hold more informative multi-scale features. We further develop a forget and an update gate by introducing the popular attention modules to implement the effective fusion instead of a simple addition between reused and new features. Due to the Hybrid Connectivity (nested combination of global dense and local residual) and Gated mechanisms, we called our network as the HCGNet. Experimental results on CIFAR and ImageNet datasets show that HCGNet is more prominently efficient than DenseNet, and can also significantly outperform state-of-the-art networks with less complexity. Moreover, HCGNet also shows the remarkable interpretability and robustness by network dissection and adversarial defense, respectively. On MS-COCO, HCGNet can consistently learn better features than popular backbones.

【Keywords】:

1543. Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval.

Paper Link】 【Pages】:12589-12596

【Authors】: Fan Yang ; Zheng Wang ; Jing Xiao ; Shin'ichi Satoh

【Abstract】: Most recent approaches for the zero-shot cross-modal image retrieval map images from different modalities into a uniform feature space to exploit their relevance by using a pre-trained model. Based on the observation that manifolds of zero-shot images are usually deformed and incomplete, we argue that the manifolds of unseen classes are inevitably distorted during the training of a two-stream model that simply maps images from different modalities into a uniform space. This issue directly leads to poor cross-modal retrieval performance. We propose a bi-directional random walk scheme to mining more reliable relationships between images by traversing heterogeneous manifolds in the feature space of each modality. Our proposed method benefits from intra-modal distributions to alleviate the interference caused by noisy similarities in the cross-modal feature space. As a result, we achieved great improvement in the performance of the thermal v.s. visible image retrieval task. The code of this paper: https://github.com/fyang93/cross-modal-retrieval

【Keywords】:

1544. Asymmetric Co-Teaching for Unsupervised Cross-Domain Person Re-Identification.

Paper Link】 【Pages】:12597-12604

【Authors】: Fengxiang Yang ; Ke Li ; Zhun Zhong ; Zhiming Luo ; Xing Sun ; Hao Cheng ; Xiaowei Guo ; Feiyue Huang ; Rongrong Ji ; Shaozi Li

【Abstract】: Person re-identification (re-ID), is a challenging task due to the high variance within identity samples and imaging conditions. Although recent advances in deep learning have achieved remarkable accuracy in settled scenes, i.e., source domain, few works can generalize well on the unseen target domain. One popular solution is assigning unlabeled target images with pseudo labels by clustering, and then retraining the model. However, clustering methods tend to introduce noisy labels and discard low confidence samples as outliers, which may hinder the retraining process and thus limit the generalization ability. In this study, we argue that by explicitly adding a sample filtering procedure after the clustering, the mined examples can be much more efficiently used. To this end, we design an asymmetric co-teaching framework, which resists noisy labels by cooperating two models to select data with possibly clean labels for each other. Meanwhile, one of the models receives samples as pure as possible, while the other takes in samples as diverse as possible. This procedure encourages that the selected training samples can be both clean and miscellaneous, and that the two models can promote each other iteratively. Extensive experiments show that the proposed framework can consistently benefit most clustering based methods, and boost the state-of-the-art adaptation accuracy. Our code is available at https://github.com/FlyingRoastDuck/ACT_AAAI20.

【Keywords】:

1545. Learning to Incorporate Structure Knowledge for Image Inpainting.

Paper Link】 【Pages】:12605-12612

【Authors】: Jie Yang ; Zhiquan Qi ; Yong Shi

【Abstract】: This paper develops a multi-task learning framework that attempts to incorporate the image structure knowledge to assist image inpainting, which is not well explored in previous works. The primary idea is to train a shared generator to simultaneously complete the corrupted image and corresponding structures — edge and gradient, thus implicitly encouraging the generator to exploit relevant structure knowledge while inpainting. In the meantime, we also introduce a structure embedding scheme to explicitly embed the learned structure features into the inpainting process, thus to provide possible preconditions for image completion. Specifically, a novel pyramid structure loss is proposed to supervise structure learning and embedding. Moreover, an attention mechanism is developed to further exploit the recurrent structures and patterns in the image to refine the generated structures and contents. Through multi-task learning, structure embedding besides with attention, our framework takes advantage of the structure knowledge and outperforms several state-of-the-art methods on benchmark datasets quantitatively and qualitatively.

【Keywords】:

1546. An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation.

Paper Link】 【Pages】:12613-12620

【Authors】: Jihan Yang ; Ruijia Xu ; Ruiyu Li ; Xiaojuan Qi ; Xiaoyong Shen ; Guanbin Li ; Liang Lin

【Abstract】: We focus on Unsupervised Domain Adaptation (UDA) for the task of semantic segmentation. Recently, adversarial alignment has been widely adopted to match the marginal distribution of feature representations across two domains globally. However, this strategy fails in adapting the representations of the tail classes or small objects for semantic segmentation since the alignment objective is dominated by head categories or large objects. In contrast to adversarial alignment, we propose to explicitly train a domain-invariant classifier by generating and defensing against pointwise feature space adversarial perturbations. Specifically, we firstly perturb the intermediate feature maps with several attack objectives (i.e., discriminator and classifier) on each individual position for both domains, and then the classifier is trained to be invariant to the perturbations. By perturbing each position individually, our model treats each location evenly regardless of the category or object size and thus circumvents the aforementioned issue. Moreover, the domain gap in feature space is reduced by extrapolating source and target perturbed features towards each other with attack on the domain discriminator. Our approach achieves the state-of-the-art performance on two challenging domain adaptation tasks for semantic segmentation: GTA5 → Cityscapes and SYNTHIA → Cityscapes.

【Keywords】:

1547. FAN-Face: a Simple Orthogonal Improvement to Deep Face Recognition.

Paper Link】 【Pages】:12621-12628

【Authors】: Jing Yang ; Adrian Bulat ; Georgios Tzimiropoulos

【Abstract】: It is known that facial landmarks provide pose, expression and shape information. In addition, when matching, for example, a profile and/or expressive face to a frontal one, knowledge of these landmarks is useful for establishing correspondence which can help improve recognition. However, in prior work on face recognition, facial landmarks are only used for face cropping in order to remove scale, rotation and translation variations. This paper proposes a simple approach to face recognition which gradually integrates features from different layers of a facial landmark localization network into different layers of the recognition network. To this end, we propose an appropriate feature integration layer which makes the features compatible before integration. We show that such a simple approach systematically improves recognition on the most difficult face recognition datasets, setting a new state-of-the-art on IJB-B, IJB-C and MegaFace datasets.

【Keywords】:

1548. Towards Scale-Free Rain Streak Removal via Self-Supervised Fractal Band Learning.

Paper Link】 【Pages】:12629-12636

【Authors】: Wenhan Yang ; Shiqi Wang ; Dejia Xu ; Xiaodong Wang ; Jiaying Liu

【Abstract】: Data-driven rain streak removal methods, which most of rely on synthesized paired data, usually come across the generalization problem when being applied in real cases. In this paper, we propose a novel deep-learning based rain streak removal method injected with self-supervision to improve the ability to remove rain streaks in various scales. To realize this goal, we made efforts in two aspects. First, considering that rain streak removal is highly correlated with texture characteristics, we create a fractal band learning (FBL) network based on frequency band recovery. It integrates commonly seen band feature operations with neural modules and effectively improves the capacity to capture discriminative features for deraining. Second, to further improve the generalization ability of FBL for rain streaks in various scales, we add cross-scale self-supervision to regularize the network training. The constraint forces the extracted features of inputs in different scales to be equivalent after rescaling. Therefore, FBL can offer similar responses based on solely image content without the interleave of scale and is capable to remove rain streaks in various scales. Extensive experiments in quantitative and qualitative evaluations demonstrate the superiority of our FBL for rain streak removal, especially for the real cases where very large rain streaks exist, and prove the effectiveness of its each component. Our code will be public available at: https://github.com/flyywh/AAAI-2020-FBL-SS.

【Keywords】:

1549. SOGNet: Scene Overlap Graph Network for Panoptic Segmentation.

Paper Link】 【Pages】:12637-12644

【Authors】: Yibo Yang ; Hongyang Li ; Xia Li ; Qijie Zhao ; Jianlong Wu ; Zhouchen Lin

【Abstract】: The panoptic segmentation task requires a unified result from semantic and instance segmentation outputs that may contain overlaps. However, current studies widely ignore modeling overlaps. In this study, we aim to model overlap relations among instances and resolve them for panoptic segmentation. Inspired by scene graph representation, we formulate the overlapping problem as a simplified case, named scene overlap graph. We leverage each object's category, geometry and appearance features to perform relational embedding, and output a relation matrix that encodes overlap relations. In order to overcome the lack of supervision, we introduce a differentiable module to resolve the overlap between any pair of instances. The mask logits after removing overlaps are fed into per-pixel instance id classification, which leverages the panoptic supervision to assist in the modeling of overlap relations. Besides, we generate an approximate ground truth of overlap relations as the weak supervision, to quantify the accuracy of overlap relations predicted by our method. Experiments on COCO and Cityscapes demonstrate that our method is able to accurately predict overlap relations, and outperform the state-of-the-art performance for panoptic segmentation. Our method also won the Innovation Award in COCO 2019 challenge.

【Keywords】:

1550. Release the Power of Online-Training for Robust Visual Tracking.

Paper Link】 【Pages】:12645-12652

【Authors】: Yifan Yang ; Guorong Li ; Yuankai Qi ; Qingming Huang

【Abstract】: Convolutional neural networks (CNNs) have been widely adopted in the visual tracking community, significantly improving the state-of-the-art. However, most of them ignore the important cues lying in the distribution of training data and high-level features that are tightly coupled with the target/background classification. In this paper, we propose to improve the tracking accuracy via online training. On the one hand, we squeeze redundant training data by analyzing the dataset distribution in low-level feature space. On the other hand, we design statistic-based losses to increase the inter-class distance while decreasing the intra-class variance of high-level semantic features. We demonstrate the effectiveness on top of two high-performance tracking methods: MDNet and DAT. Experimental results on the challenging large-scale OTB2015 and UAVDT demonstrate the outstanding performance of our tracking method.

【Keywords】:

1551. Context-Transformer: Tackling Object Confusion for Few-Shot Detection.

Paper Link】 【Pages】:12653-12660

【Authors】: Ze Yang ; Yali Wang ; Xianyu Chen ; Jianzhuang Liu ; Yu Qiao

【Abstract】: Few-shot object detection is a challenging but realistic scenario, where only a few annotated training images are available for training detectors. A popular approach to handle this problem is transfer learning, i.e., fine-tuning a detector pretrained on a source-domain benchmark. However, such transferred detector often fails to recognize new objects in the target domain, due to low data diversity of training samples. To tackle this problem, we propose a novel Context-Transformer within a concise deep transfer framework. Specifically, Context-Transformer can effectively leverage source-domain object knowledge as guidance, and automatically exploit contexts from only a few training images in the target domain. Subsequently, it can adaptively integrate these relational clues to enhance the discriminative power of detector, in order to reduce object confusion in few-shot scenarios. Moreover, Context-Transformer is flexibly embedded in the popular SSD-style detectors, which makes it a plug-and-play module for end-to-end few-shot learning. Finally, we evaluate Context-Transformer on the challenging settings of few-shot detection and incremental few-shot detection. The experimental results show that, our framework outperforms the recent state-of-the-art approaches.

【Keywords】:

Paper Link】 【Pages】:12661-12668

【Authors】: Lewei Yao ; Hang Xu ; Wei Zhang ; Xiaodan Liang ; Zhenguo Li

【Abstract】: The state-of-the-art object detection method is complicated with various modules such as backbone, RPN, feature fusion neck and RCNN head, where each module may have different designs and structures. How to leverage the computational cost and accuracy trade-off for the structural combination as well as the modular selection of multiple modules? Neural architecture search (NAS) has shown great potential in finding an optimal solution. Existing NAS works for object detection only focus on searching better design of a single module such as backbone or feature fusion neck, while neglecting the balance of the whole system. In this paper, we present a two-stage coarse-to-fine searching strategy named Structural-to-Modular NAS (SM-NAS) for searching a GPU-friendly design of both an efficient combination of modules and better modular-level architecture for object detection. Specifically, Structural-level searching stage first aims to find an efficient combination of different modules; Modular-level searching stage then evolves each specific module and pushes the Pareto front forward to a faster task-specific network. We consider a multi-objective search where the search space covers many popular designs of detection methods. We directly search a detection backbone without pre-trained models or any proxy task by exploring a fast training from scratch strategy. The resulting architectures dominate state-of-the-art object detection systems in both inference time and accuracy and demonstrate the effectiveness on multiple detection datasets, e.g. halving the inference time with additional 1% mAP improvement compared to FPN and reaching 46% mAP with the similar inference time of MaskRCNN.

【Keywords】:

1553. Deep Discriminative CNN with Temporal Ensembling for Ambiguously-Labeled Image Classification.

Paper Link】 【Pages】:12669-12676

【Authors】: Yao Yao ; Jiehui Deng ; Xiuhua Chen ; Chen Gong ; Jianxin Wu ; Jian Yang

【Abstract】: In this paper, we study the problem of image classification where training images are ambiguously annotated with multiple candidate labels, among which only one is correct but is not accessible during the training phase. Due to the adopted non-deep framework and improper disambiguation strategies, traditional approaches are usually short of the representation ability and discrimination ability, so their performances are still to be improved. To remedy these two shortcomings, this paper proposes a novel approach termed “Deep Discriminative CNN” (D2CNN) with temporal ensembling. Specifically, to improve the representation ability, we innovatively employ the deep convolutional neural networks for ambiguously-labeled image classification, in which the well-known ResNet is adopted as our backbone. To enhance the discrimination ability, we design an entropy-based regularizer to maximize the margin between the potentially correct label and the unlikely ones of each image. In addition, we utilize the temporally assembled predictions of different epochs to guide the training process so that the latent groundtruth label can be confidently highlighted. This is much superior to the traditional disambiguation operations which treat all candidate labels equally and identify the hidden groundtruth label via some heuristic ways. Thorough experimental results on multiple datasets firmly demonstrate the effectiveness of our proposed D2CNN when compared with other existing state-of-the-art approaches.

【Keywords】:

1554. Object-Guided Instance Segmentation for Biological Images.

Paper Link】 【Pages】:12677-12684

【Authors】: Jingru Yi ; Hui Tang ; Pengxiang Wu ; Bo Liu ; Daniel J. Hoeppner ; Dimitris N. Metaxas ; Lianyi Han ; Wei Fan

【Abstract】: Instance segmentation of biological images is essential for studying object behaviors and properties. The challenges, such as clustering, occlusion, and adhesion problems of the objects, make instance segmentation a non-trivial task. Current box-free instance segmentation methods typically rely on local pixel-level information. Due to a lack of global object view, these methods are prone to over- or under-segmentation. On the contrary, the box-based instance segmentation methods incorporate object detection into the segmentation, performing better in identifying the individual instances. In this paper, we propose a new box-based instance segmentation method. Mainly, we locate the object bounding boxes from their center points. The object features are subsequently reused in the segmentation branch as a guide to separate the clustered instances within an RoI patch. Along with the instance normalization, the model is able to recover the target object distribution and suppress the distribution of neighboring attached objects. Consequently, the proposed model performs excellently in segmenting the clustered objects while retaining the target object details. The proposed method achieves state-of-the-art performances on three biological datasets: cell nuclei, plant phenotyping dataset, and neural cells.

【Keywords】:

1555. Leveraging Multi-View Image Sets for Unsupervised Intrinsic Image Decomposition and Highlight Separation.

Paper Link】 【Pages】:12685-12692

【Authors】: Renjiao Yi ; Ping Tan ; Stephen Lin

【Abstract】: We present an unsupervised approach for factorizing object appearance into highlight, shading, and albedo layers, trained by multi-view real images. To do so, we construct a multi-view dataset by collecting numerous customer product photos online, which exhibit large illumination variations that make them suitable for training of reflectance separation and can facilitate object-level decomposition. The main contribution of our approach is a proposed image representation based on local color distributions that allows training to be insensitive to the local misalignments of multi-view images. In addition, we present a new guidance cue for unsupervised training that exploits synergy between highlight separation and intrinsic image decomposition. Over a broad range of objects, our technique is shown to yield state-of-the-art results for both of these tasks.

【Keywords】:

1556. Joint Super-Resolution and Alignment of Tiny Faces.

Paper Link】 【Pages】:12693-12700

【Authors】: Yu Yin ; Joseph P. Robinson ; Yulun Zhang ; Yun Fu

【Abstract】: Super-resolution (SR) and landmark localization of tiny faces are highly correlated tasks. On the one hand, landmark localization could obtain higher accuracy with faces of high-resolution (HR). On the other hand, face SR would benefit from prior knowledge of facial attributes such as landmarks. Thus, we propose a joint alignment and SR network to simultaneously detect facial landmarks and super-resolve tiny faces. More specifically, a shared deep encoder is applied to extract features for both tasks by leveraging complementary information. To exploit representative power of the hierarchical encoder, intermediate layers of a shared feature extraction module are fused to form efficient feature representations. The fused features are then fed to task-specific modules to detect landmarks and super-resolve face images in parallel. Extensive experiments demonstrate that the proposed model significantly outperforms the state-of-the-art in both landmark localization and SR of faces. We show a large improvement for landmark localization of tiny faces (i.e., 16 × 16). Furthermore, the proposed framework yields comparable results for landmark localization on low-resolution (LR) faces (i.e., 64 × 64) to existing methods on HR (i.e., 256 × 256). As for SR, the proposed method recovers sharper edges and more details from LR face images than other state-of-the-art methods, which we demonstrate qualitatively and quantitatively.

【Keywords】:

1557. Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution.

Paper Link】 【Pages】:12701-12708

【Authors】: Yingruo Fan ; Jacqueline C. K. Lam ; Victor On Kwok Li

【Abstract】: The intensity estimation of facial action units (AUs) is challenging due to subtle changes in the person's facial appearance. Previous approaches mainly rely on probabilistic models or predefined rules for modeling co-occurrence relationships among AUs, leading to limited generalization. In contrast, we present a new learning framework that automatically learns the latent relationships of AUs via establishing semantic correspondences between feature maps. In the heatmap regression-based network, feature maps preserve rich semantic information associated with AU intensities and locations. Moreover, the AU co-occurring pattern can be reflected by activating a set of feature channels, where each channel encodes a specific visual pattern of AU. This motivates us to model the correlation among feature channels, which implicitly represents the co-occurrence relationship of AU intensity levels. Specifically, we introduce a semantic correspondence convolution (SCC) module to dynamically compute the correspondences from deep and low resolution feature maps, and thus enhancing the discriminability of features. The experimental results demonstrate the effectiveness and the superior performance of our method on two benchmark datasets.

【Keywords】:

1558. Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification.

Paper Link】 【Pages】:12709-12716

【Authors】: Renchun You ; Zhiyao Guo ; Lei Cui ; Xiang Long ; Yingze Bao ; Shilei Wen

【Abstract】: Multi-label image and video classification are fundamental yet challenging tasks in computer vision. The main challenges lie in capturing spatial or temporal dependencies between labels and discovering the locations of discriminative features for each class. In order to overcome these challenges, we propose to use cross-modality attention with semantic graph embedding for multi-label classification. Based on the constructed label graph, we propose an adjacency-based similarity graph embedding method to learn semantic label embeddings, which explicitly exploit label relationships. Then our novel cross-modality attention maps are generated with the guidance of learned label embeddings. Experiments on two multi-label image classification datasets (MS-COCO and NUS-WIDE) show our method outperforms other existing state-of-the-arts. In addition, we validate our method on a large multi-label video classification dataset (YouTube-8M Segments) and the evaluation results demonstrate the generalization capability of our method.

【Keywords】:

1559. Pointwise Rotation-Invariant Network with Adaptive Sampling and 3D Spherical Voxel Convolution.

Paper Link】 【Pages】:12717-12724

【Authors】: Yang You ; Yujing Lou ; Qi Liu ; Yu-Wing Tai ; Lizhuang Ma ; Cewu Lu ; Weiming Wang

【Abstract】: Point cloud analysis without pose priors is very challenging in real applications, as the orientations of point clouds are often unknown. In this paper, we propose a brand new point-set learning framework PRIN, namely, Pointwise Rotation-Invariant Network, focusing on rotation-invariant feature extraction in point clouds analysis. We construct spherical signals by Density Aware Adaptive Sampling to deal with distorted point distributions in spherical space. In addition, we propose Spherical Voxel Convolution and Point Re-sampling to extract rotation-invariant features for each point. Our network can be applied to tasks ranging from object classification, part segmentation, to 3D feature matching and label alignment. We show that, on the dataset with randomly rotated point clouds, PRIN demonstrates better performance than state-of-the-art methods without any data augmentation. We also provide theoretical analysis for the rotation-invariance achieved by our methods.

【Keywords】:

1560. Cascading Convolutional Color Constancy.

Paper Link】 【Pages】:12725-12732

【Authors】: Huanglin Yu ; Ke Chen ; Kaiqi Wang ; Yanlin Qian ; Zhaoxiang Zhang ; Kui Jia

【Abstract】: Regressing the illumination of a scene from the representations of object appearances is popularly adopted in computational color constancy. However, it's still challenging due to intrinsic appearance and label ambiguities caused by unknown illuminants, diverse reflection properties of materials and extrinsic imaging factors (such as different camera sensors). In this paper, we introduce a novel algorithm – Cascading Convolutional Color Constancy (in short, C4) to improve robustness of regression learning and achieve stable generalization capability across datasets (different cameras and scenes) in a unique framework. The proposed C4 method ensembles a series of dependent illumination hypotheses from each cascade stage via introducing a weighted multiply-accumulate loss function, which can inherently capture different modes of illuminations and explicitly enforce coarse-to-fine network optimization. Experimental results on the public Color Checker and NUS 8-Camera benchmarks demonstrate superior performance of the proposed algorithm in comparison with the state-of-the-art methods, especially for more difficult scenes.

【Keywords】:

1561. Region Normalization for Image Inpainting.

Paper Link】 【Pages】:12733-12740

【Authors】: Tao Yu ; Zongyu Guo ; Xin Jin ; Shilin Wu ; Zhibo Chen ; Weiping Li ; Zhizheng Zhang ; Sen Liu

【Abstract】: Feature Normalization (FN) is an important technique to help neural network training, which typically normalizes features across spatial dimensions. Most previous image inpainting methods apply FN in their networks without considering the impact of the corrupted regions of the input image on normalization, e.g. mean and variance shifts. In this work, we show that the mean and variance shifts caused by full-spatial FN limit the image inpainting network training and we propose a spatial region-wise normalization named Region Normalization (RN) to overcome the limitation. RN divides spatial pixels into different regions according to the input mask, and computes the mean and variance in each region for normalization. We develop two kinds of RN for our image inpainting network: (1) Basic RN (RN-B), which normalizes pixels from the corrupted and uncorrupted regions separately based on the original inpainting mask to solve the mean and variance shift problem; (2) Learnable RN (RN-L), which automatically detects potentially corrupted and uncorrupted regions for separate normalization, and performs global affine transformation to enhance their fusion. We apply RN-B in the early layers and RN-L in the latter layers of the network respectively. Experiments show that our method outperforms current state-of-the-art methods quantitatively and qualitatively. We further generalize RN to other inpainting networks and achieve consistent performance improvements.

【Keywords】:

1562. Patchy Image Structure Classification Using Multi-Orientation Region Transform.

Paper Link】 【Pages】:12741-12748

【Authors】: Xiaohan Yu ; Yang Zhao ; Yongsheng Gao ; Shengwu Xiong ; Xiaohui Yuan

【Abstract】: Exterior contour and interior structure are both vital features for classifying objects. However, most of the existing methods consider exterior contour feature and internal structure feature separately, and thus fail to function when classifying patchy image structures that have similar contours and flexible structures. To address above limitations, this paper proposes a novel Multi-Orientation Region Transform (MORT), which can effectively characterize both contour and structure features simultaneously, for patchy image structure classification. MORT is performed over multiple orientation regions at multiple scales to effectively integrate patchy features, and thus enables a better description of the shape in a coarse-to-fine manner. Moreover, the proposed MORT can be extended to combine with the deep convolutional neural network techniques, for further enhancement of classification accuracy. Very encouraging experimental results on the challenging ultra-fine-grained cultivar recognition task, insect wing recognition task, and large variation butterfly recognition task are obtained, which demonstrate the effectiveness and superiority of the proposed MORT over the state-of-the-art methods in classifying patchy image structures. Our code and three patchy image structure datasets are available at: https://github.com/XiaohanYu-GU/MReT2019.

【Keywords】:

1563. Human Synthesis and Scene Compositing.

Paper Link】 【Pages】:12749-12756

【Authors】: Mihai Zanfir ; Elisabeta Oneata ; Alin-Ionut Popa ; Andrei Zanfir ; Cristian Sminchisescu

【Abstract】: Generating good quality and geometrically plausible synthetic images of humans with the ability to control appearance, pose and shape parameters, has become increasingly important for a variety of tasks ranging from photo editing, fashion virtual try-on, to special effects and image compression. In this paper, we propose a HUSC (HUman Synthesis and Scene Compositing) framework for the realistic synthesis of humans with different appearance, in novel poses and scenes. Central to our formulation is 3d reasoning for both people and scenes, in order to produce realistic collages, by correctly modeling perspective effects and occlusion, by taking into account scene semantics and by adequately handling relative scales. Conceptually our framework consists of three components: (1) a human image synthesis model with controllable pose and appearance, based on a parametric representation, (2) a person insertion procedure that leverages the geometry and semantics of the 3d scene, and (3) an appearance compositing process to create a seamless blending between the colors of the scene and the generated human image, and avoid visual artifacts. The performance of our framework is supported by both qualitative and quantitative results, in particular state-of-the art synthesis scores for the DeepFashion dataset.

【Keywords】:

1564. Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose.

Paper Link】 【Pages】:12757-12764

【Authors】: Xianfang Zeng ; Yusu Pan ; Mengmeng Wang ; Jiangning Zhang ; Yong Liu

【Abstract】: Recent works have shown how realistic talking face images can be obtained under the supervision of geometry guidance, e.g., facial landmark or boundary. To alleviate the demand for manual annotations, in this paper, we propose a novel self-supervised hybrid model (DAE-GAN) that learns how to reenact face naturally given large amounts of unlabeled videos. Our approach combines two deforming autoencoders with the latest advances in the conditional generation. On the one hand, we adopt the deforming autoencoder to disentangle identity and pose representations. A strong prior in talking face videos is that each frame can be encoded as two parts: one for video-specific identity and the other for various poses. Inspired by that, we utilize a multi-frame deforming autoencoder to learn a pose-invariant embedded face for each video. Meanwhile, a multi-scale deforming autoencoder is proposed to extract pose-related information for each frame. On the other hand, the conditional generator allows for enhancing fine details and overall reality. It leverages the disentangled features to generate photo-realistic and pose-alike face images. We evaluate our model on VoxCeleb1 and RaFD dataset. Experiment results demonstrate the superior quality of reenacted images and the flexibility of transferring facial movements between identities.

【Keywords】:

1565. Reliability Does Matter: An End-to-End Weakly Supervised Semantic Segmentation Approach.

Paper Link】 【Pages】:12765-12772

【Authors】: Bingfeng Zhang ; Jimin Xiao ; Yunchao Wei ; Mingjie Sun ; Kaizhu Huang

【Abstract】: Weakly supervised semantic segmentation is a challenging task as it only takes image-level information as supervision for training but produces pixel-level predictions for testing. To address such a challenging task, most recent state-of-the-art approaches propose to adopt two-step solutions, i.e. 1) learn to generate pseudo pixel-level masks, and 2) engage FCNs to train the semantic segmentation networks with the pseudo masks. However, the two-step solutions usually employ many bells and whistles in producing high-quality pseudo masks, making this kind of methods complicated and inelegant. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into confident yet tiny object/background regions. Such reliable regions are then directly served as ground-truth labels for the parallel segmentation branch, where a newly designed dense energy loss function is adopted for optimization. Despite its apparent simplicity, our one-step solution achieves competitive mIoU scores (val: 62.6, test: 62.9) on Pascal VOC compared with those two-step state-of-the-arts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC (val: 66.3, test: 66.5).

【Keywords】:

1566. Shape-Oriented Convolution Neural Network for Point Cloud Analysis.

Paper Link】 【Pages】:12773-12780

【Authors】: Chaoyi Zhang ; Yang Song ; Lina Yao ; Weidong Cai

【Abstract】: Point cloud is a principal data structure adopted for 3D geometric information encoding. Unlike other conventional visual data, such as images and videos, these irregular points describe the complex shape features of 3D objects, which makes shape feature learning an essential component of point cloud analysis. To this end, a shape-oriented message passing scheme dubbed ShapeConv is proposed to focus on the representation learning of the underlying shape formed by each local neighboring point. Despite this intra-shape relationship learning, ShapeConv is also designed to incorporate the contextual effects from the inter-shape relationship through capturing the long-ranged dependencies between local underlying shapes. This shape-oriented operator is stacked into our hierarchical learning architecture, namely Shape-Oriented Convolutional Neural Network (SOCNN), developed for point cloud analysis. Extensive experiments have been performed to evaluate its significance in the tasks of point cloud classification and part segmentation.

【Keywords】:

1567. Web-Supervised Network with Softly Update-Drop Training for Fine-Grained Visual Classification.

Paper Link】 【Pages】:12781-12788

【Authors】: Chuanyi Zhang ; Yazhou Yao ; Huafeng Liu ; Guo-Sen Xie ; Xiangbo Shu ; Tianfei Zhou ; Zheng Zhang ; Fumin Shen ; Zhenmin Tang

【Abstract】: Labeling objects at the subordinate level typically requires expert knowledge, which is not always available from a random annotator. Accordingly, learning directly from web images for fine-grained visual classification (FGVC) has attracted broad attention. However, the existence of noise in web images is a huge obstacle for training robust deep neural networks. In this paper, we propose a novel approach to remove irrelevant samples from the real-world web images during training, and only utilize useful images for updating the networks. Thus, our network can alleviate the harmful effects caused by irrelevant noisy web images to achieve better performance. Extensive experiments on three commonly used fine-grained datasets demonstrate that our approach is much superior to state-of-the-art webly supervised methods. The data and source code of this work have been made anonymously available at: https://github.com/z337-408/WSNFGVC.

【Keywords】:

1568. FDN: Feature Decoupling Network for Head Pose Estimation.

Paper Link】 【Pages】:12789-12796

【Authors】: Hao Zhang ; Mengmeng Wang ; Yong Liu ; Yi Yuan

【Abstract】: Head pose estimation from RGB images without depth information is a challenging task due to the loss of spatial information as well as large head pose variations in the wild. The performance of existing landmark-free methods remains unsatisfactory as the quality of estimated pose is inferior. In this paper, we propose a novel three-branch network architecture, termed as Feature Decoupling Network (FDN), a more powerful architecture for landmark-free head pose estimation from a single RGB image. In FDN, we first propose a feature decoupling (FD) module to explicitly learn the discriminative features for each pose angle by adaptively recalibrating its channel-wise responses. Besides, we introduce a cross-category center (CCC) loss to constrain the distribution of the latent variable subspaces and thus we can obtain more compact and distinct subspaces. Extensive experiments on both in-the-wild and controlled environment datasets demonstrate that the proposed method outperforms other state-of-the-art methods based on a single RGB image and behaves on par with approaches based on multimodal input resources.

【Keywords】:

1569. Rethinking the Image Fusion: A Fast Unified Image Fusion Network based on Proportional Maintenance of Gradient and Intensity.

Paper Link】 【Pages】:12797-12804

【Authors】: Hao Zhang ; Han Xu ; Yang Xiao ; Xiaojie Guo ; Jiayi Ma

【Abstract】: In this paper, we propose a fast unified image fusion network based on proportional maintenance of gradient and intensity (PMGI), which can end-to-end realize a variety of image fusion tasks, including infrared and visible image fusion, multi-exposure image fusion, medical image fusion, multi-focus image fusion and pan-sharpening. We unify the image fusion problem into the texture and intensity proportional maintenance problem of the source images. On the one hand, the network is divided into gradient path and intensity path for information extraction. We perform feature reuse in the same path to avoid loss of information due to convolution. At the same time, we introduce the pathwise transfer block to exchange information between different paths, which can not only pre-fuse the gradient information and intensity information, but also enhance the information to be processed later. On the other hand, we define a uniform form of loss function based on these two kinds of information, which can adapt to different fusion tasks. Experiments on publicly available datasets demonstrate the superiority of our PMGI over the state-of-the-art in terms of both visual effect and quantitative metric in a variety of fusion tasks. In addition, our method is faster compared with the state-of-the-art.

【Keywords】:

1570. Model Watermarking for Image Processing Networks.

Paper Link】 【Pages】:12805-12812

【Authors】: Jie Zhang ; Dongdong Chen ; Jing Liao ; Han Fang ; Weiming Zhang ; Wenbo Zhou ; Hao Cui ; Nenghai Yu

【Abstract】: Deep learning has achieved tremendous success in numerous industrial applications. As training a good model often needs massive high-quality data and computation resources, the learned models often have significant business values. However, these valuable deep models are exposed to a huge risk of infringements. For example, if the attacker has the full information of one target model including the network structure and weights, the model can be easily finetuned on new datasets. Even if the attacker can only access the output of the target model, he/she can still train another similar surrogate model by generating a large scale of input-output training pairs. How to protect the intellectual property of deep models is a very important but seriously under-researched problem. There are a few recent attempts at classification network protection only.In this paper, we propose the first model watermarking framework for protecting image processing models. To achieve this goal, we leverage the spatial invisible watermarking mechanism. Specifically, given a black-box target model, a unified and invisible watermark is hidden into its outputs, which can be regarded as a special task-agnostic barrier. In this way, when the attacker trains one surrogate model by using the input-output pairs of the target model, the hidden watermark will be learned and extracted afterward. To enable watermarks from binary bits to high-resolution images, both traditional and deep spatial invisible watermarking mechanism are considered. Experiments demonstrate the robustness of the proposed watermarking mechanism, which can resist surrogate models learned with different network structures and objective functions. Besides deep models, the proposed method is also easy to be extended to protect data and traditional image processing algorithms.

【Keywords】:

1571. Deep Object Co-Segmentation via Spatial-Semantic Network Modulation.

Paper Link】 【Pages】:12813-12820

【Authors】: Kaihua Zhang ; Jin Chen ; Bo Liu ; Qingshan Liu

【Abstract】: Object co-segmentation is to segment the shared objects in multiple relevant images, which has numerous applications in computer vision. This paper presents a spatial and semantic modulated deep network framework for object co-segmentation. A backbone network is adopted to extract multi-resolution image features. With the multi-resolution features of the relevant images as input, we design a spatial modulator to learn a mask for each image. The spatial modulator captures the correlations of image feature descriptors via unsupervised learning. The learned mask can roughly localize the shared foreground object while suppressing the background. For the semantic modulator, we model it as a supervised image classification task. We propose a hierarchical second-order pooling module to transform the image features for classification use. The outputs of the two modulators manipulate the multi-resolution features by a shift-and-scale operation so that the features focus on segmenting co-object regions. The proposed model is trained end-to-end without any intricate post-processing. Extensive experiments on four image co-segmentation benchmark datasets demonstrate the superior accuracy of the proposed method compared to state-of-the-art methods. The codes are available at http://kaihuazhang.net/.

【Keywords】:

1572. Pixel-Aware Deep Function-Mixture Network for Spectral Super-Resolution.

Paper Link】 【Pages】:12821-12828

【Authors】: Lei Zhang ; Zhiqiang Lang ; Peng Wang ; Wei Wei ; Shengcai Liao ; Ling Shao ; Yanning Zhang

【Abstract】: Spectral super-resolution (SSR) aims at generating a hyperspectral image (HSI) from a given RGB image. Recently, a promising direction is to learn a complicated mapping function from the RGB image to the HSI counterpart using a deep convolutional neural network. This essentially involves mapping the RGB context within a size-specific receptive field centered at each pixel to its spectrum in the HSI. The focus thereon is to appropriately determine the receptive field size and establish the mapping function from RGB context to the corresponding spectrum. Due to their differences in category or spatial position, pixels in HSIs often require different-sized receptive fields and distinct mapping functions. However, few efforts have been invested to explicitly exploit this prior.To address this problem, we propose a pixel-aware deep function-mixture network for SSR, which is composed of a new class of modules, termed function-mixture (FM) blocks. Each FM block is equipped with some basis functions, i.e., parallel subnets of different-sized receptive fields. Besides, it incorporates an extra subnet as a mixing function to generate pixel-wise weights, and then linearly mixes the outputs of all basis functions with those generated weights. This enables us to pixel-wisely determine the receptive field size and the mapping function. Moreover, we stack several such FM blocks to further increase the flexibility of the network in learning the pixel-wise mapping. To encourage feature reuse, intermediate features generated by the FM blocks are fused in late stage, which proves to be effective for boosting the SSR performance. Experimental results on three benchmark HSI datasets demonstrate the superiority of the proposed method.

【Keywords】:

1573. RIS-GAN: Explore Residual and Illumination with Generative Adversarial Networks for Shadow Removal.

Paper Link】 【Pages】:12829-12836

【Authors】: Ling Zhang ; Chengjiang Long ; Xiaolong Zhang ; Chunxia Xiao

【Abstract】: Residual images and illumination estimation have been proved very helpful in image enhancement. In this paper, we propose a general and novel framework RIS-GAN which explores residual and illumination with Generative Adversarial Networks for shadow removal. Combined with the coarse shadow-removal image, the estimated negative residual images and inverse illumination maps can be used to generate indirect shadow-removal images to refine the coarse shadow-removal result to the fine shadow-free image in a coarse-to-fine fashion. Three discriminators are designed to distinguish whether the predicted negative residual images, shadow-removal images, and the inverse illumination maps are real or fake jointly compared with the corresponding ground-truth information. To our best knowledge, we are the first one to explore residual and illumination for shadow removal. We evaluate our proposed method on two benchmark datasets, i.e., SRD and ISTD, and the extensive experiments demonstrate that our proposed method achieves the superior performance to state-of-the-arts, although we have no particular shadow-aware components designed in our generators.

【Keywords】:

1574. 3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels.

Paper Link】 【Pages】:12837-12844

【Authors】: Qi Zhang ; Antoni B. Chan

【Abstract】: Crowd counting has been studied for decades and a lot of works have achieved good performance, especially the DNNs-based density map estimation methods. Most existing crowd counting works focus on single-view counting, while few works have studied multi-view counting for large and wide scenes, where multiple cameras are used. Recently, an end-to-end multi-view crowd counting method called multi-view multi-scale (MVMS) has been proposed, which fuses multiple camera views using a CNN to predict a 2D scene-level density map on the ground-plane. Unlike MVMS, we propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D ground-plane ones. Compared to 2D fusion, the 3D fusion extracts more information of the people along z-dimension (height), which helps to solve the scale variations across multiple views. The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density. We also explore the projection consistency among the 3D prediction and the ground-truth in the 2D views to further enhance the counting performance. The proposed method is tested on 3 multi-view counting datasets and achieves better or comparable counting performance to the state-of-the-art.

【Keywords】:

1575. Deep Camouflage Images.

Paper Link】 【Pages】:12845-12852

【Authors】: Qing Zhang ; Gelin Yin ; Yongwei Nie ; Wei-Shi Zheng

【Abstract】: This paper addresses the problem of creating camouflage images. Such images typically contain one or more hidden objects embedded into a background image, so that viewers are required to consciously focus to discover them. Previous methods basically rely on hand-crafted features and texture synthesis to create camouflage images. However, due to lack of reliable understanding of what essentially makes an object recognizable, they typically result in either complete standout or complete invisible hidden objects. Moreover, they may fail to produce seamless and natural images because of the sensitivity to appearance differences. To overcome these limitations, we present a novel neural style transfer approach that adopts the visual perception mechanism to create camouflage images, which allows us to hide objects more effectively while producing natural-looking results. In particular, we design an attention-aware camouflage loss to adaptively mask out information that make the hidden objects visually standout, and also leave subtle yet enough feature clues for viewers to perceive the hidden objects. To remove the appearance discontinuities between the hidden objects and the background, we formulate a naturalness regularization to constrain the hidden objects to maintain the manifold structure of the covered background. Extensive experiments show the advantages of our approach over existing camouflage methods and state-of-the-art neural style transfer algorithms.

【Keywords】:

1576. AutoRemover: Automatic Object Removal for Autonomous Driving Videos.

Paper Link】 【Pages】:12853-12861

【Authors】: Rong Zhang ; Wei Li ; Peng Wang ; Chenye Guan ; Jin Fang ; Yuhang Song ; Jinhui Yu ; Baoquan Chen ; Weiwei Xu ; Ruigang Yang

【Abstract】: Motivated by the need for photo-realistic simulation in autonomous driving, in this paper we present a video inpainting algorithm AutoRemover, designed specifically for generating street-view videos without any moving objects. In our setup we have two challenges: the first is the shadow, shadows are usually unlabeled but tightly coupled with the moving objects. The second is the large ego-motion in the videos. To deal with shadows, we build up an autonomous driving shadow dataset and design a deep neural network to detect shadows automatically. To deal with large ego-motion, we take advantage of the multi-source data, in particular the 3D data, in autonomous driving. More specifically, the geometric relationship between frames is incorporated into an inpainting deep neural network to produce high-quality structurally consistent video output. Experiments show that our method outperforms other state-of-the-art (SOTA) object removal algorithms, reducing the RMSE by over 19%.

【Keywords】:

1577. Knowledge Integration Networks for Action Recognition.

Paper Link】 【Pages】:12862-12869

【Authors】: Shiwen Zhang ; Sheng Guo ; Limin Wang ; Weilin Huang ; Matthew Scott

【Abstract】: In this work, we propose Knowledge Integration Networks (referred as KINet) for video action recognition. KINet is capable of aggregating meaningful context features which are of great importance to identifying an action, such as human information and scene context. We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition which allow the model to encode the knowledge of human and scene for action recognition. We explore two pre-trained models as teacher networks to distill the knowledge of human and scene for training the auxiliary tasks of KINet. Furthermore, we propose a two-level knowledge encoding mechanism which contains a Cross Branch Integration (CBI) module for encoding the auxiliary knowledge into medium-level convolutional features, and an Action Knowledge Graph (AKG) for effectively fusing high-level context information. This results in an end-to-end trainable framework where the three tasks can be trained collaboratively, allowing the model to compute strong context knowledge efficiently. The proposed KINet achieves the state-of-the-art performance on a large-scale action recognition benchmark Kinetics-400, with a top-1 accuracy of 77.8%. We further demonstrate that our KINet has strong capability by transferring the Kinetics-trained model to UCF-101, where it obtains 97.8% top-1 accuracy.

【Keywords】:

1578. Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language.

Paper Link】 【Pages】:12870-12877

【Authors】: Songyang Zhang ; Houwen Peng ; Jianlong Fu ; Jiebo Luo

【Abstract】: We address the problem of retrieving a specific moment from an untrimmed video by a query sentence. This is a challenging problem because a target moment may take place in relations to other temporal moments in the untrimmed video. Existing methods cannot tackle this challenge well since they consider temporal moments individually and neglect the temporal dependencies. In this paper, we model the temporal relations between video moments by a two-dimensional map, where one dimension indicates the starting time of a moment and the other indicates the end time. This 2D temporal map can cover diverse video moments with different lengths, while representing their adjacent relations. Based on the 2D map, we propose a Temporal Adjacent Network (2D-TAN), a single-shot framework for moment localization. It is capable of encoding the adjacent temporal relation, while learning discriminative features for matching video moments with referring expressions. We evaluate the proposed 2D-TAN on three challenging benchmarks, i.e., Charades-STA, ActivityNet Captions, and TACoS, where our 2D-TAN outperforms the state-of-the-art.

【Keywords】:

1579. Single Camera Training for Person Re-Identification.

Paper Link】 【Pages】:12878-12885

【Authors】: Tianyu Zhang ; Lingxi Xie ; Longhui Wei ; Yongfei Zhang ; Bo Li ; Qi Tian

【Abstract】: Person re-identification (ReID) aims at finding the same person in different cameras. Training such systems usually requires a large amount of cross-camera pedestrians to be annotated from surveillance videos, which is labor-consuming especially when the number of cameras is large. Differently, this paper investigates ReID in an unexplored single-camera-training (SCT) setting, where each person in the training set appears in only one camera. To the best of our knowledge, this setting was never studied before. SCT enjoys the advantage of low-cost data collection and annotation, and thus eases ReID systems to be trained in a brand new environment. However, it raises major challenges due to the lack of cross-camera person occurrences, which conventional approaches heavily rely on to extract discriminative features. The key to dealing with the challenges in the SCT setting lies in designing an effective mechanism to complement cross-camera annotation. We start with a regular deep network for feature extraction, upon which we propose a novel loss function named multi-camera negative loss (MCNL). This is a metric learning loss motivated by probability, suggesting that in a multi-camera system, one image is more likely to be closer to the most similar negative sample in other cameras than to the most similar negative sample in the same camera. In experiments, MCNL significantly boosts ReID accuracy in the SCT setting, which paves the way of fast deployment of ReID systems with good performance on new target scenes.

【Keywords】:

1580. Multi-Instance Multi-Label Action Recognition and Localization Based on Spatio-Temporal Pre-Trimming for Untrimmed Videos.

Paper Link】 【Pages】:12886-12893

【Authors】: Xiao-Yu Zhang ; Haichao Shi ; Changsheng Li ; Peng Li

【Abstract】: Weakly supervised action recognition and localization for untrimmed videos is a challenging problem with extensive applications. The overwhelming irrelevant background contents in untrimmed videos severely hamper effective identification of actions of interest. In this paper, we propose a novel multi-instance multi-label modeling network based on spatio-temporal pre-trimming to recognize actions and locate corresponding frames in untrimmed videos. Motivated by the fact that person is the key factor in a human action, we spatially and temporally segment each untrimmed video into person-centric clips with pose estimation and tracking techniques. Given the bag-of-instances structure associated with video-level labels, action recognition is naturally formulated as a multi-instance multi-label learning problem. The network is optimized iteratively with selective coarse-to-fine pre-trimming based on instance-label activation. After convergence, temporal localization is further achieved with local-global temporal class activation map. Extensive experiments are conducted on two benchmark datasets, i.e. THUMOS14 and ActivityNet1.3, and experimental results clearly corroborate the efficacy of our method when compared with the state-of-the-arts.

【Keywords】:

1581. FACT: Fused Attention for Clothing Transfer with Generative Adversarial Networks.

Paper Link】 【Pages】:12894-12901

【Authors】: Yicheng Zhang ; Lei Li ; Li Song ; Rong Xie ; Wenjun Zhang

【Abstract】: Clothing transfer is a challenging task in computer vision where the goal is to transfer the human clothing style in an input image conditioned on a given language description. However, existing approaches have limited ability in delicate colorization and texture synthesis with a conventional fully convolutional generator. To tackle this problem, we propose a novel semantic-based Fused Attention model for Clothing Transfer (FACT), which allows fine-grained synthesis, high global consistency and plausible hallucination in images. Towards this end, we incorporate two attention modules based on spatial levels: (i) soft attention that searches for the most related positions in sentences, and (ii) self-attention modeling long-range dependencies on feature maps. Furthermore, we also develop a stylized channel-wise attention module to capture correlations on feature levels. We effectively fuse these attention modules in the generator and achieve better performances than the state-of-the-art method on the DeepFashion dataset. Qualitative and quantitative comparisons against the baselines demonstrate the effectiveness of our approach.

【Keywords】:

1582. Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks.

Paper Link】 【Pages】:12902-12909

【Authors】: Yingying Zhang ; Junyu Gao ; Xiaoshan Yang ; Chang Liu ; Yan Li ; Changsheng Xu

【Abstract】: With the increasing prevalence of portable computing devices, browsing unedited videos is time-consuming and tedious. Video highlight detection has the potential to significantly ease this situation, which discoveries moments of user's major or special interest in a video. Existing methods suffer from two problems. Firstly, most existing approaches only focus on learning holistic visual representations of videos but ignore object semantics for inferring video highlights. Secondly, current state-of-the-art approaches often adopt the pairwise ranking-based strategy, which cannot enjoy the global information to infer highlights. Therefore, we propose a novel video highlight framework, named VH-GNN, to construct an object-aware graph and model the relationships between objects from a global view. To reduce computational cost, we decompose the whole graph into two types of graphs: a spatial graph to capture the complex interactions of object within each frame, and a temporal graph to obtain object-aware representation of each frame and capture the global information. In addition, we optimize the framework via a proposed multi-stage loss, where the first stage aims to determine the highlight-probability and the second stage leverage the relationships between frames and focus on hard examples from the former stage. Extensive experiments on two standard datasets strongly evidence that VH-GNN obtains significant performance compared with state-of-the-arts.

【Keywords】:

1583. When Radiology Report Generation Meets Knowledge Graph.

Paper Link】 【Pages】:12910-12917

【Authors】: Yixiao Zhang ; Xiaosong Wang ; Ziyue Xu ; Qihang Yu ; Alan L. Yuille ; Daguang Xu

【Abstract】: Automatic radiology report generation has been an attracting research problem towards computer-aided diagnosis to alleviate the workload of doctors in recent years. Deep learning techniques for natural image captioning are successfully adapted to generating radiology reports. However, radiology image reporting is different from the natural image captioning task in two aspects: 1) the accuracy of positive disease keyword mentions is critical in radiology image reporting in comparison to the equivalent importance of every single word in a natural image caption; 2) the evaluation of reporting quality should focus more on matching the disease keywords and their associated attributes instead of counting the occurrence of N-gram. Based on these concerns, we propose to utilize a pre-constructed graph embedding module (modeled with a graph convolutional neural network) on multiple disease findings to assist the generation of reports in this work. The incorporation of knowledge graph allows for dedicated feature learning for each disease finding and the relationship modeling between them. In addition, we proposed a new evaluation metric for radiology image reporting with the assistance of the same composed graph. Experimental results demonstrate the superior performance of the methods integrated with the proposed graph embedding module on a publicly accessible dataset (IU-RR) of chest radiographs compared with previous approaches using both the conventional evaluation metrics commonly adopted for image captioning and our proposed ones.

【Keywords】:

1584. Exploiting Motion Information from Unlabeled Videos for Static Image Action Recognition.

Paper Link】 【Pages】:12918-12925

【Authors】: Yiyi Zhang ; Li Niu ; Ziqi Pan ; Meichao Luo ; Jianfu Zhang ; Dawei Cheng ; Liqing Zhang

【Abstract】: Static image action recognition, which aims to recognize action based on a single image, usually relies on expensive human labeling effort such as adequate labeled action images and large-scale labeled image dataset. In contrast, abundant unlabeled videos can be economically obtained. Therefore, several works have explored using unlabeled videos to facilitate image action recognition, which can be categorized into the following two groups: (a) enhance visual representations of action images with a designed proxy task on unlabeled videos, which falls into the scope of self-supervised learning; (b) generate auxiliary representations for action images with the generator learned from unlabeled videos. In this paper, we integrate the above two strategies in a unified framework, which consists of Visual Representation Enhancement (VRE) module and Motion Representation Augmentation (MRA) module. Specifically, the VRE module includes a proxy task which imposes pseudo motion label constraint and temporal coherence constraint on unlabeled videos, while the MRA module could predict the motion information of a static action image by exploiting unlabeled videos. We demonstrate the superiority of our framework based on four benchmark human action datasets with limited labeled data.

【Keywords】:

1585. Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching.

Paper Link】 【Pages】:12926-12934

【Authors】: Youmin Zhang ; Yimin Chen ; Xiao Bai ; Suihanjin Yu ; Kun Yu ; Zhiwei Li ; Kuiyuan Yang

【Abstract】: State-of-the-art deep learning based stereo matching approaches treat disparity estimation as a regression problem, where loss function is directly defined on true disparities and their estimated ones. However, disparity is just a byproduct of a matching process modeled by cost volume, while indirectly learning cost volume driven by disparity regression is prone to overfitting since the cost volume is under constrained. In this paper, we propose to directly add constraints to the cost volume by filtering cost volume with unimodal distribution peaked at true disparities. In addition, variances of the unimodal distributions for each pixel are estimated to explicitly model matching uncertainty under different contexts. The proposed architecture achieves state-of-the-art performance on Scene Flow and two KITTI stereo benchmarks. In particular, our method ranked the 1st place of KITTI 2012 evaluation and the 4th place of KITTI 2015 evaluation (recorded on 2019.8.20). The codes of AcfNet are available at: https://github.com/youmi-zym/AcfNet.

【Keywords】:

1586. Fully Convolutional Network for Consistent Voxel-Wise Correspondence.

Paper Link】 【Pages】:12935-12942

【Authors】: Yungeng Zhang ; Yuru Pei ; Yuke Guo ; Gengyu Ma ; Tianmin Xu ; Hongbin Zha

【Abstract】: In this paper, we propose a fully convolutional network-based dense map from voxels to invertible pair of displacement vector fields regarding a template grid for the consistent voxel-wise correspondence. We parameterize the volumetric mapping using a convolutional network and train it in an unsupervised way by leveraging the spatial transformer to minimize the gap between the warped volumetric image and the template grid. Instead of learning the unidirectional map, we learn the nonlinear mapping functions for both forward and backward transformations. We introduce the combinational inverse constraints for the volumetric one-to-one maps, where the pairwise and triple constraints are utilized to learn the cycle-consistent correspondence maps between volumes. Experiments on both synthetic and clinically captured volumetric cone-beam CT (CBCT) images show that the proposed framework is effective and competitive against state-of-the-art deformable registration techniques.

【Keywords】:

1587. Zero-Shot Sketch-Based Image Retrieval via Graph Convolution Network.

Paper Link】 【Pages】:12943-12950

【Authors】: Zhaolong Zhang ; Yuejie Zhang ; Rui Feng ; Tao Zhang ; Weiguo Fan

【Abstract】: Zero-Shot Sketch-based Image Retrieval (ZS-SBIR) has been proposed recently, putting the traditional Sketch-based Image Retrieval (SBIR) under the setting of zero-shot learning. Dealing with both the challenges in SBIR and zero-shot learning makes it become a more difficult task. Previous works mainly focus on utilizing one kind of information, i.e., the visual information or the semantic information. In this paper, we propose a SketchGCN model utilizing the graph convolution network, which simultaneously considers both the visual information and the semantic information. Thus, our model can effectively narrow the domain gap and transfer the knowledge. Furthermore, we generate the semantic information from the visual information using a Conditional Variational Autoencoder rather than only map them back from the visual space to the semantic space, which enhances the generalization ability of our model. Besides, feature loss, classification loss, and semantic loss are introduced to optimize our proposed SketchGCN model. Our model gets a good performance on the challenging Sketchy and TU-Berlin datasets.

【Keywords】:

1588. JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds.

Paper Link】 【Pages】:12951-12958

【Authors】: Lin Zhao ; Wenbing Tao

【Abstract】: In this paper, we propose a novel joint instance and semantic segmentation approach, which is called JSNet, in order to address the instance and semantic segmentation of 3D point clouds simultaneously. Firstly, we build an effective backbone network to extract robust features from the raw point clouds. Secondly, to obtain more discriminative features, a point cloud feature fusion module is proposed to fuse the different layer features of the backbone network. Furthermore, a joint instance semantic segmentation module is developed to transform semantic features into instance embedding space, and then the transformed features are further fused with instance features to facilitate instance segmentation. Meanwhile, this module also aggregates instance features into semantic feature space to promote semantic segmentation. Finally, the instance predictions are generated by applying a simple mean-shift clustering on instance embeddings. As a result, we evaluate the proposed JSNet on a large-scale 3D indoor point cloud dataset S3DIS and a part dataset ShapeNet, and compare it with existing approaches. Experimental results demonstrate our approach outperforms the state-of-the-art method in 3D instance segmentation with a significant improvement in 3D semantic prediction and our method is also beneficial for part segmentation. The source code for this work is available at https://github.com/dlinzhao/JSNet.

【Keywords】:

1589. Spherical Criteria for Fast and Accurate 360° Object Detection.

Paper Link】 【Pages】:12959-12966

【Authors】: Pengyu Zhao ; Ansheng You ; Yuanxing Zhang ; Jiaying Liu ; Kaigui Bian ; Yunhai Tong

【Abstract】: With the advance of omnidirectional panoramic technology, 360◦ imagery has become increasingly popular in the past few years. To better understand the 360◦ content, many works resort to the 360◦ object detection and various criteria have been proposed to bound the objects and compute the intersection-over-union (IoU) between bounding boxes based on the common equirectangular projection (ERP) or perspective projection (PSP). However, the existing 360◦ criteria are either inaccurate or inefficient for real-world scenarios. In this paper, we introduce a novel spherical criteria for fast and accurate 360◦ object detection, including both spherical bounding boxes and spherical IoU (SphIoU). Based on the spherical criteria, we propose a novel two-stage 360◦ detector, i.e., Reprojection R-CNN, by combining the advantages of both ERP and PSP, yielding efficient and accurate 360◦ object detection. To validate the design of spherical criteria and Reprojection R-CNN, we construct two unbiased synthetic datasets for training and evaluation. Experimental results reveal that compared with the existing criteria, the two-stage detector with spherical criteria achieves the best mAP results under the same inference speed, demonstrating that the spherical criteria can be more suitable for 360◦ object detection. Moreover, Reprojection R-CNN outperforms the previous state-of-the-art methods by over 30% on mAP with competitive speed, which confirms the efficiency and accuracy of the design.

【Keywords】:

1590. GTNet: Generative Transfer Network for Zero-Shot Object Detection.

Paper Link】 【Pages】:12967-12974

【Authors】: Shizhen Zhao ; Changxin Gao ; Yuanjie Shao ; Lerenhan Li ; Changqian Yu ; Zhong Ji ; Nong Sang

【Abstract】: We propose a Generative Transfer Network (GTNet) for zero-shot object detection (ZSD). GTNet consists of an Object Detection Module and a Knowledge Transfer Module. The Object Detection Module can learn large-scale seen domain knowledge. The Knowledge Transfer Module leverages a feature synthesizer to generate unseen class features, which are applied to train a new classification layer for the Object Detection Module. In order to synthesize features for each unseen class with both the intra-class variance and the IoU variance, we design an IoU-Aware Generative Adversarial Network (IoUGAN) as the feature synthesizer, which can be easily integrated into GTNet. Specifically, IoUGAN consists of three unit models: Class Feature Generating Unit (CFU), Foreground Feature Generating Unit (FFU), and Background Feature Generating Unit (BFU). CFU generates unseen features with the intra-class variance conditioned on the class semantic embeddings. FFU and BFU add the IoU variance to the results of CFU, yielding class-specific foreground and background features, respectively. We evaluate our method on three public datasets and the results demonstrate that our method performs favorably against the state-of-the-art ZSD approaches.

【Keywords】:

1591. Multi-Source Distilling Domain Adaptation.

Paper Link】 【Pages】:12975-12983

【Authors】: Sicheng Zhao ; Guangzhi Wang ; Shanghang Zhang ; Yang Gu ; Yaxian Li ; Zhichao Song ; Pengfei Xu ; Runbo Hu ; Hua Chai ; Kurt Keutzer

【Abstract】: Deep neural networks suffer from performance decay when there is domain shift between the labeled source domain and unlabeled target domain, which motivates the research on domain adaptation (DA). Conventional DA methods usually assume that the labeled data is sampled from a single source distribution. However, in practice, labeled data may be collected from multiple sources, while naive application of the single-source DA algorithms may lead to suboptimal solutions. In this paper, we propose a novel multi-source distilling domain adaptation (MDDA) network, which not only considers the different distances among multiple sources and the target, but also investigates the different similarities of the source samples to the target ones. Specifically, the proposed MDDA includes four stages: (1) pre-train the source classifiers separately using the training data from each source; (2) adversarially map the target into the feature space of each source respectively by minimizing the empirical Wasserstein distance between source and target; (3) select the source training samples that are closer to the target to fine-tune the source classifiers; and (4) classify each encoded target feature by corresponding source classifier, and aggregate different predictions using respective domain weight, which corresponds to the discrepancy between each source and target. Extensive experiments are conducted on public DA benchmarks, and the results demonstrate that the proposed MDDA significantly outperforms the state-of-the-art approaches. Our source code is released at: https://github.com/daoyuan98/MDDA.

【Keywords】:

1592. MemCap: Memorizing Style Knowledge for Image Captioning.

Paper Link】 【Pages】:12984-12992

【Authors】: Wentian Zhao ; Xinxiao Wu ; Xiaoxun Zhang

【Abstract】: Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately. In this paper, we propose MemCap, a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism. Rather than relying heavily on a language model to capture style factors in existing methods, our method resorts to memorizing stylized elements learned from training corpus. Particularly, we design a memory module that comprises a set of embedding vectors for encoding style-related phrases in training corpus. To acquire the style-related phrases, we develop a sentence decomposing algorithm that splits a stylized sentence into a style-related part that reflects the linguistic style and a content-related part that contains the visual content. When generating captions, our MemCap first extracts content-relevant style knowledge from the memory module via an attention mechanism and then incorporates the extracted knowledge into a language model. Extensive experiments on two stylized image captioning datasets (SentiCap and FlickrStyle10K) demonstrate the effectiveness of our method.

【Keywords】:

1593. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression.

Paper Link】 【Pages】:12993-13000

【Authors】: Zhaohui Zheng ; Ping Wang ; Wei Liu ; Jinze Li ; Rongguang Ye ; Dongwei Ren

【Abstract】: Bounding box regression is the crucial step in object detection. In existing methods, while ℓn-norm loss is widely adopted for bounding box regression, it is not tailored to the evaluation metric, i.e., Intersection over Union (IoU). Recently, IoU loss and generalized IoU (GIoU) loss have been proposed to benefit the IoU metric, but still suffer from the problems of slow convergence and inaccurate regression. In this paper, we propose a Distance-IoU (DIoU) loss by incorporating the normalized distance between the predicted box and the target box, which converges much faster in training than IoU and GIoU losses. Furthermore, this paper summarizes three geometric factors in bounding box regression, i.e., overlap area, central point distance and aspect ratio, based on which a Complete IoU (CIoU) loss is proposed, thereby leading to faster convergence and better performance. By incorporating DIoU and CIoU losses into state-of-the-art object detection algorithms, e.g., YOLO v3, SSD and Faster R-CNN, we achieve notable performance gains in terms of not only IoU metric but also GIoU metric. Moreover, DIoU can be easily adopted into non-maximum suppression (NMS) to act as the criterion, further boosting performance improvement. The source code and trained models are available at https://github.com/Zzh-tju/DIoU.

【Keywords】:

1594. Random Erasing Data Augmentation.

Paper Link】 【Pages】:13001-13008

【Authors】: Zhun Zhong ; Liang Zheng ; Guoliang Kang ; Shaozi Li ; Yi Yang

【Abstract】: In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN). In training, Random Erasing randomly selects a rectangle region in an image and erases its pixels with random values. In this process, training images with various levels of occlusion are generated, which reduces the risk of over-fitting and makes the model robust to occlusion. Random Erasing is parameter learning free, easy to implement, and can be integrated with most of the CNN-based recognition models. Albeit simple, Random Erasing is complementary to commonly used data augmentation techniques such as random cropping and flipping, and yields consistent improvement over strong baselines in image classification, object detection and person re-identification. Code is available at: https://github.com/zhunzhong07/Random-Erasing.

【Keywords】:

1595. Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition.

Paper Link】 【Pages】:13009-13016

【Authors】: Hao Zhou ; Wengang Zhou ; Yun Zhou ; Houqiang Li

【Abstract】: Despite the recent success of deep learning in continuous sign language recognition (CSLR), deep models typically focus on the most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars behind the collaboration of different visual cues (i,e., hand shape, facial expression and body posture). By injecting multi-cue learning into neural network design, we propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem. Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module. The SMC module is dedicated to spatial representation and explicitly decomposes visual features of different cues with the aid of a self-contained pose estimation branch. The TMC module models temporal correlations along two parallel paths, i.e., intra-cue and inter-cue, which aims to preserve the uniqueness and explore the collaboration of multiple cues. Finally, we design a joint optimization strategy to achieve the end-to-end sequence learning of the STMC network. To validate the effectiveness, we perform experiments on three large-scale CSLR benchmarks: PHOENIX-2014, CSL and PHOENIX-2014-T. Experimental results demonstrate that the proposed method achieves new state-of-the-art performance on all three benchmarks.

【Keywords】:

1596. Discriminative and Robust Online Learning for Siamese Visual Tracking.

Paper Link】 【Pages】:13017-13024

【Authors】: Jinghao Zhou ; Peng Wang ; Haoyang Sun

【Abstract】: The problem of visual object tracking has traditionally been handled by variant tracking paradigms, either learning a model of the object's appearance exclusively online or matching the object with the target in an offline-trained embedding space. Despite the recent success, each method agonizes over its intrinsic constraint. The online-only approaches suffer from a lack of generalization of the model they learn thus are inferior in target regression, while the offline-only approaches (e.g., convolutional siamese trackers) lack the target-specific context information thus are not discriminative enough to handle distractors, and robust enough to deformation. Therefore, we propose an online module with an attention mechanism for offline siamese networks to extract target-specific features under L2 error. We further propose a filter update strategy adaptive to treacherous background noises for discriminative learning, and a template update strategy to handle large target deformations for robust learning. Effectiveness can be validated in the consistent improvement over three siamese baselines: SiamFC, SiamRPN++, and SiamMask. Beyond that, our model based on SiamRPN++ obtains the best results over six popular tracking benchmarks and can operate beyond real-time.

【Keywords】:

1597. Deep Domain-Adversarial Image Generation for Domain Generalisation.

Paper Link】 【Pages】:13025-13032

【Authors】: Kaiyang Zhou ; Yongxin Yang ; Timothy M. Hospedales ; Tao Xiang

【Abstract】: Machine learning models typically suffer from the domain shift problem when trained on a source dataset and evaluated on a target dataset of different distribution. To overcome this problem, domain generalisation (DG) methods aim to leverage data from multiple source domains so that a trained model can generalise to unseen domains. In this paper, we propose a novel DG approach based on Deep Domain-Adversarial Image Generation (DDAIG). Specifically, DDAIG consists of three components, namely a label classifier, a domain classifier and a domain transformation network (DoTNet). The goal for DoTNet is to map the source training data to unseen domains. This is achieved by having a learning objective formulated to ensure that the generated data can be correctly classified by the label classifier while fooling the domain classifier. By augmenting the source training data with the generated unseen domain data, we can make the label classifier more robust to unknown domain changes. Extensive experiments on four DG datasets demonstrate the effectiveness of our approach.

【Keywords】:

1598. Progressive Bi-C3D Pose Grammar for Human Pose Estimation.

Paper Link】 【Pages】:13033-13040

【Authors】: Lu Zhou ; Yingying Chen ; Jinqiao Wang ; Hanqing Lu

【Abstract】: In this paper, we propose a progressive pose grammar network learned with Bi-C3D (Bidirectional Convolutional 3D) for human pose estimation. Exploiting the dependencies among the human body parts proves effective in solving the problems such as complex articulation, occlusion and so on. Therefore, we propose two articulated grammars learned with Bi-C3D to build the relationships of the human joints and exploit the contextual information of human body structure. Firstly, a local multi-scale Bi-C3D kinematics grammar is proposed to promote the message passing process among the locally related joints. The multi-scale kinematics grammar excavates different levels human context learned by the network. Moreover, a global sequential grammar is put forward to capture the long-range dependencies among the human body joints. The whole procedure can be regarded as a local-global progressive refinement process. Without bells and whistles, our method achieves competitive performance on both MPII and LSP benchmarks compared with previous methods, which confirms the feasibility and effectiveness of C3D in information interactions.

【Keywords】:

1599. Unified Vision-Language Pre-Training for Image Captioning and VQA.

Paper Link】 【Pages】:13041-13049

【Authors】: Luowei Zhou ; Hamid Palangi ; Lei Zhang ; Houdong Hu ; Jason J. Corso ; Jianfeng Gao

【Abstract】: This paper presents a unified Vision-Language Pre-training (VLP) model. The model is unified in that (1) it can be fine-tuned for either vision-language generation (e.g., image captioning) or understanding (e.g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models. The unified VLP model is pre-trained on a large amount of image-text pairs using the unsupervised learning objectives of two tasks: bidirectional and sequence-to-sequence (seq2seq) masked vision-language prediction. The two tasks differ solely in what context the prediction conditions on. This is controlled by utilizing specific self-attention masks for the shared transformer network. To the best of our knowledge, VLP is the first reported model that achieves state-of-the-art results on both vision-language generation and understanding tasks, as disparate as image captioning and visual question answering, across three challenging benchmark datasets: COCO Captions, Flickr30k Captions, and VQA 2.0. The code and the pre-trained models are available at https://github.com/LuoweiZhou/VLP.

【Keywords】:

1600. Ladder Loss for Coherent Visual-Semantic Embedding.

Paper Link】 【Pages】:13050-13057

【Authors】: Mo Zhou ; Zhenxing Niu ; Le Wang ; Zhanning Gao ; Qilin Zhang ; Gang Hua

【Abstract】: For visual-semantic embedding, the existing methods normally treat the relevance between queries and candidates in a bipolar way – relevant or irrelevant, and all “irrelevant” candidates are uniformly pushed away from the query by an equal margin in the embedding space, regardless of their various proximity to the query. This practice disregards relatively discriminative information and could lead to suboptimal ranking in the retrieval results and poorer user experience, especially in the long-tail query scenario where a matching candidate may not necessarily exist. In this paper, we introduce a continuous variable to model the relevance degree between queries and multiple candidates, and propose to learn a coherent embedding space, where candidates with higher relevance degrees are mapped closer to the query than those with lower relevance degrees. In particular, the new ladder loss is proposed by extending the triplet loss inequality to a more general inequality chain, which implements variable push-away margins according to respective relevance degrees. In addition, a proper Coherent Score metric is proposed to better measure the ranking results including those “irrelevant” candidates. Extensive experiments on multiple datasets validate the efficacy of our proposed method, which achieves significant improvement over existing state-of-the-art methods.

【Keywords】:

1601. Generate, Segment, and Refine: Towards Generic Manipulation Segmentation.

Paper Link】 【Pages】:13058-13065

【Authors】: Peng Zhou ; Bor-Chun Chen ; Xintong Han ; Mahyar Najibi ; Abhinav Shrivastava ; Ser-Nam Lim ; Larry Davis

【Abstract】: Detecting manipulated images has become a significant emerging challenge. The advent of image sharing platforms and the easy availability of advanced photo editing software have resulted in a large quantities of manipulated images being shared on the internet. While the intent behind such manipulations varies widely, concerns on the spread of false news and misinformation is growing. Current state of the art methods for detecting these manipulated images suffers from the lack of training data due to the laborious labeling process. We address this problem in this paper, for which we introduce a manipulated image generation process that creates true positives using currently available datasets. Drawing from traditional work on image blending, we propose a novel generator for creating such examples. In addition, we also propose to further create examples that force the algorithm to focus on boundary artifacts during training. Strong experimental results validate our proposal.

【Keywords】:

1602. Motion-Attentive Transition for Zero-Shot Video Object Segmentation.

Paper Link】 【Pages】:13066-13073

【Authors】: Tianfei Zhou ; Shunzhou Wang ; Yi Zhou ; Yazhou Yao ; Jianwu Li ; Ling Shao

【Abstract】: In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal object representation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder, which transforms appearance features into motion-attentive representations at each convolutional stage. In this way, the encoder becomes deeply interleaved, allowing for closely hierarchical interactions between object motion and appearance. This is superior to the typical two-stream architecture, which treats motion and appearance separately in each stream and often suffers from overfitting to appearance information. Additionally, a bridge network is proposed to obtain a compact, discriminative and scale-sensitive representation for multi-level encoder features, which is further fed into a decoder to achieve segmentation results. Extensive experiments on three challenging public benchmarks (i.e., DAVIS-16, FBMS and Youtube-Objects) show that our model achieves compelling performance against the state-of-the-arts. Code is available at: https://github.com/tfzhou/MATNet.

【Keywords】:

1603. When AWGN-Based Denoiser Meets Real Noises.

Paper Link】 【Pages】:13074-13081

【Authors】: Yuqian Zhou ; Jianbo Jiao ; Haibin Huang ; Yang Wang ; Jue Wang ; Honghui Shi ; Thomas S. Huang

【Abstract】: Discriminative learning based image denoisers have achieved promising performance on synthetic noises such as Additive White Gaussian Noise (AWGN). The synthetic noises adopted in most previous work are pixel-independent, but real noises are mostly spatially/channel-correlated and spatially/channel-variant. This domain gap yields unsatisfied performance on images with real noises if the model is only trained with AWGN. In this paper, we propose a novel approach to boost the performance of a real image denoiser which is trained only with synthetic pixel-independent noise data dominated by AWGN. First, we train a deep model that consists of a noise estimator and a denoiser with mixed AWGN and Random Value Impulse Noise (RVIN). We then investigate Pixel-shuffle Down-sampling (PD) strategy to adapt the trained model to real noises. Extensive experiments demonstrate the effectiveness and generalization of the proposed approach. Notably, our method achieves state-of-the-art performance on real sRGB images in the DND benchmark among models trained with synthetic noises. Codes are available at https://github.com/yzhouas/PD-Denoising-pytorch.

【Keywords】:

1604. Multi-Type Self-Attention Guided Degraded Saliency Detection.

Paper Link】 【Pages】:13082-13089

【Authors】: Ziqi Zhou ; Zheng Wang ; Huchuan Lu ; Song Wang ; Meijun Sun

【Abstract】: Existing saliency detection techniques are sensitive to image quality and perform poorly on degraded images. In this paper, we systematically analyze the current status of the research on detecting salient objects from degraded images and then propose a new multi-type self-attention network, namely MSANet, for degraded saliency detection. The main contributions include: 1) Applying attention transfer learning to promote semantic detail perception and internal feature mining of the target network on degraded images; 2) Developing a multi-type self-attention mechanism to achieve the weight recalculation of multi-scale features. By computing global and local attention scores, we obtain the weighted features of different scales, effectively suppress the interference of noise and redundant information, and achieve a more complete boundary extraction. The proposed MSANet converts low-quality inputs to high-quality saliency maps directly in an end-to-end fashion. Experiments on seven widely-used datasets show that our approach produces good performance on both clear and degraded images.

【Keywords】:

1605. Towards Omni-Supervised Face Alignment for Large Scale Unlabeled Videos.

Paper Link】 【Pages】:13090-13097

【Authors】: Congcong Zhu ; Hao Liu ; Zhenhua Yu ; Xuehong Sun

【Abstract】: In this paper, we propose a spatial-temporal relational reasoning networks (STRRN) approach to investigate the problem of omni-supervised face alignment in videos. Unlike existing fully supervised methods which rely on numerous annotations by hand, our learner exploits large scale unlabeled videos plus available labeled data to generate auxiliary plausible training annotations. Motivated by the fact that neighbouring facial landmarks are usually correlated and coherent across consecutive frames, our approach automatically reasons about discriminative spatial-temporal relationships among landmarks for stable face tracking. Specifically, we carefully develop an interpretable and efficient network module, which disentangles facial geometry relationship for every static frame and simultaneously enforces the bi-directional cycle-consistency across adjacent frames, thus allowing the modeling of intrinsic spatial-temporal relations from raw face sequences. Extensive experimental results demonstrate that our approach surpasses the performance of most fully supervised state-of-the-arts.

【Keywords】:

1606. FASTER Recurrent Networks for Efficient Video Classification.

Paper Link】 【Pages】:13098-13105

【Authors】: Linchao Zhu ; Du Tran ; Laura Sevilla-Lara ; Yi Yang ; Matt Feiszli ; Heng Wang

【Abstract】: Typical video classification methods often divide a video into short clips, do inference on each clip independently, then aggregate the clip-level predictions to generate the video-level results. However, processing visually similar clips independently ignores the temporal structure of the video sequence, and increases the computational cost at inference time. In this paper, we propose a novel framework named FASTER, i.e., Feature Aggregation for Spatio-TEmporal Redundancy. FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities. The FASTER framework can integrate high quality representations from expensive models to capture subtle motion information and lightweight representations from cheap models to cover scene changes in the video. A new recurrent network (i.e., FAST-GRU) is designed to aggregate the mixture of different representations. Compared with existing approaches, FASTER can reduce the FLOPs by over 10× while maintaining the state-of-the-art accuracy across popular datasets, such as Kinetics, UCF-101 and HMDB-51.

【Keywords】:

1607. EEMEFN: Low-Light Image Enhancement via Edge-Enhanced Multi-Exposure Fusion Network.

Paper Link】 【Pages】:13106-13113

【Authors】: Minfeng Zhu ; Pingbo Pan ; Wei Chen ; Yi Yang

【Abstract】: This work focuses on the extremely low-light image enhancement, which aims to improve image brightness and reveal hidden information in darken areas. Recently, image enhancement approaches have yielded impressive progress. However, existing methods still suffer from three main problems: (1) low-light images usually are high-contrast. Existing methods may fail to recover images details in extremely dark or bright areas; (2) current methods cannot precisely correct the color of low-light images; (3) when the object edges are unclear, the pixel-wise loss may treat pixels of different objects equally and produce blurry images. In this paper, we propose a two-stage method called Edge-Enhanced Multi-Exposure Fusion Network (EEMEFN) to enhance extremely low-light images. In the first stage, we employ a multi-exposure fusion module to address the high contrast and color bias issues. We synthesize a set of images with different exposure time from a single image and construct an accurate normal-light image by combining well-exposed areas under different illumination conditions. Thus, it can produce realistic initial images with correct color from extremely noisy and low-light images. Secondly, we introduce an edge enhancement module to refine the initial images with the help of the edge information. Therefore, our method can reconstruct high-quality images with sharp edges when minimizing the pixel-wise loss. Experiments on the See-in-the-Dark dataset indicate that our EEMEFN approach achieves state-of-the-art performance.

【Keywords】:

1608. Viewpoint-Aware Loss with Angular Regularization for Person Re-Identification.

Paper Link】 【Pages】:13114-13121

【Authors】: Zhihui Zhu ; Xinyang Jiang ; Feng Zheng ; Xiaowei Guo ; Feiyue Huang ; Xing Sun ; Weishi Zheng

【Abstract】: Although great progress in supervised person re-identification (Re-ID) has been made recently, due to the viewpoint variation of a person, Re-ID remains a massive visual challenge. Most existing viewpoint-based person Re-ID methods project images from each viewpoint into separated and unrelated sub-feature spaces. They only model the identity-level distribution inside an individual viewpoint but ignore the underlying relationship between different viewpoints. To address this problem, we propose a novel approach, called Viewpoint-Aware Loss with Angular Regularization (VA-reID). Instead of one subspace for each viewpoint, our method projects the feature from different viewpoints into a unified hypersphere and effectively models the feature distribution on both the identity-level and the viewpoint-level. In addition, rather than modeling different viewpoints as hard labels used for conventional viewpoint classification, we introduce viewpoint-aware adaptive label smoothing regularization (VALSR) that assigns the adaptive soft label to feature representation. VALSR can effectively solve the ambiguity of the viewpoint cluster label assignment. Extensive experiments on the Market1501 and DukeMTMC-reID datasets demonstrated that our method outperforms the state-of-the-art supervised Re-ID methods.

【Keywords】:

1609. iFAN: Image-Instance Full Alignment Networks for Adaptive Object Detection.

Paper Link】 【Pages】:13122-13129

【Authors】: Chenfan Zhuang ; Xintong Han ; Weilin Huang ; Matthew R. Scott

【Abstract】: Training an object detector on a data-rich domain and applying it to a data-poor one with limited performance drop is highly attractive in industry, because it saves huge annotation cost. Recent research on unsupervised domain adaptive object detection has verified that aligning data distributions between source and target images through adversarial learning is very useful. The key is when, where and how to use it to achieve best practice. We propose Image-Instance Full Alignment Networks (iFAN) to tackle this problem by precisely aligning feature distributions on both image and instance levels: 1) Image-level alignment: multi-scale features are roughly aligned by training adversarial domain classifiers in a hierarchically-nested fashion. 2) Full instance-level alignment: deep semantic information and elaborate instance representations are fully exploited to establish a strong relationship among categories and domains. Establishing these correlations is formulated as a metric learning problem by carefully constructing instance pairs. Above-mentioned adaptations can be integrated into an object detector (e.g. Faster R-CNN), resulting in an end-to-end trainable framework where multiple alignments can work collaboratively in a coarse-to-fine manner. In two domain adaptation tasks: synthetic-to-real (SIM10K → Cityscapes) and normal-to-foggy weather (Cityscapes → Foggy Cityscapes), iFAN outperforms the state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.

【Keywords】:

1610. Learning Attentive Pairwise Interaction for Fine-Grained Classification.

Paper Link】 【Pages】:13130-13137

【Authors】: Peiqin Zhuang ; Yali Wang ; Yu Qiao

【Abstract】: Fine-grained classification is a challenging problem, due to subtle differences among highly-confused categories. Most approaches address this difficulty by learning discriminative representation of individual input image. On the other hand, humans can effectively identify contrastive clues by comparing image pairs. Inspired by this fact, this paper proposes a simple but effective Attentive Pairwise Interaction Network (API-Net), which can progressively recognize a pair of fine-grained images by interaction. Specifically, API-Net first learns a mutual feature vector to capture semantic differences in the input pair. It then compares this mutual vector with individual vectors to generate gates for each input image. These distinct gate vectors inherit mutual context on semantic differences, which allow API-Net to attentively capture contrastive clues by pairwise interaction between two images. Additionally, we train API-Net in an end-to-end manner with a score ranking regularization, which can further generalize API-Net by taking feature priorities into account. We conduct extensive experiments on five popular benchmarks in fine-grained classification. API-Net outperforms the recent SOTA methods, i.e., CUB-200-2011 (90.0%), Aircraft (93.9%), Stanford Cars (95.3%), Stanford Dogs (90.3%), and NABirds (88.1%).

【Keywords】:

IAAI Technical Track: Deployed Papers 11

1611. Automated Conversation Review to Surface Virtual Assistant Misunderstandings: Reducing Cost and Increasing Privacy.

Paper Link】 【Pages】:13140-13147

【Authors】: Ian Beaver ; Abdullah Mueen

【Abstract】: With the rise of Intelligent Virtual Assistants (IVAs), there is a necessary rise in human effort to identify conversations containing misunderstood user inputs. These conversations uncover error in natural language understanding and help prioritize and expedite improvements to the IVA. As human reviewer time is valuable and manual analysis is time consuming, prioritizing the conversations where misunderstanding has likely occurred reduces costs and speeds improvement. In addition, less conversations reviewed by humans mean less user data is exposed, increasing privacy. We present a scalable system for automated conversation review that can identify potential miscommunications. Our system provides IVA designers with suggested actions to fix errors in IVA understanding, prioritizes areas of language model repair, and automates the review of conversations where desired.Verint - Next IT builds IVAs on behalf of other companies and organizations, and therefore analyzes large volumes of conversational data. Our review system has been in production for over three years and saves our company roughly $1.5 million in annotation costs yearly, as well as shortened the refinement cycle of production IVAs. In this paper, the system design is discussed and performance in identifying errors in IVA understanding is compared to that of human reviewers.

【Keywords】:

1612. Day-Ahead Forecasting of Losses in the Distribution Network.

Paper Link】 【Pages】:13148-13155

【Authors】: Nisha Dalal ; Martin Mølnå ; Mette Herrem ; Magne Røen ; Odd Erik Gundersen

【Abstract】: We present a commercially deployed machine learning system that automates the day-ahead nomination of the expected grid loss for a Norwegian utility company. It meets several practical constraints and issues related to, among other things, delayed, missing and incorrect data and a small data set. The system incorporates a total of 24 different models that performs forecasts for three sub-grids. Each day one model is selected for making the hourly day-ahead forecasts for each sub-grid. The deployed system reduces the MAE with 41% from 3.68 MW to 2.17 MW per hour from mid July to mid October. It is robust and reduces manual work.

【Keywords】:

1613. Understanding Chat Messages for Sticker Recommendation in Messaging Apps.

Paper Link】 【Pages】:13156-13163

【Authors】: Abhishek Laddha ; Mohamed Hanoosh ; Debdoot Mukherjee ; Parth Patwa ; Ankur Narang

【Abstract】: Stickers are popularly used in messaging apps such as Hike to visually express a nuanced range of thoughts and utterances to convey exaggerated emotions. However, discovering the right sticker from a large and ever expanding pool of stickers while chatting can be cumbersome. In this paper, we describe a system for recommending stickers in real time as the user is typing based on the context of the conversation. We decompose the sticker recommendation (SR) problem into two steps. First, we predict the message that the user is likely to send in the chat. Second, we substitute the predicted message with an appropriate sticker. Majority of Hike's messages are in the form of text which is transliterated from users' native language to the Roman script. This leads to numerous orthographic variations of the same message and makes accurate message prediction challenging. To address this issue, we learn dense representations of chat messages employing character level convolution network in an unsupervised manner. We use them to cluster the messages that have the same meaning. In the subsequent steps, we predict the message cluster instead of the message. Our approach does not depend on human labelled data (except for validation), leading to fully automatic updation and tuning pipeline for the underlying models. We also propose a novel hybrid message prediction model, which can run with low latency on low-end phones that have severe computational limitations. Our described system has been deployed for more than 6 months and is being used by millions of users along with hundreds of thousands of expressive stickers.

【Keywords】:

1614. Embedding Convolution Neural Network-Based Defect Finder for Deployed Vision Inspector in Manufacturing Company Frontec.

Paper Link】 【Pages】:13164-13171

【Authors】: Kyoung Jun Lee ; Jun Woo Kwon ; Soohong Min ; Jungho Yoon

【Abstract】: In collaboration with Frontec, which produces parts such as bolts and nuts for the automobile industry, Kyung Hee University and Benple Inc. develop and deploy AI system for automatic quality inspection of weld nuts. Various constraints to consider exist in adopting AI for the factory, such as response time and limited computing resources available. Our convolutional neural network (CNN) system using large-scale images must classify weld nuts within 0.2 seconds with accuracy over 95%. We designed Circular Hough Transform based preprocessing and an adjusted VGG (Visual Geometry Group) model. The system showed accuracy over 99% and response time of about 0.14 sec. We use TCP / IP protocol to communicate the embedded classification system with an existing vision inspector using LabVIEW. We suggest ways to develop and embed a deep learning framework in an existing manufacturing environment without a hardware change.

【Keywords】:

1615. FedVision: An Online Visual Object Detection Platform Powered by Federated Learning.

Paper Link】 【Pages】:13172-13179

【Authors】: Yang Liu ; Anbu Huang ; Yun Luo ; He Huang ; Youzhi Liu ; Yuanyuan Chen ; Lican Feng ; Tianjian Chen ; Han Yu ; Qiang Yang

【Abstract】: Visual object detection is a computer vision-based artificial intelligence (AI) technique which has many practical applications (e.g., fire hazard monitoring). However, due to privacy concerns and the high cost of transmitting video data, it is highly challenging to build object detection models on centrally stored large training datasets following the current approach. Federated learning (FL) is a promising approach to resolve this challenge. Nevertheless, there currently lacks an easy to use tool to enable computer vision application developers who are not experts in federated learning to conveniently leverage this technology and apply it in their systems. In this paper, we report FedVision - a machine learning engineering platform to support the development of federated learning powered computer vision applications. The platform has been deployed through a collaboration between WeBank and Extreme Vision to help customers develop computer vision-based safety monitoring solutions in smart city applications. Over four months of usage, it has achieved significant efficiency improvement and cost reduction while removing the need to transmit sensitive data for three major corporate customers. To the best of our knowledge, this is the first real application of FL in computer vision-based tasks.

【Keywords】:

1616. Feedback-Based Self-Learning in Large-Scale Conversational AI Agents.

Paper Link】 【Pages】:13180-13187

【Authors】: Pragaash Ponnusamy ; Alireza Roshan Ghias ; Chenlei Guo ; Ruhi Sarikaya

【Abstract】: Today, most of the large-scale conversational AI agents such as Alexa, Siri, or Google Assistant are built using manually annotated data to train the different components of the system including Automatic Speech Recognition (ASR), Natural Language Understanding (NLU) and Entity Resolution (ER). Typically, the accuracy of the machine learning models in these components are improved by manually transcribing and annotating data. As the scope of these systems increase to cover more scenarios and domains, manual annotation to improve the accuracy of these components becomes prohibitively costly and time consuming. In this paper, we propose a system that leverages customer/system interaction feedback signals to automate learning without any manual annotation. Users of these systems tend to modify a previous query in hopes of fixing an error in the previous turn to get the right results. These reformulations, which are often preceded by defective experiences caused by either errors in ASR, NLU, ER or the application. In some cases, users may not properly formulate their requests (e.g. providing partial title of a song), but gleaning across a wider pool of users and sessions reveals the underlying recurrent patterns. Our proposed self-learning system automatically detects the errors, generate reformulations and deploys fixes to the runtime system to correct different types of errors occurring in different components of the system. In particular, we propose leveraging an absorbing Markov Chain model as a collaborative filtering mechanism in a novel attempt to mine these patterns. We show that our approach is highly scalable, and able to learn reformulations that reduce Alexa-user errors by pooling anonymized data across millions of customers. The proposed self-learning system achieves a win-loss ratio of 11.8 and effectively reduces the defect rate by more than 30% on utterance level reformulations in our production A/B tests. To the best of our knowledge, this is the first self-learning large-scale conversational AI system in production.

【Keywords】:

Paper Link】 【Pages】:13188-13195

【Authors】: Michael Powell ; Jamison A. Rotz ; Kevin D. O'Malley

【Abstract】: The U.S. Navy is successfully using natural language processing (NLP) and common machine-learning (ML) algorithms to categorize and automatically route plain text support requests at a Navy fleet support center. The algorithms enhance routine IT support tasks with automation and reduce the workload of service desk agents. The ML pipeline works in a five-step process. First, an archive of documents is created from various sources, including standard operating procedure (SOP) memos, frequently asked questions (FAQs), knowledge articles, Wikipedia articles, encyclopedia articles, previously closed support requests, and other relevant documents. Next, a library of words and phrases is generated from the archive. Then, this library is used to vectorize an incoming support request, producing a term frequency inverse document frequency (TF-IDF) vector. Following, the TF-IDF vector is used to compute similarity scores between the support request and the documents in the previously-created archive. Finally, the similarity scores are processed by support vector machine (SVM) classifiers to categorize and route the incoming support request to the correct support provider. This algorithm was deployed at a U.S. Navy customer support center as part of a pilot study, where it decreased the amount of time agents spend on tickets by 35%; the amount of time required to assign tickets by 74%; and the amount of time to close tickets by 60%. Our internal tests show that, with an error rate of 2%, a 35% reduction in ticket volume could be achieved by fully deploying these algorithms.

【Keywords】:

1618. Question Quality Improvement: Deep Question Understanding for Incident Management in Technical Support Domain.

Paper Link】 【Pages】:13196-13203

【Authors】: Anupama Ray ; Pooja Aggarwal ; Csaba Hadhazi ; Gargi Dasgupta ; Amit M. Paradkar

【Abstract】: Technical support domain involves solving problems from user queries through various channels: voice, web and chat, and is both time-consuming and labour intensive. The textual queries in web or chat mode are unstructured and often incomplete. This affects information retrieval and increases the difficulty level for agents to solve it. Such cases require multiple rounds of interaction between user and agent/chatbot in order to better understand the user query. This paper presents a deployed system called Question Quality Improvement (QQI), that aims to improve the quality of user utterance by understanding and extracting important parts of an utterance and gamifying the user interface, prompting them to enter the remaining relevant information. QQI is guided by an ontology designed for the technical support domain and uses co-reference resolution and deep parsing to understand the sentences. Using the syntactics and semantics in the deep parse tree structure various attributes in the ontology are extracted. The system has been in production for over two years supporting around 800 products resulting in a reduction in the time-to-resolve cases by around 29%, leading to huge cost savings. QQI being a core natural language understanding and metadata extraction technology, directly affects more than 8K tickets everyday. These cases are submitted after 50K edits done on the case based on QQI feedback. QQI outputs are used by other technologies such as search and retrieval, case routing for automated dispatch, case-difficulty-prediction, and by the chatbots supported in each product page.

【Keywords】:

1619. Clarity: Data-Driven Automatic Assessment of Product Competitiveness.

Paper Link】 【Pages】:13204-13211

【Authors】: Sheema Usmani ; Mariana Bernagozzi ; Yufeng Huang ; Michelle Morales ; Amir Sabet Sarvestani ; Biplav Srivastava

【Abstract】: Competitive analysis is a critical part of any business. Product managers, sellers, and marketers spend time and resources scouring through an immense amount of online and offline content, aiming to discover what their competitors are doing in the marketplace to understand what type of threat they pose to their business' financial well-being. Currently, this process is time and labor-intensive, slow and costly. This paper presents Clarity, a data-driven unsupervised system for assessment of products, which is currently in deployment in the large IT company, IBM. Clarity has been running for more than a year and is used by over 1,500 people to perform over 160 competitive analyses involving over 800 products. The system considers multiple factors from a collection of online content: numeric ratings by online users, sentiments of reviews for key product performance dimensions, content volume, and recency of content. The results and explanations of factors leading to the results are visualized in an interactive dashboard that allows users to track their product's performance as well as understand main contributing factors. Its efficacy has been tested in a series of cases across IBM's portfolio which spans software, hardware, and services.

【Keywords】:

Paper Link】 【Pages】:13212-13219

【Authors】: Anxiang Zeng ; Han Yu ; Qing Da ; Yusen Zhan ; Chunyan Miao

【Abstract】: In large-scale search systems, the quality of the ranking results is continually improved with the introduction of more factors from complex procedures. Meanwhile, the increase in factors demands more computation resources and increases system response latency. It has been observed that, under some certain context a search instance may require only a small set of useful factors instead of all factors in order to return high quality results. Therefore, removing ineffective factors accordingly can significantly improve system efficiency. In this paper, we report our experience incorporating our Contextual Factor Selection (CFS) approach into the Taobao e-commerce platform to optimize the selection of factors based on the context of each search query in order to simultaneously achieve high quality search results while significantly reducing latency time. This problem is treated as a combinatorial optimization problem which can be tackled through a sequential decision-making procedure. The problem can be efficiently solved by CFS through a deep reinforcement learning method with reward shaping to address the problems of reward signal scarcity and wide reward signal distribution in real-world search engines. Through extensive off-line experiments based on data from the Taobao.com platform, CFS is shown to significantly outperform state-of-the-art approaches. Online deployment on Taobao.com demonstrated that CFS is able to reduce average search latency time by more than 40% compared to the previous approach with negligible reduction in search result quality. Under peak usage during the Single's Day Shopping Festival (November 11th) in 2017, CFS reduced peak load search latency time by 33% compared to the previous approach, helping Taobao.com achieve 40% higher revenue than the same period during 2016.   Corrigendum

The spelling of coauthor Yusen Zan in the paper "Accelerating Ranking in E-Commerce Search Engines through Contextual Factor Selection" has been changed from Zan to Zhan. The original spelling was a typographical error. 

【Keywords】:

1621. PIDS: An Intelligent Electric Power Management Platform.

Paper Link】 【Pages】:13220-13227

【Authors】: Yongqing Zheng ; Han Yu ; Yuliang Shi ; Kun Zhang ; Shuai Zhen ; Lizhen Cui ; Cyril Leung ; Chunyan Miao

【Abstract】: Electricity information tracking systems are increasingly being adopted across China. Such systems can collect real-time power consumption data from users, and provide opportunities for artificial intelligence (AI) to help power companies and authorities make optimal demand-side management decisions. In this paper, we discuss power utilization improvement in Shandong Province, China with a deployed AI application - the Power Intelligent Decision Support (PIDS) platform. Based on improved short-term power consumption gap prediction, PIDS uses an optimal power adjustment plan which enables fine-grained Demand Response (DR) and Orderly Power Utilization (OPU) recommendations to ensure stable operation while minimizing power disruptions and improving fair treatment of participating companies. Deployed in August 2018, the platform is helping over 400 companies optimize their power consumption through DR while dynamically managing the OPU process for around 10,000 companies. Compared to the previous system, power outage under PIDS through planned shutdown has been reduced from 16% to 0.56%, resulting in significant gains in economic activities.

【Keywords】:

IAAI Technical Track: Emerging Papers 27

1622. Combining Real-Time Segmentation and Classification of Rehabilitation Exercises with LSTM Networks and Pointwise Boosting.

Paper Link】 【Pages】:13229-13234

【Authors】: Antonio Bevilacqua ; Giovanni Ciampi ; Rob Argent ; Brian Caulfield ; M. Tahar Kechadi

【Abstract】: Autonomous biofeedback tools in support of rehabilitation patients are commonly built as multi-tier pipelines, where a segmentation algorithm is first responsible for isolating motion primitives, and then classification can be performed on each primitive. In this paper, we present a novel segmentation technique that integrates on-the-fly qualitative classification of physical movements in the process. We adopt Long Short-Term Memory (LSTM) networks to model the temporal patterns of a streaming multivariate time series, obtained by sampling acceleration and angular velocity of the limb in motion, and then we aggregate the pointwise predictions of each isolated movement using different boosting methods. We tested our technique against a dataset composed of four common lower-limb rehabilitation exercises, collected from heterogeneous populations (clinical and healthy). Experimental results are promising and show that combining segmentation and classification of orthopaedic movements is a valid method with many potential real-world applications.

【Keywords】:

1623. Did That Lost Ballot Box Cost Me a Seat? Computing Manipulations of STV Elections.

Paper Link】 【Pages】:13235-13240

【Authors】: Michelle L. Blom ; Andrew Conway ; Peter J. Stuckey ; Vanessa J. Teague

【Abstract】: Mistakes made by humans, or machines, commonly arise when managing ballots cast in an election. In the 2013 Australian Federal Election, for example, 1,370 West Australian Senate ballots were lost, eventually leading to a costly re-run of the election. Other mistakes include ballots that are misrecorded by electronic voting systems, voters that cast invalid ballots, or vote multiple times at different polling locations. We present a method for assessing whether such problems could have made a difference to the outcome of a Single Transferable Vote (STV) election – a complex system of preferential voting for multi-seat elections. It is used widely in Australia, in Ireland, and in a range of local government elections in the United Kingdom and United States.

【Keywords】:

1624. Probabilistic Super Resolution for Mineral Spectroscopy.

Paper Link】 【Pages】:13241-13247

【Authors】: Alberto Candela ; David R. Thompson ; David Wettergreen ; Kerry Cawse-Nicholson ; Sven Geier ; Michael L. Eastwood ; Robert O. Green

【Abstract】: Earth and planetary sciences often rely upon the detailed examination of spectroscopic data for rock and mineral identification. This typically requires the collection of high resolution spectroscopic measurements. However, they tend to be scarce, as compared to low resolution remote spectra. This work addresses the problem of inferring high-resolution mineral spectroscopic measurements from low resolution observations using probability models. We present the Deep Gaussian Conditional Model, a neural network that performs probabilistic super resolution via maximum likelihood estimation. It also provides insight into learned correlations between measurements and spectroscopic features, allowing for the tractability and interpretability that scientists often require for mineral identification. Experiments using remote spectroscopic data demonstrate that our method compares favorably to other analogous probabilistic methods. Finally, we show and discuss how our method provides human-interpretable results, making it a compelling analysis tool for scientists.

【Keywords】:

1625. Detecting Suspicious Timber Trades.

Paper Link】 【Pages】:13248-13254

【Authors】: Debanjan Datta ; Mohammad Raihanul Islam ; Nathan Self ; Amelia Meadows ; John Simeone ; Willow Outhwaite ; Chen Hin Keong ; Amy Smith ; Linda Walker ; Naren Ramakrishnan

【Abstract】: Developing algorithms that identify potentially illegal trade shipments is a non-trivial task, exacerbated by the size of shipment data as well as the unavailability of positive training data. In collaboration with conservation organizations, we develop a framework that incorporates machine learning and domain knowledge to tackle this challenge. Modeling the task as anomaly detection, we propose a simple and effective embedding-based anomaly detection approach for categorical data that provides better performance and scalability than the current state-of-art, along with a negative sampling approach that can efficiently train the proposed model. Additionally, we show how our model aids the interpretability of results which is crucial for the task. Domain knowledge, though sparse and scattered across multiple open data sources, is ingested with input of domain experts to create rules that highlight actionable results. The application framework demonstrates the applicability of our proposed approach on real world trade data. An interface combined with the framework presents a complete system that can ingest, detect and aid in the analysis of suspicious timber trades.

【Keywords】:

1626. Automatic Building and Labeling of HD Maps with Deep Learning.

Paper Link】 【Pages】:13255-13260

【Authors】: Mahdi Elhousni ; Yecheng Lyu ; Ziming Zhang ; Xinming Huang

【Abstract】: In a world where autonomous driving cars are becoming increasingly more common, creating an adequate infrastructure for this new technology is essential. This includes building and labeling high-definition (HD) maps accurately and efficiently. Today, the process of creating HD maps requires a lot of human input, which takes time and is prone to errors. In this paper, we propose a novel method capable of generating labelled HD maps from raw sensor data. We implemented and tested our methods on several urban scenarios using data collected from our test vehicle. The results show that the proposed deep learning based method can produce highly accurate HD maps. This approach speeds up the process of building and labeling HD maps, which can make meaningful contribution to the deployment of autonomous vehicles.

【Keywords】:

1627. Analog Accelerator for Simulation and Diagnostics.

Paper Link】 【Pages】:13261-13266

【Authors】: Alexander Feldman ; Ion Matei ; Emil Totev ; Johan de Kleer

【Abstract】: We propose a new method for solving Initial Value Problems (IVPs). Our method is based on analog computing and has the potential to almost eliminate traditional switching time in digital computing. The approach can be used to simulate large systems longer, faster, and with higher accuracy. Many algorithms for Model-Based Diagnosis use numerical integration to simulate physical systems. The numerical integration process is often either computationally expensive or imprecise. We propose a new method, based on Field-Programmable Analog Arrays (FPAAs) that has the potential to overcome many practical problems. We envision a software/hardware framework for solving systems of simultaneous Ordinary Differential Equations (ODEs) in fraction of the time of traditional numerical algorithms. In this paper we describe the solving of an IVP with the help of an Analog Computing Unit (ACU). To do this we build a special calculus based on operational amplifiers (op-amps) with local feedback. We discuss the implementation of the ACU on an Integrated Circuit (IC). We analyze the working if the IC and simulate the dynamic Lotka-Volterra system with the de-facto standard tool for electrical simulation: Spice.

【Keywords】:

1628. Multi-Task Learning for Diabetic Retinopathy Grading and Lesion Segmentation.

Paper Link】 【Pages】:13267-13272

【Authors】: Alex Foo ; Wynne Hsu ; Mong-Li Lee ; Gilbert Lim ; Tien Yin Wong

【Abstract】: Although deep learning for Diabetic Retinopathy (DR) screening has shown great success in achieving clinically acceptable accuracy for referable versus non-referable DR, there remains a need to provide more fine-grained grading of the DR severity level as well as automated segmentation of lesions (if any) in the retina images. We observe that the DR severity level of an image is dependent on the presence of different types of lesions and their prevalence. In this work, we adopt a multi-task learning approach to perform the DR grading and lesion segmentation tasks. In light of the lack of lesion segmentation mask ground-truths, we further propose a semi-supervised learning process to obtain the segmentation masks for the various datasets. Experiments results on publicly available datasets and a real world dataset obtained from population screening demonstrate the effectiveness of the multi-task solution over state-of-the-art networks.

【Keywords】:

1629. Online Evaluation of Audiences for Targeted Advertising via Bandit Experiments.

Paper Link】 【Pages】:13273-13279

【Authors】: Tong Geng ; Xiliang Lin ; Harikesh S. Nair

【Abstract】: Firms implementing digital advertising campaigns face a complex problem in determining the right match between their advertising creatives and target audiences. Typical solutions to the problem have leveraged non-experimental methods, or used “split-testing” strategies that have not explicitly addressed the complexities induced by targeted audiences that can potentially overlap with one another. This paper presents an adaptive algorithm that addresses the problem via online experimentation. The algorithm is set up as a contextual bandit and addresses the overlap issue by partitioning the target audiences into disjoint, non-overlapping sub-populations. It learns an optimal creative display policy in the disjoint space, while assessing in parallel which creative has the best match in the space of possibly overlapping target audiences. Experiments show that the proposed method is more efficient compared to naive “split-testing” or non-adaptive “A/B/n” testing based methods. We also describe a testing product we built that uses the algorithm. The product is currently deployed on the advertising platform of JD.com, an eCommerce company and a publisher of digital ads in China.

【Keywords】:

1630. Improving ECG Classification Using Generative Adversarial Networks.

Paper Link】 【Pages】:13280-13285

【Authors】: Tomer Golany ; Gal Lavee ; Shai Tejman Yarden ; Kira Radinsky

【Abstract】: The Electrocardiogram (ECG) is performed routinely by medical personell to identify structural, functional and electrical cardiac events. Many attempts were made to automate this task using machine learning algorithms. Numerous supervised learning algorithms were proposed, requiring manual feature extraction. Lately, deep neural networks were also proposed for this task for reaching state-of-the-art results. The ECG signal conveys the specific electrical cardiac activity of each subject thus extreme variations are observed between patients. These variations and the low amount of training data available for each arrhythmia are challenging for deep learning algorithms, and impede generalization. In this work, the use of generative adversarial networks is studied for the synthesis of ECG signals, which can then be used as additional training data to improve the classifier performance. Empirical results prove that the generated signals significantly improve ECG classification.

【Keywords】:

1631. Implicit Skills Extraction Using Document Embedding and Its Use in Job Recommendation.

Paper Link】 【Pages】:13286-13293

【Authors】: Akshay Gugnani ; Hemant Misra

【Abstract】: This paper presents a job recommender system to match resumes to job descriptions (JD), both of which are non-standard and unstructured/semi-structured in form. First, the paper proposes a combination of natural language processing (NLP) techniques for the task of skill extraction. The performance of the combined techniques on an industrial scale dataset yielded a precision and recall of 0.78 and 0.88 respectively. The paper then introduces the concept of extracting implicit skills – the skills which are not explicitly mentioned in a JD but may be implicit in the context of geography, industry or role. To mine and infer implicit skills for a JD, we find the other JDs similar to this JD. This similarity match is done in the semantic space. A Doc2Vec model is trained on 1.1 Million JDs covering several domains crawled from the web, and all the JDs are projected onto this semantic space. The skills absent in the JD but present in similar JDs are obtained, and the obtained skills are weighted using several techniques to obtain the set of final implicit skills. Finally, several similarity measures are explored to match the skills extracted from a candidate's resume to explicit and implicit skills of JDs. Empirical results for matching resumes and JDs demonstrate that the proposed approach gives a mean reciprocal rank of 0.88, an improvement of 29.4% when compared to the performance of a baseline method that uses only explicit skills.

【Keywords】:

1632. Improving Lives of Indebted Farmers Using Deep Learning: Predicting Agricultural Produce Prices Using Convolutional Neural Networks.

Paper Link】 【Pages】:13294-13299

【Authors】: Hangzhi Guo ; Alexander Woodruff ; Amulya Yadav

【Abstract】: Farmer suicides have become an urgent social problem which governments around the world are trying hard to solve. Most farmers are driven to suicide due to an inability to sell their produce at desired profit levels, which is caused by the widespread uncertainty/fluctuation in produce prices resulting from varying market conditions. To prevent farmer suicides, this paper takes the first step towards resolving the issue of produce price uncertainty by presenting PECAD, a deep learning algorithm for accurate prediction of future produce prices based on past pricing and volume patterns. While previous work presents machine learning algorithms for prediction of produce prices, they suffer from two limitations: (i) they do not explicitly consider the spatio-temporal dependence of future prices on past data; and as a result, (ii) they rely on classical ML prediction models which often perform poorly when applied to spatio-temporal datasets. PECAD addresses these limitations via three major contributions: (i) we gather real-world daily price and (produced) volume data of different crops over a period of 11 years from an official Indian government administered website; (ii) we pre-process this raw dataset via state-of-the-art imputation techniques to account for missing data entries; and (iii) PECAD proposes a novel wide and deep neural network architecture which consists of two separate convolutional neural network models (trained for pricing and volume data respectively). Our simulation results show that PECAD outperforms existing state-of-the-art baseline methods by achieving significantly lesser root mean squared error (RMSE) - PECAD achieves ∼25% lesser coefficient of variance than state-of-the-art baselines. Our work is done in collaboration with a non-profit agency that works on preventing farmer suicides in the Indian state of Jharkhand, and PECAD is currently being reviewed by them for potential deployment.

【Keywords】:

1633. A Machine Learning Approach to Identify Houses with High Lead Tap Water Concentrations.

Paper Link】 【Pages】:13300-13305

【Authors】: Seyedsaeed Hajiseyedjavadi ; Michael Blackhurst ; Hassan A. Karimi

【Abstract】: Over a century separates initial lead service lateral installations from the federal regulation of lead in drinking water. As such, municipalities often do not have adequate information describing installations of lead plumbing. Municipalities thus face challenges such as reducing exposure to lead in drinking water, spreading scarce resources for gathering information, adopting short-term protection measures (e.g., providing filters), and developing longer-term prevention strategies (e.g., replacing lead laterals). Given the spatial and temporal patterns to properties, machine learning is seen as a useful tool to reduce uncertainty in decision making by authorities when addressing lead in water. The Pittsburgh Water and Sewer Authority (PWSA) is currently addressing these challenges in Pittsburgh and this paper describes the development and application of a model predicting high tap water concentrations (> 15 ppb) for PWSA customers. The model was developed using spatial cross validation to support PWSA’s interest in applying predictions in areas without training data. The model’s AUROC is 71.6% and primarily relies on publicly available property tax assessment data and indicators of lateral material collected by PWSA as they meet regulatory requirements.

【Keywords】:

1634. Calorie Estimation in a Real-World Recipe Service.

Paper Link】 【Pages】:13306-13313

【Authors】: Jun Harashima ; Makoto Hiramatsu ; Satoshi Sanjo

【Abstract】: Cooking recipes play an important role in promoting a healthy lifestyle, and a vast number of user-generated recipes are currently available on the Internet. Allied to this growth in the amount of information is an increase in the number of studies on the use of such data for recipe analysis, recipe generation, and recipe search. However, there have been few attempts to estimate the number of calories per serving in a recipe. This study considers this task and introduces two challenging subtasks: ingredient normalization and serving estimation. The ingredient normalization task aims to convert the ingredients written in a recipe (e.g.,), which says “sesame oil (for finishing)” in Japanese) into their canonical forms (e.g., , sesame oil) so that their calorific content can be looked up in an ingredient dictionary. The serving estimation task aims to convert the amount written in the recipe (e.g., N, N pieces) into the number of servings (e.g., M, M people), thus enabling the calories per serving to be calculated. We apply machine learning-based methods to these tasks and describe their practical deployment in Cookpad, the largest recipe service in the world. A series of experiments demonstrate that the performance of our methods is sufficient for use in real-world services.

【Keywords】:

1635. A System for Medical Information Extraction and Verification from Unstructured Text.

Paper Link】 【Pages】:13314-13319

【Authors】: Damir Juric ; Giorgos Stoilos ; André Melo ; Jonathan Moore ; Mohammad Khodadadi

【Abstract】: A wealth of medical knowledge has been encoded in terminologies like SNOMED CT, NCI, FMA, and more. However, these resources are usually lacking information like relations between diseases, symptoms, and risk factors preventing their use in diagnostic or other decision making applications. In this paper we present a pipeline for extracting such information from unstructured text and enriching medical knowledge bases. Our approach uses Semantic Role Labelling and is unsupervised. We show how we dealt with several deficiencies of SRL-based extraction, like copula verbs, relations expressed through nouns, and assigning scores to extracted triples. The system have so far extracted about 120K relations and in-house doctors verified about 5k relationships. We compared the output of the system with a manually constructed network of diseases, symptoms and risk factors build by doctors in the course of a year. Our results show that our pipeline extracts good quality and precise relations and speeds up the knowledge acquisition process considerably.

【Keywords】:

1636. Can Eruptions Be Predicted? Short-Term Prediction of Volcanic Eruptions via Attention-Based Long Short-Term Memory.

Paper Link】 【Pages】:13320-13325

【Authors】: Hiep V. Le ; Tsuyoshi Murata ; Masato Iguchi

【Abstract】: Short-term prediction of volcanic eruptions is one of the ultimate objectives of volcanology. At Sakurajima volcano, an active volcano in Japan, experts monitor the volcanic sensor data and analyze the prior signal to predict the eruptions. Even though experts derived some patterns, it is hard to make a good prediction due to handcrafted features. To address this issue, we propose to predict eruptions using machine learning. In this paper, we attempt to predict the eruptions hourly by adapting several machine learning methods including traditional and deep learning approaches. As recurrent neural network is well-known for extracting the time-sensitive features, we propose the model especially for volcanic eruption prediction named VepNet. The assumption is based on domain knowledge that some specific triggers are the main causes of future eruptions. To take this advantage, VepNet deploys an attention layer to locate and prioritize these triggers in decision making. The extensive experiments ever conducted using data from Sakurajima volcano showed the effectiveness of deep learning approach over the traditional approach. On top of that, VepNet showed its effectiveness on prediction with AUC-score up to 0.8665. Moreover, an attempt has been made to explain the mechanism of the eruptions by analyzing the attention layer of VepNet. Lastly, to support volcano expert in issuing warnings and the safety of living people around Sakurajima, a warning system named 3LWS is proposed. The system predicted the eruptions hourly with high accuracy and reliability with the eruption rate up to 68.97% in the High-Risk level.

【Keywords】:

1637. Machine-Learning-Based Functional Microcirculation Analysis.

Paper Link】 【Pages】:13326-13331

【Authors】: Ossama Mahmoud ; G. H. Janssen ; Mahmoud R. El-Sakka

【Abstract】: Analysis of microcirculation is an important clinical and research task. Functional analysis of the microcirculation allows researchers to understand how blood flowing in a tissues’ smallest vessels affects disease progression, organ function, and overall health. Current methods of manual analysis of microcirculation are tedious and time-consuming, limiting the quick turnover of results. There has been limited research on automating functional analysis of microcirculation. As such, in this paper, we propose a two-step machine-learning-based algorithm to functionally assess microcirculation videos. The first step uses a modified vessel segmentation algorithm to extract the location of vessel-like structures. While the second step uses a 3D-CNN to assess whether the vessel-like structures contained flowing blood. To our knowledge, this is the first application of machine learning for functional analysis of microcirculation. We use real-world labelled microcirculation videos to train and test our algorithm and assess its performance. More precisely, we demonstrate that our two-step algorithm can efficiently analyze real data with high accuracy (90%).

【Keywords】:

1638. Iterative Data Programming for Expanding Text Classification Corpora.

Paper Link】 【Pages】:13332-13337

【Authors】: Neil Mallinar ; Abhishek Shah ; Tin Kam Ho ; Rajendra Ugrani ; Ayush Gupta

【Abstract】: Real-world text classification tasks often require many labeled training examples that are expensive to obtain. Recent advancements in machine teaching, specifically the data programming paradigm, facilitate the creation of training data sets quickly via a general framework for building weak models, also known as labeling functions, and denoising them through ensemble learning techniques. We present a fast, simple data programming method for augmenting text data sets by generating neighborhood-based weak models with minimal supervision. Furthermore, our method employs an iterative procedure to identify sparsely distributed examples from large volumes of unlabeled data. The iterative data programming techniques improve newer weak models as more labeled data is confirmed with human-in-loop. We show empirical results on sentence classification tasks, including those from a task of improving intent recognition in conversational agents.

【Keywords】:

1639. Chemical and Textual Embeddings for Drug Repurposing.

Paper Link】 【Pages】:13338-13343

【Authors】: Galia Nordon ; Levi Gottlieb ; Kira Radinsky

【Abstract】: Drug approval is a long and expensive process, that can take 10-15 years and more than 2 billion dollars. Therefore alternative techniques, such as drug repositioning, to identify new uses for approved drugs, has been gaining increasing attention. We examine the employment of different drug embeddings to predict successful drug repositioning. We study the employment of drug molecular structure and show that using larger chemical construct, such as large functional chemical groups, is much more effective than small sub-structures. We then study embeddings that are based on textual medical publications and compare them with the chemical-structure-based embeddings. We eventually present a novel embedding technique to combine the merit of the textual and chemical-based approaches. We provide empirical results on a repositioning benchmark set. Additionally, we present an application of such embedding as part of an ongoing repositioning research conducted with a major health care supplier, and identify a novel drug and indication. The pair has been verified on a corpus of 1.5 million patient EHR data.

【Keywords】:

1640. Automated Utterance Generation.

Paper Link】 【Pages】:13344-13349

【Authors】: Soham Parikh ; Quaizar Vohra ; Mitul Tiwari

【Abstract】: Conversational AI assistants are becoming popular and question-answering is an important part of any conversational assistant. Using relevant utterances as features in question-answering has shown to improve both the precision and recall for retrieving the right answer by a conversational assistant. Hence, utterance generation has become an important problem with the goal of generating relevant utterances (sentences or phrases) from a knowledge base article that consists of a title and a description. However, generating good utterances usually requires a lot of manual effort, creating the need for an automated utterance generation. In this paper, we propose an utterance generation system which 1) uses extractive summarization to extract important sentences from the description, 2) uses multiple paraphrasing techniques to generate a diverse set of paraphrases of the title and summary sentences, and 3) selects good candidate paraphrases with the help of a novel candidate selection algorithm.

【Keywords】:

1641. EMSContExt: EMS Protocol-Driven Concept Extraction for Cognitive Assistance in Emergency Response.

Paper Link】 【Pages】:13350-13355

【Authors】: Sarah Masud Preum ; Sile Shu ; Homa Alemzadeh ; John A. Stankovic

【Abstract】: This paper presents a technique for automated curation of a domain-specific knowledge base or lexicon for resource-constrained domains, such as Emergency Medical Services (EMS) and its application to real-time concept extraction and cognitive assistance in emergency response. The EMS responders often verbalize critical information describing the situations at an incident scene, including patients' physical condition and medical history. Automated extraction of EMS protocol-specific concepts from responders' speech data can facilitate cognitive support through the selection and execution of the proper EMS protocols for patient treatment. Although this task is similar to the traditional NLP task of concept extraction, the underlying application domain poses major challenges, including low training resources availability (e.g., no existing EMS ontology, lexicon, or annotated EMS corpus) and domain mismatch. Hence, we develop EMSContExt, a weakly-supervised concept extraction approach for EMS concepts. It utilizes different knowledge bases and a semantic concept model based on a corpus of over 9400 EMS narratives for lexicon expansion. The expanded EMS lexicon is then used to automatically extract critical EMS protocol-specific concepts from real-time EMS speech narratives. Our experimental results show that EMSContExt achieves 0.85 recall and 0.82 F1-score for EMS concept extraction and significantly outperforms MetaMap, a state-of-the-art medical concept extraction tool. We also demonstrate the application of EMSContExt to EMS protocol selection and execution and real-time recommendation of protocol-specific interventions to the EMS responders. Here, EMSContExt outperforms MetaMap with a 6% increase and six times speedup in weighted recall and execution time, respectively.

【Keywords】:

1642. GRACE: Generating Summary Reports Automatically for Cognitive Assistance in Emergency Response.

Paper Link】 【Pages】:13356-13362

【Authors】: M. Arif Rahman ; Sarah Masud Preum ; Ronald D. Williams ; Homa Alemzadeh ; John A. Stankovic

【Abstract】: EMS (emergency medical service) plays an important role in saving lives in emergency and accident situations. When first responders, including EMS providers and firefighters, arrive at an incident, they communicate with the patients (if conscious), family members and other witnesses, other first responders, and the command center. The first responders utilize a microphone and headset to support these communications. After the incident, the first responders are required to document the incident by filling out a form. Today, this is performed manually. Manual documentation of patient summary report is time-consuming, tedious, and error-prone. We have addressed these form filling problems by transcribing the audio from the scene, identifying the relevant information from all the conversations, and automatically filling out the form. Informal survey of first responders indicate that this application would be exceedingly helpful to them. Results show that we can fill out a model summary report form with an F1 score as high as 94%, 78%, 96%, and 83% when the data is noise-free audio, noisy audio, noise-free textual narratives, and noisy textual narratives, respectively.

【Keywords】:

1643. Draining the Water Hole: Mitigating Social Engineering Attacks with CyberTWEAK.

Paper Link】 【Pages】:13363-13368

【Authors】: Zheyuan Ryan Shi ; Aaron Schlenker ; Brian Hay ; Daniel Bittleston ; Siyu Gao ; Emily Peterson ; John Trezza ; Fei Fang

【Abstract】: Cyber adversaries have increasingly leveraged social engineering attacks to breach large organizations and threaten the well-being of today's online users. One clever technique, the “watering hole” attack, compromises a legitimate website to execute drive-by download attacks by redirecting users to another malicious domain. We introduce a game-theoretic model that captures the salient aspects for an organization protecting itself from a watering hole attack by altering the environment information in web traffic so as to deceive the attackers. Our main contributions are (1) a novel Social Engineering Deception (SED) game model that features a continuous action set for the attacker, (2) an in-depth analysis of the SED model to identify computationally feasible real-world cases, and (3) the CyberTWEAK algorithm which solves for the optimal protection policy. To illustrate the potential use of our framework, we built a browser extension based on our algorithms which is now publicly available online. The CyberTWEAK extension will be vital to the continued development and deployment of countermeasures for social engineering.

【Keywords】:

1644. Improving Efficiency of Volunteer-Based Food Rescue Operations.

Paper Link】 【Pages】:13369-13375

【Authors】: Zheyuan Ryan Shi ; Yiwen Yuan ; Kimberly Lo ; Leah Lizarondo ; Fei Fang

【Abstract】: Food waste and food insecurity are two challenges that coexist in many communities. To mitigate the problem, food rescue platforms match excess food with the communities in need, and leverage external volunteers to transport the food. However, the external volunteers bring significant uncertainty to the food rescue operation. We work with a large food rescue organization to predict the uncertainty and furthermore to find ways to reduce the human dispatcher's workload and the redundant notifications sent to volunteers. We make two main contributions. (1) We train a stacking model which predicts whether a rescue will be claimed with high precision and AUC. This model can help the dispatcher better plan for backup options and alleviate their uncertainty. (2) We develop a data-driven optimization algorithm to compute the optimal intervention and notification scheme. The algorithm uses a novel counterfactual data generation approach and the branch and bound framework. Our result reduces the number of notifications and interventions required in the food rescue operation. We are working with the organization to deploy our results in the near future.

【Keywords】:

1645. A Natural Language Processing System for Extracting Evidence of Drug Repurposing from Scientific Publications.

Paper Link】 【Pages】:13369-13381

【Authors】: Shivashankar Subramanian ; Ioana Baldini ; Sushma Ravichandran ; Dmitriy A. Katz-Rogozhnikov ; Karthikeyan Natesan Ramamurthy ; Prasanna Sattigeri ; Kush R. Varshney ; Annmarie Wang ; Pradeep Mangalath ; Laura B. Kleiman

【Abstract】: More than 200 generic drugs approved by the U.S. Food and Drug Administration for non-cancer indications have shown promise for treating cancer. Due to their long history of safe patient use, low cost, and widespread availability, repurposing of these drugs represents a major opportunity to rapidly improve outcomes for cancer patients and reduce healthcare costs. In many cases, there is already evidence of efficacy for cancer, but trying to manually extract such evidence from the scientific literature is intractable. In this emerging applications paper, we introduce a system to automate non-cancer generic drug evidence extraction from PubMed abstracts. Our primary contribution is to define the natural language processing pipeline required to obtain such evidence, comprising the following modules: querying, filtering, cancer type entity extraction, therapeutic association classification, and study type classification. Using the subject matter expertise on our team, we create our own datasets for these specialized domain-specific tasks. We obtain promising performance in each of the modules by utilizing modern language processing techniques and plan to treat them as baseline approaches for future improvement of individual components.

【Keywords】:

1646. Kanji Workbook: A Writing-Based Intelligent Tutoring System for Learning Proper Japanese Kanji Writing Technique with Instructor-Emulated Assessment.

Paper Link】 【Pages】:13382-13389

【Authors】: Paul Taele ; Jung In Koh ; Tracy Hammond

【Abstract】: Kanji script writing is a skill that is often introduced to novice Japanese foreign language students for achieving Japanese writing mastery, but often poses difficulties to students with primarily English fluency due to their its vast differences with written English. Instructors often introduce various pedagogical methods—such as visual structure and written techniques—to assist students in kanji study, but may lack availability providing direct feedback on students' writing outside of class. Current educational applications are also limited due to lacking richer instructor-emulated feedback. We introduce Kanji Workbook, a writing-based intelligent tutoring system for students to receive intelligent assessment that emulates human instructor feedback. Our interface not only leverages students' computing devices for allowing them to learn, practice, and review the writing of prompted characters from their course's kanji script lessons, but also provides a diverse set of writing assessment metrics—derived from instructor interviews and classroom observation insights—through intelligent scoring and visual animations. We deployed our interface onto novice- and intermediate-level university courses over an entire academic year, and observed that interface users on average achieved higher course grades than their peers and also reacted positively to our interface's various features.

【Keywords】:

1647. Discovery News: A Generic Framework for Financial News Recommendation.

Paper Link】 【Pages】:13390-13395

【Authors】: Chong Wang ; Lisa Kim ; Grace Bang ; Himani Singh ; Russell Kociuba ; Steven Pomerville ; Xiaomo Liu

【Abstract】: In the financial services industry, it is crucial for analysts to constantly monitor and stay informed on the latest developments of their portfolio of companies. This ensures that analysts are up-to-date in their analysis and provide highly credible and timely insights. Currently, analysts receive news alerts through manually created news alert subscriptions that are often noisy and difficult to manage. The manual review process is time-consuming and error-prone. We demonstrate Discovery News, a framework for an automated news recommender system for financial analysis at S&P's Global Ratings. This system includes the automated ingestion, relevancy, clustering, and ranking of news. The proposed framework is adaptable to any form of input news data and can seamlessly integrate with other data used for analysis like financial data.

【Keywords】:

1648. Using Small Business Banking Data for Explainable Credit Risk Scoring.

Paper Link】 【Pages】:13396-13401

【Authors】: Wei Wang ; Christopher Lesner ; Alexander Ran ; Marko Rukonic ; Jason Xue ; Eric Shiu

【Abstract】: Machine learning applied to financial transaction records can predict how likely a small business is to repay a loan. For this purpose we compared a traditional scorecard credit risk model against various machine learning models and found that XGBoost with monotonic constraints outperformed scorecard model by 7% in K-S statistic. To deploy such a machine learning model in production for loan application risk scoring it must comply with lending industry regulations that require lenders to provide understandable and specific reasons for credit decisions. Thus we also developed a loan decision explanation technique based on the ideas of WoE and SHAP. Our research was carried out using a historical dataset of tens of thousands of loans and millions of associated financial transactions. The credit risk scoring model based on XGBoost with monotonic constraints and SHAP explanations described in this paper have been deployed by QuickBooks Capital to assess incoming loan applications since July 2019.

【Keywords】:

IAAI Technical: Challenge Papers 1

1649. AI Trust in Business Processes: The Need for Process-Aware Explanations.

Paper Link】 【Pages】:13403-13404

【Authors】: Steve T. K. Jan ; Vatche Ishakian ; Vinod Muthusamy

【Abstract】: Business processes underpin a large number of enterprise operations including processing loan applications, managing invoices, and insurance claims. The business process management (BPM) industry is expected to grow at approximately 16 Billion dollar by 2023. There is a large opportunity for infusing AI to reduce cost or provide better customer experience with a $15.7 trillion “potential contribution to the global economy by 2030”. To this end, the BPM literature is rich in machine learning solutions including unsupervised learning to gain insights on clusters of process traces, classification models to predict the outcomes, duration, or paths of partial process traces, extracting business process from documents, and models to recommend how to optimize a business process or navigate decision points. More recently, deep learning models including those from the NLP domain have been applied to process predictions.Unfortunately, very little of these innovations have been applied and adopted by enterprise companies. We assert that a large reason for the lack of adoption of AI models in BPM is that business users are risk-averse and do not implicitly trust AI models. There has, unfortunately, been little attention paid to explaining model predictions to business users with process context. We challenge the BPM community to build on the AI interpretability literature, and the AI Trust community to understand what it means to take advantage of business process artifacts in order to provide business level explanations.

【Keywords】:

EAAI Symposium: Full Papers 12

1650. Semi-Supervised Learning to Perceive Children's Affective States in a Tablet Tutor.

Paper Link】 【Pages】:13350-13357

【Authors】: Mansi Agarwal ; Jack Mostow

【Abstract】: Like good human tutors, intelligent tutoring systems should detect and respond to students' affective states. However, accuracy in detecting affective states automatically has been limited by the time and expense of manually labeling training data for supervised learning. To combat this limitation, we use semi-supervised learning to train an affective state detector on a sparsely labeled, culturally novel, authentic data set in the form of screen capture videos from a Swahili literacy and numeracy tablet tutor in Tanzania that shows the face of the child using it. We achieved 88% leave-1-child-out cross-validated accuracy in distinguishing pleasant, unpleasant, and neutral affective states, compared to only 61% for the best supervised learning method we tested. This work contributes toward using automated affect detection both off-line to improve the design of intelligent tutors, and at runtime to respond to student affect based on input from a user-facing tablet camera or webcam.

【Keywords】:

1651. Geospatial Clustering for Balanced and Proximal Schools.

Paper Link】 【Pages】:13358-13365

【Authors】: Subhodip Biswas ; Fanglan Chen ; Andreea Sistrunk ; Sathappan Muthiah ; Zhiqian Chen ; Nathan Self ; Chang-Tien Lu ; Naren Ramakrishnan

【Abstract】: Public school boundaries are redrawn from time to time to ensure effective functioning of school systems. This process, also called school redistricting, is non-trivial due to (1) the presence of multiple design criteria such as capacity utilization, proximity and travel time which are hard for planners to consider simultaneously, (2) the fixed locations of schools with widely differing capacities that need to be balanced, (3) the spatial nature of the data and the need to preserve contiguity in school zones, and (4) the difficulty in quantifying local factors that may arise. Motivated by these challenges and the intricacy of the process, we propose a geospatial clustering algorithm called GeoKmeans for assisting planners in designing school boundaries such that students are assigned to proximal schools while ensuring effective utilization of school capacities. The algorithm operates on polygonal geometries and connects them into geographically contiguous school boundaries while balancing problem-specific constraints. We evaluate our approach on real-world data of two rapidly growing school districts in the US. Results indicate the efficacy of our approach in designing boundaries. Additionally, a case study is included to demonstrate the potential of GeoKmeans to assist planners in drawing boundaries.

【Keywords】:

1652. Teaching Constraint Programming Using Fable-Based Learning.

Paper Link】 【Pages】:13366-13373

【Authors】: Mavis Chan ; Cecilia Chun ; Holly Fung ; Jimmy H. M. Lee ; Peter J. Stuckey

【Abstract】: The paper presents the pedagogical innovations and experience of the co-development of three MOOCs on the subject of “Modeling and Solving Discrete Optimization Problems” by two universities. In a nutshell, the MOOCs feature the Fable-Based Learning approach, which is a form of problem-based learning encapsulated in a coherent story plot. Each lecture video begins with an animation that tells a story following a novel. The protagonists of the story encounter a problem requiring technical assistance from the two professors from modern time via a magical tablet granted to them by a fairy god. The new pedagogy aims at increasing learners' motivation and interests as well as situating the learners in a coherent learning context. In addition to scriptwriting, animation production and situating the teaching materials in the story plot, another challenge of the project is the remote distance between the two institutions as well as the need to produce all teaching materials in both (Mandarin) Chinese and English to cater for different geographic learning needs. The MOOCs have been running recurrently on Coursera since 2017. We present learner statistics and feedback, and discuss our experience with and preliminary observations of adopting the online materials in a Flipped Classroom setting.

【Keywords】:

1653. Teaching Undergraduate Artificial Intelligence Classes: An Experiment with an Attendance Requirement.

Paper Link】 【Pages】:13374-13380

【Authors】: Sven Koenig ; Tansel Uras ; Liron Cohen

【Abstract】: We report on an experiment that we performed when we taught the undergraduate artificial intelligence class at the University of Southern California. We taught it – under very similar conditions – once with and once without an attendance requirement. The attendance requirement substantially increased the attendance of the students. It did not substantially affect their performance but decreased their course ratings across all categories in the official course evaluation, whose results happened to be biased toward the opinions of the students attending the lectures. For example, the overall rating of the instructor was 0.89 lower (on a 1-5 scale) with the attendance requirement and the overall rating of the class was 0.85 lower. Thus, the attendance requirement, combined with the policy for administering the course evaluation, had a large impact on the course ratings, which is a problem if the course ratings influence decisions on promotions, tenure, and salary increments for the instructors but also demonstrates the potential for the manipulation of course ratings.

【Keywords】:

1654. Zhorai: Designing a Conversational Agent for Children to Explore Machine Learning Concepts.

Paper Link】 【Pages】:13381-13388

【Authors】: Phoebe Lin ; Jessica Van Brummelen ; Galit Lukin ; Randi Williams ; Cynthia Breazeal

【Abstract】: Understanding how machines learn is critical for children to develop useful mental models for exploring artificial intelligence (AI) and smart devices that they now frequently interact with. Although children are very familiar with having conversations with conversational agents like Siri and Alexa, children often have limited knowledge about AI and machine learning. We leverage their existing familiarity and present Zhorai, a conversational platform and curriculum designed to help young children understand how machines learn. Children ages eight to eleven train an agent through conversation and understand how the knowledge is represented using visualizations. This paper describes how we designed the curriculum and evaluated its effectiveness with 14 children in small groups. We found that the conversational aspect of the platform increased engagement during learning and the novel visualizations helped make machine knowledge understandable. As a result, we make recommendations for future iterations of Zhorai and approaches for teaching AI to children.

【Keywords】:

1655. Multiple Data Augmentation Strategies for Improving Performance on Automatic Short Answer Scoring.

Paper Link】 【Pages】:13389-13396

【Authors】: Jiaqi Lun ; Jia Zhu ; Yong Tang ; Min Yang

【Abstract】: Automatic short answer scoring (ASAS) is a research subject of intelligent education, which is a hot field of natural language understanding. Many experiments have confirmed that the ASAS system is not good enough, because its performance is limited by the training data. Focusing on the problem, we propose MDA-ASAS, multiple data augmentation strategies for improving performance on automatic short answer scoring. MDA-ASAS is designed to learn language representation enhanced by data augmentation strategies, which includes back-translation, correct answer as reference answer, and swap content. We argue that external knowledge has a profound impact on the ASAS process. Meanwhile, the Bidirectional Encoder Representations from Transformers (BERT) model has been shown to be effective for improving many natural language processing tasks, which acquires more semantic, grammatical and other features in large amounts of unsupervised data, and actually adds external knowledge. Combining with the latest BERT model, our experimental results on the ASAS dataset show that MDA-ASAS brings a significant gain over state-of-art. We also perform extensive ablation studies and suggest parameters for practical use.

【Keywords】:

1656. Lessons Learned from Teaching Machine Learning and Natural Language Processing to High School Students.

Paper Link】 【Pages】:13397-13403

【Authors】: Narges Norouzi ; Snigdha Chaturvedi ; Matthew Rutledge

【Abstract】: This paper describes an experience in teaching Machine Learning (ML) and Natural Language Processing (NLP) to a group of high school students over an intense one-month period. In this work, we provide an outline of an AI course curriculum we designed for high school students and then evaluate its effectiveness by analyzing student's feedback and student outcomes. After closely observing students, evaluating their responses to our surveys, and analyzing their contribution to the course project, we identified some possible impediments in teaching AI to high school students and propose some measures to avoid them. These measures include employing a combination of objectivist and constructivist pedagogies, reviewing/introducing basic programming concepts at the beginning of the course, and addressing gender discrepancies throughout the course.

【Keywords】:

1657. Teaching Game AI as an Undergraduate Course in Computational Media.

Paper Link】 【Pages】:13404-13411

【Authors】: Adam M. Smith ; Daniel G. Shapiro

【Abstract】: We need to teach AI to students in and outside of traditional computer science degree programs, including those designer-engineer hybrid students who will design and implement games or engage in technical games research later. The need to rethink AI curriculum is pressing in a design education context because AI powers many emerging practical techniques such as drama management, procedural content generation, player modeling, and machine playtesting. In this paper, we describe a 5-year experimental effort to teach a Game AI course structured around a broad and expanding set of roles AI can play in game design (e.g., Adversary and Actor, as well as Design Assistant and Storyteller). This course sets up computer science and computer game design students to transform practices in the game industry as well as create new forms of media that were previously unreachable. Our students gained mastery over the relevant techniques and further demonstrated (via novel prototype systems) many new roles for AI along the way.

【Keywords】:

1658. Making High-Performance Robots Safe and Easy to Use For an Introduction to Computing.

Paper Link】 【Pages】:13412-13419

【Authors】: Joseph Spitzer ; Joydeep Biswas ; Arjun Guha

【Abstract】: Robots are a popular platform for introducing computing and artificial intelligence to novice programmers. However, programming state-of-the-art robots is very challenging, and requires knowledge of concurrency, operation safety, and software engineering skills, which can take years to teach. In this paper, we present an approach to introducing computing that allows students to safely and easily program high-performance robots. We develop a platform for students to program RoboCup Small Size League robots using JavaScript. The platform 1) ensures physical safety at several levels of abstraction, 2) allows students to program robots using JavaScript in the browser, without the need to install software, and 3) presents a simplified JavaScript semantics that shields students from confusing language features. We discuss our experience running a week-long workshop using this platform, and analyze over 3,000 student-written program revisions to provide empirical evidence that our approach does help students.

【Keywords】:

1659. Using AI Techniques in a Serious Game for Socio-Moral Reasoning Development.

Paper Link】 【Pages】:13420-13427

【Authors】: Ange Tato ; Roger Nkambou ; Aude Dufresne

【Abstract】: We present a serious game designed to help players/learners develop socio-moral reasoning (SMR) maturity. It is based on an existing computerized task that was converted into a game to improve the motivation of learners. The learner model is computed using a hybrid deep learning architecture, and adaptation rules are provided by both human experts and machine learning techniques. We conducted some experiments with two versions of the game (the initial version and the adaptive version with AI-Based learner modeling). The results show that the adaptive version provides significant better results in terms of learning gain.

【Keywords】:

1660. An Experimental Ethics Approach to Robot Ethics Education.

Paper Link】 【Pages】:13428-13435

【Authors】: Thomas Emrys Williams ; Qin Zhu ; Daniel H. Grollman

【Abstract】: We propose an experimental ethics-based curricular module for an undergraduate course on Robot Ethics. The proposed module aims to teach students how human subjects research methods can be used to investigate potential ethical concerns arising in human-robot interaction, by engaging those students in real experimental ethics research. In this paper we describe the proposed curricular module, describe our implementation of that module within a Robot Ethics course offered at a medium-sized engineering university, and statistically evaluate the effectiveness of the proposed curricular module in achieving desired learning objectives. While our results do not provide clear evidence of a quantifiable benefit to undergraduate achievement of the described learning objectives, we note that the module did provide additional learning opportunities for graduate students in the course, as they helped to supervise, analyze, and write up the results of this undergraduate-performed research experiment.

【Keywords】:

1661. AISpace2: An Interactive Visualization Tool for Learning and Teaching Artificial Intelligence.

Paper Link】 【Pages】:13436-13443

【Authors】: Chenliang Zhou ; Dominic Kuang ; Jingru Liu ; Hanbo Yang ; Zijia Zhang ; Alan K. Mackworth ; David L. Poole

【Abstract】: AIspace is a set of tools used to learn and teach fundamental AI algorithms. The original version of AIspace was written in Java. There was not a clean separation of the algorithms and visualization; it was too complicated for students to modify the underlying algorithms. Its next generation, AIspace2, is built on AIPython, open source Python code that is designed to be as close as possible to pseudocode. AISpace2, visualized in JupyterLab, keeps the simple Python code, and uses hooks in AIPython to allow visualization of the algorithms. This allows students to see and modify the high-level algorithms in Python, and to visualize the output in a graphical form, aiming to better help them to build confidence and comfort in AI concepts and algorithms. So far we have tools for search, constraint satisfaction problems (CSP), planning and Bayesian network. In this paper we outline the tools and give some evaluations based on user feedback.

【Keywords】:

EAAI Symposium: Poster Papers 3

1662. Using Cloud Tools for Literate Programming to Redesign an AI Course for Non-Traditional College Students.

Paper Link】 【Pages】:13502-13503

【Authors】: Maria Hwang ; Calvin Williamson

【Abstract】: As more open source educational software applications become available, higher educational institutions have the opportunity to utilize these cost efficient tools to deliver the instruction traditionally taught off line with heavy associated costs. Here we introduce a machine learning course that uses a simple, cloud computing approach to creating course materials. We see this type of serverless, cloud-based, literate programming to be the future of computer science education in non-traditional higher educational institutions in particular serving students who will need the basic literacy for computing and computation but will not pursue the traditional computer scientist path.

【Keywords】:

1663. Minecraft as a Platform for Project-Based Learning in AI.

Paper Link】 【Pages】:13504-13505

【Authors】: Sameer Singh

【Abstract】: Undergraduate courses that focus on open-ended, project-based learning teach students how to define concrete goals, transfer conceptual understanding of algorithms to code, and evaluate/analyze/present their solution. However, AI, along with machine learning, is getting increasingly varied in terms of both the approaches and applications, making it challenging to design project courses that span a sufficiently wide spectrum of AI. For these reasons, existing AI project courses are restricted to a narrow set of approaches (e.g. only reinforcement learning) or applications (e.g. only computer vision).In this paper, we propose to use Minecraft as the platform for teaching AI via project-based learning. Minecraft is an open-world sandbox game with elements of exploration, resource gathering, crafting, construction, and combat, and is supported by the Malmo library that provides a programmatic interface to the player observations and actions at various levels of granularity. In Minecraft, students can design projects to use approaches like search-based AI, reinforcement learning, supervised learning, and constraint satisfaction, on data types like text, audio, images, and tabular data. We describe our experience with an open-ended, undergraduate AI projects course using Minecraft that includes 82 different projects, covering themes that ranged from navigation, instruction following, object detection, combat, and music/image generation.

【Keywords】:

1664. Coding in the Liberal Arts through Natural Language Processing and Machine Learning.

Paper Link】 【Pages】:13506-13507

【Authors】: Ursula Wolz ; Jennifer Wilson

【Abstract】: An initiative recently established at our institution is creating new opportunities for students to deepen their understanding of code and computational thinking, and to embrace questions of access, equity and social justice. In this short paper we report on two contextualized computing courses in this initiative that introduce coding and computational thinking through contextualizing two subfields of AI: Natural Language Processing and Machine Learning. The goal was two-fold: to help students gain foundational computational skills to further their own creative and critical practices; and more broadly, to help them develop better-informed critiques of the use of algorithmic systems, especially AI technology.

【Keywords】:

EAAI Symposium: Model AI Assignments 1

1665. Model AI Assignments 2020.

Paper Link】 【Pages】:13509-13511

【Authors】: Todd W. Neller ; Stephen Keeley ; Michael Guerzhoy ; Wolfgang Hönig ; Jiaoyang Li ; Sven Koenig ; Ameet Soni ; Krista Thomason ; Lisa Zhang ; Bibin Sebastian ; Cinjon Resnick ; Avital Oliver ; Surya Bhupatiraju ; Kumar Krishna Agrawal ; James Allingham ; Sejong Yoon ; Jonathan Chen ; Tom Larsen ; Marion Neumann ; Narges Norouzi ; Ryan Hausen ; Matthew Evett

【Abstract】: The Model AI Assignments session seeks to gather and disseminate the best assignment designs of the Artificial Intelligence (AI) Education community. Recognizing that assignments form the core of student learning experience, we here present abstracts of nine AI assignments from the 2020 session that are easily adoptable, playfully engaging, and flexible for a variety of instructor needs. Assignment specifications and supporting resources may be found at http://modelai.gettysburg.edu.

【Keywords】:

Senior Member Presentation Track: Blue Sky Papers 8

1666. Back to the Future for Dialogue Research.

Paper Link】 【Pages】:13514-13519

【Authors】: Philip R. Cohen

【Abstract】: This “blue sky” paper argues that future conversational systems that can engage in multiparty, collaborative dialogues will require a more fundamental approach than existing technology. This paper identifies significant limitations of the state of the art, and argues that our returning to the plan-based approach to dialogue will provide a stronger foundation. Finally, I suggest a research strategy that couples neural network-based semantic parsing with plan-based reasoning in order to build a collaborative dialogue manager.

【Keywords】:

1667. Collective Information.

Paper Link】 【Pages】:13520-13524

【Authors】: Ulle Endriss

【Abstract】: Many challenging problems of scientific, technological, and societal significance require us to aggregate information supplied by multiple agents into a single piece of information of the same type—the collective information representing the stance of the group as a whole. Examples include expressive forms of voting and democratic decision making (where citizens supply information regarding their preferences), peer evaluation (where participants supply information in the form of assessments of their peers), and crowdsourcing (where volunteers supply information by annotating data). In this position paper, I outline the challenge of modelling, handling, and analysing all of these diverse instances of collective information using a common methodology. Addressing this challenge will facilitate a transfer of knowledge between different application domains, thereby enabling progress in all of them.

【Keywords】:

1668. Assessing Ethical Thinking about AI.

Paper Link】 【Pages】:13525-13528

【Authors】: Judy Goldsmith ; Emanuelle Burton ; David M. Dueber ; Beth Goldstein ; Shannon Sampson ; Michael D. Toland

【Abstract】: As is evidenced by the associated AI, Ethics and Society conference, we now take as given the need for ethics education in the AI and general CS curricula. The anticipated surge in AI ethics education will force the field to reckon with delineating and then evaluating learner outcomes to determine what is working and improve what is not. We argue for a more descriptive than normative focus of this ethics education, and propose the development of assessments that can measure descriptive ethical thinking about AI. Such an assessment tool for measuring ethical reasoning capacity in CS contexts must be designed to produce reliable scores for which there is established validity evidence concerning their interpretation and use.

【Keywords】:

1669. AI for Software Quality Assurance Blue Sky Ideas Talk.

Paper Link】 【Pages】:13529-13533

【Authors】: Meir Kalech ; Roni Stern

【Abstract】: Modern software systems are highly complex and often have multiple dependencies on external parts such as other processes or services. This poses new challenges and exacerbate existing challenges in different aspects of software Quality Assurance (QA) including testing, debugging and repair. The goal of this talk is to present a novel AI paradigm for software QA (AI4QA). A quality assessment AI agent uses machine-learning techniques to predict where coding errors are likely to occur. Then a test generation AI agent considers the error predictions to direct automated test generation. Then a test execution AI agent executes tests, that are passed to the root-cause analysis AI agent, which applies automatic debugging algorithms. The candidate root causes are passed to a code repair AI agent that tries to create a patch for correcting the isolated error.

【Keywords】:

1670. AI for Explaining Decisions in Multi-Agent Environments.

Paper Link】 【Pages】:13534-13538

【Authors】: Sarit Kraus ; Amos Azaria ; Jelena Fiosina ; Maike Greve ; Noam Hazon ; Lutz M. Kolbe ; Tim-Benjamin Lembcke ; Jörg P. Müller ; Sören Schleibaum ; Mark Vollrath

【Abstract】: Explanation is necessary for humans to understand and accept decisions made by an AI system when the system's goal is known. It is even more important when the AI system makes decisions in multi-agent environments where the human does not know the systems' goals since they may depend on other agents' preferences. In such situations, explanations should aim to increase user satisfaction, taking into account the system's decision, the user's and the other agents' preferences, the environment settings and properties such as fairness, envy and privacy. Generating explanations that will increase user satisfaction is very challenging; to this end, we propose a new research direction: Explainable decisions in Multi-Agent Environments (xMASE). We then review the state of the art and discuss research directions towards efficient methodologies and algorithms for generating explanations that will increase users' satisfaction from AI systems' decisions in multi-agent environments.

【Keywords】:

1671. Open-World Learning for Radically Autonomous Agents.

Paper Link】 【Pages】:13539-13543

【Authors】: Pat Langley

【Abstract】: In this paper, I pose a new research challenge – to develop intelligent agents that exhibit radical autonomy by responding to sudden, long-term changes in their environments. I illustrate this idea with examples, identify abilities that support it, and argue that, although each ability has been studied in isolation, they have not been combined into integrated systems. In addition, I propose a framework for characterizing environments in which goal-directed physical agents operate, along with specifying the ways in which those environments can change over time. In closing, I outline some approaches to the empirical study of such open-world learning.

【Keywords】:

1672. Learning on the Job: Online Lifelong and Continual Learning.

Paper Link】 【Pages】:13544-13549

【Authors】: Bing Liu

【Abstract】: One of the hallmarks of the human intelligence is the ability to learn continuously, accumulate the knowledge learned in the past and use the knowledge to help learn more and learn better. It is hard to imagine a truly intelligent system without this capability. This type of learning differs significantly than the classic machine learning (ML) paradigm of isolated single-task learning. Although there is already research on learning a sequence of tasks incrementally under the names of lifelong learning or continual learning, they still follow the traditional two-phase separate training and testing paradigm in learning each task. The tasks are also given by the user. This paper adds on-the-job learning to the mix to emphasize the need to learn during application (thus online) after the model has been deployed, which traditional ML cannot do. It aims to leverage the learned knowledge to discover new tasks, interact with humans and the environment, make inferences, and incrementally learn the new tasks on the fly during applications in a self-supervised and interactive manner. This is analogous to human on-the-job learning after formal training. We use chatbots and self-driving cars as examples to discuss the need, some initial work, and key challenges and opportunities in building this capability.

【Keywords】:

1673. Unveiling Hidden Intentions.

Paper Link】 【Pages】:13550-13555

【Authors】: Gerardo Ocampo Diaz ; Vincent Ng

【Abstract】: Recent years have seen significant advances in machine perception, which have enabled AI systems to become grounded in the world. While AI systems can now "read" and "see", they still cannot read between the lines and see through the lens, unlike humans. We propose the novel task of hidden message and intention identification: given some perceptual input (i.e., a text, an image), the goal is to produce a short description of the message the input transmits and the hidden intention of its author, if any. Not only will a solution to this task enable machine perception technologies to reach the next level of complexity, but it will be an important step towards addressing a task that has recently received a lot of public attention, political manipulation in social media.

【Keywords】:

Senior Member Presentation Track: Summary Talks 6

1674. Online Fair Division: A Survey.

Paper Link】 【Pages】:13557-13562

【Authors】: Martin Aleksandrov ; Toby Walsh

【Abstract】: We survey a burgeoning and promising new research area that considers the online nature of many practical fair division problems. We identify wide variety of such online fair division problems, as well as discuss new mechanisms and normative properties that apply to this online setting. The online nature of such fair division problems provides both opportunities and challenges such as the possibility to develop new online mechanisms as well as the difficulty of dealing with an uncertain future.

【Keywords】:

1675. Developments in Multi-Agent Fair Allocation.

Paper Link】 【Pages】:13563-13568

【Authors】: Haris Aziz

【Abstract】: Fairness is becoming an increasingly important concern when designing markets, allocation procedures, and computer systems. I survey some recent developments in the field of multi-agent fair allocation.

【Keywords】:

1676. Let's Learn Their Language? A Case for Planning with Automata-Network Languages from Model Checking.

Paper Link】 【Pages】:13569-13575

【Authors】: Jörg Hoffmann ; Holger Hermanns ; Michaela Klauck ; Marcel Steinmetz ; Erez Karpas ; Daniele Magazzeni

【Abstract】: It is widely known that AI planning and model checking are closely related. Compilations have been devised between various pairs of language fragments. What has barely been voiced yet, though, is the idea to let go of one's own modeling language, and use one from the other area instead. We advocate that idea here – to use automata-network languages from model checking instead of PDDL – motivated by modeling difficulties relating to planning agents surrounded by exogenous agents in complex environments. One could, of course, address this by designing additional extended planning languages. But one can also leverage decades of work on modeling in the formal methods community, creating potential for deep synergy and integration with their techniques as a side effect. We believe there's a case to be made for the latter, as one modeling alternative in planning among others.

【Keywords】:

1677. Software Testing for Machine Learning.

Paper Link】 【Pages】:13576-13582

【Authors】: Dusica Marijan ; Arnaud Gotlieb

【Abstract】: Machine learning has become prevalent across a wide variety of applications. Unfortunately, machine learning has also shown to be susceptible to deception, leading to errors, and even fatal failures. This circumstance calls into question the widespread use of machine learning, especially in safety-critical applications, unless we are able to assure its correctness and trustworthiness properties. Software verification and testing are established technique for assuring such properties, for example by detecting errors. However, software testing challenges for machine learning are vast and profuse - yet critical to address. This summary talk discusses the current state-of-the-art of software testing for machine learning. More specifically, it discusses six key challenge areas for software testing of machine learning systems, examines current approaches to these challenges and highlights their limitations. The paper provides a research agenda with elaborated directions for making progress toward advancing the state-of-the-art on testing of machine learning.

【Keywords】:

1678. On the Robustness of Face Recognition Algorithms Against Attacks and Bias.

Paper Link】 【Pages】:13583-13589

【Authors】: Richa Singh ; Akshay Agarwal ; Maneet Singh ; Shruti Nagpal ; Mayank Vatsa

【Abstract】: Face recognition algorithms have demonstrated very high recognition performance, suggesting suitability for real world applications. Despite the enhanced accuracies, robustness of these algorithms against attacks and bias has been challenged. This paper summarizes different ways in which the robustness of a face recognition algorithm is challenged, which can severely affect its intended working. Different types of attacks such as physical presentation attacks, disguise/makeup, digital adversarial attacks, and morphing/tampering using GANs have been discussed. We also present a discussion on the effect of bias on face recognition models and showcase that factors such as age and gender variations affect the performance of modern algorithms. The paper also presents the potential reasons for these challenges and some of the future research directions for increasing the robustness of face recognition models.

【Keywords】:

1679. Generalized Arc Consistency Algorithms for Table Constraints: A Summary of Algorithmic Ideas.

Paper Link】 【Pages】:13590-13597

【Authors】: Roland H. C. Yap ; Wei Xia ; Ruiwei Wang

【Abstract】: Constraint Programming is a powerful paradigm to model and solve combinatorial problems. While there are many kinds of constraints, the table constraint (also called a CSP) is perhaps the most significant—being the most well-studied and has the ability to encode any other constraints defined on finite variables. Thus, designing efficient filtering algorithms on table constraints has attracted significant research efforts. In turn, there have been great improvements in efficiency over time with the evolution and development of AC and GAC algorithms. In this paper, we survey the existing filtering algorithms for table constraint focusing on historically important ideas and recent successful techniques shown to be effective.

【Keywords】:

Demonstration Track 25

1680. TraceHub - A Platform to Bridge the Gap between State-of-the-Art Time-Series Analytics and Datasets.

Paper Link】 【Pages】:13600-13601

【Authors】: Shubham Agarwal ; Christian Muise ; Mayank Agarwal ; Sohini Upadhyay ; Zilu Tang ; Zhongshen Zeng ; Yasaman Khazaeni

【Abstract】: In this paper, we present TraceHub - a platform that connects new non-trivial state-of-the-art time-series analytics with datasets from different domains. Analytics owners can run their insights on new datasets in an automated setting to find insight's potential and improve it. Dataset owners can find all possible types of non-trivial insights based on latest research. We provide a plug-n-play system as a set of Dataset, Transformer pipeline, and Analytics APIs for both kinds of users. We show a usefulness measure of generated insights across various types of analytics in the system. We believe that this platform can be used to bridge the gap between time-series analytics and datasets by significantly reducing the time to find the true potential of budding time-series research and improving on it faster.

【Keywords】:

1681. MAPF Scenario: Software for Evaluating MAPF Plans on Real Robots.

Paper Link】 【Pages】:13602-13603

【Authors】: Roman Barták ; Jirí Svancara ; Ivan Krasicenko

【Abstract】: Multi-Agent Path Finding (MAPF) deals with finding collision free paths for a set of agents (robots) moving on a graph. The interest in MAPF in the research community started to increase recently partly due to practical applications in areas such as warehousing and computer games. However, the academic community focuses mostly on solving the abstract version of the problem (moving of agents on the graph) with only a few results on real robots. The presented software MAPF Scenario provides a tool for specifying MAPF problems on grid maps, solving the problems using various abstractions (for example, assuming rotation actions or not), simulating execution of plans, and translating the abstract plans to control programs for small robots Ozobots. The tool is intended as a research platform for evaluating abstract MAPF plans on real robots and as an educational and demonstration tool bridging the areas of artificial intelligence and robotics.

【Keywords】:

1682. Doc2Dial: A Framework for Dialogue Composition Grounded in Documents.

Paper Link】 【Pages】:13604-13605

【Authors】: Song Feng ; Kshitij P. Fadnis ; Q. Vera Liao ; Luis A. Lastras

【Abstract】: We introduce Doc2Dial, an end-to-end framework for generating conversational data grounded in given documents. It takes the documents as input and generates the pipelined tasks for obtaining the annotations specifically for producing the simulated dialog flows. Then, the dialog flows are used to guide the collection of the utterances via the integrated crowdsourcing tool. The outcomes include the human-human dialogue data grounded in the given documents, as well as various types of automatically or human labeled annotations that help ensure the quality of the dialog data with the flexibility to (re)composite dialogues. We expect such data can facilitate building automated dialogue agents for goal-oriented tasks. We demonstrate Doc2Dial system with the various domain documents for customer care.

【Keywords】:

1683. MatchU: An Interactive Matching Platform.

Paper Link】 【Pages】:13606-13607

【Authors】: James Ferris ; Hadi Hosseini

【Abstract】: MatchU is a web-based platform that offers an interactive framework to find how to form mutually-beneficial relationships, decide how to distribute resources, or resolve conflicts through a suite of matching algorithms rooted in economics and artificial intelligence. In this paper, we discuss MatchU's vision, solutions, and future directions.

【Keywords】:

1684. Embedding High-Level Knowledge into DQNs to Learn Faster and More Safely.

Paper Link】 【Pages】:13608-13609

【Authors】: Zihang Gao ; Fangzhen Lin ; Yi Zhou ; Hao Zhang ; Kaishun Wu ; Haodi Zhang

【Abstract】: Deep reinforcement learning has been successfully applied in many decision making scenarios. However, the slow training process and difficulty in explaining limit its application. In this paper, we attempt to address some of these problems by proposing a framework of Rule-interposing Learning (RIL) that embeds knowledge into deep reinforcement learning. In this framework, the rules dynamically effect the training progress, and accelerate the learning. The embedded knowledge in form of rule not only improves learning efficiency, but also prevents unnecessary or disastrous explorations at early stage of training. Moreover, the modularity of the framework makes it straightforward to transfer high-level knowledge among similar tasks.

【Keywords】:

1685. Causal Knowledge Extraction through Large-Scale Text Mining.

Paper Link】 【Pages】:13610-13611

【Authors】: Oktie Hassanzadeh ; Debarun Bhattacharjya ; Mark Feblowitz ; Kavitha Srinivas ; Michael Perrone ; Shirin Sohrabi ; Michael Katz

【Abstract】: In this demonstration, we present a system for mining causal knowledge from large corpuses of text documents, such as millions of news articles. Our system provides a collection of APIs for causal analysis and retrieval. These APIs enable searching for the effects of a given cause and the causes of a given effect, as well as the analysis of existence of causal relation given a pair of phrases. The analysis includes a score that indicates the likelihood of the existence of a causal relation. It also provides evidence from an input corpus supporting the existence of a causal relation between input phrases. Our system uses generic unsupervised and weakly supervised methods of causal relation extraction that do not impose semantic constraints on causes and effects. We show example use cases developed for a commercial application in enterprise risk management.

【Keywords】:

1686. DAMN: Defeasible Reasoning Tool for Multi-Agent Reasoning.

Paper Link】 【Pages】:13612-13613

【Authors】: Abdelraouf Hecham ; Madalina Croitoru ; Pierre Bisquert

【Abstract】: This demonstration paper introduces DAMN: a defeasible reasoning platform available on the web. It is geared towards decision making where each agent has its own knowledge base that can be combined with other agents to detect and visualize conflicts and potentially solve them using a semantics. It allows the use of different defeasible reasoning semantics (ambiguity blocking/propagating with or without team defeat) and integrates agent collaboration and visualization features.

【Keywords】:

1687. D-Agree: Crowd Discussion Support System Based on Automated Facilitation Agent.

Paper Link】 【Pages】:13614-13615

【Authors】: Takayuki Ito ; Shota Suzuki ; Naoko Yamaguchi ; Tomohiro Nishida ; Kentaro Hiraishi ; Kai Yoshino

【Abstract】: Large-scale online discussion platforms are receiving great attention as potential next-generation methods for smart democratic citizen platforms. One of the studies clarified the critical problem faced by human facilitators caused by the difficulty of facilitating large-scale online discussions. In this demonstration, we present our current implementation of D-agree, a crowd-scale discussion support system based on an automated facilitation agent. We conducted a large-scale social experiment with Nagoya local government. The results demonstrate that the agent worked well compared with human facilitators.

【Keywords】:

1688. 'Watch the Flu': A Tweet Monitoring Tool for Epidemic Intelligence of Influenza in Australia.

Paper Link】 【Pages】:13616-13617

【Authors】: Brian Jin ; Aditya Joshi ; Ross Sparks ; Stephen Wan ; Cécile Paris ; C. Raina MacIntyre

【Abstract】: ‘Watch The Flu’ is a tool that monitors tweets posted in Australia for symptoms of influenza. The tool is a unique combination of two areas of artificial intelligence: natural language processing and time series monitoring, in order to assist public health surveillance. Using a real-time data pipeline, it deploys a web-based dashboard for visual analysis, and sends out emails to a set of users when an outbreak is detected. We expect that the tool will assist public health experts with their decision-making for disease outbreaks, by providing them insights from social media.

【Keywords】:

1689. Diana's World: A Situated Multimodal Interactive Agent.

Paper Link】 【Pages】:13618-13619

【Authors】: Nikhil Krishnaswamy ; Pradyumna Narayana ; Rahul Bangar ; Kyeongmin Rim ; Dhruva Patil ; David G. McNeely-White ; Jaime Ruiz ; Bruce A. Draper ; J. Ross Beveridge ; James Pustejovsky

【Abstract】: State of the art unimodal dialogue agents lack some core aspects of peer-to-peer communication—the nonverbal and visual cues that are a fundamental aspect of human interaction. To facilitate true peer-to-peer communication with a computer, we present Diana, a situated multimodal agent who exists in a mixed-reality environment with a human interlocutor, is situation- and context-aware, and responds to the human's language, gesture, and affect to complete collaborative tasks.

【Keywords】:

1690. GENO - Optimization for Classical Machine Learning Made Fast and Easy.

Paper Link】 【Pages】:13620-13621

【Authors】: Sören Laue ; Matthias Mitterreiter ; Joachim Giesen

【Abstract】: Most problems from classical machine learning can be cast as an optimization problem. We introduce GENO (GENeric Optimization), a framework that lets the user specify a constrained or unconstrained optimization problem in an easy-to-read modeling language. GENO then generates a solver, i.e., Python code, that can solve this class of optimization problems. The generated solver is usually as fast as hand-written, problem-specific, and well-engineered solvers. Often the solvers generated by GENO are faster by a large margin compared to recently developed solvers that are tailored to a specific problem class.An online interface to our framework can be found at http://www.geno-project.org.

【Keywords】:

1691. CAiRE: An End-to-End Empathetic Chatbot.

Paper Link】 【Pages】:13622-13623

【Authors】: Zhaojiang Lin ; Peng Xu ; Genta Indra Winata ; Farhad Bin Siddique ; Zihan Liu ; Jamin Shin ; Pascale Fung

【Abstract】: We present CAiRE, an end-to-end generative empathetic chatbot designed to recognize user emotions and respond in an empathetic manner. Our system adapts the Generative Pre-trained Transformer (GPT) to empathetic response generation task via transfer learning. CAiRE is built primarily to focus on empathy integration in fully data-driven generative dialogue systems. We create a web-based user interface which allows multiple users to asynchronously chat with CAiRE. CAiRE also collects user feedback and continues to improve its response quality by discarding undesirable generations via active learning and negative training.

【Keywords】:

1692. Plan2Dance: Planning Based Choreographing from Music.

Paper Link】 【Pages】:13624-13625

【Authors】: Yuechang Liu ; Dongbo Xie ; Hankz Hankui Zhuo ; Liqian Lai

【Abstract】: The field of dancing robots has drawn much attention from numerous sources. Despite the success of previous systems on choreography for robots to dance with external stimuli, they are often either limited to a pre-defined set of movements or lack of considering “hard” relations among dancing motions. In the demonstration, we design a planning based choreographing system, which views choreography with music as planning problems and solve the problems with off-the-shelf planners. Our demonstration exhibits the effectiveness of our system via evaluating our system with various music.

【Keywords】:

1693. Deep Poetry: A Chinese Classical Poetry Generation System.

Paper Link】 【Pages】:13626-13627

【Authors】: Yusen Liu ; Dayiheng Liu ; Jiancheng Lv

【Abstract】: In this work, we demonstrate a Chinese classical poetry generation system called Deep Poetry. Existing systems for Chinese classical poetry generation are mostly template-based and very few of them can accept multi-modal input. Unlike previous systems, Deep Poetry uses neural networks that are trained on over 200 thousand poems and 3 million ancient Chinese prose. Our system can accept plain text, images or artistic conceptions as inputs to generate Chinese classical poetry. More importantly, users are allowed to participate in the process of writing poetry by our system. For the user's convenience, we deploy the system at the WeChat applet platform, users can use the system on the mobile device whenever and wherever possible.

【Keywords】:

1694. PulseSatellite: A Tool Using Human-AI Feedback Loops for Satellite Image Analysis in Humanitarian Contexts.

Paper Link】 【Pages】:13628-13629

【Authors】: Tomaz Logar ; Joseph Bullock ; Edoardo Nemni ; Lars Bromley ; John A. Quinn ; Miguel A. Luengo-Oroz

【Abstract】: Humanitarian response to natural disasters and conflicts can be assisted by satellite image analysis. In a humanitarian context, very specific satellite image analysis tasks must be done accurately and in a timely manner to provide operational support. We present PulseSatellite, a collaborative satellite image analysis tool which leverages neural network models that can be retrained on-the fly and adapted to specific humanitarian contexts and geographies. We present two case studies, in mapping shelters and floods respectively, that illustrate the capabilities of PulseSatellite.

【Keywords】:

1695. LearnIt: On-Demand Rapid Customization for Event-Event Relation Extraction.

Paper Link】 【Pages】:13630-13631

【Authors】: Bonan Min ; Manaj Srivastava ; Haoling Qiu ; Prasannakumar Muthukumar ; Joshua Fasching

【Abstract】: We present a system which allows a user to create event-event relation extractors on-demand with a small amount of effort. The system provides a suite of algorithms, flexible workflows, and a user interface (UI), to allow rapid customization of event-event relation extractors for new types and domains of interest. Experiments show that it enables users to create extractors for 6 types of causal and temporal relations, with less than 20 minutes of effort per type. Our system (source code, UI) is available at https://github.com/BBN-E/LearnIt. A demonstration video is available at https://vimeo.com/329950144.

【Keywords】:

Paper Link】 【Pages】:13632-13633

【Authors】: Natwar Modani ; Paridhi Maheshwari ; Harsh Deshpande ; Saurab Sirpurkar ; Diviya ; Somak Aditya

【Abstract】: Navigating a collection of documents can be facilitated by obtaining a human-understandable concept hierarchy with links to the content. This is a non-trivial task for two reasons. First, defining concepts that are understandable by an average consumer and yet meaningful for a large variety of corpora is hard. Second, creating semantically meaningful yet intuitive hierarchical representation is hard, and can be task dependent. We present out system Navigation.ai which automatically processes a document collection, induces a concept hierarchy using Wikipedia and presents an interactive interface that helps user navigate to individual paragraphs using concepts.

【Keywords】:

1697. PARTNER: Human-in-the-Loop Entity Name Understanding with Deep Learning.

Paper Link】 【Pages】:13634-13635

【Authors】: Kun Qian ; Poornima Chozhiyath Raman ; Yunyao Li ; Lucian Popa

【Abstract】: Entity name disambiguation is an important task for many text-based AI tasks. Entity names usually have internal semantic structures that are useful for resolving different variations of the same entity. We present, PARTNER, a deep learning-based interactive system for entity name understanding. Powered by effective active learning and weak supervision, PARTNER can learn deep learning-based models for identifying entity name structure with low human effort. PARTNER also allows the user to design complex normalization and variant generation functions without coding skills.

【Keywords】:

1698. Cognitive Compliance: Assessing Regulatory Risk in Financial Advice Documents.

Paper Link】 【Pages】:13636-13637

【Authors】: Wanita Sherchan ; Sue Ann Chen ; Simon Harris ; Nebula Alam ; Khoi-Nguyen Tran ; Christopher J. Butler

【Abstract】: This paper describes Cognitive Compliance - a solution that automates the complex manual process of assessing regulatory compliance of personal financial advice. The solution uses natural language processing (NLP), machine learning and deep learning to characterise the regulatory risk status of personal financial advice documents with traffic light rating for various risk factors. This enables comprehensive coverage of the review and rapid identification of documents at high risk of non-compliance with government regulations.

【Keywords】:

1699. DICR: AI Assisted, Adaptive Platform for Contract Review.

Paper Link】 【Pages】:13638-13639

【Authors】: Dan G. Tecuci ; Ravi Palla ; Hamid R. Motahari Nezhad ; Nishchal Ahuja ; Alex Monteiro ; Tigran Ishkhanov ; Nigel Duffy

【Abstract】: In the regular course of business, companies spend a lot of effort reading and interpreting documents, a highly manual process that involves tedious tasks, such as identifying dates and names or locating the presence or absence of certain clauses in a contract. Dealing with natural language is complex and further complicated by the fact that these documents come in various formats (scanned image, digital formats) and have different degrees of internal structure (spreadsheets, invoices, text documents). We present DICR, an end-to-end, modular, and trainable system that automates the mundane aspects of document review and allows humans to perform the validation. The system is able to speed up this work while increasing quality of information extracted, consistency, throughput, and decreasing time to decision. Extracted data can be fed into other downstream applications (from dashboards to Q&A and to report generation).

【Keywords】:

1700. Data-Driven Ranking and Visualization of Products by Competitiveness.

Paper Link】 【Pages】:13640-13641

【Authors】: Sheema Usmani ; Mariana Bernagozzi ; Yufeng Huang ; Michelle Morales ; Amir Sabet Sarvestani ; Biplav Srivastava

【Abstract】: Competitive analysis is a critical part of any business. Product managers, sellers, and marketers spend time and resources scouring through a huge volume of online and offline content, aiming to discover what their competitors are doing in the marketplace and to understand what type of threat they pose to their business' financial well-being. Currently, this process is slow, costly and labor-intensive. We demonstrate Clarity, a data-driven unsupervised system for assessment of products, which is currently in deployment at IBM. Clarity has been running for more than a year and is used by over 1,500 people to perform over 160 competitive analyses involving over 800 products. The system considers multiple factors from a collection of online content: numeric ratings by users, sentiment towards key product drivers, content volume, and recency of content. The results and explanations of factors leading to the results are visualized in an interactive dashboard that allows users to track the performance of their products as well as understand the main contributing factors. main contributing factors.

【Keywords】:

Paper Link】 【Pages】:13642-13643

【Authors】: Christabel Wayllace ; Sunwoo Ha ; Yuchen Han ; Jiaming Hu ; Shayan Monadjemi ; William Yeoh ; Alvitta Ottley

【Abstract】: We introduce Detection and Recognition of Airplane GOals with Navigational Visualization (DRAGON-V), a visualization system that uses probabilistic goal recognition to infer and display the most probable airport runway that a pilot is approaching. DRAGON-V is especially useful in cases of miscommunication, low visibility, or lack of airport familiarity which may result in a pilot deviating from the assigned taxiing route. The visualization system conveys relevant information, and updates according to the airplane's current geolocation. DRAGON-V aims to assist air traffic controllers in reducing incidents of runway incursions at airports.

【Keywords】:

Paper Link】 【Pages】:13644-13645

【Authors】: Shengzhou Yi ; Hiroshi Yumoto ; Xueting Wang ; Toshihiko Yamasaki

【Abstract】: In order to support the pratice of oral presentation, we developed PresentationTrainer which includes (1) a presentation impression prediction system and (2) a presentation slide analysis system. For the presentation impression prediction system, we proposed two methods, using Support Vector Machine and Markov Random Field, or using multimodal neural network, to predict audiences' impressions for speech videos. For the slide analysis system, we used Convolutional Neural Network and Global Average Pooling to evaluate the design of slides. We then used Class Activation Mapping to provide visual feedback for showing which areas should be modified.

【Keywords】:

1703. Automatic Car Damage Assessment System: Reading and Understanding Videos as Professional Insurance Inspectors.

Paper Link】 【Pages】:13646-13647

【Authors】: Wei Zhang ; Yuan Cheng ; Xin Guo ; Qingpei Guo ; Jian Wang ; Qing Wang ; Chen Jiang ; Meng Wang ; Furong Xu ; Wei Chu

【Abstract】: We demonstrate a car damage assessment system in car insurance field based on artificial intelligence techniques, which can exempt insurance inspectors from checking cars on site and help people without professional knowledge to evaluate car damages when accidents happen. Unlike existing approaches, we utilize videos instead of photos to interact with users to make the whole procedure as simple as possible. We adopt object and video detection and segmentation techniques in computer vision, and take advantage of multiple frames extracted from videos to achieve high damage recognition accuracy. The system uploads video streams captured by mobile devices, recognizes car damage on the cloud asynchronously and then returns damaged components and repair costs to users. The system evaluates car damages and returns results automatically and effectively in seconds, which reduces laboratory costs and decreases insurance claim time significantly.

【Keywords】:

1704. Combining Machine Learning Models Using combo Library.

Paper Link】 【Pages】:13648-13649

【Authors】: Yue Zhao ; Xuejian Wang ; Cheng Cheng ; Xueying Ding

【Abstract】: Model combination, often regarded as a key sub-field of ensemble learning, has been widely used in both academic research and industry applications. To facilitate this process, we propose and implement an easy-to-use Python toolkit, combo, to aggregate models and scores under various scenarios, including classification, clustering, and anomaly detection. In a nutshell, combo provides a unified and consistent way to combine both raw and pretrained models from popular machine learning libraries, e.g., scikit-learn, XGBoost, and LightGBM. With accessibility and robustness in mind, combo is designed with detailed documentation, interactive examples, continuous integration, code coverage, and maintainability check; it can be installed easily through Python Package Index (PyPI) or {https://github.com/yzhao062/combo}.

【Keywords】:

Sister Conference Track 15

1705. Interactive Scene Generation via Scene Graphs with Attributes.

Paper Link】 【Pages】:13651-13654

【Authors】: Oron Ashual ; Lior Wolf

【Abstract】: We introduce a simple yet expressive image generation method. On the one hand, it does not require the user to paint the masks or define a bounding box of the various objects, since the model does it by itself. On the other hand, it supports defining a coarse location and size of each object. Based on this, we offer a simple, interactive GUI, that allows a layman user to generate diverse images effortlessly.From a technical perspective, we introduce a dual embedding of layout and appearance. In this scheme, the location, size, and appearance of an object can change independently of each other. This way, the model is able to generate innumerable images per scene graph, to better express the intention of the user.In comparison to previous work, we also offer better quality and higher resolution outputs. This is due to a superior architecture, which is based on a novel set of discriminators. Those discriminators better constrain the shape of the generated mask, as well as capturing the appearance encoding in a counterfactual way.Our code is publicly available at https://www.github.com/ashual/scene_generation.

【Keywords】:

1706. Learning Higher-Order Programs through Predicate Invention.

Paper Link】 【Pages】:13655-13658

【Authors】: Andrew Cropper ; Rolf Morel ; Stephen H. Muggleton

【Abstract】: A key feature of inductive logic programming (ILP) is its ability to learn first-order programs, which are intrinsically more expressive than propositional programs. In this paper, we introduce ILP techniques to learn higher-order programs. We implement our idea in Metagolho, an ILP system which can learn higher-order programs with higher-order predicate invention. Our experiments show that, compared to first-order programs, learning higher-order programs can significantly improve predictive accuracies and reduce learning times.

【Keywords】:

1707. Restraining Bolts for Reinforcement Learning Agents.

Paper Link】 【Pages】:13659-13662

【Authors】: Giuseppe De Giacomo ; Luca Iocchi ; Marco Favorito ; Fabio Patrizi

【Abstract】: In this work we have investigated the concept of “restraining bolt”, inspired by Science Fiction. We have two distinct sets of features extracted from the world, one by the agent and one by the authority imposing some restraining specifications on the behaviour of the agent (the “restraining bolt”). The two sets of features and, hence the model of the world attainable from them, are apparently unrelated since of interest to independent parties. However they both account for (aspects of) the same world. We have considered the case in which the agent is a reinforcement learning agent on a set of low-level (subsymbolic) features, while the restraining bolt is specified logically using linear time logic on finite traces f/f over a set of high-level symbolic features. We show formally, and illustrate with examples, that, under general circumstances, the agent can learn while shaping its goals to suitably conform (as much as possible) to the restraining bolt specifications.1

【Keywords】:

1708. Algorithm-in-the-Loop Decision Making.

Paper Link】 【Pages】:13663-13664

【Authors】: Ben Green ; Yiling Chen

【Abstract】: We introduce a new framework for conceiving of and studying algorithms that are deployed to aid human decision making: “algorithm-in-the-loop” systems. The algorithm-in-the-loop framework centers human decision making, providing a more precise lens for studying the social impacts of algorithmic decision making aids. We report on two experiments that evaluate algorithm-in-the-loop decision making and find significant limits to these systems.

【Keywords】:

1709. Explaining Image Classifiers Generating Exemplars and Counter-Exemplars from Latent Representations.

Paper Link】 【Pages】:13665-13668

【Authors】: Riccardo Guidotti ; Anna Monreale ; Stan Matwin ; Dino Pedreschi

【Abstract】: We present an approach to explain the decisions of black box image classifiers through synthetic exemplar and counter-exemplar learnt in the latent feature space. Our explanation method exploits the latent representations learned through an adversarial autoencoder for generating a synthetic neighborhood of the image for which an explanation is required. A decision tree is trained on a set of images represented in the latent space, and its decision rules are used to generate exemplar images showing how the original image can be modified to stay within its class. Counterfactual rules are used to generate counter-exemplars showing how the original image can “morph” into another class. The explanation also comprehends a saliency map highlighting the areas that contribute to its classification, and areas that push it into another class. A wide and deep experimental evaluation proves that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability, besides providing the most useful and interpretable explanations.

【Keywords】:

1710. Reasoning about Political Bias in Content Moderation.

Paper Link】 【Pages】:13669-13672

【Authors】: Shan Jiang ; Ronald E. Robertson ; Christo Wilson

【Abstract】: Content moderation, the AI-human hybrid process of removing (toxic) content from social media to promote community health, has attracted increasing attention from lawmakers due to allegations of political bias. Hitherto, this allegation has been made based on anecdotes rather than logical reasoning and empirical evidence, which motivates us to audit its validity. In this paper, we first introduce two formal criteria to measure bias (i.e., independence and separation) and their contextual meanings in content moderation, and then use YouTube as a lens to investigate if the political leaning of a video plays a role in the moderation decision for its associated comments. Our results show that when justifiable target variables (e.g., hate speech and extremeness) are controlled with propensity scoring, the likelihood of comment moderation is equal across left- and right-leaning videos.

【Keywords】:

1711. Designing Evaluation Rules That Are Robust to Strategic Behavior.

Paper Link】 【Pages】:13673-13676

【Authors】: Jon M. Kleinberg ; Manish Raghavan

【Abstract】: Machine learning is often used to produce decision-making rules that classify or evaluate individuals. When these individuals have incentives to be classified a certain way, they may behave strategically to influence their outcomes. We develop a model for how strategic agents can invest effort to change the outcomes they receive, and we give a tight characterization of when such agents can be incentivized to invest specified forms of effort into improving their outcomes as opposed to “gaming” the classifier. We show that whenever any “reasonable” mechanism can do so, a simple linear mechanism suffices. This work is based on “How Do Classifiers Induce Agents To Invest Effort Strategically?” published in Economics and Computation 2019 (Kleinberg and Raghavan 2019).

【Keywords】:

1712. Identifiability from a Combination of Observations and Experiments.

Paper Link】 【Pages】:13677-13680

【Authors】: Sanghack Lee ; Juan D. Correa ; Elias Bareinboim

【Abstract】: We study the problem of causal identification from an arbitrary collection of observational and experimental distributions, and substantive knowledge about the phenomenon under investigation, which usually comes in the form of a causal graph. We call this problem g-identifiability, or gID for short. In this paper, we introduce a general strategy to prove non-gID based on thickets and hedgelets, which leads to a necessary and sufficient graphical condition for the corresponding decision problem. We further develop a procedure for systematically computing the target effect, and prove that it is sound and complete for gID instances. In other words, the failure of the algorithm in returning an expression implies that the target effect is not computable from the available distributions. Finally, as a corollary of these results, we show that do-calculus is complete for the task of g-identifiability.

【Keywords】:

1713. A Commentary on the Unsupervised Learning of Disentangled Representations.

Paper Link】 【Pages】:13681-13684

【Authors】: Francesco Locatello ; Stefan Bauer ; Mario Lucic ; Gunnar Rätsch ; Sylvain Gelly ; Bernhard Schölkopf ; Olivier Bachem

【Abstract】: The goal of the unsupervised learning of disentangled representations is to separate the independent explanatory factors of variation in the data without access to supervision. In this paper, we summarize the results of (Locatello et al. 2019b) and focus on their implications for practitioners. We discuss the theoretical result showing that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases and the practical challenges it entails. Finally, we comment on our experimental findings, highlighting the limitations of state-of-the-art approaches and directions for future research.

【Keywords】:

1714. Constraint Programming for an Efficient and Flexible Block Modeling Solver.

Paper Link】 【Pages】:13685-13688

【Authors】: Alex Lucía Mattenet ; Ian Davidson ; Siegfried Nijssen ; Pierre Schaus

【Abstract】: Constraint Programming (CP) is a powerful paradigm for solving combinatorial problems. In CP, the user creates a model by declaring variables with their domains and expresses the constraints that need to be satisfied in any solution. The solver is then in charge of finding feasible solutions—a value in the domain of each variable that satisfies all the constraints. The discovery of solutions is done by exploring a search tree that is pruned by the constraints in charge of removing impossible values. The CP framework has the advantage of exposing a rich high-level declarative constraint language for modeling, as well as efficient purpose-specific filtering algorithms that can be reused in many problems. In this work, we harness this flexibility and efficiency for the Block Modeling problem. It is a variant of the graph clustering problem that has been used extensively in many domains including social science, spatio-temporal data analysis and even medical imaging. We present a new approach based on constraint programming, allowing discrete optimization of block modeling in a manner that is not only scalable, but also allows the easy incorporation of constraints. We introduce a new constraint filtering algorithm that outperforms earlier approaches. We show its use in the analysis of real datasets.

【Keywords】:

1715. The St. Petersburg Paradox: A Fresh Algorithmic Perspective.

Paper Link】 【Pages】:13689-13692

【Authors】: Ardavan Salehi Nobandegani ; Thomas R. Shultz

【Abstract】: The St. Petersburg paradox is a centuries-old puzzle concerning a lottery with infinite expected payoff on which people are only willing to pay a small amount to play. Despite many attempts and several proposals, no generally-accepted resolution is yet at hand. In a recent paper, we show that this paradox can be understood in terms of the mind optimally using its limited computational resources (Nobandegani et al. 2019). Specifically, we show that the St. Petersburg paradox can be accounted for by a variant of normative expected-utility valuation which acknowledges cognitive limitations: sample-based expected utility (Nobandegani et al. 2018). SbEU provides a unified, algorithmic explanation of major experimental findings on this paradox. We conclude by discussing the implications of our work for algorithmically understanding human cognition and for developing human-like artificial intelligence.

【Keywords】:

1716. Energy and Policy Considerations for Modern Deep Learning Research.

Paper Link】 【Pages】:13693-13696

【Authors】: Emma Strubell ; Ananya Ganesh ; Andrew McCallum

【Abstract】: The field of artificial intelligence has experienced a dramatic methodological shift towards large neural networks trained on plentiful data. This shift has been fueled by recent advances in hardware and techniques enabling remarkable levels of computation, resulting in impressive advances in AI across many applications. However, the massive computation required to obtain these exciting results is costly both financially, due to the price of specialized hardware and electricity or cloud compute time, and to the environment, as a result of non-renewable energy used to fuel modern tensor processing hardware. In a paper published this year at ACL, we brought this issue to the attention of NLP researchers by quantifying the approximate financial and environmental costs of training and tuning neural network models for NLP (Strubell, Ganesh, and McCallum 2019). In this extended abstract, we briefly summarize our findings in NLP, incorporating updated estimates and broader information from recent related publications, and provide actionable recommendations to reduce costs and improve equity in the machine learning and artificial intelligence community.

【Keywords】:

1717. Abstraction and Refinement in Games with Dynamic Weighted Terrain.

Paper Link】 【Pages】:13697-13699

【Authors】: Nathan R. Sturtevant ; Devon Sigurdson ; Bjorn Taylor ; Tim Gibson

【Abstract】: This abstract looks at one version of the pathfinding problem in games and discusses how it motived our recent work at the AIIDE 2019 conference.

【Keywords】:

1718. Results on a Super Strong Exponential Time Hypothesis.

Paper Link】 【Pages】:13700-13703

【Authors】: Nikhil Vyas ; Ryan Williams

【Abstract】: All known SAT-solving paradigms (backtracking, local search, and the polynomial method) only yield a 2n(1−1/O(k)) time algorithm for solving k-SAT in the worst case, where the big-O constant is independent of k. For this reason, it has been hypothesized that k-SAT cannot be solved in worst-case 2n(1−f(k)/k) time, for any unbounded ƒ : ℕ → ℕ. This hypothesis has been called the “Super-Strong Exponential Time Hypothesis” (Super Strong ETH), modeled after the ETH and the Strong ETH. We prove two results concerning the Super-Strong ETH:1. It has also been hypothesized that k-SAT is hard to solve for randomly chosen instances near the “critical threshold”, where the clause-to-variable ratio is 2k ln 2 −Θ(1). We give a randomized algorithm which refutes the Super-Strong ETH for the case of random k-SAT and planted k-SAT for any clause-to-variable ratio. In particular, given any random k-SAT instance F with n variables and m clauses, our algorithm decides satisfiability for F in 2n(1−Ω( log k)/k) time, with high probability (over the choice of the formula and the randomness of the algorithm). It turns out that a well-known algorithm from the literature on SAT algorithms does the job: the PPZ algorithm of Paturi, Pudlak, and Zane (1998).2. The Unique k-SAT problem is the special case where there is at most one satisfying assignment. It is natural to hypothesize that the worst-case (exponential-time) complexity of Unique k-SAT is substantially less than that of k-SAT. Improving prior reductions, we show the time complexities of Unique k-SAT and k-SAT are very tightly related: if Unique k-SAT is in 2n(1−f(k)/k) time for an unbounded f, then k-SAT is in 2n(1−f(k)(1−ɛ)/k) time for every ɛ > 0. Thus, refuting Super Strong ETH in the unique solution case would refute Super Strong ETH in general.

【Keywords】:

1719. Ranking and Rating Rankings and Ratings.

Paper Link】 【Pages】:13704-13707

【Authors】: Jingyan Wang ; Nihar B. Shah

【Abstract】: Cardinal scores collected from people are well known to suffer from miscalibrations. A popular approach to address this issue is to assume simplistic models of miscalibration (such as linear biases) to de-bias the scores. This approach, however, often fares poorly because people's miscalibrations are typically far more complex and not well understood. It is widely believed that in the absence of simplifying assumptions on the miscalibration, the only useful information in practice from the cardinal scores is the induced ranking. In this paper we address the fundamental question of whether this widespread folklore belief is actually true. We consider cardinal scores with arbitrary (or even adversarially chosen) miscalibrations that is only required to be consistent with the induced ranking. We design rating-based estimators and prove that despite making no assumptions on the ratings, they strictly and uniformly outperform all possible estimators that rely on only the ranking. These estimators can be used as a plug-in to show the superiority of cardinal scores over ordinal rankings for a variety of applications, including A/B testing and ranking. This work thus provides novel fundamental insights in the eternal debate between cardinal and ordinal data: It ranks the approach of using ratings higher than that of using rankings, and rates both approaches in terms of their estimation errors.

【Keywords】:

Doctoral Consortium Track 17

1720. Modelling a Conversational Agent with Complex Emotional Intelligence.

Paper Link】 【Pages】:13710-13711

【Authors】: Billal Belainine ; Fatiha Sadat ; Hakim Lounis

【Abstract】: Chatbots or conversational agents have enjoyed great popularity in recent years. They surprisingly perform sensitive tasks in modern societies. However, despite the fact that they offer help, support, and fellowship, there is a task that is not yet mastered: dealing with complex emotions and simulating human sensations. This research aims to design an architecture for an emotional conversation agent for long-text conversations (multi-turns). This agent is intended to work in areas where the analysis of users feelings plays a leading role. This work refers to natural language understanding and response generation.

【Keywords】:

1721. Towards Adversarially Robust Knowledge Graph Embeddings.

Paper Link】 【Pages】:13712-13713

【Authors】: Peru Bhardwaj

【Abstract】: Knowledge graph embedding models enable representation learning on multi-relational graphs and are used in security sensitive domains. But, their security analysis has received little attention. I will research security of these models by designing adversarial attacks against them, improving their adversarial robustness and evaluating the effect of proposed improvement on their interpretability.

【Keywords】:

1722. Understanding Generalization in Neural Networks for Robustness against Adversarial Vulnerabilities.

Paper Link】 【Pages】:13714-13715

【Authors】: Subhajit Chaudhury

【Abstract】: Neural networks have contributed to tremendous progress in the domains of computer vision, speech processing, and other real-world applications. However, recent studies have shown that these state-of-the-art models can be easily compromised by adding small imperceptible perturbations. My thesis summary frames the problem of adversarial robustness as an equivalent problem of learning suitable features that leads to good generalization in neural networks. This is motivated from learning in humans which is not trivially fooled by such perturbations due to robust feature learning which shows good out-of-sample generalization.

【Keywords】:

1723. Interpreting Multimodal Machine Learning Models Trained for Emotion Recognition to Address Robustness and Privacy Concerns.

Paper Link】 【Pages】:13716-13717

【Authors】: Mimansa Jaiswal

【Abstract】: Many mobile applications and virtual conversational agents now aim to recognize and adapt to emotions. These predicted emotions are used in variety of downstream applications: (a) generating more human like dialogues, (b) predicting mental health issues, and (c) hate speech detection and intervention. To enable this, data are transmitted from users' devices and stored on central servers. These data are then processed further, either annotated or used as inputs for training a model for a specific task. Yet, these data contain sensitive information that could be used by mobile applications without user's consent or, maliciously, by an eavesdropping adversary. My work focuses on two major issues that are faced while training emotion recognition algorithms: (a) privacy of the generated representations and, (b) explaining and ensuring that the predictions are robust to various situations. Tackling these issues would lead to emotion based algorithms that are deployable and helpful at a larger scale, thus enabling more human like experience when interacting with AI.

【Keywords】:

1724. Abstract Rule Based Pattern Learning with Neural Networks.

Paper Link】 【Pages】:13718-13719

【Authors】: Radha Manisha Kopparti

【Abstract】: In this research work, the problem of learning abstract rules using neural networks is studied and a solution called ‘Relation Based Patterns’ (RBP) which model abstract relationships based on equality is proposed.

【Keywords】:

1725. Partial Correlation-Based Attention for Multivariate Time Series Forecasting.

Paper Link】 【Pages】:13720-13721

【Authors】: Won Kyung Lee

【Abstract】: A multivariate time-series forecasting has great potentials in various domains. However, it is challenging to find dependency structure among the time-series variables and appropriate time-lags for each variable, which change dynamically over time. In this study, I suggest partial correlation-based attention mechanism which overcomes the shortcomings of existing pair-wise comparisons-based attention mechanisms. Moreover, I propose data-driven series-wise multi-resolution convolutional layers to represent the input time-series data for domain agnostic learning.

【Keywords】:

1726. Coalitional Strategic Behaviour in Collective Decision Making.

Paper Link】 【Pages】:13722-13723

【Authors】: Grzegorz Lisowski

【Abstract】: In my PhD project I study the algorithmic aspects of strategic behaviour in collective decision making, with the special focus on voting mechanisms. I investigate two manners of manipulation: (1) strategic selection of candidates from groups of potential representatives and (2) influence on voters located in a social network.

【Keywords】:

1727. Explainable Agency in Reinforcement Learning Agents.

Paper Link】 【Pages】:13724-13725

【Authors】: Prashan Madumal

【Abstract】: This thesis explores how reinforcement learning (RL) agents can provide explanations for their actions and behaviours. As humans, we build causal models to encode cause-effect relations of events and use these to explain why events happen. Taking inspiration from cognitive psychology and social science literature, I build causal explanation models and explanation dialogue models for RL agents. By mimicking human-like explanation models, these agents can provide explanations that are natural and intuitive to humans.

【Keywords】:

1728. Optimal Auction Based Automated Negotiation in Realistic Decentralised Market Environments.

Paper Link】 【Pages】:13726-13727

【Authors】: Pankaj Mishra ; Ahmed Moustafa ; Takayuki Ito ; Minjie Zhang

【Abstract】: Automated negotiations based on learning models have been widely applied in different domains of negotiation. Specifically, for resource allocation in decentralised open market environments with multiple vendors and multiple buyers. In such open market environments, there exists dynamically changing supply and demand of resources, with dynamic arrival of buyers in the market. Besides, each buyer has their own set of constraints, such as budget constraints, time constraints, etc. In this context, efficient negotiation policies should be capable of maintaining the equilibrium between the utilities of both the vendors and the buyers. In this research, we aim to design a mechanism for an optimal auction paradigm, considering the existence of interdependent undisclosed preferences of both, buyers and vendors. Therefore, learning-based negotiation models are immensely appropriate for such open market environments; wherein, self-interested autonomous vendors and buyers cooperate/compete to maximize their utilities based on their undisclosed preferences. Toward this end, we present our current proposal, the two-stage learning-based resource allocation mechanism, wherein utilities of vendors and buyers are optimised at each stage. We are aiming to compare our proposed learning-based resource allocation mechanism with two state-of-the-art bidding-based resource allocation mechanism, which are based on, fixed bidding policy (Samimi, Teimouri, and Mukhtar 2016) and demand-based bidding policy (Kong, Zhang, and Ye 2015). The comparison is to be done based on the overall performance of the open market environment and also based on the individual performances of vendors and buyers.

【Keywords】:

1729. Abstract Constraints for Safe and Robust Robot Learning from Demonstration.

Paper Link】 【Pages】:13728-13729

【Authors】: Carl L. Mueller

【Abstract】: My thesis research incorporates high-level abstract behavioral requirements, called ‘conceptual constraints’, into the modeling processes of robot Learning from Demonstration (LfD) techniques. My most recent work introduces an LfD algorithm called Concept Constrained Learning from Demonstration. This algorithm encodes motion planning constraints as temporal Boolean operators that enforce high-level constraints over portions of the robot's motion plan during learned skill execution. This results in more easily trained, more robust, and safer learned skills. Future work will incorporate conceptual constraints into human-aware motion planning algorithms. Additionally, my research will investigate how these concept constrained algorithms and models are best incorporated into effective interfaces for end-users.

【Keywords】:

1730. Quantum Probabilistic Models Using Feynman Diagram Rules for Better Understanding the Information Diffusion Dynamics in Online Social Networks.

Paper Link】 【Pages】:13730-13731

【Authors】: Ece C. Mutlu

【Abstract】: This doctoral consortium presents an overview of my anticipated PhD dissertation which focuses on employing quantum Bayesian networks for social learning. The project, mainly, aims to expand the use of current quantum probabilistic models in human decision-making from two agents to multi-agent systems. First, I cultivate the classical Bayesian networks which are used to understand information diffusion through human interaction on online social networks (OSNs) by taking into account the relevance of multitude of social, psychological, behavioral and cognitive factors influencing the process of information transmission. Since quantum like models require quantum probability amplitudes, the complexity will be exponentially increased with increasing uncertainty in the complex system. Therefore, the research will be followed by a study on optimization of heuristics. Here, I suggest to use an belief entropy based heuristic approach. This research is an interdisciplinary research which is related with the branches of complex systems, quantum physics, network science, information theory, cognitive science and mathematics. Therefore, findings can contribute significantly to the areas related mainly with social learning behavior of people, and also to the aforementioned branches of complex systems. In addition, understanding the interactions in complex systems might be more viable via the findings of this research since probabilistic approaches are not only used for predictive purposes but also for explanatory aims.

【Keywords】:

1731. Hybrid Approaches to Fine-Grained Emotion Detection in Social Media Data.

Paper Link】 【Pages】:13732-13733

【Authors】: Annika Marie Schoene

【Abstract】: This paper states the challenges in fine-grained target-dependent Sentiment Analysis for social media data using recurrent neural networks. First, the problem statement is outlined and an overview of related work in the area is given. Then a summary of progress and results achieved to date and a research plan and future directions of this work are given.

【Keywords】:

1732. A Reinforcement Learning Approach to Strategic Belief Revelation with Social Influence.

Paper Link】 【Pages】:13734-13735

【Authors】: Patrick Shepherd ; Judy Goldsmith

【Abstract】: The study of social networks has increased rapidly in the past few decades. Of recent interest are the dynamics of changing opinions over a network. Some research has investigated how interpersonal influence can affect opinion change, how to maximize/minimize the spread of opinion change over a network, and recently, if/how agents can act strategically to effect some outcome in the network's opinion distribution. This latter problem can be modeled and addressed as a reinforcement learning problem; we introduce an approach to help network agents find strategies that outperform hand-crafted policies. Our preliminary results show that our approach is promising in networks with dynamic topologies.

【Keywords】:

1733. Modeling Dynamic Behaviors within Population.

Paper Link】 【Pages】:13736-13737

【Authors】: Nazgol Tavabi

【Abstract】: The abundance of temporal data generated by mankind in recent years gives us the opportunity to better understand human behaviors along with the similarities and differences in groups of people. Better understanding of human behaviors could be very beneficial in choosing strategies, from group-level to society-level depending on the domain. This type of data could range from physiological data collected from sensors to activity patterns in social media. Identifying frequent behavioral patterns in sensor data could give more insight into the health of a community and provoke strategies towards improving it; By analyzing patterns of behaviors in social media, platform's attributes could be adjusted to the user's needs.This type of modeling introduces numerous challenges that varies depending on the data. The goal of my doctoral research is to introduce ways to better understand and capture human behavior by modeling individual's behaviors as time series and extracting interesting patterns within them.

【Keywords】:

1734. Explainability in Autonomous Pedagogical Agents.

Paper Link】 【Pages】:13738-13739

【Authors】: Silvia Tulli

【Abstract】: The research presented herein addresses the topic of explainability in autonomous pedagogical agents. We will be investigating possible ways to explain the decision-making process of such pedagogical agents (which can be embodied as robots) with a focus on the effect of these explanations in concrete learning scenarios for children. The hypothesis is that the agents' explanations about their decision making will support mutual modeling and a better understanding of the learning tasks and how learners perceive them. The objective is to develop a computational model that will allow agents to express internal states and actions and adapt to the human expectations of cooperative behavior accordingly. In addition, we would like to provide a comprehensive taxonomy of both the desiderata and methods in the explainable AI research applied to children's learning scenarios.

【Keywords】:

1735. Efficient Predictive Uncertainty Estimators for Deep Probabilistic Models.

Paper Link】 【Pages】:13740-13741

【Authors】: Julissa Villanueva Llerena ; Denis Deratani Mauá

【Abstract】: Deep Probabilistic Models (DPM) based on arithmetic circuits representation, such as Sum-Product Networks (SPN) and Probabilistic Sentential Decision Diagrams (PSDD), have shown competitive performance in several machine learning tasks with interesting properties (Poon and Domingos 2011; Kisa et al. 2014). Due to the high number of parameters and scarce data, DPMs can produce unreliable and overconfident inference. This research aims at increasing the robustness of predictive inference with DPMs by obtaining new estimators of the predictive uncertainty. This problem is not new and the literature on deep models contains many solutions. However the probabilistic nature of DPMs offer new possibilities to achieve accurate estimates at low computational costs, but also new challenges, as the range of different types of predictions is much larger than with traditional deep models. To cope with such issues, we plan on investigating two different approaches. The first approach is to perform a global sensitivity analysis on the parameters, measuring the variability of the output to perturbations of the model weights. The second approach is to capture the variability of the prediction with respect to changes in the model architecture. Our approaches shall be evaluated on challenging tasks such as image completion, multilabel classification.

【Keywords】:

1736. Developing a Machine Learning Tool for Dynamic Cancer Treatment Strategies.

Paper Link】 【Pages】:13742-13743

【Authors】: Jiaming Zeng

【Abstract】: With the rising number and complexity of cancer therapies, it is increasingly difficult for clinicians to identity an optimal combination of treatments for a patient. Our research aims to provide a decision support tool to optimize and supplant cancer treatment decisions. Leveraging machine learning, causal inference, and decision analysis, we will utilize electronic medical records to develop dynamic cancer treatment strategies that advice clinicians and patients based on patient characteristics, medical history, and etc. The research hopes to bridge the understanding between causal inference and decision analysis and ultimately develops an artificial intelligence tool that improves clinical outcomes over current practices.

【Keywords】:

Student Abstract Track 129

1737. Sample Complexity Bounds for RNNs with Application to Combinatorial Graph Problems (Student Abstract).

Paper Link】 【Pages】:13745-13746

【Authors】: Nil-Jana Akpinar ; Bernhard Kratzwald ; Stefan Feuerriegel

【Abstract】: Learning to predict solutions to real-valued combinatorial graph problems promises efficient approximations. As demonstrated based on the NP-hard edge clique cover number, recurrent neural networks (RNNs) are particularly suited for this task and can even outperform state-of-the-art heuristics. However, the theoretical framework for estimating real-valued RNNs is understood only poorly. As our primary contribution, this is the first work that upper bounds the sample complexity for learning real-valued RNNs. While such derivations have been made earlier for feed-forward and convolutional neural networks, our work presents the first such attempt for recurrent neural networks. Given a single-layer RNN with a rectified linear units and input of length b, we show that a population prediction error of ε can be realized with at most Õ(a4b/ε2) samples.1 We further derive comparable results for multi-layer RNNs. Accordingly, a size-adaptive RNN fed with graphs of at most n vertices can be learned in Õ(n6/ε2), i.,e., with only a polynomial number of samples. For combinatorial graph problems, this provides a theoretical foundation that renders RNNs competitive.

【Keywords】:

1738. LatRec: Recognizing Goals in Latent Space (Student Abstract).

Paper Link】 【Pages】:13747-13748

【Authors】: Leonardo Amado ; Felipe Meneguzzi

【Abstract】: Recent approaches to goal recognition have progressively relaxed the requirements about the amount of domain knowledge and available observations, yielding accurate and efficient algorithms. These approaches, however, assume that there is a domain expert capable of building complete and correct domain knowledge to successfully recognize an agent's goal. This is too strong for most real-world applications. We overcome these limitations by combining goal recognition techniques from automated planning, and deep autoencoders to carry out unsupervised learning to generate domain theories from data streams and use the resulting domain theories to deal with incomplete and noisy observations. Moving forward, we aim to develop a new data-driven goal recognition technique that infers the domain model using the same set of observations used in recognition itself.

【Keywords】:

1739. An Iterative Approach for Identifying Complaint Based Tweets in Social Media Platforms (Student Abstract).

Paper Link】 【Pages】:13749-13750

【Authors】: Gyanesh Anand ; Akash Kumar Gautam ; Puneet Mathur ; Debanjan Mahata ; Rajiv Ratn Shah ; Ramit Sawhney

【Abstract】: Twitter is a social media platform where users express opinions over a variety of issues. Posts offering grievances or complaints can be utilized by private/ public organizations to improve their service and promptly gauge a low-cost assessment. In this paper, we propose an iterative methodology which aims to identify complaint based posts pertaining to the transport domain. We perform comprehensive evaluations along with releasing a novel dataset for the research purposes1.

【Keywords】:

1740. Entity Type Enhanced Neural Model for Distantly Supervised Relation Extraction (Student Abstract).

Paper Link】 【Pages】:13751-13752

【Authors】: Long Bai ; Xiaolong Jin ; Chuanzhi Zhuang ; Xueqi Cheng

【Abstract】: Distantly Supervised Relation Extraction (DSRE) has been widely studied, since it can automatically extract relations from very large corpora. However, existing DSRE methods only use little semantic information about entities, such as the information of entity type. Thus, in this paper, we propose a method for integrating entity type information into a neural network based DSRE model. It also adopts two attention mechanisms, namely, sentence attention and type attention. The former selects the representative sentences for a sentence bag, while the latter selects appropriate type information for entities. Experimental comparison with existing methods on a benchmark dataset demonstrates its merits.

【Keywords】:

1741. Analysis of Parliamentary Debate Transcripts Using Community-Based Graphical Approaches (Student Abstract).

Paper Link】 【Pages】:13753-13754

【Authors】: Anjali Bhavan ; Mohit Sharma ; Ramit Sawhney ; Rajiv Ratn Shah

【Abstract】: Gauging political sentiments and analyzing stances of elected representatives pose an important challenge today, and one with wide-ranging ramifications. Community-based analysis of parliamentary debate sentiments could pave a way for better insights into the political happenings of a nation and help in keeping the voters informed. Such analysis could be given another dimension by studying the underlying connections and networks in such data. We present a sentiment classification method for UK Parliament debate transcripts, which is a combination of a graphical method based on DeepWalk embeddings and text-based analytical methods. We also present proof for our hypothesis that parliamentarians with similar voting patterns tend to deliver similar speeches. We also provide some further avenues and future work towards the end.

【Keywords】:

1742. Complex Emotional Intelligence Learning Using Deep Neural Networks (Student Abstract).

Paper Link】 【Pages】:13755-13756

【Authors】: Belainine Billal ; Fatiha Sadat ; Hakim Lounis

【Abstract】: Emotion recognition and mining tasks are often limited by the availability of manually annotated data. Several researchers have used emojis and specific hashtags as forms of training and supervision.This research paper proposes a new textual and social corpus, the corpus labeled using basic emotions following Plutchik's theory. Thus, This paper propose a first study for the representation and interpretation of complex emotional interactions, using deep neural networks.

【Keywords】:

1743. Improving Semantic Parsing Using Statistical Word Sense Disambiguation (Student Abstract).

Paper Link】 【Pages】:13757-13758

【Authors】: Ritwik Bose ; Siddharth Vashishtha ; James F. Allen

【Abstract】: A Semantic Parser generates a logical form graph from an utterance where the edges are semantic roles and nodes are word senses in an ontology that supports reasoning. The generated representation attempts to capture the full meaning of the utterance. While the process of parsing works to resolve lexical ambiguity, a number of errors in the logical forms arise from incorrectly assigned word sense determinations. This is especially true in logical and rule-based semantic parsers. Although the performance of statistical word sense disambiguation methods is superior to the word sense output of semantic parser, these systems do not produce the rich role structure or a detailed semantic representation of the sentence content. In this work, we use decisions from a statistical WSD system to inform a logical semantic parser and greatly improve semantic type assignments in the resulting logical forms.

【Keywords】:

1744. Towards an Integrative Educational Recommender for Lifelong Learners (Student Abstract).

Paper Link】 【Pages】:13759-13760

【Authors】: Sahan Bulathwela ; María Pérez-Ortiz ; Emine Yilmaz ; John Shawe-Taylor

【Abstract】: One of the most ambitious use cases of computer-assisted learning is to build a recommendation system for lifelong learning. Most recommender algorithms exploit similarities between content and users, overseeing the necessity to leverage sensible learning trajectories for the learner. Lifelong learning thus presents unique challenges, requiring scalable and transparent models that can account for learner knowledge and content novelty simultaneously, while also retaining accurate learners representations for long periods of time. We attempt to build a novel educational recommender, that relies on an integrative approach combining multiple drivers of learners engagement. Our first step towards this goal is TrueLearn, which models content novelty and background knowledge of learners and achieves promising performance while retaining a human interpretable learner model.

【Keywords】:

Paper Link】 【Pages】:13761-13762

【Authors】: Junghun Byun ; Yong-Ho Cho ; Tae Ho Im ; Hak-Lim Ko ; Kyung-Seop Shin ; Ohyun Jo

【Abstract】: This paper describes an iterative learning framework consisting of multi-layer prediction processes for underwater link adaptation. To obtain a dataset in real underwater environments, we implemented OFDM (Orthogonal Frequency Division Multiplexing)-based acoustic communications testbeds for the first time. Actual underwater data measured in Yellow Sea, South Korea, were used for training the iterative learning model. Remarkably, the iterative learning model achieves up to 25% performance improvement over the conventional benchmark model.

【Keywords】:

1746. SATNet: Symmetric Adversarial Transfer Network Based on Two-Level Alignment Strategy towards Cross-Domain Sentiment Classification (Student Abstract).

Paper Link】 【Pages】:13763-13764

【Authors】: Yu Cao ; Hua Xu

【Abstract】: In recent years, domain adaptation tasks have attracted much attention, especially, the task of cross-domain sentiment classification (CDSC). In this paper, we propose a novel domain adaptation method called Symmetric Adversarial Transfer Network (SATNet). Experiments on the Amazon reviews dataset demonstrate the effectiveness of SATNet.

【Keywords】:

1747. CORAL-DMOEA: Correlation Alignment-Based Information Transfer for Dynamic Multi-Objective Optimization (Student Abstract).

Paper Link】 【Pages】:13765-13766

【Authors】: Li Chen ; Hua Xu

【Abstract】: One essential characteristic of dynamic multi-objective optimization problems is that Pareto-Optimal Front/Set (POF/POS) varies over time. Tracking the time-dependent POF/POS is a challenging problem. Since continuous environments are usually highly correlated, past information is critical for the next optimization process. In this paper, we integrate CORAL methodology into a dynamic multi-objective evolutionary algorithm, named CORAL-DMOEA. This approach employs CORAL to construct a transfer model which transfer past well-performed solutions to form an initial population for the next optimization process. Experimental results demonstrate that CORAL-DMOEA can effectively improve the quality of solutions and accelerate the evolution process.

【Keywords】:

1748. Optimizing the Feature Selection Process for Better Accuracy in Datasets with a Large Number of Features (Student Abstract).

Paper Link】 【Pages】:13767-13768

【Authors】: Xi Chen ; Afsaneh Doryab

【Abstract】: Most feature selection methods only perform well on datasets with relatively small set of features. In the case of large feature sets and small number of data points, almost none of the existing feature selection methods help in achieving high accuracy. This paper proposes a novel approach to optimize the feature selection process through Frequent Pattern Growth algorithm to find sets of features that appear frequently among the top features selected by the main feature selection methods. Our experimental evaluation on two datasets containing a small and very large number of features shows that our approach significantly improves the accuracy results of the dataset with a very large number of features.

【Keywords】:

Paper Link】 【Pages】:13769-13770

【Authors】: Xiuying Chen ; Daorui Xiao ; Shen Gao ; Guojun Liu ; Wei Lin ; Bo Zheng ; Dongyan Zhao ; Rui Yan

【Abstract】: Sponsored search optimizes revenue and relevance, which is estimated by Revenue Per Mille (RPM). Existing sponsored search models are all based on traditional statistical models, which have poor RPM performance when queries follow a heavy-tailed distribution. Here, we propose an RPMoriented Query Rewriting Framework (RQRF) which outputs related bid keywords that can yield high RPM. RQRF embeds both queries and bid keywords to vectors in the same implicit space, converting the rewriting probability between each query and keyword to the distance between the two vectors. For label construction, we propose an RPM-oriented sample construction method, labeling keywords based on whether or not they can lead to high RPM. Extensive experiments are conducted to evaluate performance of RQRF. In a one month large-scale real-world traffic of e-commerce sponsored search system, the proposed model significantly outperforms traditional baseline.

【Keywords】:

1750. Learning to Model Opponent Learning (Student Abstract).

Paper Link】 【Pages】:13771-13772

【Authors】: Ian Davies ; Zheng Tian ; Jun Wang

【Abstract】: Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment. The adaptation and learning of other agents induces non-stationarity in the environment dynamics. This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment. Policy search algorithms also struggle in multi-agent settings as the partial observability resulting from an opponent's actions not being known introduces high variance to policy training. Modelling an agent's opponent(s) is often pursued as a means of resolving the issues arising from the coexistence of learning opponents. An opponent model provides an agent with some ability to reason about other agents to aid its own decision making. Most prior works learn an opponent model by assuming the opponent is employing a stationary policy or switching between a set of stationary policies. Such an approach can reduce the variance of training signals for policy search algorithms. However, in the multi-agent setting, agents have an incentive to continually adapt and learn. This means that the assumptions concerning opponent stationarity are unrealistic. In this work, we develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL). We show our structured opponent model is more accurate and stable than naive behaviour cloning baselines. We further show that opponent modelling can improve the performance of algorithmic agents in multi-agent settings.

【Keywords】:

1751. When Low Resource NLP Meets Unsupervised Language Model: Meta-Pretraining then Meta-Learning for Few-Shot Text Classification (Student Abstract).

Paper Link】 【Pages】:13773-13774

【Authors】: Shumin Deng ; Ningyu Zhang ; Zhanlin Sun ; Jiaoyan Chen ; Huajun Chen

【Abstract】: Text classification tends to be difficult when data are deficient or when it is required to adapt to unseen classes. In such challenging scenarios, recent studies have often used meta-learning to simulate the few-shot task, thus negating implicit common linguistic features across tasks. This paper addresses such problems using meta-learning and unsupervised language models. Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. We show that our approach is not only simple but also produces a state-of-the-art performance on a well-studied sentiment classification dataset. It can thus be further suggested that pretraining could be a promising solution for few-shot learning of many other NLP tasks. The code and the dataset to replicate the experiments are made available at https://github.com/zxlzr/FewShotNLP.

【Keywords】:

1752. Efficient Spatial-Temporal Rebalancing of Shareable Bikes (Student Abstract).

Paper Link】 【Pages】:13775-13776

【Authors】: Zichao Deng ; Anqi Tu ; Zelei Liu ; Han Yu

【Abstract】: Bike sharing systems are popular worldwide now. However, these systems are facing a problem - rebalancing of shareable bikes among different docking stations. To address this challenge, we propose an approach for the spatial-temporal rebalancing of shareable bikes which allows domain experts to optimize the rebalancing operation with their knowledge and preferences without relying on learning by trial-and-error.

【Keywords】:

1753. Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract).

Paper Link】 【Pages】:13777-13778

【Authors】: Akshay Dharmavaram ; Matthew Riemer ; Shalabh Bhatnagar

【Abstract】: Option-critic learning is a general-purpose reinforcement learning (RL) framework that aims to address the issue of long term credit assignment by leveraging temporal abstractions. However, when dealing with extended timescales, discounting future rewards can lead to incorrect credit assignments. In this work, we address this issue by extending the hierarchical option-critic policy gradient theorem for the average reward criterion. Our proposed framework aims to maximize the long-term reward obtained in the steady-state of the Markov chain defined by the agent's policy. Furthermore, we use an ordinary differential equation based approach for our convergence analysis and prove that the parameters of the intra-option policies, termination functions, and value functions, converge to their corresponding optimal values, with probability one. Finally, we illustrate the competitive advantage of learning options, in the average reward setting, on a grid-world environment with sparse rewards.

【Keywords】:

1754. Multi-Agent Pattern Formation with Deep Reinforcement Learning (Student Abstract).

Paper Link】 【Pages】:13779-13780

【Authors】: Elhadji Amadou Oury Diallo ; Toshiharu Sugawara

【Abstract】: We propose a decentralized multi-agent deep reinforcement learning architecture to investigate pattern formation under the local information provided by the agents' sensors. It consists of tasking a large number of homogeneous agents to move to a set of specified goal locations, addressing both the assignment and trajectory planning sub-problems concurrently. We then show that agents trained on random patterns can organize themselves into very complex shapes.

【Keywords】:

1755. American Sign Language Recognition Using an FMCW Wireless Sensor (Student Abstract).

Paper Link】 【Pages】:13781-13782

【Authors】: Yuanqi Du ; Nguyen Dang ; Riley Wilkerson ; Parth H. Pathak ; Huzefa Rangwala ; Jana Kosecka

【Abstract】: In today's digital world, rapid technological advancements continue to lessen the burden of tasks for individuals. Among these tasks is communication across perceived language barriers. Indeed, increased attention has been drawn to American Sign Language (ASL) recognition in recent years. Camera-based and motion detection-based methods have been researched extensively; however, there remains a divide in communication between ASL users and non-users. Therefore, this research team proposes the use of a novel wireless sensor (Frequency-Modulated Continuous-Wave Radar) to help bridge the gap in communication. In short, this device sends out signals that detect the user's body positioning in space. These signals then reflect off the body and back to the sensor, developing thousands of cloud points per second, indicating where the body is positioned in space. These cloud points can then be examined for movement over multiple consecutive time frames using a cell division algorithm, ultimately showing how the body moves through space as it completes a single gesture or sentence. At the end of the project, 95% accuracy was achieved in one-object prediction as well as 80% accuracy on cross-object prediction with 30% other objects' data introduced on 19 commonly used gestures. There are 30 samples for each gesture per person from three persons.

【Keywords】:

Paper Link】 【Pages】:13783-13784

【Authors】: Deanna Flynn ; P. Michael Furlong ; Brian Coltin

【Abstract】: Our neural architecture search algorithm progressively searches a tree of neural network architectures. Child nodes are created by inserting new layers determined by a transition graph into a parent network up to a maximum depth and pruned when performance is worse than its parent. This increases efficiency but makes the algorithm greedy. Simpler networks are successfully found before more complex ones that can achieve benchmark performance similar to other top-performing networks.

【Keywords】:

1757. Exploring Abstract Concepts for Image Privacy Prediction in Social Networks (Student Abstract).

Paper Link】 【Pages】:13785-13786

【Authors】: Gabriele Galfré ; Cornelia Caragea

【Abstract】: Automatically detecting the private nature of images posted in social networks such as Facebook, Flickr, and Instagram, is a long-standing goal considering the pervasiveness of these networks. Several prior works to image privacy prediction showed that object tags from images are highly informative about images' privacy. However, we conjecture that other aspects of images captured by abstract concepts (e.g., religion, sikhism, spirituality) can improve the performance of models that use only the concrete objects from an image (e.g., temple and person). Experimental results on a Flickr dataset show that the abstract concepts and concrete object tags complement each other and yield the best performance when used in combination as features for image privacy prediction.

【Keywords】:

1758. Predicting Opioid Overdose Crude Rates with Text-Based Twitter Features (Student Abstract).

Paper Link】 【Pages】:13787-13788

【Authors】: Nupoor Gandhi ; Alex Morales ; Sally Man-Pui Chan ; Dolores Albarracin ; ChengXiang Zhai

【Abstract】: Drug use reporting is often a bottleneck for modern public health surveillance; social media data provides a real-time signal which allows for tracking and monitoring opioid overdoses. In this work we focus on text-based feature construction for the prediction task of opioid overdose rates at the county level. More specifically, using a Twitter dataset with over 3.4 billion tweets, we explore semantic features, such as topic features, to show that social media could be a good indicator for forecasting opioid overdose crude rates in public health monitoring systems. Specifically, combining topic and TF-IDF features in conjunction with demographic features can predict opioid overdose rates at the county level.

【Keywords】:

1759. I Am Guessing You Can't Recognize This: Generating Adversarial Images for Object Detection Using Spatial Commonsense (Student Abstract).

Paper Link】 【Pages】:13789-13790

【Authors】: Anurag Garg ; Niket Tandon ; Aparna S. Varde

【Abstract】: Can we automatically predict failures of an object detection model on images from a target domain? We characterize errors of a state-of-the-art object detection model on the currently popular smart mobility domain, and find that a large number of errors can be identified using spatial commonsense. We propose øurmodel , a system that automatically identifies a large number of such errors based on commonsense knowledge. Our system does not require any new annotations and can still find object detection errors with high accuracy (more than 80% when measured by humans). This work lays the foundation to answer exciting research questions on domain adaptation including the ability to automatically create adversarial datasets for target domain.

【Keywords】:

1760. VECA: A Method for Detecting Overfitting in Neural Networks (Student Abstract).

Paper Link】 【Pages】:13791-13792

【Authors】: Liangzhu Ge ; Yuexian Hou ; Yaju Jiang ; Shuai Yao ; Chao Yang

【Abstract】: Despite their widespread applications, deep neural networks often tend to overfit the training data. Here, we propose a measure called VECA (Variance of Eigenvalues of Covariance matrix of Activation matrix) and demonstrate that VECA is a good predictor of networks' generalization performance during the training process. Experiments performed on fully-connected networks and convolutional neural networks trained on benchmark image datasets show a strong correlation between test loss and VECA, which suggest that we can calculate the VECA to estimate generalization performance without sacrificing training data to be used as a validation set.

【Keywords】:

1761. Does Speech Enhancement of Publicly Available Data Help Build Robust Speech Recognition Systems? (Student Abstract).

Paper Link】 【Pages】:13793-13794

【Authors】: Bhavya Ghai ; Buvana Ramanan ; Klaus Mueller

【Abstract】: Automatic speech recognition(ASR) systems play a key role in many commercial products including voice assistants. Typically, they require large amounts of high quality speech data for training which gives an undue advantage to large organizations which have tons of private data. We investigated if speech data obtained from publicly available sources can be further enhanced to train better speech recognition models. We begin with noisy/contaminated speech data, apply speech enhancement to produce 'cleaned' version and use both the versions to train the ASR model. We have found that using speech enhancement gives 9.5% better word error rate than training on just the original noisy data and 9% better than training on just the ground truth 'clean' data. It's performance is also comparable to the ideal case scenario when trained on noisy and it's ground truth 'clean' version.

【Keywords】:

1762. An Automatic Shoplifting Detection from Surveillance Videos (Student Abstract).

Paper Link】 【Pages】:13795-13796

【Authors】: U.-ju Gim ; Jae-Jun Lee ; Jeong-Hun Kim ; Young-Ho Park ; Aziz Nasridinov

【Abstract】: The use of closed circuit television (CCTV) surveillance devices is increasing every year to prevent abnormal behaviors, including shoplifting. However, damage from shoplifting is also increasing every year. Thus, there is a need for intelligent CCTV surveillance systems that ensure the integrity of shops, despite workforce shortages. In this study, we propose an automatic detection system of shoplifting behaviors from surveillance videos. Instead of extracting features from the whole frame, we use the Region of Interest (ROI) optical-flow fusion network to highlight the necessary features more accurately.

【Keywords】:

1763. ESAS: Towards Practical and Explainable Short Answer Scoring (Student Abstract).

Paper Link】 【Pages】:13797-13798

【Authors】: Palak Goenka ; Mehak Piplani ; Ramit Sawhney ; Puneet Mathur ; Rajiv Ratn Shah

【Abstract】: Motivated by the mandate to design and deploy a practical, real-world educational tool for grading, we extensively explore linguistic patterns for Short Answer Scoring (SAS) as well as authorship feedback. We approach the SAS task via a multipronged approach that employs linguistic context features for capturing domain-specific knowledge while emphasizing on domain agnostic grading and detailed feedback via an ensemble of explainable statistical models. Our methodology quantitatively supersedes multiple automatic short answer scoring systems.

【Keywords】:

1764. Modeling Involuntary Dynamic Behaviors to Support Intelligent Tutoring (Student Abstract).

Paper Link】 【Pages】:13799-13800

【Authors】: Mononito Goswami ; Lujie Chen ; Chufan Gao ; Artur Dubrawski

【Abstract】: Problem solving is one of the most important 21st century skills. However, effectively coaching young students in problem solving is challenging because teachers must continuously monitor their cognitive and affective states and make real-time pedagogical interventions to maximize students' learning outcomes. It is an even more challenging task in social environments with limited human coaching resources. To lessen the cognitive load on a teacher and enable affect-sensitive intelligent tutoring, many researchers have investigated automated cognitive and affective detection methods. However, most of the studies use culturally-sensitive indices of affect that are prone to social editing such as facial expressions, and only few studies have explored involuntary dynamic behavioral signals such as gross body movements. In addition, most current methods rely on expensive labelled data from trained annotators for supervised learning. In this paper, we explore a semi-supervised learning framework that can learn low-dimensional representations of involuntary dynamic behavioral signals (mainly gross-body movements) from a modest number of short time series segments. Experiments on a real-world dataset reveal a significant utility of these representations in discriminating cognitive disequilibrium and flow and demonstrate their potential in transferring learned models to previously unseen subjects.

【Keywords】:

1765. Hypergraph Convolutional Network for Multi-Hop Knowledge Base Question Answering (Student Abstract).

Paper Link】 【Pages】:13801-13802

【Authors】: Jiale Han ; Bo Cheng ; Xu Wang

【Abstract】: Graph convolutional networks (GCN) have been applied in knowledge base question answering (KBQA) task. However, the pairwise connection between nodes of GCN limits the representation capability of high-order data correlation. Furthermore, most previous work does not fully utilize the semantic relation information, which is vital to reasoning. In this paper, we propose a novel multi-hop KBQA model based on hypergraph convolutional network. By constructing a hypergraph, the form of pairwise connection between nodes and nodes is converted to the high-level connection between nodes and edges, which effectively encodes complex related data. To better exploit the semantic information of relations, we apply co-attention method to learn similarity between relation and query, and assign weights to different relations. Experimental results demonstrate the effectivity of the model.

【Keywords】:

1766. Trimodal Attention Module for Multimodal Sentiment Analysis (Student Abstract).

Paper Link】 【Pages】:13803-13804

【Authors】: Anirudh Bindiganavale Harish ; Fatiha Sadat

【Abstract】: In our research, we propose a new multimodal fusion architecture for the task of sentiment analysis. The 3 modalities used in this paper are text, audio and video. Most of the current methods deal with either a feature level or a decision level fusion. In contrast, we propose an attention-based deep neural network and a training approach to facilitate both feature and decision level fusion. Our network effectively leverages information across all three modalities using a 2 stage fusion process. We test our network on the individual utterance based contextual information extracted from the CMU-MOSI Dataset. A comparison is drawn between the state-of-the-art and our network.

【Keywords】:

1767. Action Graphs for Goal Recognition Problems with Inaccurate Initial States (Student Abstract).

Paper Link】 【Pages】:13805-13806

【Authors】: Helen Harman ; Pieter Simoens

【Abstract】: Goal recognisers attempt to infer an agent's intentions from a sequence of observations. Approaches that adapt classical planning techniques to goal recognition have previously been proposed but, generally, they assume the initial world state is accurately defined. In this paper, a state is inaccurate if any fluent's value is unknown or incorrect. To cope with this, a cyclic Action Graph, which models the order constraints between actions, is traversed to label each node with their distance from each hypothesis goal. These distances are used to calculate the posterior goal probabilities. Our experimental results, for 15 different domains, demonstrate that our approach is unaffected by an inaccurately defined initial state.

【Keywords】:

1768. A Bias Trick for Centered Robust Principal Component Analysis (Student Abstract).

Paper Link】 【Pages】:13807-13808

【Authors】: Baokun He ; Guihong Wan ; Haim Schweitzer

【Abstract】: Outlier based Robust Principal Component Analysis (RPCA) requires centering of the non-outliers. We show a “bias trick” that automatically centers these non-outliers. Using this bias trick we obtain the first RPCA algorithm that is optimal with respect to centering.

【Keywords】:

1769. Inception LSTM for Next-frame Video Prediction (Student Abstract).

Paper Link】 【Pages】:13809-13810

【Authors】: Matin Hosseini ; Anthony S. Maida ; Majid Hosseini ; Raju Gottumukkala

【Abstract】: In this paper, we proposed a novel deep-learning method called Inception LSTM for video frame prediction. A standard convolutional LSTM uses a single size kernel for each of its gates. Having multiple kernel sizes within a single gate would provide a richer features that would otherwise not be possible with a single kernel. Our key idea is to introduce inception like kernels within the LSTM gates to capture features from a bigger area of the image while retaining the fine resolution of small information. We implemented the proposed idea of inception LSTM network on PredNet network with both inception version 1 and inception version 2 modules. The proposed idea was evaluated on both KITTI and KTH data. Our results show that the Inception LSTM has better predictive performance compared to convolutional LSTM. We also observe that LSTM with Inception version 1 has better predictive performance compared to Inception version 2, but Inception version 2 has less computational cost.

【Keywords】:

1770. Multi-View Deep Attention Network for Reinforcement Learning (Student Abstract).

Paper Link】 【Pages】:13811-13812

【Authors】: Yueyue Hu ; Shiliang Sun ; Xin Xu ; Jing Zhao

【Abstract】: The representation approximated by a single deep network is usually limited for reinforcement learning agents. We propose a novel multi-view deep attention network (MvDAN), which introduces multi-view representation learning into the reinforcement learning task for the first time. The proposed model approximates a set of strategies from multiple representations and combines these strategies based on attention mechanisms to provide a comprehensive strategy for a single-agent. Experimental results on eight Atari video games show that the MvDAN has effective competitive performance than single-view reinforcement learning methods.

【Keywords】:

1771. Streaming Batch Gradient Tracking for Neural Network Training (Student Abstract).

Paper Link】 【Pages】:13813-13814

【Authors】: Siyuan Huang ; Brian D. Hoskins ; Matthew W. Daniels ; Mark D. Stiles ; Gina C. Adam

【Abstract】: Faster and more energy efficient hardware accelerators are critical for machine learning on very large datasets. The energy cost of performing vector-matrix multiplication and repeatedly moving neural network models in and out of memory motivates a search for alternative hardware and algorithms. We propose to use streaming batch principal component analysis (SBPCA) to compress batch data during training by using a rank-k approximation of the total batch update. This approach yields comparable training performance to minibatch gradient descent (MBGD) at the same batch size while reducing overall memory and compute requirements.

【Keywords】:

1772. Self-Supervised, Semi-Supervised, Multi-Context Learning for the Combined Classification and Segmentation of Medical Images (Student Abstract).

Paper Link】 【Pages】:13815-13816

【Authors】: Abdullah-Al-Zubaer Imran ; Chao Huang ; Hui Tang ; Wei Fan ; Yuan Xiao ; Dingjun Hao ; Zhen Qian ; Demetri Terzopoulos

【Abstract】: To tackle the problem of limited annotated data, semi-supervised learning is attracting attention as an alternative to fully supervised models. Moreover, optimizing a multiple-task model to learn “multiple contexts” can provide better generalizability compared to single-task models. We propose a novel semi-supervised multiple-task model leveraging self-supervision and adversarial training—namely, self-supervised, semi-supervised, multi-context learning (S4MCL)—and apply it to two crucial medical imaging tasks, classification and segmentation. Our experiments on spine X-rays reveal that the S4MCL model significantly outperforms semi-supervised single-task, semi-supervised multi-context, and fully-supervised single-task models, even with a 50% reduction of classification and segmentation labels.

【Keywords】:

1773. A Multi-Task Approach to Open Domain Suggestion Mining (Student Abstract).

Paper Link】 【Pages】:13817-13818

【Authors】: Minni Jain ; Maitree Leekha ; Mononito Goswami

【Abstract】: Consumer reviews online may contain suggestions useful for improving the target products and services. Mining suggestions is challenging because the field lacks large labelled and balanced datasets. Furthermore, most prior studies have only focused on mining suggestions in a single domain. In this work, we introduce a novel up-sampling technique to address the problem of class imbalance, and propose a multi-task deep learning approach for mining suggestions from multiple domains. Experimental results on a publicly available dataset show that our up-sampling technique coupled with the multi-task framework outperforms state-of-the-art open domain suggestion mining models in terms of the F-1 measure and AUC.

【Keywords】:

1774. Third-Person Imitation Learning via Image Difference and Variational Discriminator Bottleneck (Student Abstract).

Paper Link】 【Pages】:13819-13820

【Authors】: Chong Jiang ; Zongzhang Zhang ; Zixuan Chen ; Jiacheng Zhu ; Junpeng Jiang

【Abstract】: Third-person imitation learning (TPIL) is a variant of generative adversarial imitation learning and can learn an expert-like policy from third-person expert demonstrations. Third-person expert demonstrations usually exist in the form of videos recorded in a third-person perspective, and there is a lack of direct correspondence with samples generated by agent. To alleviate this problem, we improve TPIL by applying image difference and variational discriminator bottleneck. Empirically, our new method has better performance than TPIL on two MuJoCo tasks, Reacher and Inverted Pendulum.

【Keywords】:

1775. Automatic Text-Based Personality Recognition on Monologues and Multiparty Dialogues Using Attentive Networks and Contextual Embeddings (Student Abstract).

Paper Link】 【Pages】:13821-13822

【Authors】: Hang Jiang ; Xianzhe Zhang ; Jinho D. Choi

【Abstract】: Previous works related to automatic personality recognition focus on using traditional classification models with linguistic features. However, attentive neural networks with contextual embeddings, which have achieved huge success in text classification, are rarely explored for this task. In this project, we have two major contributions. First, we create the first dialogue-based personality dataset, FriendsPersona , by annotating 5 personality traits of speakers from Friends TV Show through crowdsourcing. Second, we present a novel approach to automatic personality recognition using pre-trained contextual embeddings (BERT and RoBERTa) and attentive neural networks. Our models largely improve the state-of-art results on the monologue Essays dataset by 2.49%, and establish a solid benchmark on our FriendsPersona. By comparing results in two datasets, we demonstrate the challenges of modeling personality in multi-party dialogue.

【Keywords】:

1776. Incremental Sense Weight Training for In-Depth Interpretation of Contextualized Word Embeddings (Student Abstract).

Paper Link】 【Pages】:13823-13824

【Authors】: Xinyi Jiang ; Zhengzhe Yang ; Jinho D. Choi

【Abstract】: We present a novel online algorithm that learns the essence of each dimension in word embeddings. We first mask dimensions determined unessential by our algorithm, apply the masked word embeddings to a word sense disambiguation task (WSD), and compare its performance against the one achieved by the original embeddings. Our results show that the masked word embeddings do not hurt the performance and can improve it by 3%.

【Keywords】:

1777. Learning Directional Sentence-Pair Embedding for Natural Language Reasoning (Student Abstract).

Paper Link】 【Pages】:13825-13826

【Authors】: Yuchen Jiang ; Zhenxin Xiao ; Kai-Wei Chang

【Abstract】: Enabling the models with the ability of reasoning and inference over text is one of the core missions of natural language understanding. Despite deep learning models have shown strong performance on various cross-sentence inference benchmarks, recent work has shown that they are leveraging spurious statistical cues rather than capturing deeper implied relations between pairs of sentences. In this paper, we show that the state-of-the-art language encoding models are especially bad at modeling directional relations between sentences by proposing a new evaluation task: Cause-and-Effect relation prediction task. Back by our curated Cause-and-Effect Relation dataset (Cℰℛ), we also demonstrate that a mutual attention mechanism can guide the model to focus on capturing directional relations between sentences when added to existing transformer-based models. Experiment results show that the proposed approach improves the performance on downstream applications, such as the abductive reasoning task.

【Keywords】:

1778. Re-Thinking LiDAR-Stereo Fusion Frameworks (Student Abstract).

Paper Link】 【Pages】:13827-13828

【Authors】: Qilin Jin ; Parasara Sridhar Duggirala

【Abstract】: In this paper, we present a 2-step framework for high-precision dense depth perception from stereo RGB images and sparse LiDAR input. In the first step, we train a deep neural network to predict dense depth map from the left image and sparse LiDAR data, in a novel self-supervised manner. Then in the second step, we compute a disparity map from the predicted depths, and refining the disparity map by making sure that for every pixel in the left, its match in the right image, according to the final disparity, is the local optimum.

【Keywords】:

1779. Leveraging BERT with Mixup for Sentence Classification (Student Abstract).

Paper Link】 【Pages】:13829-13830

【Authors】: Amit Jindal ; Dwaraknath Gnaneshwar ; Ramit Sawhney ; Rajiv Ratn Shah

【Abstract】: Good generalization capability is an important quality of well-trained and robust neural networks. However, networks usually struggle when faced with samples outside the training distribution. Mixup is a technique that improves generalization, reduces memorization, and increases adversarial robustness. We apply a variant of Mixup called Manifold Mixup to the sentence classification problem, and present the results along with an ablation study. Our methodology outperforms CNN, LSTM, and vanilla BERT models in generalization.

【Keywords】:

1780. Determining the Possibility of Transfer Learning in Deep Reinforcement Learning Using Grad-CAM (Student Abstract).

Paper Link】 【Pages】:13831-13832

【Authors】: Ho-Taek Joo ; Kyung-Joong Kim

【Abstract】: Humans are usually good at guessing whether the two games are similar to each other and easily estimate how much time to master new games based on the similarity. Although Deep Reinforcement Learning (DRL) has been successful in various domains, it takes much training time to get a successful controller for a single game. Therefore, there has been much demand for the use of transfer learning to speed up reinforcement learning across multiple tasks. If we can automatically determine the possibility of transfer learning in DRL domain before training, it could efficiently transfer knowledge across multiple games. In this work, we propose a simple testing method, Determining the Possibility of Transfer Learning (DPTL), to determine the transferability of models based on Grad-CAM visualization of the CNN layer from the source model. Experimental results on Atari games show that the transferability measure is successfully suggesting the possibility of transfer learning.

【Keywords】:

1781. Exploring the Benefits of Depth Information in Object Pixel Masking (Student Abstract).

Paper Link】 【Pages】:13833-13834

【Authors】: Anish Kachinthaya ; Yi Ding ; Tobias Höllerer

【Abstract】: In this paper, we look at how depth data can benefit existing object masking methods applied in occluded scenes. Masking the pixel locations of objects within scenes helps computers get a spatial awareness of where objects are within images. The current state-of-the-art algorithm for masking objects in images is Mask R-CNN, which builds on the Faster R-CNN network to mask object pixels rather than just detecting their bounding boxes. This paper examines the weaknesses Mask R-CNN has in masking people when they are occluded in a frame. It then looks at how depth data gathered from an RGB-D sensor can be used. We provide a case study to show how simply applying thresholding methods on the depth information can aid in distinguishing occluded persons. The intention of our research is to examine how features from depth data can benefit object pixel masking methods in an explainable manner, especially in complex scenes with multiple objects.

【Keywords】:

1782. A Critique of the Smooth Inverse Frequency Sentence Embeddings (Student Abstract).

Paper Link】 【Pages】:13835-13836

【Authors】: Aidana Karipbayeva ; Alena Sorokina ; Zhenisbek Assylbekov

【Abstract】: We critically review the smooth inverse frequency sentence embedding method of Arora, Liang, and Ma (2017), and show inconsistencies in its setup, derivation and evaluation.

【Keywords】:

1783. Multidimensional Analysis of Trust in News Articles (Student Abstract).

Paper Link】 【Pages】:13837-13838

【Authors】: Avneet Kaur ; Maitree Leekha ; Utkarsh Chawla ; Ayush Agarwal ; Mudit Saxena ; Nishtha Madaan ; Kalapriya Kannan ; Sameep Mehta

【Abstract】: The advancements in the field of Information Communication Technology have engendered revolutionary changes in the journalism industry, not only on the part of the journalists and the media personnel, but also on the people consuming these news stories, who today, are only a click away from all the updates they need. However, these advances have also exposed the prevailing venality, wearying off the trust of the public in news media. How then, does an individual discern that which, out of the countless news stories for an incident, should be trusted? This work introduces a system that presents the user a multidimensional analysis for trust in news from various media sources based on the textual content of the articles, assessment of the journalists' perspectives and the temporal diversity of the issues being covered by the media houses publishing the news articles. Our experiments on a self-collected dataset confirm that the system aids in a comprehensive analysis of trust.

【Keywords】:

1784. Algorithmic Bias in Recidivism Prediction: A Causal Perspective (Student Abstract).

Paper Link】 【Pages】:13839-13840

【Authors】: Aria Khademi ; Vasant G. Honavar

【Abstract】: ProPublica's analysis of recidivism predictions produced by Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) software tool for the task, has shown that the predictions were racially biased against African American defendants. We analyze the COMPAS data using a causal reformulation of the underlying algorithmic fairness problem. Specifically, we assess whether COMPAS exhibits racial bias against African American defendants using FACT, a recently introduced causality grounded measure of algorithmic fairness. We use the Neyman-Rubin potential outcomes framework for causal inference from observational data to estimate FACT from COMPAS data. Our analysis offers strong evidence that COMPAS exhibits racial bias against African American defendants. We further show that the FACT estimates from COMPAS data are robust in the presence of unmeasured confounding.

【Keywords】:

1785. New Off-Board Solution for Predicting Vehicles' Intentions in the Highway On-Ramp Using Probabilistic Classifiers (Student Abstract).

Paper Link】 【Pages】:13841-13842

【Authors】: Zine El Abidine Kherroubi ; Samir Aknine ; Rebiha Bacha

【Abstract】: This paper proposes a new approach for predicting drivers' intentions in a Highway on-ramp merge situation using a central road side unit (RSU) with probabilistic classifiers.

【Keywords】:

1786. Learning to Classify the Wrong Answers for Multiple Choice Question Answering (Student Abstract).

Paper Link】 【Pages】:13843-13844

【Authors】: Hyeondey Kim ; Pascale Fung

【Abstract】: Multiple-Choice Question Answering (MCQA) is the most challenging area of Machine Reading Comprehension (MRC) and Question Answering (QA), since it not only requires natural language understanding, but also problem-solving techniques. We propose a novel method, Wrong Answer Ensemble (WAE), which can be applied to various MCQA tasks easily. To improve performance of MCQA tasks, humans intuitively exclude unlikely options to solve the MCQA problem. Mimicking this strategy, we train our model with the wrong answer loss and correct answer loss to generalize the features of our model, and exclude likely but wrong options. An experiment on a dialogue-based examination dataset shows the effectiveness of our approach. Our method improves the results on a fine-tuned transformer by 2.7%.

【Keywords】:

1787. Task Scoping for Efficient Planning in Open Worlds (Student Abstract).

Paper Link】 【Pages】:13845-13846

【Authors】: Nishanth Kumar ; Michael Fishman ; Natasha Danas ; Stefanie Tellex ; Michael L. Littman ; George Konidaris

【Abstract】: We propose an abstraction method for open-world environments expressed as Factored Markov Decision Processes (FMDPs) with very large state and action spaces. Our method prunes state and action variables that are irrelevant to the optimal value function on the state subspace the agent would visit when following any optimal policy from the initial state. This method thus enables tractable fast planning within large open-world FMDPs.

【Keywords】:

1788. Toward Operational Safety Verification of AI-Enabled CPS (Student Abstract).

Paper Link】 【Pages】:13847-13848

【Authors】: Imane Lamrani ; Ayan Banerjee ; Sandeep K. S. Gupta

【Abstract】: AI-enabled Cyber-physical systems (CPS) such as artificial pancreas (AP) or autonomous cars are using machine learning to make several critical decisions. The system is subject to inputs and scenarios which are not observed during training and the expected outputs are not known. Hence, popular model based verification techniques that characterize behavior of a control system before deployment using predictive models may be inaccurate and often result in incorrect safety analysis results. In addition, regulatory agencies are required to regulate safety-critical AI enabled CPS to ensure their operational safety. However, high complexity of the system result in myriad of safety concerns all of which may not only be comprehensively tested before deployment but also may not even be detected during design and testing phase. In this work, we propose a tool to help regulatory agencies compare the operation of the CPS with the specifications given by the manufacturer to ensure that the operation results conform with the safety assured design of a CPS.

【Keywords】:

1789. BattleNet: Capturing Advantageous Battlefield in RTS Games (Student Abstract).

Paper Link】 【Pages】:13849-13850

【Authors】: Donghyeon Lee ; Man-Je Kim ; Chang Wook Ahn

【Abstract】: In a real-time strategy (RTS) game, StarCraft II, players need to know the consequences before making a decision in combat. We propose a combat outcome predictor which utilizes terrain information as well as squad information. For training the model, we generated a StarCraft II combat dataset by simulating diverse and large-scale combat situations. The overall accuracy of our model was 89.7%. Our predictor can be integrated into the artificial intelligence agent for RTS games as a short-term decision-making module.

【Keywords】:

1790. Submodel Decomposition for Solving Limited Memory Influence Diagrams (Student Abstract).

Paper Link】 【Pages】:13851-13852

【Authors】: Junkyu Lee

【Abstract】: This paper presents a systematic way of decomposing a limited memory influence diagram (LIMID) to a tree of single-stage decision problems, or submodels and solving it by message passing. The relevance in LIMIDs is formalized by the notion of the partial evaluation of the maximum expected utility, and the graph separation criteria for identifying submodels follow. The submodel decomposition provides a graphical model approach for updating the beliefs and propagating the conditional expected utilities for solving LIMIDs with the worst-case complexity bounded by the maximum treewidth of the individual submodels.

【Keywords】:

1791. Who Are Controlled by The Same User? Multiple Identities Deception Detection via Social Interaction Activity (Student Abstract).

Paper Link】 【Pages】:13853-13854

【Authors】: Jiacheng Li ; Chunyuan Yuan ; Wei Zhou ; Jingli Wang ; Songlin Hu

【Abstract】: Social media has become a preferential place for sharing information. However, some users may create multiple accounts and manipulate them to deceive legitimate users. Most previous studies utilize verbal or behavior features based methods to solve this problem, but they are only designed for some particular platforms, leading to low universalness.In this paper, to support multiple platforms, we construct interaction tree for each account based on their social interactions which is common characteristic of social platforms. Then we propose a new method to calculate the social interaction entropy of each account and detect the accounts which are controlled by the same user. Experimental results on two real-world datasets show that the method has robust superiority over state-of-the-art methods.

【Keywords】:

1792. Travel Time Prediction on Un-Monitored Roads: A Spatial Factorization Machine Based Approach (Student Abstract).

Paper Link】 【Pages】:13855-13856

【Authors】: Lile Li ; Wei Liu

【Abstract】: Real-time traffic monitoring is one of the most important factors for route planning and estimated time of arrival (ETA). Many major roads in large cities are installed with live traffic monitoring systems, inferring the current traffic congestion status and ETAs to other locations. However, there are also many other roads, especially small roads and paths, that are not monitored. Yet, live traffic status on such un-monitored small roads can play a non-negligible role in personalized route planning and re-routing when road incident happens. How to estimate the traffic status on such un-monitored roads is thus a valuable problem to be addressed. In this paper, we propose a model called Spatial Factorization Machines (SFM) to address this problem. A major advantage of the SFM model is that it incorporates physical distances and structures of road networks into the estimation of traffic status on un-monitored roads. Our experiments on real world traffic data demonstrate that the SFM model significantly outperforms other existing models on ETA of un-monitored roads.

【Keywords】:

1793. Selecting Portfolios Directly Using Recurrent Reinforcement Learning (Student Abstract).

Paper Link】 【Pages】:13857-13858

【Authors】: Lin Li

【Abstract】: Portfolio selection has attracted increasing attention in machine learning and AI communities recently. Existing portfolio selection using recurrent reinforcement learning (RRL) heavily relies on single asset trading system to heuristically obtain the portfolio weights. In this paper, we propose a novel method, the direct portfolio selection using recurrent reinforcement learning (DPS-RRL), to select portfolios directly. Instead of trading single asset one by one to obtain portfolio weights, our method learns to quantify the asset allocation weight directly via optimizing the Sharpe ratio of financial portfolios. We empirically demonstrate the effectiveness of our method, which is able to outperform state-of-the-art portfolio selection methods.

【Keywords】:

1794. Towards Minimal Supervision BERT-Based Grammar Error Correction (Student Abstract).

Paper Link】 【Pages】:13859-13860

【Authors】: Yiyuan Li ; Antonios Anastasopoulos ; Alan W. Black

【Abstract】: Current grammatical error correction (GEC) models typically consider the task as sequence generation, which requires large amounts of annotated data and limit the applications in data-limited settings. We try to incorporate contextual information from pre-trained language model to leverage annotation and benefit multilingual scenarios. Results show strong potential of Bidirectional Encoder Representations from Transformers (BERT) in grammatical error correction task.

【Keywords】:

1795. Adabot: Fault-Tolerant Java Decompiler (Student Abstract).

Paper Link】 【Pages】:13861-13862

【Authors】: Zhiming Li ; Qing Wu ; Kun Qian

【Abstract】: Reverse Engineering has been an extremely important field in software engineering, it helps us to better understand and analyze the internal architecture and interrealtions of executables. Classical Java reverse engineering task includes disassembly and decompilation. Traditional Abstract Syntax Tree (AST) based disassemblers and decompilers are strictly rule defined and thus highly fault intolerant when bytecode obfuscation were introduced for safety concern. In this work, we view decompilation as a statistical machine translation task and propose a decompilation framework which is fully based on self-attention mechanism. Through better adaption to the linguistic uniqueness of bytecode, our model fully outperforms rule-based models and previous works based on recurrence mechanism.

【Keywords】:

1796. Constrained Self-Supervised Clustering for Discovering New Intents (Student Abstract).

Paper Link】 【Pages】:13863-13864

【Authors】: Ting-En Lin ; Hua Xu ; Hanlei Zhang

【Abstract】: Discovering new user intents is an emerging task in the dialogue system. In this paper, we propose a self-supervised clustering method that can naturally incorporate pairwise constraints as prior knowledge to guide the clustering process and does not require intensive feature engineering. Extensive experiments on three benchmark datasets show that our method can yield significant improvements over strong baselines.

【Keywords】:

1797. Generating Engaging Promotional Videos for E-commerce Platforms (Student Abstract).

Paper Link】 【Pages】:13865-13866

【Authors】: Chang Liu ; Han Yu ; Yi Dong ; Zhiqi Shen ; Yingxue Yu ; Ian Dixon ; Zhanning Gao ; Pan Wang ; Peiran Ren ; Xuansong Xie ; Lizhen Cui ; Chunyan Miao

【Abstract】: There is an emerging trend for sellers to use videos to promote their products on e-commerce platforms such as Taobao.com. Current video production workflow includes the production of visual storyline by human directors. We propose a system to automatically generate visual storyline based on the input set of visual materials (e.g. video clips or still images) and then produce a promotional video. In particular, we propose an algorithm called Shot Composition, Selection and Plotting (ShotCSP), which generates visual storylines leveraging film-making principles to improve viewing experience and perceived persuasiveness.

【Keywords】:

1798. Bayesian Adversarial Attack on Graph Neural Networks (Student Abstract).

Paper Link】 【Pages】:13867-13868

【Authors】: Xiao Liu ; Jing Zhao ; Shiliang Sun

【Abstract】: Adversarial attack on graph neural network (GNN) is distinctive as it often jointly trains the available nodes to generate a graph as an adversarial example. Existing attacking approaches usually consider the case that all the training set is available which may be impractical. In this paper, we propose a novel Bayesian adversarial attack approach based on projected gradient descent optimization, called Bayesian PGD attack, which gets more general attack examples than deterministic attack approaches. The generated adversarial examples by our approach using the same partial dataset as deterministic attack approaches would make the GNN have higher misclassification rate on graph node classification. Specifically, in our approach, the edge perturbation Z is used for generating adversarial examples, which is viewed as a random variable with scale constraint, and the optimization target of the edge perturbation is to maximize the KL divergence between its true posterior distribution p(Z|D) and its approximate variational distribution qθ(Z). We experimentally find that the attack performance will decrease with the reduction of available nodes, and the effect of attack using different nodes varies greatly especially when the number of nodes is small. Through experimental comparison with the state-of-the-art attack approaches on GNNs, our approach is demonstrated to have better and robust attack performance.

【Keywords】:

1799. Towards Consistent Variational Auto-Encoding (Student Abstract).

Paper Link】 【Pages】:13869-13870

【Authors】: Yijing Liu ; Shuyu Lin ; Ronald Clark

【Abstract】: Variational autoencoders (VAEs) have been a successful approach to learning meaningful representations of data in an unsupervised manner. However, suboptimal representations are often learned because the approximate inference model fails to match the true posterior of the generative model, i.e. an inconsistency exists between the learnt inference and generative models. In this paper, we introduce a novel consistency loss that directly requires the encoding of the reconstructed data point to match the encoding of the original data, leading to better representations. Through experiments on MNIST and Fashion MNIST, we demonstrate the existence of the inconsistency in VAE learning and that our method can effectively reduce such inconsistency.

【Keywords】:

1800. Gifting in Multi-Agent Reinforcement Learning (Student Abstract).

Paper Link】 【Pages】:13871-13872

【Authors】: Andrei Lupu ; Doina Precup

【Abstract】: This work performs a first study on multi-agent reinforcement learning with deliberate reward passing between agents. We empirically demonstrate that such mechanics can greatly improve the learning progression in a resource appropriation setting and provide a preliminary discussion of the complex effects of gifting on the learning dynamics.

【Keywords】:

1801. Suicide Risk Assessment via Temporal Psycholinguistic Modeling (Student Abstract).

Paper Link】 【Pages】:13873-13874

【Authors】: Puneet Mathur ; Ramit Sawhney ; Rajiv Ratn Shah

【Abstract】: Social media platforms are increasingly being used for studying psycho-linguistic phenomenon to model expressions of suicidal intent in tweets. Most recent work in suicidal ideation detection doesn't leverage contextual psychological cues. In this work, we hypothesize that the contextual information embedded in the form of historical activities of users and homophily networks formed between like-minded individuals in Twitter can substantially improve existing techniques for automated identification of suicidal tweets. This premise is extensively tested to yield state of the art results as compared to linguistic only models, and the state-of-the-art model.

【Keywords】:

1802. Meta-Learning on Graph with Curvature-Based Analysis (Student Abstract).

Paper Link】 【Pages】:13875-13876

【Authors】: Tae Hong Moon ; Sungsu Lim

【Abstract】: Learning latent representations in graphs is finding a mapping that embeds nodes or edges as data points in a low-dimensional vector space. This paper introduces a flexible framework to enhance existing methodologies that have difficulty capturing local proximity and global relationships at the same time. Our approach generates a virtual edge between non-adjacent nodes based on the Forman-Ricci curvature in network. By analyzing the network using topological information, global relationships structurally similar can easily be detected and successfully integrated with previous works.

【Keywords】:

1803. Random Projections and α-Shape to Support the Kernel Design (Student Abstract).

Paper Link】 【Pages】:13877-13878

【Authors】: Daniel Moreira Cestari ; Rodrigo Fernandes de Mello

【Abstract】: We demonstrate that projecting data points into hyperplanes is good strategy for general-purpose kernel design. We used three different hyperplanes generation schemes, random, convex hull and α-shape, and evaluated the results on two synthetic and three well known image-based datasets. The results showed considerable improvement in the classification performance in almost all scenarios, corroborating the claim that such an approach can be used as a general-purpose kernel transformation. Also, we discuss some connection with Convolutional Neural Networks and how such an approach could be used to understand such networks better.

【Keywords】:

1804. An Analytical Workflow for Clustering Forensic Images (Student Abstract).

Paper Link】 【Pages】:13879-13880

【Authors】: Sara Mousavi ; Dylan Lee ; Tatianna Griffin ; Dawnie W. Steadman ; Audris Mockus

【Abstract】: Large collections of images, if curated, drastically contribute to the quality of research in many domains. Unsupervised clustering is an intuitive, yet effective step towards curating such datasets. In this work, we present a workflow for unsupervisedly clustering a large collection of forensic images. The workflow utilizes classic clustering on deep feature representation of the images in addition to domain-related data to group them together. Our manual evaluation shows a purity of 89% for the resulted clusters.

【Keywords】:

1805. A QSAT Benchmark Based on Vertex-Folkman Problems (Student Abstract).

Paper Link】 【Pages】:13881-13882

【Authors】: David E. Narváez

【Abstract】: The purpose of this paper is to draw attention to a particular family of quantified Boolean formulas (QBFs) stemming from encodings of some vertex Folkman problems in extremal graph theory. We argue that this family of formulas is interesting for QSAT research because it is both conceptually simple and parametrized in a way that allows for a fine-grained diversity in the level of difficulty of its instances. Additionally, when coupled with symmetry breaking, the formulas in this family exhibit backbones (unique satisfying assignments) at the top-level existential variables. This benchmark is thus suitable for addressing questions regarding the connection between the existence of backbones and the hardness of QBFs.

【Keywords】:

1806. MUSIC COLLAB: An IoT and ML Based Solution for Remote Music Collaboration (Student Abstract).

Paper Link】 【Pages】:13883-13884

【Authors】: Nishtha Nayar ; Divya Lohani

【Abstract】: Communication using mediums like video and audio is essential for a lot of professions. In this paper, interaction with real-time audio transmission is looked upon using the tools in the domains of IoT and machine learning. Two transport layer protocols - TCP and UDP are examined for audio transmission quality. Further, different RNN models are examined for their efficiency in predicting music and being used as a substitute in case of loss of packets during transmission.

【Keywords】:

1807. Transformer-Capsule Model for Intent Detection (Student Abstract).

Paper Link】 【Pages】:13885-13886

【Authors】: Aleksander Obuchowski ; Michal Lew

【Abstract】: Intent recognition is one of the most crucial tasks in NLU systems, which are nowadays especially important for designing intelligent conversation. We propose a novel approach to intent recognition which involves combining transformer architecture with capsule networks. Our results show that such architecture performs better than original capsule-NLU network implementations and achieves state-of-the-art results on datasets such as ATIS, AskUbuntu ,and WebApp.

【Keywords】:

1808. How to Predict Seawater Temperature for Sustainable Marine Aquaculture (Student Abstract).

Paper Link】 【Pages】:13887-13888

【Authors】: Masahito Okuno ; Takanobu Otsuka

【Abstract】: The increasing global demand for marine products has turned attention to marine aquaculture. In marine aquaculture, appropriate environment control is important for a stable supply. The influence of seawater temperature on this environment is significant and accurate prediction is therefore required. In this paper, we propose and describe the implementation of a seawater prediction method using data acquired from real aquaculture areas and neural networks. Our evaluation experiment showed that hourly next-day prediction has an average error of about 0.2 to 0.4 ◦C and daily prediction of up to one week has an average error of about 0.2 to 0.5 ◦C. This is enough to meet actual worker need, which is within 1 ◦C error, thus confirming that our seawater prediction method is suitable for actual sites.

【Keywords】:

1809. A Simple Deconvolutional Mechanism for Point Clouds and Sparse Unordered Data (Student Abstract).

Paper Link】 【Pages】:13889-13890

【Authors】: Thomas Paniagua ; John Lagergren ; Greg Foderaro

【Abstract】: This paper presents a novel deconvolution mechanism, called the Sparse Deconvolution, that generalizes the classical transpose convolution operation to sparse unstructured domains, enabling the fast and accurate generation and upsampling of point clouds and other irregular data. Specifically, the approach uses deconvolutional kernels, which each map an input feature vector and set of trainable scalar weights to the feature vectors of multiple child output elements. Unlike previous approaches, the Sparse Deconvolution does not require any voxelization or structured formulation of data, it is scalable to a large number of elements, and it is capable of utilizing local feature information. As a result, these capabilities allow for the practical generation of unstructured data in unsupervised settings. Preliminary experiments are performed here, where Sparse Deconvolution layers are used as a generator within an autoencoder trained on the 3D MNIST dataset.

【Keywords】:

Paper Link】 【Pages】:13891-13892

【Authors】: Alexandre Parmentier ; Robin Cohen

【Abstract】: In this paper we show how integrating both domain specific and generic trust indicators into a prediction of trust links between users in social networks can improve upon methods for recommending content to users and how clustering of users to deliver personalized solutions offers even greater advantages.

【Keywords】:

1811. Video Person Re-ID: Fantastic Techniques and Where to Find Them (Student Abstract).

Paper Link】 【Pages】:13893-13894

【Authors】: Priyank Pathak ; Amir Erfan Eshratifar ; Michael Gormish

【Abstract】: The ability to identify the same person from multiple camera views without the explicit use of facial recognition is receiving commercial and academic interest. The current status-quo solutions are based on attention neural models. In this paper, we propose Attention and CL loss, which is a hybrid of center and Online Soft Mining (OSM) loss added to the attention loss on top of a temporal attention-based neural network. The proposed loss function applied with bag-of-tricks for training surpasses the state of the art on the common person Re-ID datasets, MARS and PRID 2011. Our source code is publicly available on github1.

【Keywords】:

1812. Predicting Students' Attention Level with Interpretable Facial and Head Dynamic Features in an Online Tutoring System (Student Abstract).

Paper Link】 【Pages】:13895-13896

【Authors】: Shimeng Peng ; Lujie Chen ; Chufan Gao ; Richard Jiarui Tong

【Abstract】: Engaged learners are effective learners. Even though it is widely recognized that engagement plays a vital role in learning effectiveness, engagement remains to be an elusive psychological construct that is yet to find a consensus definition and reliable measurement. In this study, we attempted to discover the plausible operational definitions of engagement within an online learning context. We achieved this goal by first deriving a set of interpretable features on dynamics of eyes, head and mouth movement from facial landmarks extractions of video recording when students interacting with an online tutoring system. We then assessed their predicative value for engagement which was approximated by synchronized measurements from commercial EEG brainwave headset worn by students. Our preliminary results show that those features reduce root mean-squared error by 29% compared with default predictor and we found that the random forest model performs better than a linear regressor.

【Keywords】:

1813. Attribute Noise Robust Binary Classification (Student Abstract).

Paper Link】 【Pages】:13897-13898

【Authors】: Aditya Petety ; Sandhya Tripathi ; N. Hemachandra

【Abstract】: We consider the problem of learning linear classifiers when both features and labels are binary. In addition, the features are noisy, i.e., they could be flipped with an unknown probability. In Sy-De attribute noise model, where all features could be noisy together with same probability, we show that 0-1 loss (l0−1) need not be robust but a popular surrogate, squared loss (lsq) is. In Asy-In attribute noise model, we prove that l0−1 is robust for any distribution over 2 dimensional feature space. However, due to computational intractability of l0−1, we resort to lsq and observe that it need not be Asy-In noise robust. Our empirical results support Sy-De robustness of squared loss for low to moderate noise rates.

【Keywords】:

1814. Opening the Black Box: Automatically Characterizing Software for Algorithm Selection (Student Abstract).

Paper Link】 【Pages】:13899-13900

【Authors】: Damir Pulatov ; Lars Kotthoff

【Abstract】: Meta-algorithmics, the field of leveraging machine learning to use algorithms more efficiently, has achieved impressive performance improvements in many areas of AI. It treats the algorithms to improve on as black boxes – nothing is known about their inner workings. This allows meta-algorithmic techniques to be deployed in many applications, but leaves potential performance improvements untapped by ignoring information that the algorithms could provide. In this paper, we open the black box without sacrificing the universal applicability of meta-algorithmic techniques by automatically analyzing the source code of the algorithms under consideration and show how to use it to improve algorithm selection performance. We demonstrate improvements of up to 82% on the standard ASlib benchmark library.

【Keywords】:

1815. Distill BERT to Traditional Models in Chinese Machine Reading Comprehension (Student Abstract).

Paper Link】 【Pages】:13901-13902

【Authors】: Xingkai Ren ; Ronghua Shi ; Fangfang Li

【Abstract】: Recently, unsupervised representation learning has been extremely successful in the field of natural language processing. More and more pre-trained language models are proposed and achieved the most advanced results especially in machine reading comprehension. However, these proposed pre-trained language models are huge with hundreds of millions of parameters that have to be trained. It is quite time consuming to use them in actual industry. Thus we propose a method that employ a distillation traditional reading comprehension model to simplify the pre-trained language model so that the distillation model has faster reasoning speed and higher inference accuracy in the field of machine reading comprehension. We evaluate our proposed method on the Chinese machine reading comprehension dataset CMRC2018 and greatly improve the accuracy of the original model. To the best of our knowledge, we are the first to propose a method that employ the distillation pre-trained language model in Chinese machine reading comprehension.

【Keywords】:

1816. KnowBias: Detecting Political Polarity in Long Text Content (Student Abstract).

Paper Link】 【Pages】:13903-13904

【Authors】: Aditya Saligrama

【Abstract】: We introduce a classification scheme for detecting political bias in long text content such as newspaper opinion articles. Obtaining long text data and annotations at sufficient scale for training is difficult, but it is relatively easy to extract political polarity from tweets through their authorship. We train on tweets and perform inference on articles. Universal sentence encoders and other existing methods that aim to address this domain-adaptation scenario deliver inaccurate and inconsistent predictions on articles, which we show is due to a difference in opinion concentration between tweets and articles. We propose a two-step classification scheme that uses a neutral detector trained on tweets to remove neutral sentences from articles in order to align opinion concentration and therefore improve accuracy on that domain. Our implementation is available for public use at https://knowbias.ml.

【Keywords】:

1817. ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract).

Paper Link】 【Pages】:13905-13906

【Authors】: Rohan Saphal ; Balaraman Ravindran ; Dheevatsa Mudigere ; Sasikanth Avancha ; Bharat Kaul

【Abstract】: Reinforcement learning algorithms are sensitive to hyper-parameters and require tuning and tweaking for specific environments for improving performance. Ensembles of reinforcement learning models on the other hand are known to be much more robust and stable. However, training multiple models independently on an environment suffers from high sample complexity. We present here a methodology to create multiple models from a single training instance that can be used in an ensemble through directed perturbation of the model parameters at regular intervals. This allows training a single model that converges to several local minima during the optimization process as a result of the perturbation. By saving the model parameters at each such instance, we obtain multiple policies during training that are ensembled during evaluation. We evaluate our approach on challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state of the art (SOTA) approaches

【Keywords】:

1818. A Multi-Task Learning Approach to Sarcasm Detection (Student Abstract).

Paper Link】 【Pages】:13907-13908

【Authors】: Edoardo Savini ; Cornelia Caragea

【Abstract】: Sarcasm detection plays an important role in natural language processing as it has been considered one of the most challenging subtasks in sentiment analysis and opinion mining applications. Our work aims to detect sarcasm in social media sites and discussion forums, exploiting the potential of deep neural networks and multi-task learning. Specifically, relying on the strong correlation between sarcasm and (implied negative) sentiment, we explore a multi-task learning framework that uses sentiment classification as an auxiliary task to inform the main task of sarcasm detection. Our proposed model outperforms many previous baseline methods on an existing large dataset annotated with sarcasm.

【Keywords】:

1819. LGML: Logic Guided Machine Learning (Student Abstract).

Paper Link】 【Pages】:13909-13910

【Authors】: Joseph Scott ; Maysum Panju ; Vijay Ganesh

【Abstract】: We introduce Logic Guided Machine Learning (LGML), a novel approach that symbiotically combines machine learning (ML) and logic solvers to learn mathematical functions from data. LGML consists of two phases, namely a learning-phase and a logic-phase with a corrective feedback loop, such that, the learning-phase learns symbolic expressions from input data, and the logic-phase cross verifies the consistency of the learned expression with known auxiliary truths. If inconsistent, the logic-phase feeds back "counterexamples" to the learning-phase. This process is repeated until the learned expression is consistent with auxiliary truth. Using LGML, we were able to learn expressions that correspond to the Pythagorean theorem and the sine function, with several orders of magnitude improvements in data efficiency compared to an approach based on an out-of-the-box multi-layered perceptron (MLP).

【Keywords】:

1820. Fairness Does Not Imply Satisfaction (Student Abstract).

Paper Link】 【Pages】:13911-13912

【Authors】: Andrew Searns ; Hadi Hosseini

【Abstract】: Fair division is a subfield of multiagent systems that is concerned with object distribution. When objects are indivisible, the Maximin Share Guarantee (MMS) is a desirable fairness notion; however, it is not guaranteed to exist. While MMS allocations may not always exist, a relaxation of MMS is guaranteed to exist. We show that there exists a family of instances for which this relaxation fails to guarantee the MMS value for all but a small constant number of agents.

【Keywords】:

1821. Providing Uncertainty-Based Advice for Deep Reinforcement Learning Agents (Student Abstract).

Paper Link】 【Pages】:13913-13914

【Authors】: Felipe Leno da Silva ; Pablo Hernandez-Leal ; Bilal Kartal ; Matthew E. Taylor

【Abstract】: The sample-complexity of Reinforcement Learning (RL) techniques still represents a challenge for scaling up RL to unsolved domains. One way to alleviate this problem is to leverage samples from the policy of a demonstrator to learn faster. However, advice is normally limited, hence advice should ideally be directed to states where the agent is uncertain on the best action to be applied. In this work, we propose Requesting Confidence-Moderated Policy advice (RCMP), an action-advising framework where the agent asks for advice when its uncertainty is high. We describe a technique to estimate the agent uncertainty with minor modifications in standard value-based RL methods. RCMP is shown to perform better than several baselines in the Atari Pong domain.

【Keywords】:

1822. SpotFake+: A Multimodal Framework for Fake News Detection via Transfer Learning (Student Abstract).

Paper Link】 【Pages】:13915-13916

【Authors】: Shivangi Singhal ; Anubha Kabra ; Mohit Sharma ; Rajiv Ratn Shah ; Tanmoy Chakraborty ; Ponnurangam Kumaraguru

【Abstract】: In recent years, there has been a substantial rise in the consumption of news via online platforms. The ease of publication and lack of editorial rigour in some of these platforms have further led to the proliferation of fake news. In this paper, we study the problem of detecting fake news on the FakeNewsNet repository, a collection of full length articles along with associated images. We present SpotFake+, a multimodal approach that leverages transfer learning to capture semantic and contextual information from the news articles and its associated images and achieves the better accuracy for fake news detection. To the best of our knowledge, this is the first work that performs a multimodal approach for fake news detection on a dataset that consists of full length articles. It outperforms the performance shown by both single modality and multiple-modality models. We also release the pretrained model for the benefit of the community.

【Keywords】:

1823. On the Hierarchical Information in a Single Contextualised Word Representation (Student Abstract).

Paper Link】 【Pages】:13917-13918

【Authors】: Dean L. Slack ; Mariann Hardey ; Noura Al Moubayed

【Abstract】: Contextual word embeddings produced by neural language models, such as BERT or ELMo, have seen widespread application and performance gains across many Natural Language Processing tasks, suggesting rich linguistic features encoded in their representations. This work aims to investigate to what extent any linguistic hierarchical information is encoded into a single contextual embedding. Using labelled constituency trees, we train simple linear classifiers on top of single contextualised word representations for ancestor sentiment analysis tasks at multiple constituency levels of a sentence. To assess the presence of hierarchical information throughout the networks, the linear classifiers are trained using representations produced by each intermediate layer of BERT and ELMo variants. We show that with no fine-tuning, a single contextualised representation encodes enough syntactic and semantic sentence-level information to significantly outperform a non-contextual baseline for classifying 5-class sentiment of its ancestor constituents at multiple levels of the constituency tree. Additionally, we show that both LSTM and transformer architectures trained on similarly sized datasets achieve similar levels of performance on these tasks. Future work looks to expand the analysis to a wider range of NLP tasks and contextualisers.

【Keywords】:

1824. Bayesian Optimisation for Premise Selection in Automated Theorem Proving (Student Abstract).

Paper Link】 【Pages】:13919-13920

【Authors】: Agnieszka Slowik ; Chaitanya Mangla ; Mateja Jamnik ; Sean B. Holden ; Lawrence C. Paulson

【Abstract】: Modern theorem provers utilise a wide array of heuristics to control the search space explosion, thereby requiring optimisation of a large set of parameters. An exhaustive search in this multi-dimensional parameter space is intractable in most cases, yet the performance of the provers is highly dependent on the parameter assignment. In this work, we introduce a principled probabilistic framework for heuristic optimisation in theorem provers. We present results using a heuristic for premise selection and the Archive of Formal Proofs (AFP) as a case study.

【Keywords】:

1825. Using Chinese Glyphs for Named Entity Recognition (Student Abstract).

Paper Link】 【Pages】:13921-13922

【Authors】: Chan Hee Song ; Arijit Sehanobish

【Abstract】: Most Named Entity Recognition (NER) systems use additional features like part-of-speech (POS) tags, shallow parsing, gazetteers, etc. Adding these external features to NER systems have been shown to have a positive impact. However, creating gazetteers or taggers can take a lot of time and may require extensive data cleaning. In this work instead of using these traditional features we use lexicographic features of Chinese characters. Chinese characters are composed of graphical components called radicals and these components often have some semantic indicators. We propose CNN based models that incorporate this semantic information and use them for NER. Our models show an improvement over the baseline BERT-BiLSTM-CRF model. We present one of the first studies on Chinese OntoNotes v5.0 and show an improvement of + .64 F1 score over the baseline. We present a state-of-the-art (SOTA) F1 score of 71.81 on the Weibo dataset, show a competitive improvement of + 0.72 over baseline on the ResumeNER dataset, and a SOTA F1 score of 96.49 on the MSRA dataset.

【Keywords】:

1826. Leakage-Robust Classifier via Mask-Enhanced Training (Student Abstract).

Paper Link】 【Pages】:13923-13924

【Authors】: Damian Stachura ; Christopher Galias ; Konrad Zolna

【Abstract】: We synthetically add data leakage to well-known image datasets, which results in predictions of convolutional neural networks trained naively on these spoiled datasets becoming wildly inaccurate. We propose a method, dubbed Mask-Enhanced Training, that automatically identifies the possible leakage and makes the classifier robust. The method enables the model to focus on all features needed to solve the task, making its predictions on the original validation set accurate, even if the whole training dataset is spoiled with the leakage.

【Keywords】:

Paper Link】 【Pages】:13925-13926

【Authors】: Jialin Su ; Yuanzhuo Wang ; Xiaolong Jin ; Yantao Jia ; Xueqi Cheng

【Abstract】: Link prediction in knowledge graphs (KGs) aims at predicting potential links between entities in KGs. Existing knowledge graph embedding (KGE) based methods represent individual entities and links in KGs as vectors in low-dimension space. However, these methods focus mainly on the link prediction of individual entities, yet neglect that between group entities, which exist widely in real-world KGs. In this paper, we propose a KGE based method, called GTransA, for link prediction between group entities in a heterogeneous network by integrating individual entity links into group entity links during prediction. Experiments show that GTransA decreases mean rank by 5.4%, compared to TransA.

【Keywords】:

1828. Structure-Based Drug-Drug Interaction Detection via Expressive Graph Convolutional Networks and Deep Sets (Student Abstract).

Paper Link】 【Pages】:13927-13928

【Authors】: Mengying Sun ; Fei Wang ; Olivier Elemento ; Jiayu Zhou

【Abstract】: In this work, we proposed a DDI detection method based on molecular structures using graph convolutional networks and deep sets. We proposed a more discriminative convolutional layer compared to conventional GCN and achieved permutation invariant prediction without losing the capability of capturing complicated interactions.

【Keywords】:

1829. Sampling Random Chordal Graphs by MCMC (Student Abstract).

Paper Link】 【Pages】:13929-13930

【Authors】: Wenbo Sun ; Ivona Bezáková

【Abstract】: Chordal graphs are a widely studied graph class, with applications in several areas of computer science, including structural learning of Bayesian networks. Many problems that are hard on general graphs become solvable on chordal graphs. The random generation of instances of chordal graphs for testing these algorithms is often required. Nevertheless, there are only few known algorithms that generate random chordal graphs, and, as far as we know, none of them generate chordal graphs uniformly at random (where each chordal graph appears with equal probability). In this paper we propose a Markov chain Monte Carlo (MCMC) method to sample connected chordal graphs uniformly at random. Additionally, we propose a Markov chain that generates connected chordal graphs with a bounded treewidth uniformly at random. Bounding the treewidth parameter (which bounds the largest clique) has direct implications on the running time of various algorithms on chordal graphs. For each of the proposed Markov chains we prove that they are ergodic and therefore converge to the uniform distribution. Finally, as initial evidence that the Markov chains have the potential to mix rapidly, we prove that the chain on graphs with bounded treewidth mixes rapidly for trees (chordal graphs with treewidth bound of one).

【Keywords】:

1830. Keyphrase Generation for Scientific Articles Using GANs (Student Abstract).

Paper Link】 【Pages】:13931-13932

【Authors】: Avinash Swaminathan ; Raj Kuwar Gupta ; Haimin Zhang ; Debanjan Mahata ; Rakesh Gosangi ; Rajiv Ratn Shah

【Abstract】: In this paper, we present a keyphrase generation approach using conditional Generative Adversarial Networks (GAN). In our GAN model, the generator outputs a sequence of keyphrases based on the title and abstract of a scientific article. The discriminator learns to distinguish between machine-generated and human-curated keyphrases. We evaluate this approach on standard benchmark datasets. Our model achieves state-of-the-art performance in generation of abstractive keyphrases and is also comparable to the best performing extractive techniques. We also demonstrate that our method generates more diverse keyphrases and make our implementation publicly available1.

【Keywords】:

1831. Biologically Inspired Sleep Algorithm for Reducing Catastrophic Forgetting in Neural Networks.

Paper Link】 【Pages】:13933-13934

【Authors】: Timothy Tadros ; Giri P. Krishnan ; Ramyaa Ramyaa ; Maxim Bazhenov

【Abstract】: Artificial neural networks (ANNs) are known to suffer from catastrophic forgetting: when learning multiple tasks, they perform well on the most recently learned task while failing to perform on previously learned tasks. In biological networks, sleep is known to play a role in memory consolidation and incremental learning. Motivated by the processes that are known to be involved in sleep generation in biological networks, we developed an algorithm that implements a sleep-like phase in ANNs. In an incremental learning framework, we demonstrate that sleep is able to recover older tasks that were otherwise forgotten. We show that sleep creates unique representations of each class of inputs and neurons that were relevant to previous tasks fire during sleep, simulating replay of previously learned memories.

【Keywords】:

1832. Improving First-Order Optimization Algorithms (Student Abstract).

Paper Link】 【Pages】:13935-13936

【Authors】: Ange Tato ; Roger Nkambou

【Abstract】: This paper presents a simple and intuitive technique to accelerate the convergence of first-order optimization algorithms. The proposed solution modifies the update rule, based on the variation of the direction of the gradient and the previous step taken during training. Results after tests show that the technique has the potential to significantly improve the performance of existing first-order optimization algorithms.

【Keywords】:

Paper Link】 【Pages】:13937-13938

【Authors】: Maxat Tezekbayev ; Zhenisbek Assylbekov ; Rustem Takhanov

【Abstract】: We show that the skip-gram embedding of any word can be decomposed into two subvectors which roughly correspond to semantic and syntactic roles of the word.

【Keywords】:

1834. Robust Multi-View Representation Learning (Student Abstract).

Paper Link】 【Pages】:13939-13940

【Authors】: Sibi Venkatesan ; James K. Miller ; Artur Dubrawski

【Abstract】: Multi-view data has become ubiquitous, especially with multi-sensor systems like self-driving cars or medical patient-side monitors. We propose two methods to approach robust multi-view representation learning with the aim of leveraging local relationships between views.The first is an extension of Canonical Correlation Analysis (CCA) where we consider multiple one-vs-rest CCA problems, one for each view. We use a group-sparsity penalty to encourage finding local relationships. The second method is a straightforward extension of a multi-view AutoEncoder with view-level drop-out.We demonstrate the effectiveness of these methods in simple synthetic experiments. We also describe heuristics and extensions to improve and/or expand on these methods.

【Keywords】:

1835. Emergence of Writing Systems through Multi-Agent Cooperation (Student Abstract).

Paper Link】 【Pages】:13941-13942

【Authors】: Shresth Verma ; Joydip Dhar

【Abstract】: Learning to communicate is considered an essential task to develop a general AI. While recent literature in language evolution has studied emergent language through discrete or continuous message symbols, there has been little work in the emergence of writing systems in artificial agents. In this paper, we present a referential game setup with two agents, where the mode of communication is a written language system that emerges during the play. We show that the agents can learn to coordinate successfully using this mode of communication. Further, we study how the game rules affect the writing system taxonomy by proposing a consistency metric.

【Keywords】:

1836. Towards Interpretable Semantic Segmentation via Gradient-Weighted Class Activation Mapping (Student Abstract).

Paper Link】 【Pages】:13943-13944

【Authors】: Kira Vinogradova ; Alexandr Dibrov ; Gene Myers

【Abstract】: Convolutional neural networks have become state-of-the-art in a wide range of image recognition tasks. The interpretation of their predictions, however, is an active area of research. Whereas various interpretation methods have been suggested for image classification, the interpretation of image segmentation still remains largely unexplored. To that end, we propose seg-grad-cam, a gradient-based method for interpreting semantic segmentation. Our method is an extension of the widely-used Grad-CAM method, applied locally to produce heatmaps showing the relevance of individual pixels for semantic segmentation.

【Keywords】:

1837. Action Recognition and State Change Prediction in a Recipe Understanding Task Using a Lightweight Neural Network Model (Student Abstract).

Paper Link】 【Pages】:13945-13946

【Authors】: Qing Wan ; Yoonsuck Choe

【Abstract】: Consider a natural language sentence describing a specific step in a food recipe. In such instructions, recognizing actions (such as press, bake, etc.) and the resulting changes in the state of the ingredients (shape molded, custard cooked, temperature hot, etc.) is a challenging task. One way to cope with this challenge is to explicitly model a simulator module that applies actions to entities and predicts the resulting outcome (Bosselut et al. 2018). However, such a model can be unnecessarily complex. In this paper, we propose a simplified neural network model that separates action recognition and state change prediction, while coupling the two through a novel loss function. This allows learning to indirectly influence each other. Our model, although simpler, achieves higher state change prediction performance (67% average accuracy for ours vs. 55% in (Bosselut et al. 2018)) and takes fewer samples to train (10K ours vs. 65K+ by (Bosselut et al. 2018)).

【Keywords】:

1838. Learning Sense Representation from Word Representation for Unsupervised Word Sense Disambiguation (Student Abstract).

Paper Link】 【Pages】:13947-13948

【Authors】: Jie Wang ; Zhenxin Fu ; Moxin Li ; Haisong Zhang ; Dongyan Zhao ; Rui Yan

【Abstract】: Unsupervised WSD methods do not rely on annotated training datasets and can use WordNet. Since each ambiguous word in the WSD task exists in WordNet and each sense of the word has a gloss, we propose SGM and MGM to learn sense representations for words in WordNet using the glosses. In the WSD task, we calculate the similarity between each sense of the ambiguous word and its context to select the sense with the highest similarity. We evaluate our method on several benchmark WSD datasets and achieve better performance than the state-of-the-art unsupervised WSD systems.

【Keywords】:

1839. Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract).

Paper Link】 【Pages】:13949-13950

【Authors】: Qisheng Wang ; Qichao Wang ; Xiao Li

【Abstract】: Exploration efficiency challenges for multi-agent reinforcement learning (MARL), as the policy learned by confederate MARL depends on the interaction among agents. Less informative reward also restricts the learning speed of MARL in comparison with the informative label in supervised learning. This paper proposes a novel communication method which helps agents focus on different exploration subarea to guide MARL to accelerate exploration. We propose a predictive network to forecast the reward of current state-action pair and use the guidance learned by the predictive network to modify the reward function. An improved prioritized experience replay is employed to help agents better take advantage of the different knowledge learned by different agents. Experimental results demonstrate that the proposed algorithm outperforms existing methods in cooperative multi-agent environments.

【Keywords】:

1840. Combining Fine-Tuning with a Feature-Based Approach for Aspect Extraction on Reviews (Student Abstract).

Paper Link】 【Pages】:13951-13952

【Authors】: Xili Wang ; Hua Xu ; Xiaomin Sun ; Guangcan Tao

【Abstract】: One key task of fine-grained sentiment analysis on reviews is to extract aspects or features that users have expressed opinions on. Generally, fine-tuning BERT with sophisticated task-specific layers can achieve better performance than only extend one extra task-specific layer (e.g., a fully-connected + softmax layer) since not all tasks can easily be represented by Transformer encoder architecture and special task-specific layer can capture task-specific features. However, BERT fine-tuning may be unstable on a small-scale dataset. Besides, in our experiments, directly fine-tuning BERT on extending sophisticated task-specific layers did not take advantage of the features of task-specific layers and even restrict the performance of BERT module. To address the above consideration, this paper combines Fine-tuning with a feature-based approach to extract aspect. To the best of our knowledge, this is the first paper to combine fine-tuning with a feature-based approach for aspect extraction.

【Keywords】:

1841. HGMAN: Multi-Hop and Multi-Answer Question Answering Based on Heterogeneous Knowledge Graph (Student Abstract).

Paper Link】 【Pages】:13953-13954

【Authors】: Xu Wang ; Shuai Zhao ; Bo Cheng ; Jiale Han ; Yingting Li ; Hao Yang ; Guoshun Nan

【Abstract】: Multi-hop question answering models based on knowledge graph have been extensively studied. Most existing models predict a single answer with the highest probability by ranking candidate answers. However, they are stuck in predicting all the right answers caused by the ranking method. In this paper, we propose a novel model that converts the ranking of candidate answers into individual predictions for each candidate, named heterogeneous knowledge graph based multi-hop and multi-answer model (HGMAN). HGMAN is capable of capturing more informative representations for relations assisted by our heterogeneous graph, which consists of multiple entity nodes and relation nodes. We rely on graph convolutional network for multi-hop reasoning and then binary classification for each node to get multiple answers. Experimental results on MetaQA dataset show the performance of our proposed model over all baselines.

【Keywords】:

1842. Topic Enhanced Controllable CVAE for Dialogue Generation (Student Abstract).

Paper Link】 【Pages】:13955-13956

【Authors】: Yiru Wang ; Pengda Si ; Zeyang Lei ; Yujiu Yang

【Abstract】: Neural generation models have shown great potential in conversation generation recently. However, these methods tend to generate uninformative or irrelevant responses. In this paper, we present a novel topic-enhanced controllable CVAE (TEC-CVAE) model to address this issue. On the one hand, the model learns the context-interactive topic knowledge through a novel multi-hop hybrid attention in the encoder. On the other hand, we design a topic-aware controllable decoder to constrain the expression of the stochastic latent variable in the CVAE to reduce irrelevant responses. Experimental results on two public datasets show that the two mechanisms synchronize to improve both relevance and diversity, and the proposed model outperforms other competitive methods.

【Keywords】:

1843. Neural Dynamics and Gamma Oscillation on a Hybrid Excitatory-Inhibitory Complex Network (Student Abstract).

Paper Link】 【Pages】:13957-13958

【Authors】: Yuan Wang ; Xia Shi ; Bo Cheng ; Junliang Chen

【Abstract】: This paper investigates the neural dynamics and gamma oscillation on a complex network with excitatory and inhibitory neurons (E-I network), as such network is ubiquitous in the brain. The system consists of a small-world network of neurons, which are emulated by Izhikevich model. Moreover, mixed Regular Spiking (RS) and Chattering (CH) neurons are considered to imitate excitatory neurons, and Fast Spiking (FS) neurons are used to mimic inhibitory neurons. Besides, the relationship between synchronization and gamma rhythm is explored by adjusting the critical parameters of our model. Experiments visually demonstrate that the gamma oscillations are generated by synchronous behaviors of our neural network. We also discover that the Chattering(CH) excitatory neurons can make the system easier to synchronize.

【Keywords】:

1844. Supervised Discovery of Unknown Unknowns through Test Sample Mining (Student Abstract).

Paper Link】 【Pages】:13959-13960

【Authors】: Zheng Wang ; Bruno Abrahao ; Ece Kamar

【Abstract】: Given a fixed hypothesis space, defined to model class structure in a particular domain of application, unknown unknowns (u.u.s) are data examples that form classes in the feature space whose structure is not represented in a trained model. Accordingly, this leads to incorrect class prediction with high confidence, which represents one of the major sources of blind spots in machine learning. Our method seeks to reduce the structural mismatch between the training model and that of the target space in a supervised way. We illuminate further structure through cross-validation on a modified training model, set up to mine and trap u.u.s in a marginal training class, created from examples of a random sample of the test set. Contrary to previous approaches, our method simplifies the solution, as it does not rely on budgeted queries to an Oracle whose outcomes inform adjustments to training. In addition, our empirically results exhibit consistent performance improvements over baselines, on both synthetic and real-world data sets.

【Keywords】:

1845. Few Sample Learning without Data Storage for Lifelong Stream Mining (Student Abstract).

Paper Link】 【Pages】:13961-13962

【Authors】: Zhuoyi Wang ; Yigong Wang ; Yu Lin ; Bo Dong ; Hemeng Tao ; Latifur Khan

【Abstract】: Continuously mining complexity data stream has recently been attracting an increasing amount of attention, due to the rapid growth of real-world vision/signal applications such as self-driving cars and online social media messages. In this paper, we aim to address two significant problems in the lifelong/incremental stream mining scenario: first, how to make the learning algorithms generalize to the unseen classes only from a few labeled samples; second, is it possible to avoid storing instances from previously seen classes to solve the catastrophic forgetting problem? We introduce a novelty stream mining framework to classify the infinite stream of data with different categories that occurred during different times. We apply a few-sample learning strategy to make the model recognize the novel class with limited samples; at the same time, we implement an incremental generative model to maintain old knowledge when learning new coming categories, and also avoid the violation of data privacy and memory restrictions simultaneously. We evaluate our approach in the continual class-incremental setup on the classification tasks and ensure the sufficient model capacity to accommodate for learning the new incoming categories.

【Keywords】:

1846. A Multi-Task Learning Machine Reading Comprehension Model for Noisy Document (Student Abstract).

Paper Link】 【Pages】:13963-13964

【Authors】: Zhijing Wu ; Hua Xu

【Abstract】: Current neural models for Machine Reading Comprehension (MRC) have achieved successful performance in recent years. However, the model is too fragile and lack robustness to tackle the imperceptible adversarial perturbations to the input. In this work, we propose a multi-task learning MRC model with a hierarchical knowledge enrichment to further improve the robustness for noisy document. Our model follows a typical encode-align-decode framework. Additionally, we apply a hierarchical method of adding background knowledge into the model from coarse-to-fine to enhance the language representations. Besides, we optimize our model by jointly training the answer span and unanswerability prediction, aiming to improve the robustness to noise. Experiment results on benchmark datasets confirm the superiority of our method, and our method can achieve competitive performance compared with other strong baselines.

【Keywords】:

1847. Multi-Agent/Robot Deep Reinforcement Learning with Macro-Actions (Student Abstract).

Paper Link】 【Pages】:13965-13966

【Authors】: Yuchen Xiao ; Joshua Hoffman ; Tian Xia ; Christopher Amato

【Abstract】: We consider the challenges of learning multi-agent/robot macro-action-based deep Q-nets including how to properly update each macro-action value and accurately maintain macro-action-observation trajectories. We address these challenges by first proposing two fundamental frameworks for learning macro-action-value function and joint macro-action-value function. Furthermore, we present two new approaches of learning decentralized macro-action-based policies, which involve a new double Q-update rule that facilitates the learning of decentralized Q-nets by using a centralized Q-net for action selection. Our approaches are evaluated both in simulation and on real robots.

【Keywords】:

1848. Multi-Channel Convolutional Neural Networks with Adversarial Training for Few-Shot Relation Classification (Student Abstract).

Paper Link】 【Pages】:13967-13968

【Authors】: Yuxiang Xie ; Hua Xu ; Congcong Yang ; Kai Gao

【Abstract】: The distant supervised (DS) method has improved the performance of relation classification (RC) by means of extending the dataset. However, DS also brings the problem of wrong labeling. Contrary to DS, the few-shot method relies on few supervised data to predict the unseen classes. In this paper, we use word embedding and position embedding to construct multi-channel vector representation and use the multi-channel convolutional method to extract features of sentences. Moreover, in order to alleviate few-shot learning to be sensitive to overfitting, we introduce adversarial learning for training a robust model. Experiments on the FewRel dataset show that our model achieves significant and consistent improvements on few-shot RC as compared with baselines.

【Keywords】:

1849. Breakdown Detection in Negotiation Dialogues (Student Abstract).

Paper Link】 【Pages】:13969-13970

【Authors】: Atsuki Yamaguchi ; Katsuhide Fujita

【Abstract】: In human-human negotiation, reaching a rational agreement can be difficult, and unfortunately, the negotiations sometimes break down because of conflicts of interests. If artificial intelligence can play a role in assisting with human-human negotiation, it can assist in avoiding negotiation breakdown, leading to a rational agreement. Therefore, this study focuses on end-to-end tasks for predicting the outcome of a negotiation dialogue in natural language. Our task is modeled using a gated recurrent unit and a pre-trained language model: BERT as the baseline. Experimental results demonstrate that the proposed tasks are feasible on two negotiation dialogue datasets, and that signs of a breakdown can be detected in the early stages using the baselines even if the models are used in a partial dialogue history.

【Keywords】:

1850. I Know Where You Are Coming From: On the Impact of Social Media Sources on AI Model Performance (Student Abstract).

Paper Link】 【Pages】:13971-13972

【Authors】: Yang Qi ; Farseev Aleksandr ; Filchenkov Andrey

【Abstract】: Nowadays, social networks play a crucial role in human everyday life and no longer purely associated with spare time spending. In fact, instant communication with friends and colleagues has become an essential component of our daily interaction giving a raise of multiple new social network types emergence. By participating in such networks, individuals generate a multitude of data points that describe their activities from different perspectives and, for example, can be further used for applications such as personalized recommendation or user profiling. However, the impact of the different social media networks on machine learning model performance has not been studied comprehensively yet. Particularly, the literature on modeling multi-modal data from multiple social networks is relatively sparse, which had inspired us to take a deeper dive into the topic in this preliminary study. Specifically, in this work, we will study the performance of different machine learning models when being learned on multi-modal data from different social networks. Our initial experimental results reveal that social network choice impacts the performance and the proper selection of data source is crucial.

【Keywords】:

1851. Session-Level User Satisfaction Prediction for Customer Service Chatbot in E-Commerce (Student Abstract).

Paper Link】 【Pages】:13973-13974

【Authors】: Riheng Yao ; Shuangyong Song ; Qiudan Li ; Chao Wang ; Huan Chen ; Haiqing Chen ; Daniel Dajun Zeng

【Abstract】: This paper aims to predict user satisfaction for customer service chatbot in session level, which is of great practical significance yet rather untouched. It requires to explore the relationship between questions and answers across different rounds of interactions, and handle user bias. We propose an approach to model multi-round conversations within one session and take user information into account. Experimental results on a dataset from a real-world industrial customer service chatbot Alime demonstrate the good performance of our proposed model.

【Keywords】:

1852. Deep Ranking for Style-Aware Room Recommendations (Student Abstract).

Paper Link】 【Pages】:13975-13976

【Authors】: Ilkay Yildiz ; Esra Ataer Cansizoglu ; Hantian Liu ; Peter B. Golbus ; Ozan Tezcan ; Jae-Woo Choi

【Abstract】: We present a deep learning based room image retrieval framework that is based on style understanding. Given a dataset of room images labeled by interior design experts, we map the noisy style labels to comparison labels. Our framework learns the style spectrum of each image from the generated comparisons and makes significantly more accurate recommendations compared to discrete classification baselines.

【Keywords】:

1853. Interactive Neural Network: Leveraging Part-of-Speech Window for Aspect Term Extraction (Student Abstract).

Paper Link】 【Pages】:13977-13978

【Authors】: Da Yin ; Xiuyu Wu ; Baobao Chang

【Abstract】: Aspect term extraction is a fundamental task for aspect-level sentiment analysis. Previous methods tend to extract noun aspect terms due to the large quantities of them, and perform badly on extracting aspect terms containing words with other POS tags, according to experimental results. In addition, few works focus on the POS tags of adjacent words which are critical to aspect term extraction. We propose a novel model which combines POS and word features in an interactive way, and makes full use of the POS tags of adjacent words by POS window. We conduct experiments on two datasets, and prove the effectiveness of our model.

【Keywords】:

1854. Domain Knowledge-Assisted Automatic Diagnosis of Idiopathic Pulmonary Fibrosis (IPF) Using High Resolution Computed Tomography (HRCT) (Student Abstract).

Paper Link】 【Pages】:13979-13980

【Authors】: Wenxi Yu ; Hua Zhou ; Jonathan G. Goldin ; Hyun J. Grace Kim

【Abstract】: Domain knowledge acquired from pilot studies is important for medical diagnosis. This paper leverages the population-level domain knowledge based on the D-optimal design criterion to judiciously select CT slices that are meaningful for the disease diagnosis task. As an illustrative example, the diagnosis of idiopathic pulmonary fibrosis (IPF) among interstitial lung disease (ILD) patients is used for this work. IPF diagnosis is complicated and is subject to inter-observer variability. We aim to construct a time/memory-efficient IPF diagnosis model using high resolution computed tomography (HRCT) with domain knowledge-assisted data dimension reduction methods. Four two-dimensional convolutional neural network (2D-CNN) architectures (MobileNet, VGG16, ResNet, and DenseNet) are implemented for an automatic diagnosis of IPF among ILD patients. Axial lung CT images are acquired from five multi-center clinical trials, which sum up to 330 IPF patients and 650 non-IPF ILD patients. Model performance is evaluated using five-fold cross-validation. Depending on the model setup, MobileNet achieved satisfactory results with overall sensitivity, specificity, and accuracy greater than 90%. Further evaluation of independent datasets is underway. Based on our knowledge, this is the first work that (1) uses population-level domain knowledge with optimal design criterion in selecting CT slices and (2) focuses on patient-level IPF diagnosis.

【Keywords】:

1855. Cancer Treatment Classification with Electronic Medical Health Records (Student Abstract).

Paper Link】 【Pages】:13981-13982

【Authors】: Jiaming Zeng ; Imon Banerjee ; Michael Francis Gensheimer ; Daniel Rubin

【Abstract】: We built a natural language processing (NLP) language model that can be used to extract cancer treatment information using structured and unstructured electronic medical records (EMR). Our work appears to be the first that combines EMR and NLP for treatment identification.

【Keywords】:

1856. Literature Mining for Incorporating Inductive Bias in Biomedical Prediction Tasks (Student Abstract).

Paper Link】 【Pages】:13983-13984

【Authors】: Qizhen Zhang ; Audrey Durand ; Joelle Pineau

【Abstract】: Applications of machine learning in biomedical prediction tasks are often limited by datasets that are unrepresentative of the sampling population. In these situations, we can no longer rely only on the the training data to learn the relations between features and the prediction outcome. Our method proposes to learn an inductive bias that indicates the relevance of each feature to outcomes through literature mining in PubMed, a centralized source of biomedical documents. The inductive bias acts as a source of prior knowledge from experts, which we leverage by imposing an extra penalty for model weights that differ from this inductive bias. We empirically evaluate our method on a medical prediction task and highlight the importance of incorporating expert knowledge that can capture relations not present in the training data.

【Keywords】:

1857. Shoreline: Data-Driven Threshold Estimation of Online Reserves of Cryptocurrency Trading Platforms (Student Abstract).

Paper Link】 【Pages】:13985-13986

【Authors】: Xitong Zhang ; He Zhu ; Jiayu Zhou

【Abstract】: With the proliferation of blockchain projects and applications, cryptocurrency exchanges, which provides exchange services among different types of cryptocurrencies, become pivotal platforms that allow customers to trade digital assets on different blockchains. Because of the anonymity and trustlessness nature of cryptocurrency, one major challenge of crypto-exchanges is asset safety, and all-time amount hacked from crypto-exchanges until 2018 is over $1.5 billion even with carefully maintained secure trading systems. The most critical vulnerability of crypto-exchanges is from the so-called hot wallet, which is used to store a certain portion of the total asset online of an exchange and programmatically sign transactions when a withdraw happens. It is important to develop network security mechanisms. However, the fact is that there is no guarantee that the system can defend all attacks. Thus, accurately controlling the available assets in the hot wallets becomes the key to minimize the risk of running an exchange. In this paper, we propose Shoreline, a deep learning-based threshold estimation framework that estimates the optimal threshold of hot wallets from historical wallet activities and dynamic trading networks.

【Keywords】:

1858. Rception: Wide and Deep Interaction Networks for Machine Reading Comprehension (Student Abstract).

Paper Link】 【Pages】:13987-13988

【Authors】: Xuanyu Zhang ; Zhichun Wang

【Abstract】: Most of models for machine reading comprehension (MRC) usually focus on recurrent neural networks (RNNs) and attention mechanism, though convolutional neural networks (CNNs) are also involved for time efficiency. However, little attention has been paid to leverage CNNs and RNNs in MRC. For a deeper understanding, humans sometimes need local information for short phrases, sometimes need global context for long passages. In this paper, we propose a novel architecture, i.e., Rception, to capture and leverage both local deep information and global wide context. It fuses different kinds of networks and hyper-parameters horizontally rather than simply stacking them layer by layer vertically. Experiments on the Stanford Question Answering Dataset (SQuAD) show that our proposed architecture achieves good performance.

【Keywords】:

Paper Link】 【Pages】:13989-13990

【Authors】: Zeyu Zhao ; John P. Dickerson

【Abstract】: Kidney exchange is an organized barter market that allows patients with end-stage renal disease to trade willing donors—and thus kidneys—with other patient-donor pairs. The central clearing problem is to find an arrangement of swaps that maximizes the number of transplants. It is known to be NP-hard in almost all cases. Most existing approaches have modeled this problem as a mixed integer program (MIP), using classical branch-and-price-based tree search techniques to optimize. In this paper, we frame the clearing problem as a Maximum Weighted Independent Set (MWIS) problem, and use a Graph Neural Network guided Monte Carlo Tree Search to find a solution. Our initial results show that this approach outperforms baseline (non-optimal but scalable) algorithms. We believe that a learning-based optimization algorithm can improve upon existing approaches to the kidney exchange clearing problem.

【Keywords】:

1860. Focusing on Detail: Deep Hashing Based on Multiple Region Details (Student Abstract).

Paper Link】 【Pages】:13991-13992

【Authors】: Quan Zhou ; Xiushan Nie ; Yang Shi ; Xingbo Liu ; Yilong Yin

【Abstract】: Fast retrieval efficiency and high performance hashing, which aims to convert multimedia data into a set of short binary codes while preserving the similarity of the original data, has been widely studied in recent years. Majority of the existing deep supervised hashing methods only utilize the semantics of a whole image in learning hash codes, but ignore the local image details, which are important in hash learning. To fully utilize the detailed information, we propose a novel deep multi-region hashing (DMRH), which learns hash codes from local regions, and in which the final hash codes of the image are obtained by fusing the local hash codes corresponding to local regions. In addition, we propose a self-similarity loss term to address the imbalance problem (i.e., the number of dissimilar pairs is significantly more than that of the similar ones) of methods based on pairwise similarity.

【Keywords】:

1861. HARK: Harshness-Aware Sentiment Analysis Framework for Product Review (Student Abstract).

Paper Link】 【Pages】:13993-13994

【Authors】: Ting Zhou ; Xun Wang ; Yili Fang

【Abstract】: Sentiment analysis has been a helpful mechanism that targets to understand the market feedback on certain commodities by utilizing the user comments. In the process of providing comments, each user comment is generated based on his/her preference which is referred to as harshness. Existing methods mainly apply majority voting or its variants to directly infer the evaluation of products. Nevertheless, due to the ignorance of the harshness of users, these methods will lead to low-quality inference outcome of sentiment analysis, which is far from the result of the expert analysis report. To this end, we propose HARK, a harshness-aware product analysis framework. First, we employ a Bayesian-based model for sentiment analysis. Moreover, in order to infer the reliable sentiment concerning each product from all the comments, we present a probabilistic graphical model in which the harshness is incorporated. Extensive experimental evaluations have shown that the result of our method is more consistent with the expert evaluation than that of the state-of-the-art methods. And our method also outperforms the method which infers the final sentiment with the ground truth of comments but without involving the harshness of users.

【Keywords】:

1862. Contention-Aware Mapping and Scheduling Optimization for NoC-Based MPSoCs (Student Abstract).

Paper Link】 【Pages】:13995-13996

【Authors】: Yupeng Zhou ; Rongjie Yan ; Anyu Cai ; Yige Yan ; Minghao Yin

【Abstract】: We consider spacial and temporal aspects of communication to avoid contention in Network-on-Chip (NoC) architectures. A constraint model is constructed such that the design concerns can be evaluated, and an efficient evolutionary algorithm with various heuristics is proposed to search for better solutions. Experimentations from random benchmarks demonstrate the efficiency of our method in multi-objective optimization and the effectiveness of our techniques in avoiding network contention.

【Keywords】:

1863. Generative Adversarial Imitation Learning from Failed Experiences (Student Abstract).

Paper Link】 【Pages】:13997-13998

【Authors】: Jiacheng Zhu ; Jiahao Lin ; Meng Wang ; Yingfeng Chen ; Changjie Fan ; Chong Jiang ; Zongzhang Zhang

【Abstract】: Imitation learning provides a family of promising methods that learn policies from expert demonstrations directly. As a model-free and on-line imitation learning method, generative adversarial imitation learning (GAIL) generalizes well to unseen situations and can handle complex problems. In this paper, we propose a novel variant of GAIL called GAIL from failed experiences (GAILFE). GAILFE allows an agent to utilize failed experiences in the training process. Moreover, a constrained optimization objective is formalized in GAILFE to balance learning from given demonstrations and from self-generated failed experiences. Empirically, compared with GAIL, GAILFE can improve sample efficiency and learning speed over different tasks.

【Keywords】:

1864. Combating False Negatives in Adversarial Imitation Learning (Student Abstract).

Paper Link】 【Pages】:13999-14000

【Authors】: Konrad Zolna ; Chitwan Saharia ; Léonard Boussioux ; David Yu-Tung Hui ; Maxime Chevalier-Boisvert ; Dzmitry Bahdanau ; Yoshua Bengio

【Abstract】: We define the False Negatives problem and show that it is a significant limitation in adversarial imitation learning. We propose a method that solves the problem by leveraging the nature of goal-conditioned tasks. The method, dubbed Fake Conditioning, is tested on instruction following tasks in BabyAI environments, where it improves sample efficiency over the baselines by at least an order of magnitude.

【Keywords】:

1865. Position-Based Social Choice Methods for Intransitive Incomplete Pairwise Vote Sets (Student Abstract).

Paper Link】 【Pages】:14001-14002

【Authors】: Julian Zucker

【Abstract】: Combining the decisions of multiple agents into a final decision requires the use of social choice mechanisms. Pairwise decisions are often incomplete and intransitive, preventing the use of Borda count and other position-based social choice mechanisms. We propose and compare multiple methods for converting incomplete intransitive pairwise vote sets to complete rankings, enabling position-based social choice methods. The algorithms are evaluated on their output's Kendall's τ similarity when implementing pairwise social choice mechanisms. We show that there is only a small difference between the outputs of social choice methods on the original pairwise vote set and the generated ranking set on a real-world pairwise voting dataset. Source code for the analysis is available.1

【Keywords】: